Sunday, January 15, 2012

Top NoSQL databases

HBase is an open-source, nonrelational, distributed database that is modeled after Google's BigTable and written in Java. It was developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of Hadoop Distributed Filesystem (HDFS).

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System papers. Hadoop is a top-level Apache project being built by a global community of contributors, written in Java. Yahoo is the largest contributor to the project, and uses Hadoop extensively across its businesses.

MongoDB is an open-source, high-performance, schema-free, document-oriented NoSQL database system written in C++. It manages collections of BSON documents that can be nested in complex hierarchies and still be easy to query and index, enabling many applications to store data in a natural way that matches their native data types and structures. 10gen began developing MongoDB in October 2007 by 10gen. The first public release was in February 2009.

Apache CouchDB, commonly referred to as CouchDB, is an open-source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices. CouchDB is supported by commercial enterprises Couchbase and Cloudant.

Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project[1] designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.
Cassandra provides a structured key-value store with tunable consistency. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. Furthermore, columns are added only to specified keys, so different keys can have different numbers of columns in any given family. The values from a column family for each key are stored together. This makes Cassandra a hybrid data management system between a column-oriented DBMS and a row-oriented store. Also, besides using the way of modeling of BigTable, it has properties like eventual consistency, the Gossip protocol, a master-master way of serving the read and write requests that are inspired by Amazon's Dynamo.

No comments:

Post a Comment