aliquote.org

Overview of next gen database

January 18, 2011

The NoSQL paradigm isn’t a way to work with a database without having to connect to a server. It is merely a term coined to reflect new “non-relational” models for organizing data, through a distributed architecture (it is not mandatory, though), and it should not be confused with the existing software NoSQL.

According to http://nosql-database.org/, NoSQL is

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more. So the misleading term “nosql” (the community now translates it mostly with “not only sql”) should be seen as an alias to something like the definition above.

The key concepts are high availability and data partitioning, which goes at the expense of consistency. Standard RDBMs rely on the ACID principle: Atomicity, Consistency, Isolation, and Durability. From the CAP theorem, we know that only two of the following three characteristics can be satisfied at the same time: consistency, partition tolerance, availability. Classical RDBMs (e.g., MySQL, Oracle) satisfy the first two. NoSQL relaxes the requirement on the consistency, and instead provide availability and partition tolerance. In contrast to ACID, we speak of BASE, which stands for Basically Available, Soft state, and Eventual consistency.

They often achieve performance by having far fewer features than SQL databases, but are useable with massive data sets like the ones found on Internet (e.g., Facebook).

Some current projects that reflect this philosophy of data management: Voldemort (LinkedIn); BigTable (Google); CouchDB; Redis; HBase (Java) provides BigTable-like support for Hadoop; Neptune based on ZooKeeper (Hadoop HDFS); Apache Cassandra (Facebook); MongoDB (see Opricot for an HTML-based GUI).

The Wikipedia entry is full of other interesting links, but see also NoSQL In The Cloud. I also found very interesting handouts on these various technologies on NOSQL debrief. Finally, a lot of papers can be found on http://nosqlsummer.org/.

See Also

» Venn diagrams and SQL joins in R