Next Gen database overview

2011-01-18

The NoSQL paradigm isn't a way to work with a database without having to connect to a server. It is merely a term coined to reflect new "non-relational" models for organizing data, through a distributed architecture (it is not mandatory, though), and it should not be confounded with the existing software NoSQL.

According to http://nosql-database.org/, NoSQL is

Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.

The key concepts are high availability and data partitioning, which goes at the expense of consistency. Standard RDBMs rely on the ACID principle: Atomicity, Consistency, Isolation, and Durability. From the CAP theorem, we know that only two of the following three characteristics can be satisfied at the same time: consistency, partition tolerance, availability. Classical RDBMs (e.g., MySQL, Oracle) satisfy the first two. NoSQL relaxes the requirement on the consistency, and instead provide availability and partition tolerance. In contrast to ACID, we speak of BASE, which stands for Basically Available, Soft state, and Eventual consistency.

They often achieve performance by having far fewer features than SQL databases, but are useable with massive data sets like the ones found on Internet (e.g., Facebook).

Some current projects that reflect this philosophy of data management: Voldemort (LinkedIn); BigTable (Google); CouchDB; Redis; HBase (Java) provides BigTable-like support for Hadoop; Neptune based on ZooKeeper (Hadoop HDFS); Apache Cassandra (Facebook); MongoDB (see Opricot for an HTML-based GUI).

The Wikipedia entry is full of other interesting links, but see also NoSQL In The Cloud. I also found very interesting handouts on these various technologies on NOSQL debrief. Finally, a lot of papers can be found on http://nosqlsummer.org/.

I should ultimately think about the way the NoSQL approach will impact data mining, probably in a next post.

---

Articles with the same tag(s):

Data cleaning techniques
eHealth, eTools and the like: Welcome to the 21st century
ODBC drivers on Mac OS X
Visualizing results from SQL queries
Circular displays for contingency tables

---