NOSQL

Introduction:

We all know RDBMS & we were pretty much happy with them. Transactions in RDBMS are well protected, recovery was good & we are able to come back from failures pretty well & the row-column structure was a very good data model which kind of structured the data well while loading it so we can retrieve the data easily. So what was the need for a new theory called NoSQL? Well the motivation really was due to the huge unstructured data building up. And loading this huge data into an RDBMS with a schema structure was a challenge. And secondly, RDBMS systems scaled up well (you provide more memory & CPU resources it performs well) but they really could not scale out (horizontally by adding more machines to them). And lastly, RDBMS focus more on data consistency rather than the performance. When you stress more on consistency, there is an impact on performance. This blog discussed some basic theories NoSQL systems are built upon. This blog also kind of sets a base for generating further interest into these systems.

CAP Theorem:

Let us get into something called CAP theorem. Consistency, Availability & Partitioning. These are some of the dimensions which provide an insight into a particular DBMS system. NoSQL systems kind of fall into the category of Availability-Partitioning. So they are not really worried about data consistency which the RDBMS are very careful about.

Eventual Consistency:

RDBMS systems follow the ACID (atomicity, consistency, isolation, durability) rule for a transaction. The system will ensure that any data modification made by a transaction is consistent & everyone sees the same data by implementing locks eventually leading to deadlocks. They follow different isolation levels to ensure this. In contrast, NoSQL systems say that your data will be eventually consistent. What this means is, the data updated at a single point in time may not be consistent across all copies but eventually it is consistent. But it ensures that the data is available & your NoSQL instance is scalable to multiple nodes unlike RDBMS systems which do cannot scale out to thousands of servers. Like ACID, NoSQL systems follow a not very popular theory called BASE (basically available, soft state & eventually consistent). So your data is available even after multiple failures, abandons the consistency requirement of RDBMS & is eventually consistent at some point in time without a guarantee to when that time would be.

Major impacting NoSQL systems:

Some of the systems made a major impact in the NoSQL systems & most of the systems follow the same model from these three systems:

Memcached – First demonstrated the idea that in memory indexes can be highly scalable, distributing & replicating to multiple nodes. It uses consistent hashing technique.

Dynamo – Developed at Amazon, demonstrated the idea that eventual consistency is a way to achieve high availability & scalability.

BigTable – Developed by Google, demonstrated that persistent record storage can be scaled to thousands of nodes.

Cassandra:

Apache Cassandra is kind of a hybrid DB which takes the idea of Dynamo for distributed design & follows the data model of BigTable.

Major features of Cassandra include:

Decentralization
Linear scalability
Tunable consistency
Map – reduce support (also supports pig & hive)

Comparison between Cassandra & RDBMS

Apache Cassandra is an open source NoSQL, distributed data system. Some of the features are given as below:

CASSANDRA	RDBMS (MSSQLServer)
Can scale out to thousands of servers to store huge amounts of data which does not fit in a single server	Only can scale up for better performance. Meaning, performance is dependent on RAM & CPU but cannot have multiple servers handle huge amounts of data
Support primary indexing based on which data gets partitioned between different nodes	Primary indexing is supported. A B tree index structure is created
Supports secondary indexes for faster data retrieval. Used in memory indexing concept to store index details in memory	Secondary indexes are supported again as B tree index structure
Columnar data model.	Row oriented data models
Does not support ACID properties of a transaction.	ACID properties are supported
Data is generally de-normalized for better performance	Normalized data to avoid redundancy.
Highly tunable. Cassandra can be tuned at any level based on application requirement	Can be tuned at some level without compromising on ACID properties of a transaction
Not recommended if data is expected to be high consistent like financial / banking data.	Recommended if consistency is required
Does not support Adhoc queries from application side. The queries need to be planned & known from the design	Supports Adhoc queries
No single point of failure. Application can still work if nodes are down. (highly available by design)	Server is the single point if failure. If server goes down due to some reason, DB is down. (need to setup standby for high availability)

Conclusion:

So after reading all this one thing is pretty sure. RDBMS is not going away as there are many applications which require the data to be consistent & cannot afford to rely on inconsistent data at any point in time. NoSQL systems do a great job in scaling & providing great performance with fault tolerance.

Learn tech with Ravi

Search This Blog

NOSQL

Introduction:

CAP Theorem:

Eventual Consistency:

Major impacting NoSQL systems:

Cassandra:

Comparison between Cassandra & RDBMS

CASSANDRA

RDBMS (MSSQLServer)

Conclusion:

Comments

Post a Comment

Popular posts from this blog

Cloud burst

Openstack Swift