Introduction:
We all know RDBMS & we were
pretty much happy with them. Transactions in RDBMS are well protected, recovery
was good & we are able to come back from failures pretty well & the
row-column structure was a very good data model which kind of structured the
data well while loading it so we can retrieve the data easily. So what was the
need for a new theory called NoSQL? Well the motivation really was due to the
huge unstructured data building up. And loading this huge data into an RDBMS
with a schema structure was a challenge. And secondly, RDBMS systems scaled up
well (you provide more memory & CPU resources it performs well) but they
really could not scale out (horizontally by adding more machines to them). And
lastly, RDBMS focus more on data consistency rather than the performance. When
you stress more on consistency, there is an impact on performance. This blog
discussed some basic theories NoSQL systems are built upon. This blog also kind
of sets a base for generating further interest into these systems.
CAP Theorem:
Let us get into something called
CAP theorem. Consistency, Availability & Partitioning. These are some of
the dimensions which provide an insight into a particular DBMS system. NoSQL
systems kind of fall into the category of Availability-Partitioning. So they
are not really worried about data consistency which the RDBMS are very careful
about.
Eventual Consistency:
RDBMS systems follow the ACID
(atomicity, consistency, isolation, durability) rule for a transaction. The
system will ensure that any data modification made by a transaction is
consistent & everyone sees the same data by implementing locks eventually
leading to deadlocks. They follow different isolation levels to ensure this. In
contrast, NoSQL systems say that your data will be eventually consistent. What
this means is, the data updated at a single point in time may not be consistent
across all copies but eventually it is consistent. But it ensures that the data
is available & your NoSQL instance is scalable to multiple nodes unlike
RDBMS systems which do cannot scale out to thousands of servers. Like ACID,
NoSQL systems follow a not very popular theory called BASE (basically
available, soft state & eventually consistent). So your data is available
even after multiple failures, abandons the consistency requirement of RDBMS
& is eventually consistent at some point in time without a guarantee to
when that time would be.
Major impacting NoSQL systems:
Some of the systems made a major
impact in the NoSQL systems & most of the systems follow the same model
from these three systems:
Memcached – First
demonstrated the idea that in memory indexes can be highly scalable,
distributing & replicating to multiple nodes. It uses consistent hashing
technique.
Dynamo – Developed
at Amazon, demonstrated the idea that eventual consistency is a way to achieve
high availability & scalability.
BigTable – Developed
by Google, demonstrated that persistent record storage can be scaled to
thousands of nodes.
Cassandra:
Apache Cassandra is kind of a
hybrid DB which takes the idea of Dynamo for distributed design & follows
the data model of BigTable.
Major
features of Cassandra include:
- Decentralization
- Linear scalability
- Tunable consistency
- Map – reduce support (also supports pig & hive)
Comparison between Cassandra & RDBMS
Apache Cassandra is an open source NoSQL, distributed data
system. Some of the features are given as below:
CASSANDRA |
RDBMS (MSSQLServer) |
|
Can scale out to thousands of servers to store huge amounts of data
which does not fit in a single server
|
Only can scale up for better performance. Meaning, performance is
dependent on RAM & CPU but cannot have multiple servers handle huge
amounts of data
|
|
Support primary indexing based on which data gets partitioned between
different nodes
|
Primary indexing is supported. A B tree index structure is created
|
|
Supports secondary indexes for faster data retrieval. Used in memory
indexing concept to store index details in memory
|
Secondary indexes are supported again as B tree index structure
|
|
Columnar data model.
|
Row oriented data models
|
|
Does not support ACID properties of a transaction.
|
ACID properties are supported
|
|
Data is generally de-normalized for better performance
|
Normalized data to avoid redundancy.
|
|
Highly tunable. Cassandra can be tuned at any level based on
application requirement
|
Can be tuned at some level without compromising on ACID properties of
a transaction
|
|
Not recommended if data is expected to be high consistent like
financial / banking data.
|
Recommended if consistency is required
|
|
Does not support Adhoc queries from application side. The queries
need to be planned & known from the design
|
Supports Adhoc queries
|
|
No single point of failure. Application can still work if nodes are
down. (highly available by design)
|
Server is the single point if failure. If server goes down due to
some reason, DB is down. (need to setup standby for high availability)
|
Conclusion:
So after
reading all this one thing is pretty sure. RDBMS is not going away as there are
many applications which require the data to be consistent & cannot afford
to rely on inconsistent data at any point in time. NoSQL systems do a great job
in scaling & providing great performance with fault tolerance.

Comments
Post a Comment