Cassandra vs Hbase

Following is brief rant on why Cassandra is better than Hbase.

  • Cassandra has been built from the ground up to be a scalable data store. Whereas Hbase runs on top of HDFS which isn't designed for random reads. HBase manages to support random reads/writes with few tricks memory caching and asynchronous read/write and compaction. It is a roundabout way to do things and certainly not as efficient as Cassandra.

  • Hbase requires a multiple servers/services to run (Hbase Master, RegionServers, HDFS Namenodes, Zookeeper, Apache Phoenix [optional]) and hence this makes it very brittle. Cassandra is more highly available than its counterpart and has no single point of failure.

  • Cassandra's support for tunable consistency will be very handy for to have fast efficient write and consistent read or consistent writes and fast reads (highly configurable)

  • Cassandra has more affinity towards Spark than Hbase. Hbase is more commonly associated with Hadoop MR. It is already proved Spark is supremely advanced than Hadoop Map-Reduce and Cassandra/Spark is much better tech stack on the whole than Hbase/Hadoop MR.

  • Cassandra sports finer data block size like (256 K blocks) whereas Hbase has something like 64Mb (since it runs on HDFS). We can reduce the HDFS block size but it would increase the load on the namenode server of HDFS (which is again single point of failure). Cassandra has no single point of failures.

  • Cassandra has more contributor and more activity going on its development than Hbase. This translates to better community support, better tooling and library and faster addition of features and bug-fixes.

  • JVM languages are first class citizens in Hbase. Though other languages are supported via Thrift/Rest there is a significant penalty in performance. Cassandra on the other hand has no bias to any language and supports all.

  • Cassandra's declarative CQL api is advanced than the imperative Hbase API.

  • Cassandra has more adoption than Hbase and the general industry trend is swinging in favour of Cassandra over Hbase.

  • The Cassandra website states that Cassandra is ahead of Hbase by 3 three years in many aspects and features.

To conclude Cassandra is simply a better piece of software which was designed from the ground up to be a parallel, scalable, fault tolerent and have efficient read/writes. Hbase on the other side is a system that runs on top of HDFS which was not designed for random read/write and which is not truly linearly scalable (because of its assymmetric architecture with Namenodes as single point of failure). Many benchmark test has confirmed that Cassandra scales better than Hbase and is much faster than Hbase. Cassandra is emerging as the clear winner in the No-SQL wars against Hbase. Cassandra is going to get more adoption and more support in the community and is going to add features at the much faster pace than Hbase.

Comments