Apache Cassandra recently incremented its major version from 3 to 4 after nearly six years of work. Six years encompasses almost an entire technology cycle, with new Java virtual machines, new system kernels, new hardware, new libraries and even new algorithms. Progress in these areas presented the engineers behind Cassandra with an unprecedented opportunity to achieve new levels of performance. Did they seize it?
As engineers behind ScyllaDB, a Cassandra-compatible open source database designed from the ground up for extremely high throughput and low latency, we were curious about the performance of Cassandra 4.0. Specifically, we wanted to understand how far Cassandra 4.0 performance advanced versus Cassandra 3.11, and against ScyllaDB Open Source 4.4.3. So we put them all to the test.
Cassandra 4.0 is an advancement from Cassandra 3.11. It is clear that Cassandra 4.0 has aptly piggy-backed on advancements to the JVM, and upgrading from Cassandra 3.11 to Cassandra 4.0 will benefit many use cases.
In our test setup, Cassandra 4.0 showed a 25% improvement for a write-only disk-intensive workload and 33% improvements for cases of read-only with either a low or high cache hit rate. Otherwise, the maximum throughput between the two Cassandra releases was relatively similar.
However, most workloads won’t be executed in maximum utilization and the tail latency in max utilization is usually not good. In our tests, we marked the throughput performance at a service-level agreement of under 10 millisecond in P90 and P99 latency. At this service level Cassandra 4.0, powered by the new JVM/GC (JVM garbage collection), can perform twice that of Cassandra 3.0. Outside of sheer performance, we tested a wide range of administrative operations, from adding nodes, doubling a cluster, node removal and compaction, all of them under emulated production load. Cassandra 4.0 improves these admin operation times up to 34%.
But for data-intensive applications that require ultra-low latency with extremely high throughput, consider other options such as ScyllaDB, the fastest NoSQL Database. ScyllaDB provides the same Cassandra Query Language (CQL) interface and queries, the same drivers, even the same on-disk SSTable format, but with a modern architecture designed to eliminate Cassandra performance issues, limitations and operational barriers. ScyllaDB consistently and significantly outperformed Cassandra 4.0 on our benchmarks. On identical hardware, ScyllaDB withstood up to 5x greater traffic and offered lower latencies than Apache Cassandra 4.0 in almost every tested scenario. ScyllaDB also completed admin tasks 2.5 to 4 times faster than Cassandra 4.0.
Moreover, ScyllaDB’s feature set goes beyond Cassandra’s in many respects. The bottom line: Cassandra’s performance improved since its initial release in 2008, but ScyllaDB has lept ahead of Cassandra with its shared-nothing, shard-per-core architecture that takes full advantage of modern infrastructure and networking capabilities.