Scylla vs Apache Cassandra – Performance Benchmark by Samsung
Samsung MSL (Memory Solutions Lab) recently released benchmark results from a YCSB evaluation they conducted. We are thrilled to share Samsung’s results, which reiterate previous benchmark findings that Scylla performs 10X better than Apache Cassandra. If you have high-end hardware, you can expect the same results. On smaller machines, the difference is in the range of 1.5X to 3X. We recommend using larger machines to reduce both your node count and your Total Cost of Ownership.
Test Methodology (tools, setup, and configuration)
The Scylla cluster consisted of three servers and nine machines as YCSB clients. Each server was equipped with four NVMe SSDs with an XFS filesystem organized into a level 0 software RAID. The database was populated with a 2TB dataset, replicated across three servers and compression disabled. An explicit effort was made to set up a tuned Apache Cassandra 3.9 and Java 1.8 with a G1 garbage collection configuration. Four different YCSB workloads were used:
|A – Update Heavy||Read: 50%, Update:50%||Session store recording recent actions in a user session|
|B – Read Heavy||Read: 95%, Update:5%||Photo tagging: can add a tag in an update, but most operations are to read tags|
|C – Read Only||Read: 100%||User profile cache, where profiles are constructed elsewhere (e.g. Hadoop)|
|D – Read Latest||Read 95%, Insert: 5%||User status update|
Comparing the performance of ScyllaDB versus Apache Cassandra, using the same 2TB dataset and run over two hours, demonstrates Scylla outperforms Apache Cassandra by a staggering 10X to 37X factor.
The Samsung team ran Apache Cassandra with a small, 50GB, dataset fitting in the server RAM and compared to ScyllaDB running with a 2TB dataset with 100% hit rate. The results show Scylla performs faster by a factor of 4.4X to 8.6X than Apache Cassandra, while Scylla stores 40X the data.
Moreover, the Samsung team repeated the test, this time with ScyllaDB running the 2TB dataset with only 60% hit rate (i.e. NVMe SSDs are serving 40% of the requests) and still, Scylla performs faster by a factor of 2.3X to 3X Apache Cassandra while storing 40X more data.
Measuring Scylla Latency
The Samsung team selected a load of 50% of the maximum throughput, the top anticipated working range.
They measured latency for each workload, starting from ~60-80% hit rate and up to ~100% hit rate. The results vary between 0.6ms to 2.6ms depending on the hit rate and the workload. Yes, a cluster of 3 machines with a replication factor of 3 can do multiple 100k IOPS with a millisecond latency using a 10-column, 1KB schema.
|Workload||Hit rate % and Avg. Latency||Hit rate % and Avg. Latency|
|A||Hit rate: 58%
Latency: 2.64 milisec
|Hit rate: 97%
Latency: 1.56 milisec
|B||Hit rate: 64%
Latency: 1.72 milisec
|Hit rate: 97%
Latency: 0.69 milisec
|C||Hit rate: 64%
Latency: 1.71 milisec
|Hit rate: 99%
Latency: 0.46 milisec
|D||Hit rate: 82%
Latency: 1.25 milisec
|Hit rate: 98%
Latency: 0.61 milisec
Download Samsung’s full report, ‘ScyllaDB and Samsung NVMe SSDs Accelerate NoSQL Database Performance‘ to get all the details.
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.