In the past we compared Scylla and Cassandra by running an equal number of servers. Back then, the common cassandra-stress tool was used. We were curious to discover what would be the results of a common workload generator for NoSQL databases—the YCSB (Yahoo! Cloud Serving Benchmark). It should validate our assumption about cluster size decrease when migrating from Cassandra to Scylla.
This report represents our results, comparing 3 Scylla nodes against 3, 9, 15 and 30 node Cassandra clusters. We compared both the Operations Per Second (OPS) and the latency, which many times is more important. For the impatient, our results speak for themselves.
A 3-node Scylla cluster executes 4.6X more OPS than a similar Cassandra cluster. Only a 30-node Cassandra cluster can level the throughput of the Scylla cluster of 1/10th the size. Yet the 1:10 gain is not the end of it. Latency measurement reveals that Scylla has 4X-10X better P99-latency advantage.
The benchmark tests 50:50 read write mix. We verified that the gains remain for all the other workloads of YCSB. In addition, we tested a 9-node Scylla cluster as well. There was no point testing 30-node Scylla machines since we would need 60 loader machines for them. A nice surprise arrived with the results of a larger data-set of 250 million rows. We promise to follow up with them in a subsequent report.
Scylla migration allows immediate reduction in cost of ownership, better application responsiveness and significant savings in DevOps labor time. Migration is straight forward as Scylla is fully compatible with Cassandra.
The benchmark used 10 loading servers, and each server ran 10 concurrent loading processes creating a total of 100 loading processes. The loading tool is YCSB 0.5. All servers are bare metal on the Rackspace cloud. The benchmark configuration of choice is the industry standard replication factor of 3, and consistency level QUORUM. Needless to say, the data is fully consistent and all writes reach the drive as appropriate. The table size was 10 million rows, with a row size of 1kb (10 columns). We started hammering the clusters with 10K OPS and increased the load up to 600K OPS in 10K steps.
We wish to thanks Sea Data and Alon Eldi specifically for the successful execution of this benchmark.
Hardware configurations were the same for Scylla and Cassandra. We used Rackspace OnMetal Servers.
Note: YCSB 0.5 does not support prepared statements, which is against best practices for both Scylla and Cassandra.
Scylla Version 0.17. For a list of RPMs used, see this GitHub Gist.
The following step by step setup explains how to set up Scylla and Cassandra clusters in the Rackspace cloud: Cassandra and Scylla setup on Rackspace.
We used Scylla setup scripts to tune common OS and disk configurations, for both databases. Details on the scripts are in the Scylla system configuration guide.
We added the
poll-mode option to SCYLLA_ARGS in /etc/sysconfig/scylla-server.
The following options were applied in the Cassandra configuration:
Concurrent_writes = 80
Each iteration of the benchmark, we ran the following steps:
echo 1 > /proc/sys/vm/drop_caches
The loaders load each cluster from 10K OPS to 600K OPS incremented by 10K OPS with each measurement, creating 60 observation points. Before each load, data and cache were cleaned. Each load ran for 15 minutes.
The following workloads were tested: Scylla 3 Nodes vs. Cassandra 3, 9, 15, and 30 nodes, using replication factor 3 and consistency level QUORUM.
The operations were 50% Reads and 50% writes. (view YCSB Workloads.)
The following Python script and config file were used for the load: view on GitHub. We ran 10 client processes on each of 10 client systems for a total of 100 processes. This was in order to create a large number of connections so that all of Scylla’s internal shards would be loaded. Load can be monitored using the Scyllatop utility.
Results were gathered from YCSB summary log files. For latency measurements, an average value was calculated, and for OPS the results for all 100 client processes were aggregated. The latency measure is in microseconds.
This chart shows the number of OPS that we asked the clients to execute, the OPS load, compared to the OPS that the cluster actually performed. Comparing OPS load to OPS actual is a good method to observe the maximum throughput of the cluster and to realize what happens when we pass this point.
3 Scylla nodes perform linearly until 550k OPS. A matching cluster of 3 Cassandra nodes cannot do more than 120k OPS. We were surprised to discover that even a cluster of 15 Cassandra nodes cannot meet the throughput of our small Scylla cluster. It’s important to note that Scylla scales out as well as it scales up and one can easily create a cluster of 30 Scylla machines. However in this benchmark we want to measure Scylla vs. Cassandra and thus we’ll leave large Scylla cluster tests for the future. Let’s proceed onward for the latency metrics which were collected during the above test.
The graph shows the average latency of each of the clusters, again with a different throughput load. As expected, the latency grows as a function of the OPS. Scylla presented outstanding results with a flat line, well below 1ms for almost the entire OPS range. It means that if you run a Scylla cluster below 50% max load, you can expect the best latencies in the industry.
The latency figures for the update operations are similar.
Scylla shines even more when we analyze the 99th percentile latency (the time within which you expect 99% of requests to return, ignoring 1% of outliers). Cassandra’s 99th percentile latencies are between 15ms to 25ms, while Scylla performs way below this number with more OPS and less hardware.
Let’s see whether a 30 Cassandra node cluster can be competitive with our rock-solid 3 node Scylla cluster. We repeated the scenario with it:
Finally a proper contender was found. The maximum OPS for a 30 node Cassandra cluster is 630K. A little more than the 3 Scylla node cluster, which hit “only” 550K. However, is the fight over? Lets observe the latency measurements and see who has the upper hand.
Through most of the range Scylla outperforms Cassandra by far. Scylla latency is 1/3 of the Cassandra one. Only at the maximum throughput level are the results the same.
Here the situation is similar. Only above about 500K OPS can a 30-node Cassandra cluster match the latency of a 3-node Scylla cluster. Average latency, though, is a misleading metric. Typical latency goals for a real-world application are more likely to focus on worst-case or 99th percentile latencies. If your site uses 12 queries per page, a user who visits 5 pages in a session will have a 45% chance (1-(0.9960)) of experiencing that “rare” 99th percentile latency.
Again, in the P99 graph, the difference is quite large and it’s clear that Scylla is the big winner of the whole benchmark where even a 30-node Cassandra cluster cannot meet the low consistent latency of Scylla. This is no different from the 3, 9, or 15 node Cassandra clusters.
A 3-node Scylla cluster is comparable to a 30-node Cassandra cluster in terms of throughput, while providing consistent low latencies, way below Cassandra. Scylla allows a 1:10 ratio reduction in the total cost of ownership, while providing better latency results. Maintenance cost is reduced as well – hardware’s MTBF (Mean Time Between Failures) is constant, so the frequency of failures reduces by a factor of 10! Now, post our 1.0 release, we encourage you to migrate to Scylla today, improving your application latency and end-user satisfaction.
Getting started takes only a few minutes. Scylla has an installer for every major platform and is well documented. If you get stuck, we’re here to help.