Measuring Performance Improvements in Scylla 2.2
When we released Scylla 2.2, we announced that it includes many performance (throughput, latency) improvements. We’re glad for the opportunity to quantify some of those improvements. In a recent post, we described a large partition use case with the improved query paging of Scylla 2.2. In this blog post, we will put Scylla 2.2 to the test against Scylla 2.1, comparing the two versions with read and write workloads. This post is a collaborative effort between Larisa Ustalov and myself, with the help of many others.
Highlights of Our Results:
- Read benchmark: 42% reduction in 99th percentile latency with 1kB cells
- Write benchmark: 18% throughput increase
We tested with two Scylla use cases: Write and Read. Details, including a description of the workload and results of each workload, are included below.
Read Workload – Latency Test
We tested the impact of Scylla 2.2 on read workload latencies, and in particular, the improvement from changing the row digest hash from md5 to xxHash #2884. To isolate the latency change, the load throughput was fixed to 80K/s, resulting with a CPU load of ~50%. You can find complete details on the test setup in the appendix below.
Results – Extracted from cassandra-stress
Latencies (lower is better)
|Scylla 2.1||Scylla 2.2||Improvement|
|Mean latency||2.6 ms||1.8 ms||23%|
|95% latency||4.9 ms||3.3 ms||32%|
|99% latency||7.8 ms||4.5 ms||42%|
Scylla 2.1 – 99% latency over time: (each line represents one node in the cluster)
Scylla 2.2 – 99% latency over time:
Summary: Running the same workload on Scylla 2.2 results in lower read latency than Scylla 2.1
The write workload was designed to test the effect of the new CPU controller on Scylla 2.2. The impact of the controller is greater when Scylla is fully loaded and needs to balance resources between background tasks, like compactions, foreground tasks, and write requests. To test that, we injected the maximum throughput, writing 500GB of data sequentially. Complete details of test setup are in the appendix below.
Average operations per second for the entire test.
|Scylla 2.1||Scylla 2.2||Improvement|
Throughput over time:
The initial decline in throughput in the first ~15 minutes is expected. As more data accumulates on the storage, compactions kick-in and take resources away from the real-time requests. The difference between the releases is the controller. In Scylla 2.2, it is doing a better job of stabilizing the system and provides more consistent throughput during compactions. This effect is more evident when looking at the number of concurrent compactions. Compared to Scylla 2.1, Scylla 2.2 more consistently runs the same number of compactions, resulting in smoother performance.
Scylla 2.1 – Number of active compactions
Scylla 2.2 – Number of active compactions
Summary: Using the same setup, Scylla 2.2 can handle higher write throughput than Scylla 2.1
Our performance comparison of Scylla 2.2 and Scylla 2.1 demonstrates significant improvements with write throughput and read latency for two simplistic use cases. Stay tuned for additional benchmarks of Scylla 2.2 with future releases.
- Scylla Summit 2018 is around the corner. Register now!
- Learn more about Scylla from our product page.
- See what our users are saying about Scylla.
- Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.
- Take Scylla for a Test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.
Appendix – Test Setup
- Scylla Cluster
- Nodes: 3
- Instance type: I3.8xlarge
- Scylla 2.1.6 AMI (, us-east-1)
- Scylla 2.2rc3 AMI (ami-917521ee, us-east-1)
- Servers: 4
- Instance type: c4.4xlarge
- Workloads (all of them)
- Replication Factor(RF) = 3
- Consistency Level(CL) = QUORUM
- Compaction Strategy: Size-Tiered
- Data: 1,000,000,000 (1 Billion) keys with 1,024 bytes each (raw data 1 TB)
cassandra-stress(c-s) command used to populate data:
- 4 loaders, each running 150 threads, limit to a value – 20000/s
- cassandra-stress command used to populate the data:
cassandra-stress write no-warmup cl=QUORUM n=1000000000 -schema 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate threads=200 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..1000000000"
- cassandra-stress command used to read the data:
cassandra-stress read no-warmup cl=QUORUM duration=100m -schema keyspace=keyspace$2 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=150 limit=20000/s' -errors ignore -col 'size=FIXED(1024) n=FIXED(1)' -pop 'dist=gauss(1..750000000,500000000,250000000)'
- Date: 10^9 keys with 10^3 bytes each (raw data 1 TB)
- 4 loaders, each running 1,000 threads
- cassandra-stress command:
cassandra-stress write no-warmup cl=QUORUM duration=120m -schema keyspace=keyspace$2 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=1000' -errors ignore -pop 'seq=1..500000000'