Jul19

Measuring Performance Improvements in Scylla 2.2

Subscribe to Our Blog

performance improvements

When we released Scylla 2.2, we announced that it includes many performance (throughput, latency) improvements. We’re glad for the opportunity to quantify some of those improvements. In a recent post, we described a large partition use case with the improved query paging of Scylla 2.2. In this blog post, we will put Scylla 2.2 to the test against Scylla 2.1, comparing the two versions with read and write workloads. This post is a collaborative effort between Larisa Ustalov and myself, with the help of many others.

Highlights of Our Results:

  • Read benchmark: 42% reduction in 99th percentile latency with 1kB cells
  • Write benchmark: 18% throughput increase

Workloads Tested

We tested with two Scylla use cases: Write and Read. Details, including a description of the workload and results of each workload, are included below.

Read Workload – Latency Test

We tested the impact of Scylla 2.2 on read workload latencies, and in particular, the improvement from changing the row digest hash from md5 to xxHash #2884. To isolate the latency change, the load throughput was fixed to 80K/s, resulting with a CPU load of ~50%. You can find complete details on the test setup in the appendix below.

Source: http://docs.scylladb.com/architecture/anti-entropy/read-repair/

Results – Extracted from cassandra-stress

Latencies (lower is better)

Scylla 2.1 Scylla 2.2 Improvement
Mean latency 2.6 ms 1.8 ms 23%
95% latency 4.9 ms 3.3 ms 32%
99% latency 7.8 ms 4.5 ms 42%

Scylla 2.1 – 99% latency over time: (each line represents one node in the cluster)

Scylla 2.2 – 99% latency over time:

Summary: Running the same workload on Scylla 2.2 results in lower read latency than Scylla 2.1

Write Workload

The write workload was designed to test the effect of the new CPU controller on Scylla 2.2. The impact of the controller is greater when Scylla is fully loaded and needs to balance resources between background tasks, like compactions, foreground tasks, and write requests. To test that, we injected the maximum throughput, writing 500GB of data sequentially. Complete details of test setup are in the appendix below.

Results

Average operations per second for the entire test.

Scylla 2.1 Scylla 2.2 Improvement
Ops ~354K ~418K +18%

Throughput over time:

Scylla 2.1

Scylla 2.2

The initial decline in throughput in the first ~15 minutes is expected. As more data accumulates on the storage, compactions kick-in and take resources away from the real-time requests. The difference between the releases is the controller. In Scylla 2.2, it is doing a better job of stabilizing the system and provides more consistent throughput during compactions. This effect is more evident when looking at the number of concurrent compactions. Compared to Scylla 2.1, Scylla 2.2 more consistently runs the same number of compactions, resulting in smoother performance.

Scylla 2.1 – Number of active compactions

Scylla 2.2 – Number of active compactions

Summary: Using the same setup, Scylla 2.2 can handle higher write throughput than Scylla 2.1

Conclusions

Our performance comparison of Scylla 2.2 and Scylla 2.1 demonstrates significant improvements with write throughput and read latency for two simplistic use cases. Stay tuned for additional benchmarks of Scylla 2.2 with future releases.

Next Steps

  • Scylla Summit 2018 is around the corner. Register now!
  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.

Appendix – Test Setup

  • Scylla Cluster
    • Nodes: 3
    • Instance type: I3.8xlarge
    • Scylla 2.1.6 AMI (, us-east-1)
    • Scylla 2.2rc3 AMI (ami-917521ee, us-east-1)
  • Loaders
    • Servers: 4
    • Instance type: c4.4xlarge
    • Workloads (all of them)
    • Replication Factor(RF) = 3
    • Consistency Level(CL) = QUORUM
    • Compaction Strategy: Size-Tiered

Read Workload

  • Data: 1,000,000,000 (1 Billion) keys with 1,024 bytes each (raw data 1 TB)
    cassandra-stress(c-s) command used to populate data:
  • 4 loaders, each running 150 threads, limit to a value – 20000/s
  • cassandra-stress command used to populate the data:
    cassandra-stress write no-warmup cl=QUORUM n=1000000000 -schema 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate threads=200 -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=1..1000000000"
  • cassandra-stress command used to read the data:
    cassandra-stress read no-warmup cl=QUORUM duration=100m -schema keyspace=keyspace$2 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=150 limit=20000/s' -errors ignore -col 'size=FIXED(1024) n=FIXED(1)' -pop 'dist=gauss(1..750000000,500000000,250000000)'

Write Workload

  • Date: 10^9 keys with 10^3 bytes each (raw data 1 TB)
  • 4 loaders, each running 1,000 threads
  • cassandra-stress command:
    cassandra-stress write no-warmup cl=QUORUM duration=120m -schema keyspace=keyspace$2 'replication(factor=3)' -port jmx=6868 -mode cql3 native -rate 'threads=1000' -errors ignore -pop 'seq=1..500000000'

Tzach LivyatanAbout Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.


Tags: latency, performance, release, scylla