YCSB cluster benchmark

YCSB testing reveals Scylla latency and performance advantages

In the past we compared Scylla and Cassandra by running an equal number of servers. Back then, the common cassandra-stress tool was used. We were curious to discover what would be the results of a common workload generator for NoSQL databases—the YCSB (Yahoo! Cloud Serving Benchmark). It should validate our assumption about cluster size decrease when migrating from Cassandra to Scylla.

This report represents our results, comparing 3 Scylla nodes against 3, 9, 15 and 30 node Cassandra clusters. We compared both the OPS (operations per second) and the latency, which many times is more important. For the impatient, our results speak for themselves.

A 3 node Scylla cluster executes 4.6X more OPS than a similar Cassandra cluster. Only a 30 node Cassandra cluster can level the throughput of the Scylla cluster of 1/10th the size. Yet the 1:10 gain is not the end of it. Latency measurement reveals that Scylla has 4X-10X better P99-latency advantage.

The benchmark tests 50:50 read write mix. We verified that the gains remain for all the other workloads of YCSB. In addition, we tested 9 node Scylla cluster as well. There was no point testing 30 node Scylla machines since we would need 60 loader machines for them. A nice additional surprise arrived with the results of a larger data-set of 250 million rows. We promise to follow up with them in a subsequent report.

Scylla migration allows immediate reduction in cost of ownership, better application responsiveness and significant saving in Dev-Ops labor time. Migration is straight forward as Scylla is fully compatible with Cassandra.

Setup and installation

The benchmark used 10 loading servers, and each server ran 10 concurrent loading processes creating a total of 100 loading processes. The loading tool is YCSB 0.5. All servers are bare metal on the Rackspace cloud. The benchmark configuration of choice is the industry standard replication factor of 3, and consistency level QUORUM. Needless to say, the data is fully consistent and all writes reach the drive as appropriate. The table size was 10 million rows, with a row size of 1kb (10 columns). We started hammering the clusters with 10K OPS and increased the load up to 600K OPS in 10K steps.

We wish to thanks Sea Data and Alon Eldi specifically for the successful execution of this benchmark.

Server hardware

Hardware configurations were the same for Scylla and Cassandra. We used Rackspace OnMetal Servers.

  • Server type: Rackspace Bare Metal IO Class v1
  • CPU: Dual 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2
  • RAM: 128 GB
  • Networking: Redundant 10 Gb/s connections in a high availability bond
  • Data Disks: 2 * 1.6 TB PCIe flash cards
  • System Disk: 32 GB
  • OS: CentOS 7.2.1511
  • Kernel version: 3.10.0-327.10.1.el7.x86_64
  • Java
    • Cassandra – Oracle jdk-8u65
    • Scylla – Open JDK 1.8 (used only for scylla-jmx)

Client Loader hardware

  • Server type: Rackspace Bare Metal Compute Class V1
  • CPU: 2.8 GHz, 10 core Intel® Xeon® E5-2680 v2
  • RAM: 32 GB
  • Networking: Redundant 10 Gb/s connections in a high availability bond
  • System Disk: 32 GB
  • OS: CentOS 7.2.1511
  • Kernel version: 3.10.0-327.10.1.el7.x86_64
  • Java: (JDK version) Openjdk 1.8.0_71

Note: YCSB 0.5 does not support prepared statements, which is against best practices for both Scylla and Cassandra.

DB Versions


Scylla Version 0.17. For a list of RPMs used, see this GitHub Gist.


Apache Cassandra 2.2.5

Configuration and Installation

The following step by step setup explains how to set up Scylla and Cassandra clusters in the Rackspace cloud: Cassandra and Scylla setup on Rackspace.

OS and System Disk Tuning

We used Scylla setup scripts to tune common OS and disk configurations, for both databases. Details on the scripts are in the Scylla system configuration guide.

  1. scylla_raid_setup – Create RAID 0 from 2 SSD disks, using the XFS filesystem, with a size of 3.2 TB.
  2. posix_net_conf.sh – Optimize IRQ handling and set higher values for a listen() socket backlog
  3. scylla_bootparam_setup – set common kernel options
  4. scylla_ntp_setup – Configure NTP servers

Scylla tuning

We added the poll-mode option to SCYLLA_ARGS in /etc/sysconfig/scylla-server.

Cassandra tuning

The following options were applied in the Cassandra configuration:

Concurrent_writes = 80

Workload Method

Each iteration of the benchmark, we ran the following steps:

  1. Stop the cluster.
  2. Clean the database state by removing all data files and commit log.
  3. Clean OS cache with the command: echo 1 > /proc/sys/vm/drop_caches
  4. Start the cluster.
  5. Create YCSB keyspace and the table usertable.
  6. Insert 10 million rows into usertable.
  7. Run the OPS load.

OPS load

The loaders load each cluster from 10K OPS to 600K OPS incremented by 10K OPS with each measurement, creating 60 observation points. Before each load, data and cache were cleaned. Each load ran for 15 minutes.

The following workloads were tested: Scylla 3 Nodes vs. Cassandra 3, 9, 15, and 30 nodes, using replication factor 3 and consistency level QUORUM.

The operations were 50% Reads and 50% writes. (view YCSB Workloads.)

Load script and config file

The following Python script and config file were used for the load: view on GitHub. We ran 10 client processes on each of 10 client systems for a total of 100 processes. This was in order to create a large number of connections so that all of Scylla’s internal shards would be loaded. Load can be monitored using the Scyllatop utility.


Results were gathered from YCSB summary log files. For latency measurements, an average value was calculated, and for OPS the results for all 100 client processes were aggregated. The latency measure is in microseconds.

OPS – Aggregation from All YCSB client results

Cassandra and Scylla performance

This chart shows the number of OPS that we asked the clients to execute, the OPS load, compared to the OPS that the cluster actually performed. Comparing OPS load to OPS actual is a good method to observe the maximum throughput of the cluster and to realize what happens when we pass this point.

3 Scylla nodes perform linearly until 550k OPS. A matching cluster of 3 Cassandra nodes cannot do more than 120k OPS. We were surprised to discover that even a cluster of 15 Cassandra nodes cannot meet the throughput of our small Scylla cluster. It’s important to note that Scylla scales out as well as it scales up and one can easily create a cluster of 30 Scylla machines. However in this benchmark we want to measure Scylla vs. Cassandra and thus we’ll leave large Scylla cluster tests for the future. Let’s proceed onward for the latency metrics which were collected during the above test.

Average Read Latency (microseconds)

Cassandra and Scylla performance

The graph shows the average latency of each of the clusters, again with a different throughput load. As expected, the latency grows as a function of the OPS. Scylla presented outstanding results with a flat line, well below 1ms for almost the entire OPS range. It means that if you run a Scylla cluster below 50% max load, you can expect the best latencies in the industry.

Average Update Latency (microseconds)

Cassandra and Scylla performance

The latency figures for the update operations are similar.

Read Latency 99 Percentile (microseconds)

Cassandra and Scylla performance

Scylla shines even more when we analyze the 99th percentile latency (the time within which you expect 99% of requests to return, ignoring 1% of outliers). Cassandra’s 99th percentile latencies are between 15ms to 25ms, while Scylla performs way below this number with more OPS and less hardware.

Update Latency 99 Percentile (microseconds)

Cassandra and Scylla performance

Results of 3 Scylla nodes vs 30 Cassandra nodes

Let’s see whether a 30 Cassandra node cluster can be competitive with our rock-solid 3 node Scylla cluster. We repeated the scenario with it:

Cassandra and Scylla performance

Finally a proper contender was found. The maximum OPS for a 30 node Cassandra cluster is 630K. A little more than the 3 Scylla node cluster, which hit “only” 550K. However, is the fight over? Lets observe the latency measurements and see who has the upper hand.

Average Read Latency

Cassandra and Scylla performance

Through most of the range Scylla outperforms Cassandra by far. Scylla latency is 1/3 of the Cassandra one. Only at the maximum throughput level are the results the same.

Average Update Latency

Cassandra and Scylla performance

Here the situation is similar. Only above about 500K OPS can a 30-node Cassandra cluster match the latency of a 3-node Scylla cluster. Average latency, though, is a misleading metric. Typical latency goals for a real-world application are more likely to focus on worst-case or 99th percentile latencies. If your site uses 12 queries per page, a user who visits 5 pages in a session will have a 45% chance (1-(0.9960)) of experiencing that “rare” 99th percentile latency.

99th Percentile Read and Update Latency

Cassandra and Scylla performance

Again, in the P99 graph, the difference is quite large and it’s clear that Scylla is the big winner of the whole benchmark where even a 30-node Cassandra cluster cannot meet the low consistent latency of Scylla. This is no different from the 3, 9, or 15 node Cassandra clusters.

Cassandra and Scylla performance


A 3-node Scylla cluster is comparable to a 30-node Cassandra cluster in terms of throughput, while providing consistent low latencies, way below Cassandra. Scylla allows a 1:10 ratio reduction in the total cost of ownership, while providing better latency results. Maintenance cost is reduced as well – hardware’s MTBF (Mean Time Between Failures) is constant, so the frequency of failures reduces by a factor of 10! Now, post our 1.0 release, we encourage you to migrate to Scylla today, improving your application latency and end-user satisfaction.