Aug14

Upcoming Improvements to Scylla Streaming Performance

Subscribe to Our Blog

Scylla Streaming

Streaming in Scylla is an internal operation that moves data from node to node over a network. It is the foundation of various Scylla cluster operations. For example, it is used by an “add node” operation to copy data to a new node in a cluster. It is also used by a “decommission” operation that removes a node from a cluster and streams the data it holds to the other nodes. Another example is a “rebuild” operation that rebuilds the data that a node should hold from replicas on other nodes. It is also used by a “repair” operation that is used to synchronize data between nodes.

In this blog post, we will take a closer look at how Scylla streaming works in detail and how the upcoming Scylla 2.4’s new streaming improves streaming bandwidth by 240% and reduces the time it takes to perform a “rebuild” operation by 70%.

How Scylla Streaming Works

The idea behind Scylla streaming is very simple. It’s all about reading data on one node, sending it to another node, and applying the data to disk. The diagram below shows the current path of data streaming. The sender creates sstable readers to read the rows from sstables on disk and sends them over the network. The receiver receives the rows from the network and writes them to the memtable. The rows in memtable are flushed into sstables periodically or when the memtable is full.

How We’re Improving Scylla Streaming

You can see that on the receiver side, the data is applied to a memtable. In normal CQL writes, memtables help by sorting the CQL writes in the form of mutations. When the CQL writes in memtables are flushed to an SSTable, the mutations are sorted. However, when streaming, the mutations are already sorted. This is because they are in the same order when the sender reads the mutations from the SSTable. That’s great! We can remove the memtable from the process. The advantages are:

  • Less memory consumption. The saved memory can be used to handle your CQL workload instead.
  • Less CPU consumption. No CPU cycles are used to insert and sort memtables.
  • Bigger SSTables and fewer compactions. Once the Memtable is full, it is flushed to an SSTable on disk. This happens during the whole streaming process repeatedly, thus generating many smaller SSTable files. This volume of SSTables adds pressure to compaction. By removing the memtable from the streaming process, we can write the mutations to a single SSTable.

If we look at the sender and receiver as a sandwich, the secret sauce is the Seastar RPC framework. To send a row, a Seastar RPC call is invoked. The sender invokes the RPC call repetitively until all the rows are sent. The RPC call is used in request-response models. For streaming, the goal is to send a stream of data with higher throughput and less time and not to request the remote node to process the data and give a response with lower latency for each individual request. Thus, it makes more sense to use the newly introduced Seastar RPC Streaming interface. With the new Seastar RPC Streaming interface, we need to use the RPC call only once to get handlers called Sink and Source, dramatically reducing the number of RPC calls to send streaming data. On the sender side, the rows are pushed to the Sink handler, which sends them over the network. On the receiver side, the rows are pulled from the Source handler. It’s also worth noting that we can remove the batching at the streaming layer since the Seastar RPC streaming interface will do the batching automatically.

With the new workflow in place, the new streaming data path looks like this:

Performance Improvements

In this part, we will run tests to evaluate the streaming performance improvements. For this, we chose to use the rebuild operation that streams data from existing nodes to rebuild the database. Of course, the rebuild operation uses Scylla streaming.

We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. (You can find the setup details at the end of this blog post.) Afterward, we created a keyspace with a replication factor of 3 and inserted 1 billion partitions to each of the 3 nodes. After the insertion, each node held 240 GiB of data. Lastly, we removed all the SSTables on one of the nodes and ran the “nodetool rebuild” operation. The sender nodes with the replicas send the data to the receiver node in parallel. Thus, in our test, there were two nodes streaming in parallel to the node that ran the rebuild operation.

We compared the time it took to complete the rebuild operation and the streaming bandwidth on the receiver node before and after the new streaming changes.

Tests Old Scylla Streaming New Scylla Streaming Improvement
Time To Rebuild 170 minutes 50 minutes 70%
Streaming Bandwidth 36 MiB/s 123 MiB/s 240%

To look at the streaming performance on bigger machines, we did the above test again with 3 i3.8xlarge nodes. Since the instance is 8 times larger, we inserted 8 billion partitions to each of the 3 nodes. Each node held 1.89TiB of data. The test results are in the tables below.

Tests Old Scylla Streaming New Scylla Streaming Improvement
Time To Rebuild 218 minutes 66 minutes 70%
Streaming Bandwidth 228 MiB/s 765 MiB/s 235%

Conclusion

With our new Scylla streaming, streaming data is written to the SSTable on disk directly and skips the memtable completely resulting in less memory and CPU usage and less compaction. The data is sent over network utilizing the new Seastar RPC Streaming interface.

The changes described here will be released in the upcoming Scylla 2.4 release. It will make the Scylla cluster operations like add new node, decommission node and repair node even faster.
You can follow our progress at implementing the streaming improvement on GitHub: #3591.

Next Steps

  • Scylla Summit 2018 is around the corner. Register now!
  • Learn more about Scylla from our product page.
  • See what our users are saying about Scylla.
  • Download Scylla. Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.

Setup Details

DB Nodes: 3
Instance Type: i3.xlarge / i3.8xlarge
Replication Factor (RF): 3
Consistency Level (CL): QUORUM
Compaction Strategy: Size-Tiered
Scylla Version: Scylla master commit 31d4d37161bdc26ff6089ca4052408576a4e6ae7 with the new streaming disabled / enabled.

 


Tags: improvements, performance, seastar, sstables, streaming