Before organizations go into production with Scylla, they must ensure that they are getting the best possible performance so their applications and services will run optimally. One of the many ways to optimize your Scylla deployment is to choose the right compaction strategy. One of the more popular talks at Scylla Summit 2017 was on this subject. Based on that talk, I will explain what compaction is and then explore the different strategies available in Scylla.
What is compaction?
When writes occur on Scylla, they first go into memory in what we call a memtable. Memtables are periodically flushed to a new sorted file called an sstable. Over time, we will have many separate sstable files on the disk. This results in wasted disk space and affects read performance. When compaction operations occur, it will effectively merge the sstables into a single file containing merged and most recent information and free up disk space. However, compaction comes at a great cost.
The cost of compaction is performance. Compaction operations are expensive in terms of CPU, memory, and Disk I/O. A compaction strategy decides which sstables to compact and can help with read or write amplification or optimize disk space utilization by reducing the number of temporary files used while merging the sstables. There are different compaction strategies available that organizations can choose to best suit their workload needs rather than being a burden. Let’s explore each one available on Scylla.
Size-Tiered Compaction is the default compaction strategy and compacts write-intensive workloads very well. Compaction is triggered when the system has enough similarly sized SSTables. These are merged together to form one larger sstable. This strategy has several size tiers (small SSTables, large SSTables, even-larger SSTables) and in each tier, there is roughly the same number of files. When one tier is full, the system merges all its tables to one table in the next tier. A disadvantage of this strategy is that very large SSTable will stay behind for a long time and utilize a lot of disk space.
Leveled compaction is recommended for read-intensive workloads. Instead of potentially huge SSTables, the system uses small, fixed-size SSTables divided into different “levels”. With the leveled compaction strategy, SSTable reads are efficient. Although there will be a greater number of smaller SSTables, Scylla will not need to look up a key in each one. In the typical case, Scylla will just need to read only one SSTable. The other factors making this compaction strategy efficient are that at most 10% of space will be wasted by obsolete rows, and only enough space for ~10x the small SSTable size needs to be reserved for temporary use. The downside of this method is that it will result in double the I/O on writes or more and will affect latency. Also, it is not as good for write-new-data-mostly workloads.
Leveled and size-tiered compaction strategies both have unique advantages and disadvantages that will vary upon the needs for your workload. But what about workloads that need the benefits of both? Don’t worry because ScyllaDB decided to create a new solution coming next year in Scylla Enterprise called Hybrid Compaction. This new compaction strategy takes the best traits from Leveled and size-tiered compaction strategies.
One of the goals is to have Hybrid Compaction do the cleanup job itself rather than relying on the system administrator to run manual or major compaction operations. Hybrid Compaction can make smart decisions towards improving disk space amplification without hurting system performance.
Looking at the chart below, we can see how the Hybrid Compaction strategy uses far less disk space:
The image below is a comparison of Hybrid versus the other compaction strategies. With Hybrid, users will benefit from optimal performance for read and write workloads without wasting disk I/O.
There are more compaction strategies available in Scylla that were not discussed in this post. To learn more about compaction strategies in-depth and the Hybrid Compaction strategy, check out the video above or view the slide deck from Scylla Summit 2017:
Want to learn more about Scylla? Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker. You can also take Scylla for a test drive. Our Test Drive lets you quickly spin-up a running cluster of Scylla so you can see for yourself how it performs.