Tips and Tricks for Maximizing ScyllaDB Performance

A Guide to Getting the Most from Your ScyllaDB Database

This guide provides an overview of the best practices for maximizing the performance of ScyllaDB, the monstrously fast and scalable NoSQL database. Even though ScyllaDB auto-tunes itself for optimal performance, users still need to apply best practices to get the most out of their ScyllaDB deployments.

Get me up and running

In case you are not able to read this document in full, here are the most important things to remember:

use the best hardware you can reasonably afford
install ScyllaDB Monitoring Stack
run scylla_setup script
use Cassandra stress test
expect to get at least 12.5K operations per second (OPS) per physical core for simple operations on selected hardware

Why should I read this? I already know how to execute a benchmark

ScyllaDB is different from any other NoSQL database. It achieves the highest levels of performance and takes full control of the hardware by utilizing all of the server cores in order to provide strict SLAs for low-latency operations. If you run ScyllaDB in an over-committed environment, performance won’t just be linearly slower — it will tank completely.

This is because ScyllaDB has a reactor design that runs on all the (configured) cores and a scheduler that assumes a 0.5 ms tick. ScyllaDB does everything it can to control queues in userspace and not in the OS/drives. Thus it assumes the bandwidth that was measured by scylla_setup.

However, it is not difficult to get the best performance out of ScyllaDB. It primarily tunes itself automatically. Just make sure you don’t work against the system.

Install ScyllaDB Monitoring Stack

Install and use the ScyllaDB Monitoring Stack, which provides excellent additional value above and beyond performance optimization. If you cannot pinpoint a performance bottleneck, you likely have not configured the system correctly. ScyllaDB Monitoring Stack will help to sort this out.

With the recent addition of the ScyllaDB Advisor to the ScyllaDB Monitoring Stack, it is now even easier to find potential issues.

Install ScyllaDB Manager

Install and use ScyllaDB Manager together with the ScyllaDB Monitoring Stack. ScyllaDB Manager provides automated backups, and repairs of your database. ScyllaDB Manager can manage multiple ScyllaDB clusters and run cluster-wide tasks in a controlled and predictable way.

Run scylla_setup

Before running ScyllaDB, it is critical that the scylla_setup script has been executed. ScyllaDB doesn’t require manual optimization – it is the task of the scylla_setup script to determine the optimal configuration. If scylla_setup has not run, the system won’t be configured optimally.

Benchmarking best practices

Use a representative environment

Execute benchmarks on an environment that reflects your production environment. Benchmarking on the wrong environment can easily lead to an order-of-magnitude performance difference. For example, on a laptop you might see 20K OPS while on a dedicated server you could easily achieve 200K OPS. Unless you have your production system running on a laptop, do not benchmark on a laptop.

We recommend automating your benchmarking with tools like Terraform/Ansible so you can more easily repeat the benchmark test.

If you are using shared hardware in a containerized/virtualized environment, be aware that one guest can increase latency in other guests.

Also, make sure you do not underprovision load generators, otherwise the load generators themselves will be the bottleneck.

Use a representative data model

Tools such as cassandra-stress use a default data model that does not completely reflect what actions you will perform in production. For example, the cassandra-stress default data model has a replication factor set to 1 and uses the LOCAL_ONE as a consistency level.

Although cassandra_stress is a convenient way to get some initial performance impressions, it is critical to benchmark the same/similar data model that you will use in production. We therefore recommend that you use a custom data model. For more information refer to the user mode section in our documentation.

Use representative datasets

If you run the benchmark with a dataset that is smaller than your production data, you may have misleading or incorrect results due to the reduced number of I/O operations. Therefore, it is critical to configure the size of the dataset to reflect your production dataset size.

Use a representative load

Run the benchmark using a load that represents, as closely as possible, the load you anticipate having in production. This includes the queries submitted by the load generator. When you use the right type of queries, they are distributed over the partitions and the ratio between read/write remains relatively constant. The read/ write ratio is important due to the overhead of compaction and finding the right data on disk.

Proper warmup & duration

When benchmarking, it is important to give the system time to warm up. This allows the database to fill the cache. In addition, it is critical to run the benchmarks long enough so that at least one compaction is triggered.

Latency test vs throughput test

When performing a load test you will need to differentiate between a latency test and a throughput test. With a throughput test, you measure the maximum throughput by sending a new request as soon as the previous request completes. With a latency test, you pin the throughput at a fixed rate. In both cases, latency is measured.

Most engineers will start with a throughput test, but often a latency test is a better choice because they know the desired throughput, e.g. 1M op/s. This is especially the case if your production system must meet a specific SLA. For example, the 99.99 percentile should have a latency less than 10ms.

Coordinated omission

A common problem when measuring latencies is the coordinated omission problem, which causes the worst latencies to be omitted from the measurements and, as a consequence, renders the higher percentiles useless. A tool like cassandra-stress prevents coordinated omission from occurring.

Workload/Compaction Strategy	STCS	LCS	ICS	TWCS
Write only	+	–	+	–
Overwrite	+	–	+	–
Read mostly, with few updates	–	+	–	+
Read-mostly with many updates	+	–	+	–
Time series	–	–	–	+

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog

Tips and Tricks for Maximizing ScyllaDB Performance

A Guide to Getting the Most from Your ScyllaDB Database

Get me up and running

Why should I read this? I already know how to execute a benchmark

Install ScyllaDB Monitoring Stack

Install ScyllaDB Manager

Run scylla_setup

Benchmarking best practices

Query recommendations

Driver guidelines

Guidelines

Cloud compute instance recommendations

Docker

Kubernetes

Data Compaction

Consistency Level

Replication Factor

Asynchronous Requests

Conclusion

Start scaling with the world's best high performance NoSQL database.