ScyllaDB strives to offer its users predictable low latencies. However, in real life, things do not always go according to plan, and sometimes predictable low latencies become unpredictable big latencies. When that happens, it’s time to go into detective mode and go figure out what’s going on.
Last month we gave a talk at Scylla Summit that described the caveats and best practices for monitoring a live Scylla cluster. Once the cluster is ready to serve your requests, you will need to monitor it to understand its performance characteristics, its overall health, and should anything go wrong, understand what was it was that upset the cluster’s behavior.
This is the second and last part of this article. If you haven’t read the first part, you can do it here. In this part, we will look at the design of the Seastar I/O Scheduler that Scylla uses to manage its disk I/O and discuss how it can be used to not only provide predictable latencies as we saw in our previous installment, but to guarantee fairness and proper balancing among different actors.
In a datastore like Scylla, there are many actors competing for disk I/O. Examples of such actors are data writers (in Scylla’s parlance they can be either memtable or commitlog writers), and a disk reader fetching the data to serve a cache miss. To illustrate the role that competition plays, if we are just issuing disk I/O without resorting to any fairness or balancing consideration, a reader, for instance, could find itself behind a storm of writes. By the time it has the opportunity to run, all that wait would have translated into increased latency for the read.
Amazon EC2 is a virtual computer store with all sizes and types of server on display. We researched the top choices to find the best balanced, best-performing server for NoSQL.