Learn about our improvements on high percentile latencies at Scylla Summit 2017
I was pleased to sit down with Glauber Costa of ScyllaDB to learn more about his upcoming talk at Scylla Summit 2017. At Scylla Summit 2017, participants will join NoSQL developers and users from start-ups and the enterprise for two days of sharing ideas, hearing innovating use cases, and getting real-life tips and tricks from your peers and NoSQL idols. Let’s begin the interview and learn more about what Glauber will be presenting at the Summit.
Please tell us about yourself and what you do at ScyllaDB?
I am a Principal Architect and I both help drive the development of core database features and make sure customers are successful in their deployments, and integrating Scylla with their environment. I wrote the Scylla I/O Scheduler, automatic write-rate determination, and flush priority controller.
What will you be talking about at Scylla Summit 2017?
I will be showing a host of improvements that we have made since last year’s Scylla Summit that will help Scylla provide more consistent latencies.
What type of audience will be interested in your talk?
My talk targets the entire audience of the Scylla Summit. I will show a lot of results backing the improvements I describe, but the description of the improvements themselves will be fairly high level.
Can you please tell me more about your talk?
Scylla has already built a reputation for itself as a database capable of providing very high levels of throughput at lower latencies. However, it is easy to achieve good average latencies: it is much harder to achieve latencies that are good on the 99th and the 99.9th percentile. The fact that we don’t have a stop-the-world garbage collection helps a lot to get us started but it is not the end of the story and there is a lot more we can do. For instance, if the database’s memory buffers used for request backlog fill up, we need to stop acknowledging requests. But that means we’ll have a lot of fast requests until the buffer fills, and that one big outlier after that.
Last year we have made a commitment to implement a set of algorithms that we collectively called “Workload Conditioning”. This will have the database automatically adjust itself to the load to provide those guarantees. One such algorithm can be applied to the problem above. As soon as we detect that we’ll run out of memory, we can already start introducing latencies to the existing requests which will make all of them consistently slower. Now your outliers are gone and you know exactly what to expect. If the new level of latency is not acceptable for the application, it is time to reduce the load or resize the cluster.
We have largely delivered on that promise and there is always more work to do. The upcoming Scylla 2.0 contains one year worth of improvements in that area with impressive results. I will show how we solved the problem I alluded to earlier and many others, and also what are the expected results of running Scylla then and now.
Where can we learn more information about your talk?
I have already published a detailed description of some of this work in the Scylla’s Users Blog. Between now and the summit, there are more to come. The blog posts are obviously much more detailed than the talk can ever be but even people familiar with the contents of the articles will benefit from the talk. It’s good to have a high-level overview of how all of those things play together to compose the end result that is now seen in Scylla
How can the people get in touch with you?
Thank you very much, Glauber. We can not wait to see your talk in person and learn more. If you want to attend Scylla Summit 2017 and enjoy more talks like this one, please register here.
Scylla Summit is taking place in San Francisco, CA on October 24-25. Check out the whole agenda on our web site to learn about the rest of the talks—including technical talks from the Scylla team, the Scylla road map, and a hands-on workshop where you’ll learn how to get the most out of your Scylla cluster.