Scylla Summit Preview: Scylla Got Slow! Using Tools, Talent and Tracing to Find Out Why
Vladislav “Vlad” Zolotarov is one of our experts at getting the most out of Scylla, having written articles in the past about CQL tracing, tracing slow queries, securing your cluster, and using Hinted Handoffs. He will speak at the Scylla Summit in a talk entitled Scylla Got Slow! Using Tools, Talent, and Tracing to Find Out Why. He took the time to give a sneak peek into his upcoming session.
Outside of work, Vlad enjoys off-road biking and spending time with his family.
“Slow” can mean many things to many people. Latency or throughput, storage I/O, process cycle times like compactions, or a raft of low-level tweaks such as CPU scheduling. What specific aspects will you focus on in your talk?
In the context of this talk “slow” means unsatisfactory Scylla’s performance: either throughput or latency. We will start from that and will try to drill down trying to identify the possible reason for a slowdown. And reasons may vary from suboptimal OS-level configuration (networking, I/O, etc.) to bad Scylla practices.
Speed improvements can be found within the database, but also in the applications and ecosystem the database is connected to. Will you focus solely on inside-the-box analysis, or also on broader systemic troubleshooting?
We will start with analyzing the server’s state first. However tools Scylla provides allow detecting issues that are caused not only by problems on the server side but on the application side as well.
What are the go-to tools you have in your toolbox?
First of all Scylla Monitoring, which is a bunch of Grafana dashboards representing various metrics from Scylla cluster. This is what you always start with. In many cases this is where it ends too.
But if you need to drill down more you’ll eventually get to nodetool (statistics commands), system log, cqlsh and CQL Tracing.
If you had a single, specific tip to give to Scylla database managers to improve performance, what would it be?
Always try to understand what the problem is before trying to fix anything. Remember that the cluster is going to work as fast as the slowest node and in Scylla’s case — as the slowest shard. Therefore always start with identifying where the bottleneck is. There are a few ways of approaching this, and we’ll discuss various methods during the talk.
Do you have any recommendations for Scylla Summit attendees of articles or topics they should review prior to when they sit down to hear you speak?
I expect people to know Scylla fundamentals and have a basic understanding of how a server like Scylla or Cassandra works: what is a cluster, node, keyspace, partition, row, primary/partition/clustering keys, how data is stored, queried. Some familiarity with basic computers-related concepts is an advantage: multi-queue devices, CPU affinity, NUMA, IRQ, etc.
They should also familiarize themselves with Scylla Monitor 2.0, which we just released.
Thanks Vlad! I am sure a lot of people are going to be interested in your session.
You’re welcome. If anyone hasn’t registered yet, feel free to use the code vlad25monster to get 25% off the current price.