Hello, everyone, and welcome to my talk about “What You Need To Know About ScyllaDB 5.0.” My name is Avi Kivity. I’m the CTO of ScyllaDB, and also the maintainer of the project, and I have past experience with Linux Kernel-based Virtual Machine. So what’s new in ScyllaDB 5.0? I’ve categorized into four different topics. Although, many of those features will fall into several of these categories, but you have to pick your partition key.
So let’s start with the performance because that’s the one thing everyone loves. So one thing we did is remove the penalty from large partitions. So in the past when you had a large partition, you might have seen penalties with accessing the partition in the middle or scanning this large partition, and over the years we’ve removed many of the penalties. But now we think we’ve removed all of them, and this is done with the SSTable index caching, and this is adaptive caching. So the parts of the index that are in heavy use will be cached and the part of the index that are not in heavy use will be kept on disk, and this will economize the use of memory, and the parts of the data that are in frequent use will be heavily cached and you will get immediate access to the data. Another area that we’ve improved is the reversed queries. Now this is an area that you might not have noticed at all that there was a problem. Many users do not do reversed queries at all, or if you do reversed queries with small partitions, then you might not notice a problem. But in the combination of reversed queries and a large partition, we did have quadratic complexity, which translated to very long query times when the partitions were large. So you can see how the previous implementation in ScyllaDB 4.5 showed this quadratic complexity that moreover and ended in failure at around 100 megabytes. Whereas the ScyllaDB 5.0 has the performance of the reversed queries only trail slightly behind the performance of forward queries and doesn’t have any limitation in terms of the partition size. Also, it’s nice to see that we can query 100 megabytes in half a second, so that’s 200 megabytes per second, for just one core, and of course you multiply that by the number of cores we have, and you see the immense bandwidths that the ScyllaDB can provide. So next we’ve done a rework of our IO Scheduler, and this is in two parts. One, in one part we modeled how modern solid-state disks behave. So, this chart shows how this behaves at different combinations of sequential writes and random reads, and we’ve modeled every point on the chart. We’ve run the test through a variety of disks, and for every disk we measured the latency that you can expect. And in this cyan color, you have the low latencies when the combination of read IOPS and write throughput is low enough to support fast queries. And in purple, you are getting bad 95th percentile. And so the second part of this work is to adjust the IO Scheduler so that it knows exactly in which area of the chart it is currently operating and it steers the workload away from the purple part and into the cyan part. And so it keeps the latencies low. It keeps the disk running at places where it’s happy. And we have a nice benchmark that we performed recently on a petabyte scale dataset, and I’m pretty happy to be able to say that at 5 million operations per second, we get 2 millisecond or 99th percentile latency, and it deteriorates somewhat when you go to 6 million operations per second. I don’t think there are many people who work on databases that can say that. I’m very satisfied with it. So next, let’s move on to new features. So the headline feature of ScyllaDB 5.0 is the Raft, where we use Raft to manage topology and schema. And in 5.0, we only have the beginning of that, but we plan to base many new features of ScyllaDB on top of Raft. In 5.0 we have the schema and topology managed by Raft. This is optional because it is so new, so it defaults to the old eventual consistency model, but you can select the new strongly consistent Raft model, and this improves the ability of the system to guarantee that the schema will be consistent and that topology operations will be reconciled, and we will build on this to support strongly consistent tables, strongly consistent materialized views and indexes, and data distribution features such as tablets, which will improve the elasticity of the system and enable better operation of large and small tables coexisting with each other, and remove a lot of the problems of the current vnode model. Another feature we’ve worked on is the improving the out of memory resistance. So ScyllaDB delivers very fast performance in part by supporting a very high concurrency so that it can exploit the many cores, many nodes, and the very fast disks that are backing the nodes today. But the price for high concurrency is that you have higher memory consumption, and the memory consumption is hard to predict because you cannot tell in advance the size of the data that you are reading. You might be reading 100 bytes or you might be reading 10 megabytes. And so we’ve had to improve and make more accurate the way that we track memory consumption in order to start restricting concurrency before we run out of memory. So ScyllaDB 5.0 will be a lot more resilient in different workloads with high concurrency. So let’s move on to our quality of life. Those are things that make your life easier as an operator and developer of a ScyllaDB database. So let’s start with TimeWindowCompactionStrategy tables. So in the past you’ve had to make a choice, which is artificially bucketing your data in order to fit within a time window. And this means that you have to spend more effort on modeling your data, and if you make a mistake, those mistakes are quite hard to correct. And today, in part because of our improved support for large partitions, but also improvements to TimeWindowCompactionStrategy itself, you no longer need to do this artificial bucketing. Each time series can be one in which a partition can span as many time windows as it needs to, and ScyllaDB will adapt and will handle the data, even though it spills through multiple time windows. So that’s a simplification for data series users .. . time series users. Next, repair based node operations, so that’s a little bit of a mouthful. So node operations are bootstrapping a new node, decommissioning a node, rebuilding a node that has lost its data, and similar. So in the past those operations were based on streaming, moving data from one node to another. But we’ve changed them to be based on repair. And the idea behind that is that although repair is more complex and more stressful for the database, it is less stressful for the operators because you can just resume it if …if it failed and you need to continue you it, you don’t have to start from scratch. It ensures consistency of the data. It simplifies the operation because you no longer need to run repair at specified point. And it’s unified so that there is less code involved in these operations, and so less ways for them to fail. Another mouthful is the repair based Tombstone Garbage Collection, and you might be familiar with the fact that you must run repair at intervals that are more frequent than the Tombstone GC second, the parameter that is configured by default, 10 days. And this is quite stressful because if you do not manage to complete repair within that period, then you are at risk of data resurrection, and nobody wants deleted data to be resurrected. So with repair based Tombstone Garbage Collection, instead of having to complete repairs within an arbitrary time period, now the database tracks whether you’ve completed repair and it will purge tombstones that are older than the last repair, and it will keep tombstones that are newer than the last repair. And with that you have two benefits. So the first, of course, is that there is no risk of data resurrection even if your repairs lag behind. Maybe a repair failed, maybe you forgot to run a repair. Whatever is the reason, if you lag behind your repairs, your data is safe. Of course, it’s not recommended to do that. There is a performance penalty with that, but that also brings us to the second advantage of repair based Tombstone Garbage Collection, and that is that if you do manage repair on schedule or even ahead of schedule, then ScyllaDB will purge more tombstone than it would otherwise, and that will reclaim space and will improve performance. So tombstones, as we know, can degrade performance, and by purging them earlier we regain some of the performance lost. Next we’ll talk about changes around the ecosystem of ScyllaDB. So we’ll start, of course, with Kubernetes. So a lot of improvements have been brought to the operator around performance, not only of ScyllaDB running within Kubernetes, but also the operator itself. So it will respond to your commands more quickly. We’ve had some stability improvements, and also we’ve improved the security around the operator. Next, a Rust driver, so this is the newest driver in the Rust … in the ScyllaDB family, and the first driver that ScyllaDB wrote by ourselves, and it’s already the fastest driver out there. And we have the plans to re-bake some of the other drivers on top of the Rust driver because it is such a modern, and clean, and safe, and fast code base, so this is a very good development. Another new feature is WebAssembly. So WebAssembly is a new way to perform computations across the security boundaries. So in this case you’re moving a function from the user domain, from your domain, into the database. And WebAssembly allows the database to execute a function with safety and also with a great performance, and this is available in experimental mode now. You can provide a WebAssembly function as the raw WebAssembly code that’s seen in this slide, but also with a C or Rust code. So there’s many options and it’s a quite an exciting feature. Next is ARMv8 support, so ScyllaDB now runs on ARM machine. It can run on your M1 Mac, in Docker, and it can run Amazon I4 class instances, which are powered by Graviton2, so you get better price performance and very high-density nodes with very large amounts of storage per dollar and per vCPU. So these are great if you have workloads that are storage-bound and not CPU-bound. And that’s it. Thanks a lot. Stay around for more in-depth talks, as well as lightning talks about ScyllaDB 5.0 and beyond. Enjoy the summit.