How ScyllaDB Powers This Next Tech Cycle

14 minutes

In This NoSQL Presentation

Applications have never been so data-hungry, nor as demanding for scale, speed and availability. Hear from CEO Dor Laor as he shares how ScyllaDB is powering this next tech cycle.

Dor Laor, CEO and Co-Founder, ScyllaDB

Dor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM and Xen development for several years. Dor holds an MSc from the Technion and a Phd in snowboarding.

Video Transcript

Hey! Hello! I’m Dor Laor, CEO and co-founder of ScyllaDB. Welcome to the seventh ScyllaDB Summit. Yes, it’s virtual again, but it’s COVID these days. One day we’ll get to meet again. Now, we’ll talk about how ScyllaDB powers the next tech cycle. This next tech cycle happens now. We do not need Zuckerberg to turn on the switch. Basically COVID changed our lives, and I guess you, like me, sleep in the office, I mean, the home office. And the habits changed too.

It’s like you watch .. . You stop watching TV, and instead you use streaming, like Disney+, you monitor every step of the way you make, with Strava, you write, either immediately on Twitter, or you write your thoughts on a Medium. So things change dramatically and let’s see how these changes affect the database and your stack. By the way almost all of them, all of these companies, use ScyllaDB today. So how databases changed and what is about to change again? Before we speak about databases, let’s speak about the underlying infrastructure. The underlying infrastructure has been improving a lot. We have amazing NVMe drives. We have 100 gigabit Ethernet switches. We have huge cores with Intel and AMD, and ARM all in competition and you get to enjoy it. Big machines and 5G with a chip! So big changes in the infrastructure. And databases get to enjoy all of it. And we have all this variety of NewSQL flavors, even distributed SQL these days. And the main recent changes in the recent years is the consumption of databases in the form of as a service. Probably your stack looks something like this where it needs scale, it needs to work all the time. It needs to be agile in response, and it needs to run analytics. The thing is, your requirements today are not necessarily will be the requirements of tomorrow with the next tech cycle. And we have .. . we’re undergoing a real cycle right now. People buy NFTs. People join the metaverse, or do some type of small metaverse steps toward the future. And there’s of course cryptocurrency also. All of them need databases. By the way, all of these also use ScyllaDB today. But your database needs were going to change when you’re going to use 10x the amount of data. When you’re going to use strategies like many of the existing customers that we’re going to present at the summit, I’m speaking about telco companies, like Amdocs, or companies like cyber .. . like Palo Alto Networks. And many many others all of them don’t just use the data as is. You do data enrichment, data cleaning. AI/NML, and streaming, and replication, and caching from multiple sources. So the more you have data, the more you use that data and there’s more opportunity to get more advantages with this data in this new world of ours. Now, what’s the effects on the database? If you have 10x the queries, or 100x the queries, when this happens, P99 becomes P36. You take the 0.99 raise it to the power of 100, and it becomes 0.36. More than one third of the accesses get the P99 cannon in latency. Things break at scale. Cost skyrockets. So you need to think and take all of those into consideration when you make database decision and to make it simpler for you, there’s a couple of efficiency scores we define. So first there’s the basic one of practical efficiency you have your query to VCPU ratio, or the size of the workload to the RAM. Of course, you don’t need to have a RAM the size of the workload. It’s irrational. It’s like old-school eMemory days. And there’s the cost of the storage, which sometimes can be dramatic. Beyond that, there’s the overprovisioning score. What’s the portion of the idle time of the resources the database consumes? Usually, it’s a lot. Also how many .. . What’s the ratio of tables to clusters? If it’s a one to one, you’re not utilizing your infrastructure and most of the time you do that because you cannot separate them. You cannot isolate one table from the others. So maybe there’s database that can actually do it. The same with data centers that’s run in read only mode, instead of active active. There’s the database scale score; can your database scale to 10x with the same cost and performance properties? Can it has still a good P99 if it will scale? What time it takes to scale? What’s the ease of use in order to get to that scale. There’s a maintenance score. There’s the backup and repair and upgradeability and observability and traceability. All of these are a little bit boring, but probably if you watch, that’s your daily job so these are important to you and important to us. We sympathize with these advantages as we run databases ourselves. Maintaining ScyllaDB Cloud. And lastly, there is the freedom score. The licensing, the environment can you run on every environment on every cloud? Are you locked into some particular environment or set of tools? So this is the freedom score. All these are extremely important and we try to get a good score with ScyllaDB. Now, let’s see what’s the state of the union that ScyllaDB got to. Over the years, there were several generations of ScyllaDB. The first one was brute force performance. We wanted to get the best utilization, the best efficiency, and we’re still doing it. We’re still excited about it. Here in the capture you see a screenshot from top, where we run a one petabyte deployment with 20 servers and we do millions of transactions we’re about to have such a session at the Summit. And you can see there are 96 processors of one node doing 100% CPU utilization. The second generation was about reaching API compatibility with our latest database, Cassandra. With lightweight transactions, with compaction, we solved the compaction challenge. Later on it was about extras. Doing the extra mile. Doing more things with ScyllaDB. Implementing changing data capture in a really nice way. Supporting Kubernetes supporting indexes whether they’re local, whether they’re global, adding compaction strategies more than the regular ones in the form of incremental compaction, and the implementation of Alternator, our DynamoDB compatible API. Now we enter generation four. It’s more than just ScyllaDB 5.0 though, which is tremendous, but it .. . If we’re going to be along ScyllaDB 5.X releases. And ScyllaDB Cloud and ScyllaDB Operator for Kubernetes, we’ll get a lot of benefits due to ScyllaDB 5. ScyllaDB 5 has these two major building blocks. The first is just several improvements across the board. Finally we enabled repair based node operations by default, and if you do not know what it is, we have a session about it. So repair basically is really solved question. Also operations can be restartable. We have a new scheduler that brings better efficiency, better control, better latency, which she’s fabulous. We also solved large partitions. Forget about figuring out if you have a latency bump due to a large partition. There’s no problem with large partitions and there are many, many other problems that we solved and handled. The major second improvement of ScyllaDB 5 is the raft consensus chronicle. We implemented raft. It’s done. The protocol is committed. We also enabled in transactional schema agreements using raft, so we have an extremely solid base for ScyllaDB with Raft. And what’s coming is additional topology changes, transactional, and tablets, which will allow us to make ScyllaDB much more elastic, allow us to have immediate consistency instead of eventual consistency for your data. I’m not talking about maintenance alone and better indexing. There are more changes which we’re going to cover some of them throughout this summit. ScyllaDB 5 will be an amazing enabler for ScyllaDB Cloud and ScyllaDB Kubernetes, because elasticity will become extremely better. And portion — one portion of the elasticity aspect is ability to scale your workload out and back in, but more aspects are about to displace shards, which are overloaded, into other shards dynamically. Because everything becomes transactional, and everything becomes controllable, and we’ll be able to do that without waiting for time out but immediately because it’s consistent and transactional. And through that you’ll be able to manage gigantic clusters at ease. And sometime today it’s not as is and things will be much easier in the future. You’re going to be able to use mega nodes or tiny pods, and to do further cluster consolidation and table consolidation within a cluster. So lots of things are coming with ScyllaDB 5, which happens now. The release is at RC mode as we speak. Transitioning to ScyllaDB Cloud and really these days, I think that the entire ecosystem moves to as a service, because you’d like to move away the maintenance burden of a complexity of the simple database, which is still .. . Even with ScyllaDB 5, will still have a good amount of learning curve into a service based consumption while you are able to concentrate on your obligation itself. So ScyllaDB Cloud today runs on a AWS and GCP with several flavors, and it allows you to have extreme efficiency, it’s 5x and even more cost-effective than other solutions. It gives you perfect security and also perfect isolation, I mean performance isolation in that case. And it’s relatively simple to understand. Whether the units we talked about servers or the observability and the analysis of your data. What’s coming is multitenancy. We mainly get a lot of requests to separate compute and storage, because sometimes your workloads are not perfectly even with their demand on compute and storage. So we’re going to separate those. We also going to provide serverless approach with a pay for what you tend to use in terms of transactions. It’s more elastic in modern usage, beyond servers. We’re going to maintain the previous forms so don’t worry. All of the goodware of ScyllaDB, we’re going to remain. We’re going to add APIs, and with all this multitenancy, since we’ll be overcommitting the existing infrastructure you’ll get better efficiency eventually. So all this is coming, in 2022, this year. So stay tuned. And of course we have the Summit itself, so make sure you attend the sessions you like from customers, from users, from engineers. Stay tuned. Make sure you star us on github. Welcome to interact with me or with the entire team and thanks for listening. Enjoy the show. Bye bye.

Read More