Whenever ScyllaDB CEO Dor Laor and SADA CTO Miles Ward get together, their mutual obsession over tapping the power of modern cloud infrastructure rings loud and clear. Case in point: the recent podcast, Cloud N Clear.
Here are some highlights from their conversation covering ScyllaDB’s origins in KVM, why and how ScyllaDB built an alternative Cassandra and then DynamoDB, how ScyllaDB supports business-critical use cases like Discord’s, the rise of database-as-a-service, and more. The transcript is lightly edited for brevity.
Pivoting from KVM to a “Monstrously Fast and Scalable” Database
Dor: We decided to create an operating system for virtualization from scratch with our own kernel because we [Avi Kivity and I] did the KVM hypervisor before. We saw everybody running just one app per OS and we thought, “Let’s create a new OS.” But Docker came along and forced us to pivot. We ran a lot of databases on our OS and we felt that Apache Cassandra was not as fast as it should be. It’s got a super-nice scale-out architecture, but it’s not efficient at all. It’s written in Java, which is a nice language for management apps but not for high-end speed. So, we figured, “Let’s rewrite this from scratch; take the OS knowledge we’ve got and just do it.”
Miles: [chuckling] For everyone in the audience, you should internalize the scope of what he’s saying: “Oh, we think this is a good app – we should just build a faster version of it.” I’m super impressed by the technical depth on this team. This is the same technical group that designed the KVM hypervisor, which among other places, is in use in production at Google Cloud – powering every GCP instance and a huge number of internal services.
Dor: We didn’t know how complicated it would be. We thought, “We wrote a hypervisor, how much more complicated can a database be?” It’s actually much more complicated than a hypervisor… GCP published a benchmark of Cassandra running 330 small virtual machines to run 1 million Cassandra query language (CQL) operations per second. When we were at Red Hat and working on the KVM hypervisor, we broke the world record of doing filesystem IO using a single VM virtual machine: we did 1.6 million operations per second with a single virtual machine. It’s an unfair comparison because it’s only file system IO, not replicated, versus a million database operations with 3X replication. But really…330? Something is not right. Ahead of time, we were sure that we could 10x the Apache Cassandra performance – and we have.
Open Source and the Rise of Database as a Service
Miles: ScyllaDB is available not only as an open source project, but also as a managed service. Can you talk about the relationship between those two and how ScyllaDB manages the managed service offering?
Dor: From the very beginning, we’ve been open source fans. We believe in open source; we like open source, we think it’s fun. My management style actually took a lot of processes and values from open source. In addition to an Open Source version of ScyllaDB, we have an Enterprise version based on it (similar to the Red Hat offering). Also, a couple of years ago, we saw the need to have a managed database as a service. This is basically where the entire market is going – in databases and beyond. ScyllaDB Cloud is a managed ScyllaDB Enterprise with a lot of code around it to automate the management aspects of it. When a machine fails, when a data center fails, you don’t need to wake up at night. You don’t need to worry about backups. Not everyone does backups, and even fewer make sure that the restore actually works before a disaster.
Miles: Backups are worthless – it’s restores that are so nice, right? That’s why the “as a service model” is so valuable to so many customers. It aligns incentives properly. If you can make an investment in operational efficiency that pays dividends over all of your customers, of course, you’re incentivized to do that work. It benefits every customer. So, you end up in this spot where you’ve got the right reasons to be making a better and better product over time.
The Responsibility of Being “The Beating Heart in the Middle of a Business” like Discord
Miles: Discord is running ScyllaDB on GCP. Mark Smith, Discord’s Director of Infrastructure, said they’ve got over 150 million monthly active users – a really scaled system for communicating while you’re playing games, open source projects, etc. I use it all the time with some of our companies that want to collaborate in that way. His quote is, “We trust ScyllaDB’s speed and reliability to help with some of our most business-critical use cases including all of our core product experience, our anti-abuse systems and more.” Talk to me about that responsibility. You’re building a product that powers other products. SADA is a company that helps companies build their products. But what are you doing at ScyllaDB to make sure that it can keep those kinds of promises?
Dor: That’s definitely not a simple undertaking. It takes a while to gain the trust of customers. Take Discord, for example. They started with us at a small scale. Over time, after a lot of testing, they started to shift their services off Cassandra and onto ScyllaDB. Now, they’ve moved pretty much everything to ScyllaDB, and we’re super proud. We work hard every day to satisfy all customer needs, even minor issues. Sometimes it’s hard to figure out if an issue is coming from the database, or the infrastructure, or the client. We don’t worry about those boundaries; we focus on solving the problems and making the customer happy. One time, an on-prem customer reported that a single node suddenly went to 50% capacity. I had seen this before in my past, so I told them to go check if the second power supply was still on. That turned out to be the issue. We’ve solved driver issues in the OS, bugs in non-ScyllaDB database drivers, and so on. It all needs to work end-to-end.
Miles: You’re exactly right. I think I remember when the term “full stack engineer” made its way into the market of ideas and how we talk about these roles. Every time I met one that said they were full stack I was like, “Oh no, you don’t mean full stack full stack. You don’t know power supplies, you don’t know what’s going on inside of memory registers…There’s too much to know – the scale of context that’s required to really make all of this technology do what it’s supposed to do is enormously broad. SADA knows that; all day long, we’ve got people that ask us for support for every different Google product, all the ISVs that sit around them, all of the surrounding technologies they layer on top of that. It really is an incredible thing to watch in practice as businesses try to make value out of this crazy set of zeros and ones that they throw around.
Continue Watching/Listening: CAP Theorem, DynamoDB, and More
That’s just the beginning of this lively chat. Watch/listen (you can also get it on your favorite podcast platform) to hear what Dor and Miles have to say about:
- What ScyllaDB has been working on (spoiler: consistency and elasticity feature quite prominently) (12:16)
- Miles’ interactions with Eric Brewer of CAP theorem fame (15:51)
- How ScyllaDB’s open source DynamoDB-compatible API enables customers to take DynamoDB to GCP and on-prem (17:00)
- The “fun” budding relationship between SADA and ScyllaDB (23:40)