Lee Atchison & Dor Laor chat about ScyllaDB’s origin story, why it’s displacing other databases, and different approaches to getting started
From events like ScyllaDB Summit and P99 CONF (starting Wednesday!) to being named Google Cloud Customer of the Year, this has been a remarkable year at ScyllaDB. Lee Atchison – software architect, podcast, host, and author of Architecting for Scale (O’Reilly Media) – was curious about the latest developments, so he recently hosted ScyllaDB co-founder and CEO Dor Laor on the Software Engineering Daily podcast.
You can listen to the complete podcast here. For those of you who prefer to read rather than listen, we’re blogging highlights from the conversation. This article focuses on the origins of ScyllaDB, why teams adopt it, and tips for getting started. Future blogs will cover topics such as NoSQL vs SQL and how database consistency models are merging.
ScyllaDB’s “Origin Story”
Lee Atchison (LA): Let’s talk about ScyllaDB. It’s a highly competitive database in a very crowded product space. There are a lot of products that do similar sorts of things. Why another database? And what does ScyllaDB provide over and above DynamoDB, and all the other similar modern NoSQL databases?
Dor Laor (DL): When my co-founder and I sat down and listed multiple ideas for a startup, one of them was the database designed for SSDs. However, we crossed that off the list because we believed that someone had already done it. So, we selected another idea: build an operating system from scratch (OSv). Eventually, we had to pivot away from it. That OS still lives on… but that’s a story for a whole different episode. [More backstory here and here]
The OS that we created was supposed to boost databases and other workloads compared to Linux, so we tried it with Apache Cassandra, Redis, Hadoop (this was 2014) and other products. When we ran Redis on it, we boosted Redis by 70%. But when we ran Cassandra on it, there wasn’t an improvement versus running it on Linux. We realized that the complexity lies within Cassandra: it’s a full-fledged, relatively heavyweight Java process. If you change the underlying engine, it won’t move the needle much. Back then, Cassandra was among the 10th most used databases. After additional research, we decided: let’s rewrite Cassandra from scratch in C++, bringing all of the knowledge we have from our operating system creation. (Avi, my co-founder also had the novel idea of how to create the KVM hypervisor; he drove the implementation and I managed the team.)
So, we had a lot of opinionated ideas of how to create a low-latency infrastructure and we happened to target the database space. Cassandra was just screaming from inefficiency.
We realized: let’s keep the really good interface that Cassandra has (also derived from others – it stands on the shoulder of giants with the Dynamo paper and Bigtable). Let’s keep all the goodness that comes from this type of wide column structure. Let’s just change the implementation.
“We Solve Problems at Scale”
LA: So your intent was a high-performing Cassandra, and that’s really what you what you’ve accomplished. But your biggest competitor right now, really, I guess it is still Cassandra, but it’s mostly I would say DynamoDB. Correct? Is that your biggest competitor at the moment?
DL: As you said, before, there are so many databases today. We don’t target any one in particular. We focus on solving problems at scale
With some databases, like relational databases, scaling is extremely complex and problematic, if not impossible. Some databases can scale to a certain level, but then problems arise. Others can scale, but they’re extremely expensive.
We solve problems at scale, addressing the pains that teams are having with other platforms. At the end of the day, if they do not have any pain, they can stick with what the solution they have.
Dor’s Tips for Getting Started with ScyllaDB
LA: If someone wants to start evaluating ScyllaDB for either a new software project they’re starting or for an existing project what advice do you give them and what should they do?
DL: There are two approaches. One approach is to start simple. Just grab a Docker container, run it on your laptop, and play with it a bit. We have a ScyllaDB University with lots of nice, super quick Docker-composed apps that you can run and emulate a complicated environment super quickly with one command. A different approach is to start off with your most complicated use case. Say you need to run a million OPS with low latency. In this case, you probably want to get a couple good machines, provision on the cloud, and test out the hardest problem first, instead of delaying it to the end.
LA: Makes sense. Now, you mentioned the Docker container. You have an open source version, but you also have a cloud-hosted version and you have an enterprise offering as well. Do you want to talk about the difference between those at all?
DL: Open source is the base: you get the fully functional database that can do a lot. You will get community help with it, and it’s an easy starting point. ScyllaDB Enterprise is based on the open source product. It has a longer release cycles, we harden it more, and it has additional features. Customers who purchase ScyllaDB Enterprise manage the database (provisioning, responding to system failures, etc.) on their own, with our support. ScyllaDB Cloud is a fully managed SaaS product. Here, we are responsible for everything (like with DynamoDB). It’s easy to use – no installation, just a service. That’s of course the easiest. About 80% of our new customers are interested in the fully-managed service because it’s just easier. They get to focus on their app, and not worry about maintaining a distributed database.