See all blog posts

ScyllaDB in 2023: Incremental Changes are Just the Tip of the Iceberg

This article was written by tech journalist George Anadiotis.

Is incremental change a bad thing? The answer, as with most things in life, is “it depends.” In the world of technology specifically, the balance between innovation and tried-and-true concepts and solutions seems to have tipped in favor of the former. Or at least, that’s the impression the headlines give. Good thing there’s more to life than headlines.

Innovation does not happen overnight, and is not applied overnight either. In most creative endeavors, teams work relentlessly for long periods until they are ready to share their achievements with the world. Then they go back to their garage and keep working until the next milestone is achieved. If we were to peek in the garage intermittently, we’d probably call what we’d see most of the time “incremental change.”

The ScyllaDB team works with their garage doors up and are not necessarily after making headlines. They believe that incremental change is nothing to shun if it leads to steady progress. Compared to the release of ScyllaDB 5.0 at ScyllaDB Summit 2022, “incremental change” could be the theme of ScyllaDB Summit 2023 in February. But this is just the tip of the iceberg, as there’s more than meets the eye here.

I caught up with ScyllaDB CEO and co-founder Dor Laor to discuss what kept the team busy in 2022, how people are using ScyllaDB (the fastest NoSQL Database), as well as trends and tradeoffs in the world of high- performance compute and storage.

Note: In addition to reading the article, you can hear the complete conversation in this podcast:

Data is Going to the Cloud in Real Time, and so is ScyllaDB

The ScyllaDB team have their ears tuned to what their clients are doing with their data. What I noted in 2022 was that data is going to the cloud in real time, and so is ScyllaDB 5.0. Following up, I wondered whether those trends have kept pace with the way they manifested previously.

The answer, Laor confirmed, is simple: absolutely yes. ScyllaDB Cloud, the company’s database-as-a-service, has been growing over 100% year over year in 2022. In just three years since its introduction in 2019, ScyllaDB Cloud is now the major source of revenue for ScyllaDB, exceeding 50%.

“Everybody needs to have the core database, but the service is the easiest and safest way to consume it. This theme is very strong not just with ScyllaDB, but also across the board with other databases and sometimes beyond databases with other types of infrastructure. It makes lots of sense”, Laor noted.

Similarly, ScyllaDB’s support for real-time updates via its change data capture (CDC)feature is seeing lots of adoption. All CDC events go to a table that can be read like a regular table. Laor noted that this makes CDC easy to use, also in conjunction with the Kafka connector. Furthermore, CDC opens the door to another possibility: using ScyllaDB not just as a database, but also as an alternative to Kafka.

“It’s not that ScyllaDB is a replacement for Kafka. But if you have a database plus Kafka stack, there are cases that instead of queuing stuff in Kafka and pushing them to the database, you can just also do the queuing within the database itself” Laor said.

This is not because Kafka is bad per se. The motivation here is to reduce the number of moving parts in the infrastructure. Palo Alto Networks did this and others are following suit too. Numberly is another example in which ScyllaDB was used to replace both Kafka and Redis.

High Performance and Seamless Migration

Numberly was one of the many use cases presented in ScyllaDB Summit 2023. Others included the likes of Discord, Epic Games, Optimizely, ShareChat and Strava. Browsing through those, two key themes emerge: high performance and migration.

Migration is a typical adoption path for ScyllaDB as Laor shared. Many users come to ScyllaDB from other databases in search of scalability. As ScyllaDB sees a lot of migrations, it offers support for two compatible APIs, one for Cassandra and one for DynamoDB. There are also several migration tools, such as Spark Migrator, scanning the source database and writing to the target database. CDC may also help there.

While each migration has its own intricacies, when organizations like Discord or ShareChat migrate to ScyllaDB, it’s all about scale. Discord migrated trillions of messages. ShareChat migrated dozens of services. Things can get complicated and users will make their own choices. Some users rewrite their stack without keeping API compatibility, or even rewrite parts of their codebase in another programming language like Go or Rust.

Either way, ScyllaDB is used to dealing with this, Laor said. Another thing that ScyllaDB is used to dealing with is delivering high performance. After all, this was the premise it was built on. Epic Games, Optimizely and Strava all presented high- performance use cases. Laor pointed out that as an avid gamer and mountain bike rider, having ScyllaDB being part of the Epic Games stack and powering Strava was gratifying.

Epic Games are the creators of the Unreal game engine. Its evolution reflects the way that modern software has evolved. Back in the day, using the Unreal game engine was as simple as downloading a single binary file. Nowadays, everything is distributed. Epic Games works with game makers by providing a reference architecture and a recommended stack, and makers choose how to consume it. ScyllaDB is used as a distributed cache in this stack, providing fast access over objects stored in AWS S3.

A Sea of Storage, Raft and Serverless

ScyllaDB increasingly is being used as a cache. For Numberly, it happened because ScyllaDB does not need a cache, so that made Redis obsolete. For Epic Games, the need was to add a fast-serving layer on top of S3.

S3 works great, is elastic and economic, but if your application has stringent latency requirements, then you need a cache, Laor pointed out. This is something a lot of people in the industry are aware of, including ScyllaDB engineering. As Laor shared, there is an ongoing R&D effort in ScyllaDB to use S3 for storage too. As he put it:

“S3 is a sea of extremely cheap storage, but it’s also slow. If you can marry the two, S3 and fast storage, then you manage to break the relationship between compute and storage. That gives you lots of benefits, from extreme flexibility to lower TCO.”

This is a key project that ScyllaDB’s R&D is working on these days, but not the only one. Yaniv Kaul, who just joined ScyllaDB as vice president of R&D coming from Red Hat, where he was senior director of engineering, has a lot to keep him busy. The team is growing, and recently ScyllaDB held its R&D Summit bringing everyone together to discuss what’s next.

ScyllaDB comes in two flavors, open source and enterprise. There are not that many differences between the two, primarily security features and a couple of performance and TCO (total cost of ownership)-based features. However, the enterprise version, based on the DBaaS offering, also comes with 2 ½ years of support, and is the one that the DBaaS offering is based on. The current open source version is 5.1 while the current enterprise version is 2022.2.

In the upcoming open source version 5.2, ScyllaDB will have a consistent transactional schema operation based on Raft. In the next release, 5.3, transactional topology changes will also be supported. Metadata strong consistency is essential for sophisticated users who programmatically scale the cluster and Data Definition Language.

In addition, these changes will enable ScyllaDB to pass the Jepsen test with many topology and schema changes. More importantly, this paves the way toward changes in the way ScyllaDB shards data, making it more dynamic and leading to better load balancing.

Many of these efforts come together in the push toward serverless. There is a big dedicated team in ScyllaDB working on Kubernetes, which is already used in the DBaaS offering. This work will also be leveraged in the serverless offering. This is a major direction ScyllaDB is headed toward and a cross-team project. A free trial based on serverless was made available at the ScyllaDB Summit and will become generally available later this year.

P99, Rust and Beyond

Laor does not live in a ScyllaDB-only world. Being someone with a hypervisor and Linux Red Hat background, he appreciates the nuances of P99. P99 latency is the 99th latency percentile. This means 99% of requests will be faster than the given latency number, or that only 1% of the requests will be slower than your P99 latency.

P99 CONF is also the name of an event on all things performance, organized by the ScyllaDB team but certainly not limited to them. P99 CONF is for developers who obsess over P99 percentiles and high-performance, low-latency applications. It’s where developers can get down in the weeds, share their knowledge and focus on performance in real-time.

P99 CONF is not necessarily a ScyllaDB event or a database event. Laor said that participants are encouraged to present databases, including competitors, as well as talk about operating systems development, programming languages, special libraries and more. It’s not even just about performance. It’s also about performance predictability, availability and tradeoffs. In his talk, Laor emphasized that it’s expensive to have a near-perfect system, and not everybody needs a near-perfect system all the time.

Among many distinguished P99 CONF speakers, one who stood out was Bryan Cantrill. Cantrill has stints at Sun Microsystems, Oracle and Joyent and a loyal following among developers and beyond. In his P99 2021 talk, Cantrill shared his experience and thinking on “Rust, Wright’s Law and The Future of Low-Latency Systems.” In it, Cantrill praised Rust as being everything he has come to expect from C and C++ and then some.

ScyllaDB’s original premise was to be a faster implementation of the Cassandra API, written in C++ as opposed to Java. The Seastar library that was developed as part of this effort has gotten a life of its own and is attracting users and contributors far and wide (such as Redpanda and Ceph). Dor weighed in on the “what is the best programming language today” conversation, always a favorite among developers.

Although ScyllaDB has invested heavily in its C++ codebase and perfected it over time, Laor also gives Rust a lot of credit. So much, in fact, that he said it’s quite likely that if they were to start the implementation effort today, they would have done it using Rust. Not so much for performance reasons, but more for the ease of use. In addition, many ScyllaDB users like Discord and Numberly are moving to Rust.

Even though a codebase that has stood the test of time is not something any wise developer would want to get rid of, ScyllaDB is embracing Rust too. Rust is the language of choice for the new generation of ScyllaDB’s drivers. As Laor explained, going forward the core of ScyllaDB’s drivers will be written in Rust. From that, other language-specific versions will be derived. ScyllaDB is also embracing Go and Wasm for specific parts of its codebase.

To come full circle, there’s a lot of incremental change going on. Perhaps if the garage door wasn’t up, and we only got to look at the finished product in carefully orchestrated demos, those changes would stack up more impressions. Apparently, that’s not what matters more for Laor and ScyllaDB.

george-anadiotis-scylladb-blog

About George Anadiotis

George's got tech, data, and media, and he's not afraid to use them. Coming from an IT background, he's had the chance to learn to play many instruments on the way to becoming a one-man band and an orchestrator: being a Gigaom analyst, serving Fortune 500, startups and NGOs as a consultant, building and managing projects, products and teams of all sizes and shapes, and getting involved in award-winning research among others. George runs Linked Data Orchestration: http://linkeddataorchestration.com

Virtual Workshop

Build Low-Latency Applications in Rust on ScyllaDB