InfoWorld editors conducted independent research into various information technology domains and recognized 12 leading technologies for their 2020 Technology of the Year awards. ScyllaDB is honored to have our Enterprise database product recognized in the same lineup of amazing technologies as Tableau, Atlassian’s Jira, Datadog, Hashicorp’s Terraform, PyTorch, Databricks, Snowflake, Kinetica, Confluent, UiPath and Anaconda.
Scylla was created with the goal of becoming a key tool in the modern infrastructure stack. This award is a result of the hard work we have put into our database over the years. Patch by patch, idea by idea, we took many steps towards gaining more users and customer confidence and satisfaction. A team of very experienced experts work day and night to make Scylla better.
In 2020, after five and a half years of work we reached feature parity with Apache Cassandra and have now even started to challenge the industry’s leading player, Amazon’s DynamoDB with our new compatible API. While at times it feels like our brains might explode from so many initiatives — which together combined, requires 100x more hands to implement — Saying ‘no’ or ‘not now’ to so many good ideas is one of the most frustrating jobs at ScyllaDB. We try to keep our focus and select the ones that are most requested by our customers and that make the most difference.
Scylla is implemented natively in C++ (we started with version C++11, moved to 14, then 17 and, soon enough we expect to adopt C++20 with its coroutines). Yet Scylla’s success is due to much more than its programming language. Every chunk of code in our data path is classified to a priority class and measured and monitored for CPU and I/O usage. The schedulers allow us full control of every operation, thus we can prioritize real time transactions over administrative tasks. Log structured merge (LSM) tree compaction is a solved problem with Scylla.
Just this week a new user of Scylla Cloud opened a support call since they noticed a 100% CPU usage. Normally that kind of utilization is a warning sign that a system is thrashing. But we showed how this CPU usage jumped from 30% to nearly 100% because we automatically scheduled a repair. The system was actually using all of the CPU the user had available. It wasn’t thrashing. It was working as designed. Unlike many other databases, for Scylla 100% utilization is considered good. Plus, we don’t suck up those CPU cycles mindlessly. Scylla chose to serve transactions first (important for a high availability database) and during opportune moments, when there are spare CPU cycles, we schedule repair execution to fill up all of the rest of the idle time. That’s how we both keep latency minimal and speed up repair, streaming and compaction.
The culture at Scylla is to squeeze every cycle of CPU power, every bit in your NVMe — but it is not all just about performance. We try to improve capability and usability as well. Here are some examples for our efforts:
- Really durable and fast Lightweight Transactions (LWT)
Scylla implements the same Paxos protocol as Cassandra and other databases that provide strong consistency. In our development process we discovered a unique method to remove one round-trip, that dramatically improves performance. Soon, another round trip will be gone. However, an even more important improvement was implementing an automatic mode that only flushes the commit log if there are pending transactions. That makes our Paxos implementation friendly, easy, fast and durable.
- Global and local secondary indexes
While some of our competition have only figured out how to implement one or the other, we offer our users both local and global secondary indexes, which grants them the flexibility to implement their data model and queries as best fit their use case.
- Database streaming – Change Data Capture (CDC)
We implemented CDC in an elegant manner. All changes to a base table are sent to a CDC table with a Time to Live (TTL). You use the same CQL query on the CDC table as the base table. No special interface or API is required to monitor the changes. The user can select to receive just the updates, or the updates with the former values. The data is unified from all replicas. Since we use a regular table (which we create automatically), one can apply all types of CQL manipulations on it, thus one can filter the changes for special events instead of consuming all of them by the client. By the way, this feature is on the DynamoDB wishlist and we already have it! 😉
- Redis API
Believe it or not, this feature has already been merged into the Scylla code base. While it’s far from being ready, now’s a good time to give us your feedback! I mention this here because the Redis API is also on that same DynamoDB wishlist.
- HUGE limits
Another one from the DynamoDB wishlist… They are asking for 1MB maximum items. But already users keep 10 MB or larger objects in Scylla easily and some even use Scylla as a backend behind an s3-like implementation. So far the Scylla user world record for the largest partition (wide row) is 153GB! (Note that we do not recommend such large rows.) We also support HUGE nodes with up to 60TB per node.
- No manual tuning, at all
During install time the scylla_setup script automatically configures the best parameters for the kernel, optionally set up a RAID array with the optimized stripes, sets NTP and many more parameters. We even run a benchmark to test your disk’s sweet spot so Scylla will queue everything in userspace and impose minimal latency. (If you had to tune Scylla for performance, please open a github issue — seriously!)
io_uringis a new Linux kernel interface for async I/O completion which allows us to do much more. Although Scylla is already optimized for asynchronous operations I am excited to say that we actively developed this new API which should be available in the next few days and will translate to better performance and lower latency.
- 37% reduction in storage space
Incremental compaction, our new algorithm, saves more than a third of your storage! Moreover, it makes operations such as major compaction faster, smoother and cheaper.
- Workload prioritization
This is one of our flagship innovations, allowing disparate workloads to run on the same cluster without either hogging all of the system resources and bottlenecking the others. You can read about it here.
- Driver improvements
Our shard-per-core topology-aware driver is very instrumental for better load balancing and latency improvements. The bypass-cache option can come into play as well and last but not least, our GoCQLX simplifies Go code a lot.
- Integrations with many tools
As I said above, Scylla often collaborates with many of the other winners of this year’s Technology of the Year awards. You can combine the power of Scylla with that of Confluent’s Kafka to stream data, migrate and clone clusters with our Spark migrator, monitor your Scylla database with Datadog. Beyond these, you can create a highly scalable graph database with JanusGraph, use KairosDB for time-series aggregation, use Presto for SQL queries, or use Kong for API management of your microservices.
The heavy lifting of matching Cassandra’s capabilities is behind us, but Scylla is still far from perfect and there is much more to do. Apart from many small improvements, we have a new, long-term initiative that we will reveal in the not-so-distant future.
But the most important thing that makes Scylla successful is our users. It is your collective desires, visions and ideas that provide us the clarity and priority of the next set of technical challenges to target. We welcome you to continue our journey together towards mechanical and digital simplicity.