Update: Since this post was published, we’ve conducted benchmarks on Cassandra 4.0 vs. Cassandra 3.11 vs. ScyllaDB Open Source 4.4. See the results in our Cassandra benchmark blog.
Amazon definitely piqued our interest by announcing their new Managed Cassandra Service (MCS) yesterday. As a database vendor, we’ve been asked by many about our view and even whether they’re running ScyllaDB, a real-time big data database, under the hood since they promise single digit latency. Here is our quick analysis of managed Cassandra on AWS while the discussion is still being hotly debated on Hacker News.
What Is Managed Cassandra on AWS, Exactly?
Like many new services it’s hard to figure out what exactly is being offered and how it makes running Cassandra on AWS better. An open source blog was published but it’s hard to sort out the architecture (and the limitations or advantages) of their implementation. This is about the 4th re-implementation of the Cassandra API (after ScyllaDB, Yugabyte, CosmosDB), and certainly not the first cloud-hosted, managed Cassandra service (ScyllaDB Cloud, DataStax, Instaclustr, Bitnami, Aiven, and so on), so why keep it a secret? AWS may be doing themselves a disservice in being less transparent.
For AWS’ sake, we can say that they continue to surprise the industry with their rigor and their relentless execution. No service is immune from AWS. It doesn’t matter if it’s open or closed source, hardware or software — any worthy technology will eventually be offered as a service on AWS, even if they have an already robust alternate offering. Usually AWS prefers to go horizontal, broadening their portfolio like in the Cassandra/MongoDB case, instead of going vertically and focusing on a narrower set of better offerings. And they are always looking to learn and adapt, like the switch from Xen to KVM+Nitro.
That said, let’s take a look at what this Amazon managed Apache Cassandra is all about. MCS is a form of chimera: the front end of Apache Cassandra (including the CQL API) running on top of a DynamoDB back end (both for the storage engine and the data replication mechanisms). As confirmed on Twitter:
If you reverse engineer their AWS managed Cassandra pricing, you’ll discover it closely resembles the DynamoDB prices with an extra 16%, probably for the added translation layer. It’s compatible with a mix of Apache Cassandra 2.0 and 3.11 CQL features with many limitations we’ll discuss soon, but let’s be positive and start with the upside:
- It’s ‘serverless’ — As anyone who’s used to spin-up nodes and clusters before you can use them, this is quite refreshing. When you login to the MCS console, you can create a keystore and tables without spinning up anything, very convenient!
- It’s especially convenient for a complex service like Cassandra with all of its tuning challenges and administration overhead.
- It’s integrated with IAM, AWS’s authentication service. That’s very nice and secure too.
- The compute is detached from the storage. That’s a big upside for many Cassandra workloads.
- Fast Lightweight Transactions (LWT) – This is great actually, DynamoDB is based on a leader-follower model, thus simple transactions do not pay a penalty — nice!
Looking deeper, the functional differences are significant. There is no multi-region support, no UDT, no ALTER TABLE, no counters — all pretty fundamental for Cassandra users.
Cassandra JVM is used under the hood, and we are left to speculate if and how garbage collection (GC) may strike. Obviously GC mitigation and Cassandra tuning can be problematic in an “as a service” offering since the user has no control on their tuning.
There are no materialized views, no ability to load SSTables directly (a capability open source users have gotten used to). It’s unclear whether the DynamoDB storage limit of 1 megabyte per item exists here. ScyllaDB for example can store objects at the size of 10s of megabytes.
Yes, there’s no doubt this is expensive, with $1.45 per million writes, a workload consisting of an average of 100k iops will cost 1.45×0.1×24×3600×365 = $4,572,720/year! To compare, ScyllaDB runs 200k iops with three i3.2xlarge nodes. At $0.312 hourly per node, that’s 0.312×3×24×365 = $8,900 annually plus licensing.
Lastly, there is the flexible replication and consistency level. Managed Cassandra Service allows you to use the consistency level of one or local quorum (but no more than this), and no control over the replication factor (we assume its 3 base on DynamoDB architecture). ScyllaDB and Cassandra allows you to have any replication factor and many more consistency levels.
Who Is AWS Managed Cassandra Aimed at?
As the Managed Cassandra Service is more expensive but less powerful than DynamoDB, Amazon must be aiming its new service at pure Cassandra migrators. However, the current set of limitations is quite a hurdle for those who wish to move to the convenience of serverless distributed Cassandra API. The service is still in its infancy and we’re sure it will get better, yet some basic limitations are inherent with their implementation model.
What’s ScyllaDB’s Final Take on Managed Cassandra on AWS?
AWS continues to march on with impressive execution speed. Some products are very impressive, such as the new ARM chip, everything around s3 and Aurora. However, we find MCS to be as a hybrid, ‘chimera’ — half-Cassandra, half-Dynamo, semi-compatible, serverless but with a potential for GC, mostly proprietary with a dash of open source.
Yes, AWS promises to contribute back to Cassandra. The question is whether those contributions be more than self-serving ones such as the AWS IAM authentication or bigger contributions beyond pluggable backends. Time will tell but with the underlying DynamoDB database there are no incentives to make real progress. To Amazon’s credit, they already made a single major contribution in the form of the Dynamo paper, 12 years ago.
We at ScyllaDB can certainly learn from AWS managed Cassandra service implementation. Its integration is solid, the autoscale capability is great, and the GUI is well done. Competition is vital for our brains, for the whole industry, and for you, our users, developers and community members.
In that spirit of competition, if you’ve read this far, we invite you to have a look at our ScyllaDB Cloud (NoSQL DBaaS), which supports both CQL and our new DynamoDB-compatible API (Project Alternator) that offers 5x-20x price/performance gains over other NoSQL databases and killer features like workload prioritizations that guarantee per workload SLA.