Scylla is a drop-in replacement for Apache Cassandra that is implemented in C++ rather than Java. Scylla Cloud is the industry’s most performant NoSQL database as a service. Since Cassandra derived key concepts from the original Dynamo paper, Scylla can be thought of as the ‘granddaughter’ of Dynamo. As such, Scylla Cloud and Amazon’s DynamoDB share a common class of NoSQL databases: column family with wide rows and tunable consistency. Amazon DynamoDB, along with its Google counterpart, Bigtable, were the first movers in this market and opened up the field of globally distributed, high availability NoSQL databases.
Many IT organizations evaluating Cloud NoSQL databases will evaluate both Scylla Cloud and DynamoDB. For that reason, it is worth taking a close look at Scylla Cloud for a variety of workloads and see how the performance compares to Amazon DynamoDB, and why Scylla Cloud should be the obvious choice for NoSQL applications that require extremely fast storage at a low price.
For this benchmark comparison, we ran the well known standardized benchmark, Yahoo! Cloud Serving Benchmark (YCSB), which provides an industry-standard methodology for simulating a variety of workloads.
We conducted the following tests:
The following benchmark tests show that Scylla Cloud demonstrates superior latencies compared to Amazon DynamoDB with 20x better throughput in the hot-partition test and better latency numbers. DynamoDB failed to meet the required SLA during workload-A (zipfian distribution) due to its hot partition access limitation (read more here). It also failed to achieve the required 120K IOPS (1.1Kb payload) SLA during population phase; it requires provisioning ~320K capacity for writes.
As tested, Scylla Cloud is one-seventh the expense of Amazon DynamoDB when running equivalent workloads. Scylla software can be run in a variety of public cloud providers as well as within your own datacenter. Scylla provides freedom of choice with no cloud vendor lock-in.
Figure 1 – Comparison of 1 year costs of Scylla Cloud compared to Amazon DynamoDB for required SLA.
Tests were conducted using Amazon DynamoDB and Amazon Web Services EC2 instances as loaders. Scylla Cloud also used Amazon Web Services EC2 instances for servers, monitoring tools and loaders. For the benchmark tool we used Yahoo! Cloud Serving Benchmark (YCSB) since it is cross platform as well as an industry standard.
The databases were populated with a 1TB dataset made of 1B partitions (for Scylla Cloud we used RF=3), after which the YCSB workload-A executed twice; first with uniform distribution and second with zipfian distribution (read more here). We then deleted the dataset on both databases and re-populated them with a ~110MB dataset made of 100K partitions (for Scylla Cloud we used RF=3), after which we executed YCSB workload-A two times using Uniform distribution. First using a single client working on a single (hot) partition, and then a second test using 8 clients; each client working on a single (hot) partition, so 8 single (hot) partitions.
We tested a single (hot) partition because, although everyone would always be happy to have their data evenly distributed, hot partition/s situations emerge due to forces beyond our control (e.g. the dress that broke the internet), or due to poorly-modelled data.
For both the population and workload-A tests, we targeted a sustained throughput SLA of ~120K operations per second. Sustaining an SLA of ~120K ops for workload-A (60K WR + 60K RD) using ~1.1Kb payload, which is equal to 2 reserved I/O operations on DynamoDB. The test followed DynamoDB’s recommendation to keep Max load (used capacity) below 75%, requiring provisioned capacity of 160K for write and 80K for reads (see more info here).
The first test was to populate one terabyte (TB) of data into each database. Populating a database with 1 TB of data is a significant but not uncommon task. For Scylla Cloud we used a replication factor of 3 and set up 1 billion partitions. AWS DynamoDB requires that a user provision the capacity of reads and writes in advance of the actual input or output. This is referred to as ‘pre-provisioning’. AWS suggests that a customer overprovision (and thus overpay) by approximately 30% more than they will use. Table 1 describes the setup for the benchmark.
Scylla Cloud Cluster
Table 1 – The setup for running YCSB in Scylla Cloud and Amazon’s DynamoDB.
Duplicating data across regions is critical for data retention in the event of a data center failure. For both Amazon’s DynamoDB and Scylla Cloud, a process was set up to duplicate the data from one datacenter that is part of AWS (US-East) and then the data was copied to another datacenter in AWS (US-West). While the process to set up replication differs with different products, the results in terms of the performance of replication cannot be overstated.
Table 2 summarizes the replication results.
Avg. Replication Latency (ms)
Max Replication Latency (ms)
Scylla Cloud Performance Performance Gain
Table 2 – Replication results in terms of milliseconds.
Duplicating your data fast will enhance the overall performance of the application, while ensuring that your data is safe. You don’t want to have to wait for Amazon DynamoDB to duplicate your data as that will slow your application performance greatly.
While the loading of the database was smooth and non-eventful for Scylla Cloud, DynamoDB generated multiple errors at random times during data population. On multiple occasions, DynamoDB returned an error that the system could not process the request. In a number of cases, it took several hours of loading data before encountering an error. To address this exception, developers of the client application had to implement custom code to handle the retry.
After the data had been loaded into both databases, workloads were run against them for 90 minutes as specified in the YCSB specification. The benchmark was set to perform 50% writes and 50% reads, representing many common database use cases. With a single partition and a single client, Scylla Cloud demonstrated an approximately 20x performance improvement over DynamoDB in the overall throughput of the system. By using only 1 client, the servers were not stressed in terms of I/O. Under these circumstances, Scylla Cloud consistently outperformed DynamoDB by a wide margin.
To reflect more realistic conditions, the next test scenario was more complex. Eight clients would connect to the database clusters simultaneously. Eight YCSB clients were run simultaneously, each working against a single partition of a 110MB dataset (100K partitions). During testing, we observed a DynamoDB limitation when a specific partition key exceeds 3000 read capacity units (RCU) and/or 1000 write capacity units (WCU). Even when using only ~4% of the provisioned capacity (~7K ops/sec), some YCSB clients experienced few ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed by DynamoDB. This indicates that DynamoDB cannot handle the workloads as advertised or within the SLAs published by AWS.
To simulate real-world conditions, we changed the request distribution in the YCSB loaders to a Zipfian distribution. With other parameters remaining unchanged; loaders are still trying to send 90,000 requests per second (with 50% reads, 50% updates).
Zipfian distribution was originally used to describe word frequency in languages. However, this distribution curve, known as Zipf’s Law, has also shown correlation to many other real-life scenarios. It can often indicate a form of “rich-get-richer” self-reinforcing algorithm, such as a bandwagon or network effect, where the distribution of results is heavily skewed and disproportionately weighted.
Over time, a certain search result becomes popular, and more people click on it. Thus, it becomes an increasingly popular search result. Examples include the number of “likes” different NBA teams have on social media, as shown in Figure 4, or the activity distribution among users of the Quora website.
When these sorts of result skew occur in a database, it can lead to incredibly unequal access to the database and result in poor performance. For database testing, this means we’ll have keys randomly accessed in a heavy-focused distribution pattern that allows us to visualize how the database in question handles hotspot scenarios.
Figure 2 – Example of a Zifian distribution of data access.
Zipfian emulates hot-partition workloads. Here we again observed the same hot partition limitation in DynamoDB. Whenever we pushed more than ~68K ops/sec (~43% used capacity), YCSB experienced multiple, recurring ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed by DynamoDB. As a result, DynamoDB was not able to fulfill the 120K ops/sec SLA and could not keep up with provisioned capacity.
Screenshot 1: Single partition. Consumed capacity: ~0.6% -> Throttling imposed by DynamoDB
Screenshot 2: 8 single partitions. Consumed capacity: ~4% -> Throttling imposed by DynamoDB
|YCSB Workload||Scylla Cloud (3 x i3.8xlarge) | 1 client||DynamoDB (160K WR | 80K RD) | 1 client|
Single partition (~1.1Kb)
Duration: 90 min.
Overall Throughput(ops/sec): 20.2K
Avg Load (scylla-server): ~5%
READ operations (Avg): ~50M
Avg. 99th Percentile Latency (ms): 9.4
Avg. 95th Percentile Latency (ms): 2.7
Avg. 99th Percentile Latency (ms): 4.5
Overall Throughput(ops/sec): 857
Max Consumed capacity: WR 0.6% | RD 0.6%
READ operations (Avg): ~2.3M
Avg. 99th Percentile Latency (ms): 10.7
Avg. 95th Percentile Latency (ms): 7.7
Avg. 99th Percentile Latency (ms): 607.8
Table 4 – Detailed performance of Scylla Cloud vs. DynamoDB
Last but not least, Scylla provides you freedom of choice with no cloud vendor lock-in (as Scylla can be run on various cloud vendors, or even on-premises).
Getting started takes only a few minutes. Scylla has an installer for every major platform. If you get stuck, we’re here to help.