Scylla Cloud versus Amazon DynamoDB

Performance Report: Scylla Cloud shows superior performance over Amazon's DynamoDB and is significantly less expensive for similar workloads.

Scylla vs. DynamoDB

Overview

Scylla is a drop-in replacement for Apache Cassandra that is implemented in C++ rather than Java. Scylla Cloud is the industry’s most performant NoSQL database as a service. Since Cassandra derived key concepts from the original Dynamo paper, Scylla can be thought of as the ‘granddaughter’ of Dynamo. As such, Scylla Cloud and Amazon’s DynamoDB share a common class of NoSQL databases: column family with wide rows and tunable consistency. Amazon DynamoDB, along with its Google counterpart, Bigtable, were the first movers in this market and opened up the field of globally distributed, high availability NoSQL databases.

Many IT organizations evaluating Cloud NoSQL databases will evaluate both Scylla Cloud and DynamoDB. For that reason, it is worth taking a close look at Scylla Cloud for a variety of workloads and see how the performance compares to Amazon DynamoDB, and why Scylla Cloud should be the obvious choice for NoSQL applications that require extremely fast storage at a low price.

For this benchmark comparison, we ran the well known standardized benchmark, Yahoo! Cloud Serving Benchmark (YCSB), which provides an industry-standard methodology for simulating a variety of workloads.

We conducted the following tests:

  • Loading the database
  • Querying the database
  • Replicating the data to a different region

Summary of Benchmark Results

The following benchmark tests show that Scylla Cloud demonstrates superior latencies compared to Amazon DynamoDB with 20x better throughput in the hot-partition test and better latency numbers.  DynamoDB failed to meet the required SLA during workload-A (zipfian distribution) due to its hot partition access limitation (read more here). It also failed to achieve the required 120K IOPS (1.1Kb payload) SLA during population phase; it requires provisioning ~320K capacity for writes.

As tested, Scylla Cloud is one-seventh the expense of Amazon DynamoDB when running equivalent workloads. Scylla software can be run in a variety of public cloud providers as well as within your own datacenter. Scylla provides freedom of choice with no cloud vendor lock-in.

Figure 1 – Comparison of 1 year costs of Scylla Cloud compared to Amazon DynamoDB for required SLA.

Test Methodology and Configuration

Tests were conducted using Amazon DynamoDB and Amazon Web Services EC2 instances as loaders. Scylla Cloud also used Amazon Web Services EC2 instances for servers, monitoring tools and loaders. For the benchmark tool we used Yahoo! Cloud Serving Benchmark (YCSB) since it is cross platform as well as an industry standard.

The databases were populated with a 1TB dataset made of 1B partitions (for Scylla Cloud we used RF=3), after which the YCSB workload-A executed twice; first with uniform distribution and second with zipfian distribution (read more here). We then deleted the dataset on both databases and re-populated them with a ~110MB dataset made of 100K partitions (for Scylla Cloud we used RF=3), after which we executed YCSB workload-A two times using Uniform distribution. First using a single client working on a single (hot) partition, and then a second test using 8 clients; each client working on a single (hot) partition, so 8 single (hot) partitions.

We tested a single (hot) partition because, although everyone would always be happy to have their data evenly distributed, hot partition/s situations emerge due to forces beyond our control (e.g. the dress that broke the internet), or due to poorly-modelled data.

For both the population and workload-A tests, we targeted a sustained throughput SLA of ~120K operations per second. Sustaining an SLA of ~120K ops for workload-A (60K WR + 60K RD) using ~1.1Kb payload, which is equal to 2 reserved I/O operations on DynamoDB. The test followed DynamoDB’s recommendation to keep Max load (used capacity) below 75%, requiring provisioned capacity of 160K for write and 80K for reads (see more info here).

The first test was to populate one terabyte (TB) of data into each database. Populating a database with 1 TB of data is a significant but not uncommon task. For Scylla Cloud we used a replication factor of 3 and set up 1 billion partitions. AWS DynamoDB requires that a user provision the capacity of reads and writes in advance of the actual input or output. This is referred to as ‘pre-provisioning’. AWS suggests that a customer overprovision (and thus overpay) by approximately 30% more than they will use. Table 1 describes the setup for the benchmark.

Scylla Cloud

Amazon DynamoDB

Scylla Cloud Cluster

  • i3.8xlarge | 32 vCPU | 244 GiB RAM | 4 x 1.9TB NVMe
  • 3-node cluster on single DC  | RF=3
  • Dataset: ~1.1TB (1B partitions / size: ~1.1Kb)
  • Total used storage: ~3.3TB 
  • Loaders: 4 x m4.2xlarge (8 vCPU | 32 GiB RAM) 

Provisioned capacity

  • 160K write  | 80K read (strong consistency)
  • Partition key: y_id (String)
  • Dataset: ~1.1TB (1B partitions / size: ~1.1Kb)
  • Storage size: ~1.1 TB (DynamoDB table metrics)
  • Loaders: 4 x m4.2xlarge (8 vCPU | 32 GiB RAM) 

Table 1 – The setup for running YCSB in Scylla Cloud and Amazon’s DynamoDB.

  • Population (1B partitions): 8 YCSB clients, 2 per loader VM, each client populating 1/8 of the full range.
  • Workload-A: 90 min, using 8 YCSB clients, every client runs on its own data range (125M partitions)
  • Scylla Cloud workloads runs with Consistency Level = QUORUM for Writes and Reads.
  • Scylla Cloud starts with a Cold cache in all workloads.
  • Scylla Cloud is using YCSB prepared statements private branch
  • Amazon DynamoDB workloads ran with dynamodb.consistentReads = true
  • Amazon DynamoDB consumed storage capacity retrieved from DynamoDB table metrics

Duplicating data across regions is critical for data retention in the event of a data center failure. For both Amazon’s DynamoDB and Scylla Cloud, a process was set up to duplicate the data from one datacenter that is part of AWS (US-East) and then  the data was copied to another datacenter in AWS (US-West). While the process to set up replication differs with different products, the results in terms of the performance of replication cannot be overstated.

  • Scylla Cloud: Average replication latency of 82ms, maximum replication latency measured was 272ms.
  • Amazon DynamoDB: Average replication latency was 370ms, maximum replication latency measured was 397ms.

Table 2 summarizes the replication results.

Product

Avg. Replication Latency (ms) 

Max Replication Latency (ms)

Scylla Cloud

82

272

DynamoDB

370

397

Scylla Cloud Performance Performance Gain

350% faster

45% faster

Table 2 – Replication results in terms of milliseconds.

Duplicating your data fast will enhance the overall performance of the application, while ensuring that your data is safe. You don’t want to have to wait for Amazon DynamoDB to duplicate your data as that will slow your application performance greatly.

Test Results

While the loading of the database was smooth and non-eventful for Scylla Cloud, DynamoDB generated multiple errors at random times during data population. On multiple occasions, DynamoDB returned an error that the system could not process the request. In a number of cases, it took several hours of loading data before encountering an error. To address this exception, developers of the client application had to implement custom code to handle the retry.

After the data had been loaded into both databases, workloads were run against them for 90 minutes as specified in the YCSB specification. The benchmark was set to perform 50% writes and 50% reads, representing many common database use cases. With a single partition and a single client, Scylla Cloud demonstrated an approximately 20x performance improvement over DynamoDB in the overall throughput of the system. By using only 1 client, the servers were not stressed in terms of I/O. Under these circumstances, Scylla Cloud consistently outperformed DynamoDB by a wide margin.

Behavior under Real-World Variable Workloads

To reflect more realistic conditions, the next test scenario was more complex. Eight clients would connect to the database clusters simultaneously. Eight YCSB clients were run simultaneously, each working against a single partition of a 110MB dataset (100K partitions). During testing, we observed a DynamoDB limitation when a specific partition key exceeds 3000 read capacity units (RCU) and/or 1000 write capacity units (WCU). Even when using only ~4% of the provisioned capacity (~7K ops/sec), some YCSB clients experienced few ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed by DynamoDB. This indicates that DynamoDB cannot handle the workloads as advertised or within the SLAs published by AWS.

Zipfian Distribution

To simulate real-world conditions, we changed the request distribution in the YCSB loaders to a Zipfian distribution. With other parameters remaining unchanged; loaders are still trying to send 90,000 requests per second (with 50% reads, 50% updates).

Zipfian distribution was originally used to describe word frequency in languages. However, this distribution curve, known as Zipf’s Law, has also shown correlation to many other real-life scenarios. It can often indicate a form of “rich-get-richer” self-reinforcing algorithm, such as a bandwagon or network effect, where the distribution of results is heavily skewed and disproportionately weighted. 

Over time, a certain search result becomes popular, and more people click on it. Thus, it becomes an increasingly popular search result. Examples include the number of “likes” different NBA teams have on social media, as shown in Figure 4, or the activity distribution among users of the Quora website.

When these sorts of result skew occur in a database, it can lead to incredibly unequal access to the database and result in poor performance. For database testing, this means we’ll have keys randomly accessed in a heavy-focused distribution pattern that allows us to visualize how the database in question handles hotspot scenarios.

Figure 2: NBA Facebook Likes

Figure 2 – Example of a Zifian distribution of data access.

Zipfian emulates hot-partition workloads. Here we again observed the same hot partition limitation in DynamoDB. Whenever we pushed more than ~68K ops/sec (~43% used capacity), YCSB experienced multiple, recurring ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed by DynamoDB. As a result, DynamoDB was not able to fulfill the 120K ops/sec SLA and could not keep up with provisioned capacity.

Single Partition Graph

Screenshot 1: Single partition. Consumed capacity: ~0.6% -> Throttling imposed by DynamoDB

8 Partitions Graph

Screenshot 2: 8 single partitions. Consumed capacity: ~4% -> Throttling imposed by DynamoDB

Table 3 – Comparison of performance of Scylla Cloud vs. DynamoDB

YCSB WorkloadScylla Cloud (3 x i3.8xlarge)  | 1 clientDynamoDB (160K WR | 80K RD)  | 1 client

Workload A
50% Read / 50% WriteRange:

Single partition (~1.1Kb)

Distribution: Uniform

Duration: 90 min.

Overall Throughput(ops/sec): 20.2K

Avg Load (scylla-server): ~5%

READ operations (Avg): ~50M
Avg. 95th Percentile Latency (ms): 7.3

Avg. 99th Percentile Latency (ms): 9.4

UPDATE Operations (Avg): ~50M

Avg. 95th Percentile Latency (ms): 2.7

Avg. 99th Percentile Latency (ms): 4.5

Overall Throughput(ops/sec): 857

Max Consumed capacity: WR 0.6% | RD 0.6%

READ operations (Avg): ~2.3M
Avg. 95th Percentile Latency (ms): 5.4

Avg. 99th Percentile Latency (ms): 10.7

UPDATE Operations (Avg): ~2.3M

Avg. 95th Percentile Latency (ms): 7.7

Avg. 99th Percentile Latency (ms): 607.8

Table 4 – Detailed performance of Scylla Cloud vs. DynamoDB

The Bottom Line

  • DynamoDB failed to achieve the required SLA multiple times, especially during the population phase.
  • DynamoDB has 3x-4x the latency of Scylla, even under ideal conditions
  • DynamoDB is 7x more expensive than Scylla
  • DynamoDB was extremely inefficient in a real-life Zipfian distribution. You’d have to buy 3x your capacity, making it 20x more expensive than Scylla
  • Scylla demonstrated up to 20x better throughput in the hot-partition test with better latency numbers

Last but not least, Scylla provides you freedom of choice with no cloud vendor lock-in (as Scylla can be run on various cloud vendors, or even on-premises).

Let’s do this

Getting started takes only a few minutes. Scylla has an installer for every major platform. If you get stuck, we’re here to help.