PROJECT ALTERNATOR:

The Scylla Open Source Amazon DynamoDB-compatible API

Share on facebook
Share on twitter
Share on print
Share on email

Project Alternator is an open source project for an Amazon DynamoDB™-compatible API. The goal of this project is to deliver an open source alternative to Amazon’s DynamoDB, deployable wherever a user would want: on premises, on other public clouds like Microsoft Azure or Google Cloud Platform, or still on AWS (for users who wish to take advantage of other aspects of Amazon’s market-leading cloud ecosystem, such as the high-density i3en instances). DynamoDB users can keep their same client code unchanged. Alternator is written in C++ and is a part of Scylla.

Background

Dynamo is the NoSQL database that changed the industry in 2007 when the paper about its architecture was published. The Dynamo paper set out a number of principles for a distributed database that have been emulated in a number of projects since:

  • key-value NoSQL store that doesn’t need a rigid relational or hierarchical data model
  • horizontally scalable to take advantage of economically efficient commodity hardware
  • highly available and reliable through a distributed, replicated peer-to-peer architecture
  • tolerant of failures through hinted handoffs, anti-entropy mechanisms like Merkle trees, and gossip-based cluster membership and adaptation to topology changes

However, Dynamo remained an internal proprietary project. In 2012 Amazon offered DynamoDB as a commercial cloud-based service on Amazon Web Services (AWS). It is not the same database, but was “built on the principles” of the original Dynamo. It is also proprietary, only available on Amazon’s own AWS.

DynamoDB can be very costly. Companies spend a great deal of time and engineering effort in their attempts to reduce their DynamoDB bill.

This last aspect became the challenge for ScyllaDB — to create a free open source software alternative that would allow users to deploy a DynamoDB-compatible database to any cluster of their choosing (whether on-premises, or on any public or private cloud), while eliminating the barrier of high cost of ownership — the on-demand capacity or provisioned capacity mode costs in DynamoDB parlance.

Scylla as a Platform for a DynamoDB-compatible API

The ScyllaDB team began development in 2014, and Scylla was released as an open source project in 2015. It was written to be highly performant by running close-to-the-metal.

Like DynamoDB, Scylla is also suitable for many key-value and wide-row use cases, while exhibiting the same properties of scalability, flexibility and reliability. Scylla was designed initially to be a drop-in replacement for Apache Cassandra™, a NoSQL system designed at Facebook and eventually open sourced through the Apache Foundation. While there are some significant differences between Dynamo and Cassandra, there are also many similarities. This is due in no small part to them having shared a co-inventor in Avinash Lakshman. In this way, Scylla can be seen as a technological descendant of Dynamo, and thus, a distant cousin to DynamoDB.

In testing between the two systems Scylla was proven to outperform DynamoDB in throughput, latency and price-performance. However, due to their different data access methods, switching between the two systems is non-trivial to users. Scylla uses the Cassandra Query Language (CQL), which is syntactically akin to Structured Query Language (SQL), while DynamoDB uses a Javascript Object Notation (JSON) format for queries.

Thus, for a user to switch systems, they would be forced to rewrite their applications and re-design their queries. Based on ScyllaDB’s extensive experience in making API-compatible databases (e.g., Scylla’s existing Apache Cassandra compatibility), a design principle for the Alternator project is that no DynamoDB API calls should need to be altered from the developer’s perspective. The database system should accept input from the client application in DynamoDB-compatible API format, and transparently translate it into appropriate calls to the underlying database. Resultant data from queries would also need to be returned to users in the DynamoDB-compatible format.

All users would need to do is deploy Scylla, enable Alternator in scylla.yaml, and then point their current DynamoDB client applications to the Scylla cluster. Additionally, an Apache Spark™-based streaming solution to migrate data from existing DynamoDB instances into Scylla is currently under development.

Scylla Alternator provides three key benefits to DynamoDB users:

1. Cost: DynamoDB charges for read and write transactions (RRUs and WRUs). A free open source solution eliminates these costs and minimizes other operational expenses. Scylla’s design efficiency allows developers to use significantly fewer resources for the same task or workload. According to our benchmark, users can expect to save 80% – 93% overall to support the same workload (5x-14x less expensive). Because of its highly performant design Scylla would also eliminate the cost of any in-memory cache (such as DAX).

2. Performance: Scylla was implemented in modern C++ to support features such as instructions per cycle; lock-free, log structured cache; heat-based load balancing; workload prioritization; userspace schedulers and much more. This results in improvements to latency and throughput, and allows Scylla to scale both vertically and horizontally. Scylla can take advantage of high-density nodes, some of which can now support hundreds of cores per server.

3. Openness: Scylla is open source. Users can review the source code and any known defects, and in return can add their own contributions to the project. Operationally, Scylla can run on any suitable server cluster regardless of location (on-premises, in a private or public cloud) or deployment method (bare metal, containerized, virtualized, or deployed in pods via Kubernetes). This contributes to the user’s lower TCO by allowing deployment flexibility in line with their existing operations.

The Current Status of Alternator

Alternator is still in development. It is not yet generally available and we haven’t created a product release with it. However, it is part of the Scylla source code right now.

Our alternator.md and design doc provide detailed information of what’s supported and not yet supported today. In short, most standard applications will just work. The JSON HTTP API is mostly implemented, indexing works, multi zones are implemented, and many more features will work. There are consistency differences that arise from the fact that DynamoDB itself has a leader/follower model versus the active-active model that Scylla implemented. This may be an issue in certain cases. For anyone looking to use Alternator, we advise you to first take a close look at the documentation and your code.

Roadmap

Within the next few months we plan to harden the code, bringing it to production quality by inserting it into our robust quality assurance cycles. We will also completely implement all of the nitty gritty API differences. In the future, we will offer a Scylla Enterprise release containing the Alternator software and also release a version to run on Scylla Cloud.

Future updates to our managed DBaaS will run on Azure and GCP but you won’t have to wait. As soon as Alternator releases you will be able to run it on your own Amazon, Azure and Google Cloud instances. We also plan on having a General Availability (GA) version for our Kubernetes operator so you can fully deploy and manage a DynamoDB-compatible database wherever you wish.

We will also address the load balancing differences between Scylla and DynamoDB clients. Unlike Scylla, where the client is topology aware and can access any node, in DynamoDB the clients receive a load-balanced IP DNS translation that is Timed-to-Live (TTLed) every 4 seconds.

When Scylla’s workload prioritization feature is enabled, developers can assign more resources to crucial workloads, cap less important workloads, and so, for example, mix analytics and real time operational loads on the same cluster.

At our Scylla Summit (November 5-6, 2019) we plan to release Scylla’s first Light Weight Transaction (LWT) feature. Initially based on Cassandra’s Paxos, LWT will allow us to add consistency to the Alternator. In parallel we are working on a Raft version that has a leader/follower model and better performance.

We’re also going to announce our own streaming – Change Data Capture (CDC) feature and we may add compatibility with DynamoDB streams later on.

In another upcoming release, Scylla will introduce User Defined Functions (UDFs) and, soon after, map-reduce computation based on it. UDFs are far more efficient than serverless alternative since the functions are executed on the server.

To make the switch as easy as possible we’ve extended our online migration tools to be relatively simple: just start streaming the changes from DynamoDB plus run a full scan. The Scylla Spark Migrator project has been enhanced to support the DynamoDB-compatible API.

Contributing to Project Alternator

Developers looking to become open source contributors to the project are encouraged to join the scylladb-dev mailing list, or to visit ScyllaDB’s Slack.

Resources

Alternator Webinar

ScyllaDB co-founders Avi Kivity and Dor Laor will give an in-depth look (including demos) of Scylla Alternator via a webinar on Wednesday, September 25, 2019 at 10 AM Pacific, 1 PM Eastern. You won’t want to miss it!

REGISTER NOW FOR THE ALTERNATOR WEBINAR