Products
- Quick Links
  
  Real-Time AI
  
  ScyllaDB powers real-time AI with low latency, high throughput, and billion-scale data.
  Learn More
  
  Is ScyllaDB right for me?
  
  ScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
  Learn More
- Products
- Compare
- Related Technologies
Developers
Customers
Resources
- Featured Resource
  
  ScyllaDB University
  
  Level up your skills with our free NoSQL database courses.
  Take a Course
  
  ScyllaDB Blog
  
  Our blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
  Read the Blog
- Resource Center
- Events
- Compare
Pricing
Contact Us
Chat Now
Get Started
Sign In
Search
Button Links

See all blog posts

Migrating from DynamoDB to ScyllaDB’s DynamoDB-compatible API

By Glauber Costa

September 12, 2019

Bluesky Reddit X LinkedIn

Subscribe to Receive Blog Updates

Subscription Categories

All Posts
Events
Integrations
News
Performance
Releases
ScyllaDB
Seastar
Tutorials
User Stories
Developers Blog

Bluesky Reddit X LinkedIn

oreilly-book-ads

This week we announced our Project Alternator, open-source software that will enable application- and API-level compatibility between ScyllaDB and Amazon DynamoDB. Available for use with ScyllaDB Open Source, Alternator will let DynamoDB users easily migrate to an open source NoSQL database that runs anywhere — on any cloud platform, on-premises, on bare-metal, virtual machines or Kubernetes — without having to change their client code.

Of course, applications operate on data, and the data that is in DynamoDB has to first be migrated to ScyllaDB so that users can take full advantage of this new capability. ScyllaDB already supported migrations from Apache Cassandra by means of the scylla-migrator project, an Apache Spark-based tool that we’ve previously described.

This article presents the extensions done to the ScyllaDB Migrator to also support data movement between an existing DynamoDB installation and ScyllaDB.The idea is simple: a cluster of Spark workers reads the source database (DynamoDB) in parallel and writes in parallel to the target database, ScyllaDB.

Running the Migrator

The latest copy of the scylla-migrator, which already contains the DynamoDB-compatible API bindings can be found here. Always refer for latest steps to bundled README.dynamodb.md.

The first step is to build it. Make sure a recent copy of sbt is properly installed on your machine, and run build.sh.

Next, the connection properties with DynamoDB have to be configured. The migrator ships with an example file, so we can start by copying the example to a configuration file

cp config.dynamodb.yaml.example config.dynamodb.yaml

Most importantly, we will configure the source and target sections to describe our source DynamoDB cluster and target ScyllaDB cluster:

The first thing to notice is that the target configuration is simpler; since we one will be connecting to a live ScyllaDB cluster, not to an AWS service, there is no need to provide the access keys (for public clusters usernames, and passwords can be provided instead).

The next step is to submit the spark application. The application is submitted in a generic way, the migrator leverages spark.scylla.config variable to pass on parameters.

An Example Migration

To test and demonstrate the migrator, we have created an example table in DynamoDB and filled it with data.

Be sure to pay attention to throughput variables and number of splits and mappers. Do set them up properly based on what capacity you can use from what you’ve provisioned. The scan_segments controls the split size(how many tasks there will be, so this divides your data into smaller chunks for processing) and max_map_tasks controls maximum of map tasks for map reduce job that migrates the data.

Tuning those two variables will generally require more resources on spark worker side, so make sure you allocate them, e.g. by using spark-submit option –conf “spark.executor.memory=100G and similar for CPUs.

Since the migrator is just a spark application, the running application can be monitored using spark web UI:

Figure 1: Workers and Running Applications

Figure 2: Active Stages

Verifying the Migration

To verify the migration is successful, we can first match the count of keys from DynamoDB. In this case, we have 7,869,600 keys:

Figure 3: Table Details

When finished, the spark tasks will print messages to their logs with the number of keys that each task transferred:

Which, as we expect, sums up to 7,869,600, the same number of keys we had in the source.

We have also queried some random keys from both databases, since they now accept the same API. We can see that all keys that are randomly sampled are present in both sides:

Script:

Results:

Future work

Much like the DynamoDB-compatible API in ScyllaDB itself, the migrator is a work in progress. In live migrations without downtime, there is usually the need to perform dual writes, a technique in which the application is temporarily and surgically changed so that new writes are sent to both databases, while the old data is scanned and written to the new database. However, DynamoDB supports a well-rounded stream interfaces, where an application can listen for changes. The streams API can be used to facilitate migrations without explicitly coding for dual writes. That is a work in progress in the migrator, expected to debut soon.

Next Steps

If you want to learn more about Project Alternator, we have a webinar that is scheduled for September 25, 10 AM Pacific, 1 PM Eastern. And you can also check out the Project Alternator Wiki.

REGISTER NOW FOR THE ALTERNATOR WEBINAR

READ THE PROJECT ALTERNATOR WIKI

Apache Spark Alternator (for DynamoDB)ScyllaDB Spark Migrator Project Alternator Amazon DynamoDB

About Glauber Costa

Glauber Costa is the founder and CEO of ChiselStrike, the company enabling developers to create fast, scalable serverless backends with TypeScript. Prior to ChiselStrike, Glauber worked at Datadog, ScyllaDB, RedHat, and IBM.

View all posts by this author | Website

Previous Post Next Post

Related Posts

ScyllaDB Alternator: The Open Source DynamoDB-compatible API

Moving from Cassandra to ScyllaDB via Apache Spark: The ScyllaDB Migrator

Deep Dive into the ScyllaDB Spark Migrator

ScyllaDB Monitoring Stack 3.0.1 – Alternator Dashboard

See you at AWS re:Invent 2019!

Managed Cassandra on AWS, Our Take

Comparing CQL and the DynamoDB API

Migrate Parquet Files with the ScyllaDB Migrator

One-Step Streaming Migration from DynamoDB into ScyllaDB

DynamoDB Autoscaling Dissected: When a Calculator Beats a Robot

Start scaling with the world's best high performance NoSQL database.

Product

Product Overview
Architecture
ScyllaDB Enterprise
ScyllaDB Cloud
Vector Search
Benchmarks
Solutions
Release Notes
Install
Tools
Pricing

Resources

Documentation
Online Training
NoSQL Guides
Resource Center
Events
Customer Support
Customer Portal
Community Projects
Blog

Company

About Us
Team
Customers
Partners
News
Media Kit
Careers
Contact Us

Privacy Policy
Terms of Use
Trust Center
Legal Center
Cookie Policy

2026 ©ScyllaDB | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.