DynamoDB Resources

DynamoDB is a popular NoSQL database offered by AWS as a fully managed database as a service (DBaaS) running on Amazon Web Services (AWS) cloud infrastructure.

AWS designed DynamoDB to implement a combination of Amazon Dynamo technology and Google BigTable (a wide column and key value store).

DynamoDB is primarily a key-value store NoSQL database that also supports a document store data model using JSON. DynamoDB is a proprietary, closed source non-relational database.

DynamoDB only runs on AWS cloud infrastructure. It does not run on-premises or in other cloud environments. It is sold through a “serverless” model. Customers pay a subscription based upon database read/write throughput and storage needs versus systems in a database cluster.

How to Use DynamoDB?

AWS Course on Getting Started with DynamoDB

AWS provides detailed training and certification courses for DynamoDB with self-paced courses that take about 16 hours (plus optional labs). In the courses, you simulate a real application development project with lessons that cover:

The DynamoDB API and AWS SDKs
Working with indexes
Managing capacity, consistency, and performance
Managing DynamoDB applications at scale
Advanced monitoring and optimization
Architecting applications and tables

Amazon Web Services (AWS) Developer Guide

AWS provides a series of hands-on tutorials for users in their developer guide that assist them in getting started with DynamoDB. They cover everything from basic concepts in how to use DynamoDB to deep dives into reading and updating data in tables and creating and querying indexes.

Alex Debrie’s Resources
DynamoDB expert Alex Debrie has written The DynamoDB Book and created DynamoDBGuide.com, an open guide that walks through the basic concepts and advanced features of the DynamoDB. He also maintains a GitHub page that collects his top DynamoDB resources. This page includes sections focusing on books, videos, written resources, tools, and uses – plus an entire section on AWS re:Invent talks by Rick Houlihan, inventor of DynamoDB.

Data Modeling in DynamoDB
Find an example of modeling relational data in DynamoDB from AWS here, and a developer guide to best practices for modeling relational data in DynamoDB here.

Overcoming DynamoDB Challenges
Here are a few first-hand accounts of how teams overcame DynamoDB challenges such as high costs, slow autoscaling, and deployment limitations:

GE Healthcare: GE Healthcare’s clients needed their solution to run on-premises. The team had two choices: rewrite the Edison Workbench to run against a different datastore, or find a database compatible with DynamoDB.
iFood: DynamoDB’s autoscaling was not fast enough for their use case. iFood’s bursty traffic spikes around lunch and dinner times. Slow autoscaling meant that they could not meet those daily bursts of demand unless they left a high minimum throughput (which was expensive) or managed scaling themselves (which is work that they were trying to avoid by paying for a fully-managed service).
Lookout: Lookout needed to dramatically scale their solution, however, DynamoDB alone was going to cost them about $1M per month if they scaled it for the target 100M devices. Also, DynamoDB lacked the required level of sorting for time series data. With DynamoDB, they had to use a query API that introduced a significant performance hit, impacting latency.

Decorative Image of Scylla Monster Watching Screen

Masterclass: Data Modeling for NoSQL Databases

DynamoDB Cost

A free tier of AWS DynamoDB does exist, but it offers more minimal features. DynamoDB pricing is based on six factors:

Amount of replicated write request units while using DynamoDB global tables
Backup and restore operations performed
Data storage
Data transfer amounts
DynamoDB streams
Read and write levels

It is important to consider these factors in the context of the DynamoDB pricing structure to reduce DynamoDB costs and achieve the best performance. There are two pricing structures to choose from: on-demand capacity and provisioned capacity.

DynamoDB On-demand Pricing

The DynamoDB on-demand pricing plan is billed per read and write request units. This is a truly serverless choice, because users are charged only for the requests made. However, when handling large workloads, this choice can become expensive. When it’s not clear how much traffic is coming, the on-demand capacity method is ideal for autoscaling.

DynamoDB Provisioned Capacity

The DynamoDB provisioned capacity pricing plan bills users hourly per read and write capacity units. This means users can control costs by specifying that each database table be fully managed, designating a maximum amount of resources needed. Provisioned capacity offers autoscaling, dynamically adapting to traffic spikes—but only when enabled.

On-demand capacity is best when:

The application’s workload is unclear
The application’s data traffic consistency is unknown
For a pay as you use option

According to blog posts by Yan Cui, on-demand tables compared to provisioned tables are several times more costly per request to use. For a consistent workload with no unexpected spikes, where future usage is unsure, consider enabling autoscaling in provisioned mode.

Provisioned capacity is ideal when:

The maximum application workload is known
Application traffic is consistent and does not demand scaling (it costs more to enable the autoscaling feature)

Users of provisioned capacity that exceeds 100 units can purchase reserved capacity to optimize costs.

Learn more about how AWS calculates costs for both provisioned capacity and on-demand DynamoDB pricing structures here. Or select DynamoDB features, read/write settings, and/or provisioned capacity, and see estimates with the AWS DynamoDB Pricing Calculator.

To see how DynamoDB’s costs compare to alternative databases–including some with DynamoDB-compatible APIs– see this DynamoDB Pricing Comparison Calculator. Alternatives are often around 1/5 the cost of DynamoDB, with 1/3 or less latency and 10x or greater throughput (see benchmarks).

DynamoDB Data Types

DynamoDB supports various types of data for different attributes within a table:

Scalar data types. A scalar type may represent one value only. Binary, Boolean, null, number, and string are the scalar types.

Binary type attributes store binary data such as encrypted data, compressed text, or images. DynamoDB treats each binary data byte as unsigned as it compares binary values.

A Boolean type attribute stores either a true or false value.

Null represents an unknown or undefined attribute.

Numbers can be negative, positive, or zero and have precision up to 38 digits. Numbers that exceed 38 digits result in an exception.

Strings are Unicode. Strings may have a minimum length of zero, if not used as a key for a table or index, and within the maximum item size limit of 400 KB.

Document types. The document types are map and list. These data types can represent complex data structures with depths of up to 32 levels by nesting within each other. As long as the item holding the values is within the DynamoDB item size limit (400 KB), there is no limit on the number of values in a map or list.

A list type attribute, or just a list, is enclosed in square brackets: [ … ] and can store an ordered collection of values. Lists are similar to JSON arrays. The elements do not have to be the same type, and there are no restrictions on the data types a list element can store.

A map type attribute, or just a map, is enclosed in curly braces: { … } and can store an unordered collection of name-value pairs. Maps are similar to JSON objects. The map element can store any data types without restrictions, and the elements in a map need not be of the same type.

Set types. A set type represents multiple scalar values. The set types are binary set, number set, and string set. All set elements must be the same type, so a string set must contain only strings, for example. A set can hold a limitless number of values so long as it is within the DynamoDB item size limit (400 KB).

Each value within a set must be unique. Applications must not rely on a specific element order within the set because the value order in a set is not preserved. Although a set may contain empty binary and string values, DynamoDB does not support empty sets.

To create a secondary index or table, for each primary key attribute (sort key and partition key) specify names and data types and define them as string, number, or binary.

Serverless DynamoDB

DynamoDB is useful for serverless applications because it automatically scales. DynamoDB itself uses a “serverless” cloud model. There is no system sizing and number selected or configuration of a cluster. The cluster infrastructure is “dynamic” – database capacity can scale up or down with increases or decreases in application read/write traffic.

The Amazon Web Services (AWS) serverless patterns collection is a repository of examples of serverless architecture that demonstrate integrations of multiple AWS services. Each pattern uses either the AWS Cloud Development Kit (AWS CDK) or the AWS Serverless Application Model (AWS SAM).

Four patterns use DynamoDB:

Amazon API Gateway REST API to Amazon DynamoDB

This pattern creates an Amazon API Gateway REST API that includes a usage plan and an API key and integrates with “Music”, an Amazon DynamoDB table. The DynamoDB table includes “Artist-Index”, a global secondary index. The API directly integrates with the DynamoDB API and supports Query and PutItem and actions. Use this pattern to store items in a DynamoDB table that come from the specified API.

AWS Lambda to Amazon DynamoDB

This pattern deploys a DynamoDB table, a Lambda function, and the minimum required IAM permissions for running the application. A Lambda function persists an item to a DynamoDB table using the AWS SDK.

AWS Step Functions to Amazon DynamoDB

This pattern deploys a Step Functions workflow that:

Accepts and places a payload in a DynamoDB table
Shows how to read an item directly from the DynamoDB table
Contains the minimum required IAM permissions for the application

Amazon DynamoDB to AWS Lambda

This pattern deploys a DynamoDB table, a Lambda function, and the minimum required IAM permissions for running the application. Whenever items are written or updated in the DynamoDB table, the Lambda function is invoked and it polls the changes in the DynamoDB stream. A payload containing the contents of the changed table item invokes the function.

DynamoDB Metrics

DynamoDB sends various metrics and dimensions to users via CloudWatch. Users view metrics on the console here: https://console.aws.amazon.com/cloudwatch/. Open the console, choose “Metrics” in the navigation pane, and select the DynamoDB namespace.

First metrics are grouped by service namespace, and then by the various combinations of dimensions within each of them.

Amazon CloudWatch aggregates some key metrics at one-minute intervals:

ConditionalCheckFailedRequests
ConsumedReadCapacityUnits
ConsumedWriteCapacityUnits
ReadThrottleEvents
ReturnedBytes
ReturnedItemCount
ReturnedRecordsCount
SuccessfulRequestLatency
SystemErrors
TimeToLiveDeletedItemCount
ThrottledRequests
TransactionConflict
UserErrors
WriteThrottleEvents

It aggregates all other DynamoDB metrics at five-minute intervals.

Although not every statistic is applicable for every metric, all values are available by using the CloudWatch console, through the Amazon DynamoDB console.

List of Available Metrics

Each of these metrics is linked to the complete AWS description.

DynamoDB Analytics

Data from DynamoDB can be exported to Amazon S3 (Simple Storage Service) to use other AWS services (e.g., Amazon Athena for data analytics).

Item-level changes in DynamoDB tables can be captured as Kinesis data stream, enabling you to build advanced streaming applications such as real-time log aggregation, real-time business analytics, and IoT data capture.

DynamoDB Autoscaling

Many database workloads are cyclical or unpredictable. For example, social networking or gaming apps may experience peak hours or periods of time where most users are active or when other demands like launches make the need for levels of throughput much higher. This results in a need to scale database resources in response to usage levels.

Amazon DynamoDB uses application autoscaling service to dynamically respond to actual traffic patterns by adjusting provisioned throughput capacity. This enables a global secondary index or a table to manage sudden increases in traffic, without throttling by increasing its provisioned read and write capacity. Application autoscaling decreases throughput as workload decreases so users don’t pay for unused provisioned capacity.

DynamoDB auto scaling is enabled by default for users who create a table or a global secondary index with the AWS management console, but auto scaling settings can be modified at any time. Read Using the AWS Management Console with DynamoDB Auto Scaling for more information.
When a table or global table replica is deleted, this does not automatically delete any associated scaling policies, scalable targets, or CloudWatch alarms.

These steps summarize the autoscaling process:

Create an application autoscaling policy
Set up autoscaling in provisioned capacity mode and alter assigned DynamoDB capacity units based on application load
Consumed capacity metrics publish to Amazon CloudWatch
Amazon CloudWatch triggers an alarm if the table’s consumed capacity exceeds or falls below target utilization for a specific length of time; when the load increases to between 20% and 90% of existing capacity units, autoscaling can be configured to increase available capacity units for the table
The CloudWatch alarm can be viewed on the console and via notifications using Amazon simple notification service (Amazon SNS) and invokes application auto scaling to evaluate the policy
Application auto scaling adjusts the table’s provisioned throughput by issuing an UpdateTable request
DynamoDB processes the request, increasing or decreasing provisioned throughput capacity dynamically to achieve target utilization
As the table load drops, AWS scales down capacity

It is important to remember that autoscaling isn’t magic. Rather, it’s simply a robot managing capacity for you. However, that robot has two fundamental problems. First, it can only add capacity so fast. Second, its knowledge about the traffic patterns is very limited. As a result, DynamoDB autoscaling cannot deal with unexpected large changes in capacity. Moreover, it is dangerous with respect to costs. The perils of DynamoDB autoscaling are outlined in detail in the DZone article DynamoDB Autoscaling Dissected: When a Calculator Beats a Robot.

DynamoDB Use Cases

App development. DynamoDB enables the development of internet-scale applications that support caches that require connections for millions of users, high concurrency, and millions of requests per second and user-content metadata.

Media metadata store creation. Scale concurrency and throughput for entertainment and media workloads such as interactive content and real-time video streaming, and deliver lower latency with replication across multiple AWS Regions.

Seamless retail experiences. Use design patterns to deploy customer profiles, inventory tracking, shopping carts, and workflow engines. DynamoDB can handle millions of queries per second and supports extreme-scaled, high-traffic events.

Scale gaming platforms. Build out your game platform with session history, player data, and leaderboards for millions of concurrent users. Fuel innovation while reducing operational overhead.

DynamoDB to SQL

It is much more common for teams to move from a SQL database to DynamoDB (NoSQL) than from DynamoDB to a SQL database. Factors that influence this decision include the need for joins, ACID compliance, and scaling.

For a detailed look at SQL vs NoSQL, see this NoSQL guide or read the white paper examining the architectural differences between SQL and NoSQL.

How to Connect to DynamoDB Using Java?

The DynamoDB developer guide provides instructions and examples for connecting applications to DynamoDB using Java.

It covers topics such as:

Setting Your AWS Credentials
Setting the AWS Region and Endpoint
Working with Items and Attributes
Working with Tables and Data in DynamoDB
Query Operations in DynamoDB
Working with Scans in DynamoDB
Improving Data Access with Secondary Indexes

How to Connect to DynamoDB Using Python?

The DynamoDB developer guide provides an example of how to use the AWS SDK for Python (Boto) build and test a Tic-Tac-Toe application that connects to DynamoDB using Python and makes the necessary DynamoDB calls to store game data in a DynamoDB table. It also shows how to use the Python web framework Flask to illustrate end-to-end application development in DynamoDB, including how to model data.

List of Open Source Alternatives to DynamoDB

ScyllaDB (the fastest NoSQL Database) is an open source NoSQL database for data-intensive apps that require high performance and low latency. ScyllaDB offers an Amazon DynamoDB-compatible API that enables seamless migration to ScyllaDB for significant cost savings as well as performance improvements. [see the ScyllaDB-DynamoDB benchmark]. It frees teams from lockin by enabling them to deploy as a service, on premises, or in any public cloud.

Apache Cassandra is among the leading NoSQL platforms powering many modern applications while lowering the overall cost of ownership by offering high performance and scalability, continuous availability, strong security, and operational simplicity.

FerretDB (previously MangoDB) is an open-source proxy using PostgreSQL as a database engine which converts MongoDB wire protocol queries to SQL.

HBase is an open source, distributed non-relational database written in Java and modeled after Google’s BigTable.

OrbitDB is a distributed, serverless, peer-to-peer database that uses IPFS and IPFS Pubsub as data storage and to automatically sync databases with peers, respectively.

RethinkDB is an open-source, scalable database that pushes JSON to apps in real-time.

Titan is a scalable graph database optimized for querying and storing graphs holding hundreds of billions of edges and vertices distributed throughout a multi-machine cluster.

UnQLite is a software library and document store database similar to Redis, MongoDB, and CouchDB which implements a serverless, self-contained, transactional, zero-configuration NoSQL database engine.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog