Going to Google Cloud Next? Visit Booth #4719 for live NoSQL demos, 1M OPS benchmarks, and your monster plushie!

DynamoDB Throttling

DynamoDB Throttling Definition

DynamoDB throttling occurs when the number of requests to a DynamoDB table or global secondary index exceeds the provisioned read or write capacity units (RCUs/WCUs) allocated to it. When this happens, DynamoDB rejects the excess requests and returns a ProvisionedThroughputExceededException error — a DynamoDB throttling exception that applications must handle explicitly. Retry logic with exponential backoff is the standard response.

Throttling is also enforced at the partition level, not just the table level. DynamoDB partition throttling is one of the most operationally disruptive forms: a single hot partition can exhaust its share of provisioned throughput (3,000 RCUs or 1,000 WCUs) and be throttled even when the table as a whole has unused capacity. This is why DynamoDB throttled requests can appear on a table that appears — at the aggregate level — to be operating within its provisioned limits.

DynamoDB’s on-demand capacity mode defers provisioning to AWS but does not eliminate throttling. DynamoDB on demand throttling occurs when requests exceed twice the table’s previous peak throughput — a threshold that surprises teams running irregular or bursty workloads. For AWS DynamoDB throttling to be avoided entirely, teams typically need to either significantly over-provision (using on demand Warm Throughput to prepare for a new baseline load) or redesign their access patterns.

Principally executing a buffering technique, database cache architecture stores frequently-queried data in a temporary memory so it is more readily accessed to reduce database workloads. For example, a user may need to retrieve their profile from the database to use the system. The first time it will need to go from server to server, but then it will store the user profile closer to the user to reduce the time needed to read the profile the next time it is queried.

However,  database cache system are not as simple as they are often made out to be. In fact, they can be one of the more problematic components of a distributed application architecture. If your data infrastructure relies on caches, your overall approach may be subject to the downsides of caching

 

Chart showing DynamoDB throttling

DynamoDB Throttling FAQs

How does DynamoDB throttling work?

DynamoDB throttling is enforced using a token bucket algorithm applied at the partition level. Each partition holds a bucket of tokens representing available read and write capacity. Each request consumes tokens proportional to the number of RCUs or WCUs required. When a partition’s token bucket runs dry, new requests are rejected with a ProvisionedThroughputExceededException. In provisioned mode, AWS DynamoDB throttling begins at the partition level before the table-level limit is reached if access is uneven. In on-demand mode, DynamoDB on demand throttling begins when traffic spikes exceed 2x the previous peak.

What are DynamoDB throttling limitations? 

DynamoDB throttling introduces several compounding operational burdens: partition-level hot spots that require partition key restructuring to resolve, DynamoDB throttling errors that must be handled in application code with carefully tuned retry logic, DynamoDB Global Secondary Index (GSI) throttling that operates independently for each global secondary index, DynamoDB throttling on demand behavior that surprises teams during traffic spikes, and unpredictable cost exposure when DynamoDB throttled exceed capacity thresholds force teams to switch between provisioned and on-demand modes in the long run.

How does DynamoDB throttling compare to ScyllaDB?  

ScyllaDB does not use a provisioned capacity unit model and does not have anything like DynamoDB’s throttling mechanism. ScyllaDB does not impose hard limits on querying hot partitions. Instead, it routes each request directly to the CPU shard that owns the relevant data; here, throughput is bounded by hardware rather than configured quotas. Teams migrating from DynamoDB to ScyllaDB eliminate the DynamoDB throttle incidents that characterized their DynamoDB operations.

In DynamoDB vs ScyllaDB benchmarks, DynamoDB delivered as little as 16% of its provisioned throughput under skewed workloads; ScyllaDB sustained target throughput with single-digit millisecond P99 latencies throughout.

 

Key Components of DynamoDB Throttling Architecture  

DynamoDB’s throughput enforcement is built on several interdependent components that together determine when DynamoDB throttled requests occur and how the database responds.

Provisioned capacity mode assigns a fixed number of RCUs and WCUs to each table and distributes that allocation evenly across underlying partitions. As tables grow and partition splits occur, capacity is redistributed — but at any given time, each partition has a bounded throughput ceiling. Requests that push a single partition beyond its ceiling trigger throttling at the partition level, producing DynamoDB throttled read events and DynamoDB throttled write events independently.

Adaptive capacity is DynamoDB’s mechanism for redistributing unused capacity from cold partitions to hot ones. It reduces the impact of uneven access patterns over time, but responds to imbalances on a minute-scale timescale — far too slow to prevent throttling from sudden bursts. Adaptive capacity is useful for gradual skew but does not protect against traffic spikes or rapidly shifting hot partitions.

On-demand capacity mode removes the explicit provisioning requirement, allowing DynamoDB to scale partitions automatically. DynamoDB on demand throttling is still possible when traffic exceeds 2x the previous peak throughput in a short window. For workloads with extreme or irregular spikes — seasonal traffic, event-driven architectures — the on-demand throttling window presents a real operational risk.

DynamoDB GSI throttling applies independently to each global secondary index. Each GSI maintains its own partition structure and token bucket allocation. A table with three GSIs has four independent throttling surfaces. High query rates against a specific GSI can produce DynamoDB throttled read events on that index while the base table operates without issue, making per-index throughput monitoring essential in production.

High query rates against a GSI can trigger GSI throttling, which causes a replication lag in the index.  This throttles the base table’s writes until the GSI replication catches up with the incoming write workload.

ScyllaDB vs DynamoDB Throttling: Predictable Throughput Without Capacity Planning 

DynamoDB’s throttling model forces a recurring capacity planning decision: overprovision to absorb peaks (and pay for unused capacity), or provision tightly and accept throttling errors (and pay with engineering time and application complexity). Neither option is free, and for teams operating at scale, both costs compound.

ScyllaDB’s architecture eliminates this decision. Each CPU core owns a shard of the data and processes requests for that shard independently. The concepts of RCUs, WCUs, and token buckets do not apply. Throughput is bounded by hardware (disk I/O, network bandwidth, and CPU capacity) rather than by configurable quotas that can be exhausted at arbitrary traffic levels.

In practice, this means teams running ScyllaDB do not monitor per-partition or per-GSI capacity consumption and do not restructure partition key design to avoid hot partition throttling. This avoids the engineering overhead that DynamoDB throttling demands (retry tuning, jitter configuration, idempotency enforcement on retried writes).

For workloads with highly variable traffic, ScyllaDB’s hardware-based model can be a better fit. DynamoDB on demand throttling can kick in if traffic suddenly spikes beyond recent peaks. ScyllaDB can service requests up to whatever the cluster allows, plus it can add capacity faster than DynamoDB.

Since ScyllaDB is API-compatible with DynamoDB, applications using AWS SDK DynamoDB clients can point at a ScyllaDB cluster with minimal code changes. Teams can test existing workloads against ScyllaDB before committing to migration, making the throughput comparison verifiable rather than theoretical.

How DynamoDB Throttling Exceptions Work 

When DynamoDB throttles a request, it returns a ProvisionedThroughputExceededException to the caller. This is a DynamoDB throttling exception that signals the request was rejected before execution — no data was written or read, and the operation is safe to retry. AWS SDKs implement automatic retry with exponential backoff by default, but the default configuration is rarely appropriate for production workloads at scale.

The core challenge of handling DynamoDB throttled requests in production is avoiding a thundering herd: when many application instances retry simultaneously after being throttled, they collectively saturate the partition’s token bucket again immediately, perpetuating the throttle cycle. Production implementations add jitter to retry intervals. AWS documentation recommends full jitter or decorrelated jitter as the most effective approaches for high-concurrency workloads.

DynamoDB write throttled requests require particular care when designing application logic. Although the

ProvisionedThroughputExceededException confirms rejection and is safe to retry, engineers must still apply the general rule for write operations: audit non-idempotent patterns (like conditional writes or side-effecting updates) and explicitly confirm the SDK’s retry behavior is safe. This practice prevents the risk of duplicate writes when a failure is ambiguous (e.g., due to a network timeout).

How ScyllaDB Handles Throughput Without Throttling 

ScyllaDB uses a preemptive task scheduler built on the Seastar framework, which allocates CPU time across concurrent requests without enforcing any external capacity quota. Each shard processes its own requests in a non-blocking event loop, and the scheduler preempts long-running tasks to maintain responsiveness. ScyllaDB avoids artificial throughput ceilings, but it does use backpressure and load shedding to maintain system stability. Instead of a ‘ProvisionedThroughputExceededException,’ ScyllaDB prioritizes requests using its internal scheduler.

For workloads with mixed read and write traffic, ScyllaDB’s I/O scheduler supports configurable shares per workload class, providing priority-based throughput allocation. Background operations like compaction run at reduced priority and do not directly compete with client-facing requests. Heavy maintenance operations slow down gracefully instead of producing hard errors equivalent to DynamoDB throttling errors.

At hardware saturation, ScyllaDB increases tail latency rather than rejecting requests. For most production workloads, p99 latency increases as a leading indicator, giving teams time to scale before request errors occur. The operational profile is fundamentally different from DynamoDB’s hard-limit throttling model, where requests are rejected without warning when partition capacity is exhausted.

DynamoDB Throttling Limitations Explained  

DynamoDB distributes provisioned capacity evenly across partitions, but production traffic is rarely uniform. A single high-traffic partition key can exhaust its partition’s capacity share and be throttled even when the overall table has available throughput. Even though DynamoDB dynamically reallocates throughput to hot partitions, this operation can still take time to reflect in capacity.Resolving hot partition throttling typically requires restructuring partition keys or implementing write sharding — both of which require application changes and can be time-consuming to deploy safely.

DynamoDB on demand throttling still occurs when requests exceed 2x the table’s previous peak throughput. Workloads with infrequent but large traffic spikes — seasonal applications, event-driven architectures, batch jobs — can reach this threshold unexpectedly, with limited warning and no manual control to raise the ceiling ahead of time.

DynamoDB GSI throttling on a heavily queried index can degrade specific query patterns while the base table operates normally. Tables with multiple GSIs require separate per-index capacity monitoring and tuning, multiplying the operational footprint of capacity management and making root cause diagnosis for throttle incidents non-obvious.

Handling DynamoDB throttling errors correctly at high request rates requires careful configuration of retry counts, delay intervals, jitter algorithms, and idempotency guarantees for retried writes. This complexity scales with the number of tables and write patterns in the system. At organizational scale, ensuring consistent retry correctness across all services and teams becomes a significant engineering governance challenge.

How Much Does DynamoDB Throttling Cost? 

DynamoDB’s cost is based on consumed capacity units. In provisioned capacity mode, you pay for allocated RCUs and WCUs regardless of utilization. Under-provisioning minimizes the capacity bill but introduces DynamoDB throttling errors and the engineering cost of handling them; over-provisioning eliminates throttling but results in paying for capacity that is never consumed.

On-demand mode charges per request at a higher per-unit rate than provisioned capacity. For sustained, high-throughput workloads, it is typically more expensive than provisioned mode with reserved capacity. The DynamoDB throttling limit in on-demand mode is determined by the 2x-previous-peak rule rather than an explicit setting, making it harder to predict and plan around.

Reserved capacity is available for provisioned-mode tables, offering 1- and 3-year commitments at reduced rates. This lowers steady-state cost but creates planning risk: workloads that scale faster than anticipated require purchasing additional capacity at full price or accepting throttling until the next reservation cycle.

ScyllaDB Costs vs DynamoDB Pricing

ScyllaDB Cloud pricing is based on the cluster: you pay for compute and storage of cluster nodes, not for individual read or write operations. For workloads with stable throughput requirements, this model is typically more predictable and more cost-efficient than DynamoDB’s per-unit pricing. There is no equivalent of DynamoDB throttling limit management — capacity is provisioned at the node level and scales linearly with hardware.

To compare pricing on different workload patterns, see the DynamoDB cost calculator.

DynamoDB Cost Calculator

At high throughput (hundreds of thousands of requests per second sustained), the cost difference is significant. DynamoDB costs scale linearly with request volume; ScyllaDB costs scale with cluster size. Teams that have modeled costs at production scale consistently find that ScyllaDB reduces database spend while also eliminating the engineering overhead of throttle management and capacity planning. ScyllaDB’s DynamoDB-compatible API allows cost modeling using existing application code before any migration commitment is made.

Trending NoSQL Resources