Extreme Elasticity with Tablets, Raft and Kubernetes

Extreme scale engineering

Discover the latest trends and best practices impacting data-intensive applications. Register for access to all 50+ sessions available on demand.

Extreme Elasticity with Tablets, Raft and Kubernetes

Maciej Zimnoch10 minutes

View Presentation

In this Monster Scale Summit Presentation

Recent ScyllaDB versions have improved elasticity using Tablets and Raft-based Consistent Topology Changes, allowing for fast bootstrapping and parallel scaling. A demo presents doubling cluster size and autoscaling after crossing 90% disk utilization.

Maciej Zimnoch, Senior Software Engineer, ScyllaDB

Maciej Zimnoch is a core developer of ScyllaDB Operator – a Kubernetes Operator for ScyllaDB.

Video Transcript

Maciej Zimnoch shows how tablets, Raft, and the Scylla Operator give ScyllaDB elasticity on Kubernetes. Earlier, scaling requires one node at a time, gossip coordination, and full data streaming, taking hours or days. Raft now reaches consensus in seconds, and nodes start serving before the background stream finishes. A demo doubles cluster size in about a minute and rebalances within three. A second demo auto‑scales when disk hits 90 %, holding utilization high while cutting waste.

Topics discussed

What single‑node bootstrap with Gossip and full streaming looks like in pre‑tablet ScyllaDB clusters
How tablets, Raft, and consistent topology changes let nodes join in seconds and scale in parallel
How the Scylla Operator on Kubernetes doubles cluster size in one minute and finishes load balancing in three
When a Horizontal Pod Autoscaler adds one node per rack as soon as disk exceeds 90 % utilization
Why tablets allow safe 90 % SSD usage, lowering hardware costs while keeping clusters responsive

Takeaways

Tablets and Raft replace gossip and per‑node streaming, so new nodes reach quorum in seconds and start coordinating traffic even while data backfills, slashing bootstrap overhead.
A Kubernetes‑managed cluster scales horizontally in parallel; the demo adds three nodes, doubles capacity in ~60 s, and rebalances traffic by the three‑minute mark with only minor latency upticks.
Disk‑aware autoscaling uses tablets’ 90 % utilization headroom plus HPA to add capacity without manual action, cutting scaling lag from hours to minutes and avoiding under‑used SSDs.
Combining rapid node readiness with background file‑based streaming keeps the cluster online during expansions, making maintenance windows shorter and protecting P95/P99 latency.

Top takeaway: Tablets and Raft let ScyllaDB clusters on Kubernetes double capacity in under three minutes while keeping latency in check.