Discover the latest trends and best practices impacting data-intensive applications. Register for access to all 50+ sessions available on demand.
Streaming, the process of scaling out/in to other nodes used to analyze every partition, one-by-one and was too slow and depended on the schema. File based stream is a new feature that optimizes tablet movement significantly. It streams the entire SSTable files without deserializing SSTable files into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network, and less CPU is consumed, especially for data models that contain small cells.
Asias He is a long-time open source developer who previously worked on Debian Project, Solaris Kernel, KVM Virtualization for Linux and OSv unikernel. He now works on Seastar and ScyllaDB.
Summary: Asias He explains ScyllaDB’s internal data streaming, used for adding or removing nodes and migrating tablets. Traditional mutation‑based streaming reads mutations from multiple SSTables, serializes them, and writes them back on the receiving node. A new file‑based approach now streams whole SSTables directly because each table maps to a single tablet. This eliminates parsing, cuts CPU, lowers network bytes, and makes streaming up to 25 × faster and 10 × more bandwidth‑efficient.
Topics discussed
Takeaways
Top takeaway: File‑based streaming lets ScyllaDB move entire SSTables between nodes, cutting CPU work and network bytes and slashing streaming time by up to 25×.