How We Made Large Partition Scans Over Two Times Faster

Botond Denes, Software Engineer, ScyllaDB20:53

Fetching large amount of data in a single query is a longstanding pain for applications. Queries that return a significant amount of data have to be paged, in other words, split into multiple subqueries that return data little by little. In both ScyllaDB and Apache Cassandra, paging is stateless: each subquery is independent of each other and can even be sent to different replicas. Because of that, all the work done in the previous subqueries will not be reused causing a reduction from the maximum expected throughput. In this talk we are going to examine the problems with the previous stateless paging implementation and introduce the new stateful paging implementations that brings vast improvements in the throughput of large partition scans.

Share this

Video Slides