Memory management

Lock-free memory management for multicore systems

Memory management is a time-consuming task for NoSQL data stores, both at the software and at the administrator levels. Typical NoSQL data stores must contend with memory management at the JVM level and the kernel level, resulting in lock-heavy code paths that fail to take advantage of multicore hardware.

Scylla memory management

The Scylla row cache

The Scylla row cache, unlike the original Cassandra cache, is designed to reconcile data in cache with incoming writes. Cassandra’s row cache invalidates the whole partition for a given table on write, but Scylla’s does not. The result is that Scylla can run mixed read/write workloads efficiently. This reduces the need for data model complexity that is only present in order to work around the Cassandra read-before-write problem. Reducing data model complexity can have the indirect result of saving storage bandwidth as well.

Cassandra’s row cache caches only the head of a partition, where the number of rows cached is configurable. The Scylla cache is designed to enable caching of random rows from a partition. A near-future Scylla release will evict data from the row cache upon memory pressure gradually, starting from least recently used data.

Because the Cassandra row cache is ineffective for many workloads, some users introduce the additional complexity of running with the row cache disabled, and rely on the operating system page cache for serving reads. Scylla does not depend on the system page cache for caching on-disk data, but will rather dedicate all that memory to the application instead. Emphasizing the Scylla row cache over the OS page cache has several key advantages.

  • The OS page cache must store data the on-disk format–sstables. The sstable format consumes more memory, as cell names are repeated in each cell. The Scylla native in-memory format is denser and makes better use of memory.

  • Scylla does not need to parse cached data from sstable format to in-memory format before serving it, because the Scylla row cache already holds data in the needed format.

  • When reading involves multiple sstables, the system page cache will become polluted with stale data. The Scylla row cache holds only reconciled data, which also makes better use of memory and lowers latency on cache hits.

  • When the requested data is significantly shorter than a page (4096 bytes), the OS page cache will waste memory on caching random pieces of data located next to the required data, for no reason.

Cassandra compaction thrashes the page cache, because it reads and writes everything, and after compaction the most frequently used data is likely to no longer be in the cache. Cassandra has some workarounds for this problem, but the row cache is the most direct solution: compaction simply doesn’t touch the row cache, which remains populated with relevant data.