Virtual Workshops
Twice-monthly interactive sessions with our NoSQL solution architects.
Join Our Next Session >

Lock-free Memory Management for Multi-core Database Systems

Memory management is a time-consuming task for NoSQL datastores, at both the software and administrator levels. Typical NoSQL datastores must contend with memory management at the JVM level and the kernel level, resulting in lock-heavy code paths that fail to take advantage of multi-core hardware.

How Scylla outperforms the Cassandra row cache

The Scylla row cache, unlike the original Apache Cassandra cache, is designed to reconcile data in cache with incoming writes. Apache Cassandra row cache invalidates the whole partition for a given table on write, but Scylla’s does not. The result is that Scylla can run mixed read/write workloads efficiently. This reduces the need for complex data models that are present only to work around the Apache Cassandra read-before-write problem. Reducing data model complexity can have the indirect result of saving storage bandwidth as well.

Apache Cassandra row cache caches only the head of a partition, where the number of rows cached is configurable. The Scylla cache is designed to enable caching of random rows from a partition. A near-future Scylla release will evict data from the row cache upon memory pressure gradually, starting from least recently used data.

Because Cassandra row cache is ineffective for many workloads, some users introduce the additional complexity of running with the row cache disabled, and rely on the operating system page cache for serving reads. Scylla does not depend on the system page cache for caching on-disk data, but will rather dedicate all that memory to the application instead. Emphasizing the Scylla row cache over the OS page cache has several key advantages:

  • The OS page cache must store data the on-disk format–sstables. The sstable format consumes more memory, as cell names are repeated in each cell. The Scylla native in-memory format is denser and makes better use of memory.
  • Scylla does not need to parse cached data from sstable format to in-memory format before serving it, because the Scylla row cache already holds data in the needed format.
  • When reading involves multiple sstables, the system page cache will become polluted with stale data. The Scylla row cache holds only reconciled data, which also makes better use of database memory and lowers latency on cache hits.
  • When the requested data is significantly shorter than a page (4096 bytes), the OS page cache will waste memory on caching random pieces of data located next to the required data for no reason.
  • Apache Cassandra compaction thrashes the page cache, because it reads and writes everything, and after compaction the most frequently used data is likely to no longer be in the cache. Apache Cassandra has some workarounds for this problem, but the row cache is the most direct solution: compaction simply doesn’t touch the row cache, which remains populated with relevant data.

Let’s do this

Getting started takes only a few minutes. Scylla has an installer for every major platform. If you get stuck, we’re here to help.