fbpx

Scylla Architecture

Scylla is built on a modern NoSQL database architecture designed from the ground up for superior performance and high availability while offering API compatibility with Apache Cassandra and Amazon DynamoDB.

Compatibilities and Differences

Scylla is API compatible with both Apache Cassandra and DynamoDB. Yet it provides significant advantages over these other databases. Flip the cards below to see the similarities and differences.

Compatibility with Cassandra

Scylla supports the same CQL commands and drivers and the same persistent SSTable storage format. It uses the same ring architecture and high availability model you’ve come to expect from Cassandra. Find out more

See key differences

Key Differences from Cassandra

Unlike Apache Cassandra, Scylla was written in C++ using the Seastar framework, providing highly asynchronous operations and avoiding Garbage Collection (GC) stalls prevalent in Java. Scylla offers unique features like Incremental Compaction and Workload Prioritization that extend its capabilities beyond Cassandra. Find out more

See key similarities

Compatibility with DynamoDB

Scylla supports the same JSON-style queries and same drivers. It runs over the same HTTP/HTTPS style connection. It can even simulate DynamoDB’s schemaless model via a clever use of maps. Find out more

See key difference

Key Differences from DynamoDB

With Scylla you are not locked-in to one cloud provider. Scylla can run on any cloud platform or on-premises. And if you want us to run Scylla for you, Scylla Cloud offers a fully managed database as a service (NoSQL DBaaS) solution. You can even run Scylla Cloud on AWS Outposts. Find out more

See key similarities

Scylla is a next-generation distributed NoSQL database, written in C++ to take full advantage of low-level Linux primitives. It can be thought about in several different ways, both its physical and logical architectural layers.

dodecalpod-monster

Project Circe

Scylla’s Project Circe is our 2021 initiative to improve Scylla by adding greater capabilities for consistency, performance, scalability, stability, manageability, and broader and easier deployment.

Architecture Section Quick Navigation

Scylla Server Architecture

Scylla’s cluster architecture and shard-per-core design
Cluster The first layer to be understood is a cluster, a collection of interconnected nodes organized into a virtual ring architecture across which data is distributed. All nodes are considered equal, in a peer-to-peer sense. Without a defined leader the cluster has no single point of failure. Clusters need not reside in a single location. Data can be distributed across a geographically dispersed cluster using multi-datacenter replication.
Node These can be individual on-premises servers, or virtual servers (public cloud instances) comprised of a subset of hardware on a larger physical server. The data of the entire cluster is distributed as evenly as possible across those nodes. Further, Scylla uses logical units, known as Virtual Nodes (vNodes), to better distribute data for more even performance. A cluster can store multiple copies of the same data on different nodes for reliability.
Shard Scylla divides data even further, creating shards by assigning a fragment of the total data in a node to a specific CPU, along with its associated memory (RAM) and persistent storage (such as NVMe SSD). Shards operate as mostly independently operating units, known as a “shared nothing” design. This greatly reduces contention and the need for expensive processing locks.

Data Architecture

Scylla’s data model has led to it being called a “wide column” database, though we sometimes refer to it as a “key-key-value” database to reflect the partitioning and clustering keys.

This is an example of a few partitions in Scylla. Notice how each partition may have different numbers of rows.
Keyspace The top level container for data defines the replication strategy and replication factor (RF) for the data held within Scylla. For example, users may wish to store two, three or even more copies of the same data to ensure that if one or more nodes are lost their data will remain safe.
Table Within a keyspace data is stored in separate tables. Tables are two-dimensional data structures comprised of columns and rows. Unlike SQL RDBMS systems, tables in Scylla are standalone; you cannot make a JOIN across tables.
Partition Tables in Scylla can be quite large, often measured in terabytes. Therefore tables are divided into smaller chunks, called partitions, to be distributed as evenly as possible across shards.
Rows Each partition contains one or more rows of data sorted in a particular order. Not every column appears in each row. This makes Scylla more efficient to store what is known as “sparse data.”
Columns Data in a table row will be divided into columns. The specific row-and-column entry will be referred to as a cell. Some columns will be used to define how data will be indexed and sorted, known as partition and clustering keys.
Scylla includes methods to find particularly large partitions and large rows that can cause performance problems. Scylla also caches the most frequently used rows in a memory-based cache, to avoid expensive disk-based lookups on so-called “hot partitions.”

Learn best practices for data modeling in Scylla.

Ring Architecture

An example of the vNode distribution for a keyspace with a replication factor (RF) of 2. Thus each vNode is replicated two times, and cannot be stored on the same physical node.
Ring All data in Scylla can be visualized as a ring of token ranges, with each partition mapped to a single hashed token (though, conversely, a token can be associated with one or more partitions). These tokens are used to distribute data around the cluster, balancing it as evenly as possible across nodes and shards.
vNode The ring is broken into vNodes (Virtual Nodes) comprising a range of tokens assigned to a physical node or shard. These vNodes are duplicated a number of times across the physical nodes based on the replication factor (RF) set for the keyspace.

Storage Architecture

Scylla’s write path puts data in an in-memory structure called a memtable and also durably writes the update to a persistent commitlog. In time, the contents of the memtable are flushed to a new, immutable persistent file on disk, known as a Sorted Strings Table (SSTable).
Memtable In Scylla’s write path data is first put into a memtable, stored in RAM. In time this data is flushed to disk for persistence.
Commitlog An append-only log of local node operations, written to simultaneously as data is sent to a memtable. This provides persistence (data durability) in case of a node shutdown; when a server restarts, the commitlog can be used to restore a memtable.
SSTables Data is stored persistently in Scylla on a per-shard basis using immutable (unchangeable, read-only) files called Sorted Strings Tables (SSTables), which use a Log-Structured Merge (LSM) tree format. Once data is flushed from a memtable to an SSTable, the memtable (and the associated commitlog segment) can be deleted. Updates to records are not written to the original SSTable, but recorded in a new SSTable. Scylla has mechanisms to know which version of a specific record is the latest.
Tombstones When a row is deleted from SSTables, Scylla puts a marker called a tombstone into a new SSTable. This serves a reminder for the database to ignore the original data as being deleted.
Compactions After a number of SSTables are written to disk Scylla knows to run a compaction, a process that stores only the latest copy of a record, and deleting any records marked with tombstones. Once the new compacted SSTable is written, the old, obsolete SSTables are deleted and space is freed up on disk.
Compaction Strategy Scylla uses different algorithms, called strategies, to determine when and how best to run compactions. The strategy determines trade-offs between write, read and space amplification. Scylla Enterprise even supports a unique method called Incremental Compaction Strategy which can significantly save on disk overhead.

Client-Server Architecture

Drivers Scylla communicates with applications via drivers for its Cassandra-compatible CQL interface, or with SDKs for its DynamoDB API. You can use multiple drivers or clients connecting to Scylla at the same time to allow for greater overall throughput. Drivers written specifically for Scylla are shard-aware, thus allowing for direct communication with nodes wherein data is stored, thus providing faster access and better performance. You can see a complete list of drivers available for Scylla here.
Scylla University Mascot

Scylla University

Get started on the path to Scylla expertise

Scylla Cloud Mascot

Scylla Cloud

It’s easy to get started with our NoSQL DBaaS