fbpx

Scylla University LIVE Summer School |  July 28 & July 29

Scylla U Sticky Banner icon
Pick Your Timezone >

Scylla: A NoSQL Database Built for High Availability

Scylla at its core is designed to be highly available, meaning that short of a complete systemic outage the database should remain up and available for your mission critical applications.

Peer-to-Peer Architecture

When Scylla starts up, nodes use the gossip protocol to discover peer nodes to establish the cluster. There are no leaders nor followers, the underlying architecture is leaderless. No primaries and no replicas. In fact, with Scylla, we have even removed the concept of seed nodes found in other gossip implementations. It’s completely peer-to-peer. This gossip mechanism is also used in case of topology changes, such as adding or removing a node, or in case of an unexpected node outage, providing strong resiliency to a Scylla cluster.

In gossip peers operate in parallel, each peer communicating with one or more randomly-selected partners to establish the cluster

Automatic Data Replication

Scylla allows users to set a replication factor (RF), meaning that multiple copies of the same data can be stored on multiple nodes across the cluster. In this way, even if a node is lost, the data still resides somewhere in the cluster.

In this write operation, an update to partition 1, the data is passed along from the coordinator node, V, to the three nodes where the data resides: W, X and Z.

For many high availability use cases, setting a replication factor of three (3) is sufficient. In such cases, even if two out of the three copies of the data are unavailable, the data will reside somewhere live in the cluster.

With a properly set replication factor, zero downtime is achievable. Users can determine their own replication factor, based on their use case. There are times when a replication factor of two may be sufficient, and times when a replication factor of five may be called for. Scylla automatically takes care of replicating the data in the background. You just set the replication factor, and the cluster handles the rest.

The CAP theorem states distributed databases can only simultaneously maintain two of three critical properties: consistency, availability or partition tolerance. Scylla focuses on high availability ("A") and partition tolerance ("P"), so is referred to as an "AP"-mode system.

Scylla and the CAP Theorem

The CAP Theorem is based on the hypothesis that systems could choose to offer consistency, availability or partition tolerance, and that database designers would have to choose two of those three characteristics. Any distributed database requires partition tolerance — the ability to continue operating even if part of the system becomes off-line due to network or server failure. Thus, two prevalent modes of databases are common today:

CP
mode systems

which favor consistency and partition tolerance

– or –

AP
mode systems

which favor availability and partition tolerance

Scylla is the latter, focusing on high availability. Learn more about Scylla’s Fault Tolerant architecture

Tunable Consistency

Consistency in Scylla is tunable — users can allow their transactions to have different degrees of consistency. Here are a few examples:

Consistency Level
ONE

means successfully writing an update to any node is sufficient (the system will eventually replicate it to other nodes)

QUORUM
consistency

means a majority of replicas (based on the replication factor) need to acknowledge an update

Consistency Level
ALL

means all replicas must acknowledge an update

Learn more about Scylla’s Tunable Consistency in Scylla University.

Achieving Zero Downtime

Nodes can fail. Racks can fail. Even whole datacenters. Yet your applications cannot. They remain always-on. That’s the goal for high availability database systems. The way Scylla achieves zero downtime is through a few mechanisms, including rack and datacenter awareness, as well as multi-datacenter replication.

A Scylla cluster can span datacenters scattered across any geographic space. Data in Scylla is automatically synchronized across datacenters in an eventually consistent manner without requiring users to create any sort of streaming or batch processing to ensure the clusters communicate changes.

Rack and Datacenter Awareness

Scylla is topology aware. It uses snitches to know which rack and which datacenter a node belongs to. These allow you to spread your data across nodes in different racks in a datacenter, or across different datacenters, availability zones and regions in public clouds. That way, if a rack goes out, or even a whole datacenter, your data is still available.

Multi-Datacenter Replication

Scylla clusters that span different datacenters can employ the NetworkTopologyStrategy and set different replication factors for each datacenter. For instance, the primary datacenter may have a RF of 3, and a separate satellite datacenter may be set to an RF of 2. This allows you to determine the resiliency of your data per site.

Anti-Entropy

Scylla is designed to operate even in the case of temporary node unavailability (when it eventually rejoins the cluster) or a node failure (when it has to be replaced). But when those situations occur, the system has to battle against entropy and bring the cluster back to full operation. The following processes and features are designed to mitigate that

Hinted Handoffs In the case of a temporary node outage (less than three hours), Scylla uses a feature called Hinted Handoffs to keep track of what transactions occurred while the node was unavailable. When the node returns to service, the Hinted Handoffs allow the node to catch up on what transpired while it was offline. (You can think of it like a classmate who takes notes for you in case you miss a class or two.)
Row-level Repair In case you had a more serious loss of availability of a node, Scylla has a background repair process that allows you to get a new node up to speed. This process can be managed with a command-line interface, called nodetool repair, or from within Scylla Manager, which can also restore data from backups.
Nodetool repair is an anti-entropy utility that runs in the background and synchronizes data between nodes.
Scylla University Mascot

Scylla University

Get started on the path to Scylla expertise

Scylla Cloud Mascot

Scylla Cloud

It’s easy to get started with our NoSQL DBaaS