ScyllaDB is the NoSQL Database Built for High Availability

icon-mono-high-availability

ScyllaDB at its core is designed to be highly available, meaning that short of a complete systemic outage the database should remain up and available for your mission critical applications.

Peer-to-Peer Architecture

When ScyllaDB starts up, nodes use the gossip protocol to discover peer nodes to establish the cluster. There are no leaders nor followers, the underlying architecture is leaderless. No primaries and no replicas. In fact, with ScyllaDB, we have even removed the concept of seed nodes found in other gossip implementations. It’s completely peer-to-peer. This gossip mechanism is also used in case of topology changes, such as adding or removing a node, or in case of an unexpected node outage, providing strong resiliency to a ScyllaDB cluster.

In gossip peers operate in parallel, each peer communicating with one or more randomly-selected partners to establish the cluster

Automatic Data Replication

ScyllaDB allows users to set a replication factor (RF), meaning that multiple copies of the same data can be stored on multiple nodes across the cluster. In this way, even if a node is lost, the data still resides somewhere in the cluster.

In this write operation, an update to partition 1, the data is passed along from the coordinator node, V, to the three nodes where the data resides: W, X and Z.

For many high availability use cases, setting a replication factor of three (3) is sufficient. In such cases, even if two out of the three copies of the data are unavailable, the data will reside somewhere live in the cluster.

With a properly set replication factor, zero downtime is achievable. Users can determine their own replication factor, based on their use case. There are times when a replication factor of two may be sufficient, and times when a replication factor of five may be called for. ScyllaDB automatically takes care of replicating the data in the background. You just set the replication factor, and the cluster handles the rest.

The CAP theorem states distributed databases can only simultaneously maintain two of three critical properties: consistency, availability or partition tolerance. ScyllaDB focuses on high availability ("A") and partition tolerance ("P"), so is referred to as an "AP"-mode system.

ScyllaDB and the CAP Theorem

The CAP Theorem is based on the hypothesis that systems could choose to offer consistency, availability or partition tolerance, and that database designers would have to choose two of those three characteristics. Any distributed database requires partition tolerance — the ability to continue operating even if part of the system becomes off-line due to network or server failure. Thus, two prevalent modes of databases are common today:

CP
mode systems

which favor consistency and partition tolerance

– or –

AP
mode systems

which favor availability and partition tolerance

ScyllaDB is the latter, focusing on high availability. Learn more about ScyllaDB’s Fault Tolerant architecture

Tunable Consistency

Consistency in ScyllaDB is tunable — users can allow their transactions to have different degrees of consistency. Here are a few examples:

Consistency Level
ONE

means successfully writing an update to any node is sufficient (the system will eventually replicate it to other nodes)

QUORUM
consistency

means a majority of replicas (based on the replication factor) need to acknowledge an update

Consistency Level
ALL

means all replicas must acknowledge an update

Learn more about ScyllaDB’s Tunable Consistency in ScyllaDB University.

Achieving Zero Downtime

Nodes can fail. Racks can fail. Even whole datacenters. Yet your applications cannot. They remain always-on. That’s the goal for high availability database systems. The way ScyllaDB achieves zero downtime is through a few mechanisms, including rack and datacenter awareness, as well as multi-datacenter replication.

A ScyllaDB cluster can span datacenters scattered across any geographic space. Data in ScyllaDB is automatically synchronized across datacenters in an eventually consistent manner without requiring users to create any sort of streaming or batch processing to ensure the clusters communicate changes.

Rack and Datacenter Awareness

ScyllaDB is topology aware. It uses snitches to know which rack and which datacenter a node belongs to. These allow you to spread your data across nodes in different racks in a datacenter, or across different datacenters, availability zones and regions in public clouds. That way, if a rack goes out, or even a whole datacenter, your data is still available.

Multi-Datacenter Replication

ScyllaDB clusters that span different datacenters can employ the NetworkTopologyStrategy and set different replication factors for each datacenter. For instance, the primary datacenter may have a RF of 3, and a separate satellite datacenter may be set to an RF of 2. This allows you to determine the resiliency of your data per site.

Anti-Entropy

ScyllaDB is designed to operate even in the case of temporary node unavailability (when it eventually rejoins the cluster) or a node failure (when it has to be replaced). But when those situations occur, the system has to battle against entropy and bring the cluster back to full operation. The following processes and features are designed to mitigate that

Hinted Handoffs In the case of a temporary node outage (less than three hours), ScyllaDB uses a feature called Hinted Handoffs to keep track of what transactions occurred while the node was unavailable. When the node returns to service, the Hinted Handoffs allow the node to catch up on what transpired while it was offline. (You can think of it like a classmate who takes notes for you in case you miss a class or two.)
Row-level Repair In case you had a more serious loss of availability of a node, ScyllaDB has a background repair process that allows you to get a new node up to speed. This process can be managed with a command-line interface, called nodetool repair, or from within ScyllaDB Manager, which can also restore data from backups.
Nodetool repair is an anti-entropy utility that runs in the background and synchronizes data between nodes.
ScyllaDB University Mascot

ScyllaDB University

Get started on the path to ScyllaDB expertise

ScyllaDB Cloud Mascot

ScyllaDB Cloud

It’s easy to get started with our NoSQL DBaaS