When Scylla starts up, nodes use the gossip protocol to discover peer nodes to establish the cluster. There are no leaders nor followers, the underlying architecture is leaderless. No primaries and no replicas. In fact, with Scylla, we have even removed the concept of seed nodes found in other gossip implementations. It’s completely peer-to-peer. This gossip mechanism is also used in case of topology changes, such as adding or removing a node, or in case of an unexpected node outage, providing strong resiliency to a Scylla cluster.
Scylla allows users to set a replication factor (RF), meaning that multiple copies of the same data can be stored on multiple nodes across the cluster. In this way, even if a node is lost, the data still resides somewhere in the cluster.
For many high availability use cases, setting a replication factor of three (3) is sufficient. In such cases, even if two out of the three copies of the data are unavailable, the data will reside somewhere live in the cluster.
With a properly set replication factor, zero downtime is achievable. Users can determine their own replication factor, based on their use case. There are times when a replication factor of two may be sufficient, and times when a replication factor of five may be called for. Scylla automatically takes care of replicating the data in the background. You just set the replication factor, and the cluster handles the rest.
The CAP Theorem is based on the hypothesis that systems could choose to offer consistency, availability or partition tolerance, and that database designers would have to choose two of those three characteristics. Any distributed database requires partition tolerance — the ability to continue operating even if part of the system becomes off-line due to network or server failure. Thus, two prevalent modes of databases are common today:
Consistency in Scylla is tunable — users can allow their transactions to have different degrees of consistency. Here are a few examples:
Nodes can fail. Racks can fail. Even whole datacenters. Yet your applications cannot. They remain always-on. That’s the goal for high availability database systems. The way Scylla achieves zero downtime is through a few mechanisms, including rack and datacenter awareness, as well as multi-datacenter replication.
A Scylla cluster can span datacenters scattered across any geographic space. Data in Scylla is automatically synchronized across datacenters in an eventually consistent manner without requiring users to create any sort of streaming or batch processing to ensure the clusters communicate changes.
Scylla is topology aware. It uses snitches to know which rack and which datacenter a node belongs to. These allow you to spread your data across nodes in different racks in a datacenter, or across different datacenters, availability zones and regions in public clouds. That way, if a rack goes out, or even a whole datacenter, your data is still available.
Scylla clusters that span different datacenters can employ the NetworkTopologyStrategy and set different replication factors for each datacenter. For instance, the primary datacenter may have a RF of 3, and a separate satellite datacenter may be set to an RF of 2. This allows you to determine the resiliency of your data per site.
Scylla is designed to operate even in the case of temporary node unavailability (when it eventually rejoins the cluster) or a node failure (when it has to be replaced). But when those situations occur, the system has to battle against entropy and bring the cluster back to full operation. The following processes and features are designed to mitigate that
|Hinted Handoffs||In the case of a temporary node outage (less than three hours), Scylla uses a feature called Hinted Handoffs to keep track of what transactions occurred while the node was unavailable. When the node returns to service, the Hinted Handoffs allow the node to catch up on what transpired while it was offline. (You can think of it like a classmate who takes notes for you in case you miss a class or two.)|
|Row-level Repair||In case you had a more serious loss of availability of a node, Scylla has a background repair process that allows you to get a new node up to speed. This process can be managed with a command-line interface, called nodetool repair, or from within Scylla Manager, which can also restore data from backups.|
Get started on the path to Scylla expertise
It’s easy to get started with our NoSQL DBaaS