Close-to-the-metal architecture handles millions of OPS with predictable single-digit millisecond latencies.
Learn MoreP99 CONF is the event on all things performance. Join us online Oct 23-24 — Registration is free
Close-to-the-metal architecture handles millions of OPS with predictable single-digit millisecond latencies.
Learn MoreScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
Learn MoreLevel up your skills with our free NoSQL database courses.
Take a CourseOur blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
Read MoreScyllaDB has a symmetric, peer-to-peer architecture. There are no leaders or followers traditionally found in legacy NoSQL and SQL architectures (primaries with replicas), and the nodes remain identical within a zone, region, or globally, without additional external components to support replication. Until ScyllaDB 5.4 and 2024.1, ScyllaDB used the gossip protocol to discover peer nodes, establish the cluster, and communicate health and node status.
However, the Raft consensus algorithm has replaced the gossip protocol for topology changes, such as adding or removing a node or, in case of an unexpected node outage, providing resiliency, strong consistency, and parallel topology changes to a ScyllaDB cluster. Raft also replaces gossip protocol for Data Definition Language (DDL) schema changes.
Raft provides distributed, strongly consistent, replicated logs of all node states across all nodes without a performance penalty. It implements consensus by electing a distinguished leader and then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply them to their state machines.
ScyllaDB allows users to set a replication factor (RF), meaning that multiple copies of the same data can be stored on multiple nodes across the cluster. In this way, even if a node is lost, the data still resides somewhere in the cluster.
Setting a replication factor of three (3) is sufficient for many high-availability use cases. In such cases, even if two of the three copies of the data are unavailable, the data will reside somewhere in the cluster.
With a properly set replication factor, zero downtime is achievable. Users can determine their own replication factor based on their use case. There are times when a replication factor of two may be sufficient and times when a replication factor of five or more may be called for. ScyllaDB automatically replicates the data in the background. You just set the replication factor, and the cluster handles the rest.
Consistency in ScyllaDB is tunable — users can allow their transactions to have different degrees of consistency. Here are a few examples:
means successfully writing an update to any node is sufficient (the system will eventually replicate it to other nodes)
means a majority of replicas (based on the replication factor) need to acknowledge an update
means all replicas must acknowledge an update
Nodes, racks, and even whole data centers can fail, but your applications cannot. They must remain “always on.” That’s the goal for high-availability database systems. ScyllaDB achieves zero downtime through a few mechanisms, including rack and data center awareness and multi-datacenter (multi-zone and multi-region in public clouds) replication.
A ScyllaDB cluster can span data centers scattered across any geographic space (global replication). Data in ScyllaDB is automatically synchronized across data centers in a tuneably consistent manner without requiring users to create any sort of streaming or batch processing to ensure the clusters communicate changes. No add-on, costly components such as load balancers or external caches are required.
Rack and Datacenter Awareness
ScyllaDB is topology-aware. It uses snitches to know which rack and data center a node belongs to. These allow you to spread your data across nodes in different racks in a data center or across different data centers, availability zones, and regions in public clouds. That way, your data is still available if a rack or even a whole data center goes out.
Multi-Datacenter Replication
ScyllaDB clusters that span different data centers can employ the NetworkTopologyStrategy and set different replication factors for each data center. For instance, the primary data center may have an RF of 3 and a separate satellite data center may be set to an RF of 2. This allows you to determine the resiliency of your data per site.
ScyllaDB is designed to operate even in the case of temporary node unavailability (when it eventually rejoins the cluster) or a node failure (when it has to be replaced). But when those situations occur, the system has to battle against entropy and bring the cluster back to full operation. The following processes and features are designed to mitigate that.
Hinted Handoffs | In the case of a temporary node outage (less than three hours), ScyllaDB uses a feature called Hinted Handoffs to keep track of what transactions occurred while the node was unavailable. When the node returns to service, the Hinted Handoffs allow the node to catch up on what transpired while it was offline. (You can think of it like a classmate who takes notes for you in case you miss a class or two.) |
Row-level Repair | In case you had a more serious loss of availability of a node, ScyllaDB has a background repair process that allows you to get a new node up to speed. This process can be managed with a command-line interface, called nodetool repair, or from within ScyllaDB Manager, which can also restore data from backups. |
Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.