CAP Theorem FAQs
What is CAP Theorem?
Understanding the CAP theorem is simpler when you consider it piece by piece:
What is CAP theorem in distributed systems? A distributed network system stores data across multiple nodes—virtual or physical machines—simultaneously. It’s essential to understand the CAP theorem when designing a cloud application because all cloud apps are distributed systems.
What is CAP theorem consistency? Consistency of CAP theorem means that regardless of the node they connect to, all clients see the same data at once. For the write to one node to succeed, it must also instantly be replicated or forwarded to all the other nodes in the system.
CAP theorem eventual consistency. Some NoSQL databases (for example, ScyllaDB) use a model of tunable eventual consistency to deliver multi datacenter high availability and fast and efficient write and read operations. In this case, all nodes are equal; this means any node can serve any request, there is no single point of coordination, and all nodes in the system continue to cooperatively provide service, even when nodes become unavailable. Eventual consistency supports modern workloads that are less dependent on strong consistency but rely heavily on availability.
What is availability in CAP theorem? CAP theorem availability means that even if one or more nodes are down, any client making a data request receives a response. In other words, when any request is made, without exception, all working nodes in a distributed system return a valid response.
Partition tolerance in CAP theorem explained. In a distributed system, a partition is a break in communications—a temporarily delayed or lost connection between nodes. Partition tolerance means that in spite of any number of breakdowns in communication between nodes in the system, the cluster will continue to work.
What is CAP theorem in NoSQL? NoSQL databases and CAP theorem are closely linked. NoSQL databases are categorized based on which CAP characteristics they support: CP, AP, or CA.
A CP database offers consistency and partition tolerance but sacrifices availability. The practical result is that when a partition occurs, the system must make the inconsistent node unavailable until it can resolve the partition. MongoDB and Redis are examples of CP databases.
An AP database provides availability and partition tolerance but not consistency in the event of a failure. All nodes remain available when a partition occurs, but some might return an older version of the data. CouchDB, Cassandra, and ScyllaDB are examples of AP databases.
A CA database delivers consistency and availability, but it can’t deliver fault tolerance if any two nodes in the system have a partition between them. Clearly, this is where CAP theorem and NoSQL databases collide: there are no NoSQL databases you would classify as CA under the CAP theorem. In a distributed database, there is no way to avoid system partitions. So, although CAP theorem stating a CA distributed database is possible exists, there is currently no true CA distributed database system. The modern goal of CAP theorem analysis should be for system designers to generate optimal combinations of consistency and availability for particular applications.
CAP Theorem vs ACID
Although the “C” in both ACID and CAP theorem refer to consistency, consistency in CAP is different than in ACID. In CAP, consistency means having information that is the most up-to-date. Consistency in ACID refers to the hardness of the database that protects it from corruption despite the addition of new transactions and references different database events.
Database systems such as RDBMS that are designed in part based on traditional ACID guarantees choose consistency over availability. In contrast, systems common in the NoSQL movement designed around the BASE philosophy select availability over consistency.
CAP Theorem NoSQL Database Examples
To better comprehend the role of CAP theorem in database theory, consider some CAP theorem databases examples.
MongoDB is a popular NoSQL database management system that is used for big data applications running across multiple locations. MongoDB resolves network partitions by maintaining consistency, sacrificing availability as necessary in the event of failure. One primary node receives all write operations in MongoDB. The system needs to elect a new primary node if the existing one becomes unavailable, and while it does, clients can’t make any write requests so data remains consistent.
In contrast to MongoDB, Apache Cassandra is an open source NoSQL database with a peer-to-peer architecture and potentially multiple points of failure. CAP theorem in Cassandra reveals an AP database: Cassandra offers availability and partition tolerance but can’t provide consistency all the time. However, by reconciling inconsistencies as quickly as possible and allowing clients to write to any nodes at any time, Cassandra provides eventual consistency. Inconsistencies are resolved quickly in most cases of network partitions, so the constant availability and high performance are often worth the Cassandra CAP theorem trade-off.
CAP Theorem in NoSQL Databases vs Distributed SQL Databases
NoSQL databases are generally considered to be AP systems, providing Availability and Partition tolerance at the expense of Consistency. In contrast, distributed SQL (NewSQL) databases provide Consistency, Availability and Partition tolerance. (According to Eric Brewer’s painstaking analysis, Google Spanner is technically a CP system that can claim to be an “effectively CA” system. Such nuances, while important, are beyond the scope of this glossary entry.)
The fusion of strong consistency with distributed architecture is undeniably attractive. The question is whether distributed SQL can deliver on this promise without compromising in other critical areas – primarily performance. Notably, the traditional CAP theorem makes no provision for performance or latency. For example, according to the CAP theorem, a database can be considered Available if a query returns a response after 30 days. Obviously, such latency would be unacceptable for any real-world application.
PACELC vs CAP Theorem
A newer way to model databases, the PACELC theorem, builds on the advantages of CAP theorem analysis and extends it, accounting for the nuances of modern systems. According to PACELC, Like the CAP theorem, the PACELC theorem states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C). PACELC extends the CAP theorem by introducing latency and consistency as additional attributes of distributed systems. The theorem states that, “else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).”
In other words, PACELC reveals that systems tend either towards strong consistency or latency sensitivity. Even in the absence of partitioning, a trade-off between consistency and latency exists.
Traditional CAP theorem provides for neither latency nor performance. For example, the CAP theorem considers a database available if a query returns a response after 30 days—a level of latency that users of any real-world application would find unacceptable.
Based on PACELC, a distributed SQL database that focuses on being a strongly consistent system such as CockroachDB is categorized as PC/EC. They will not sacrifice consistency for any reason, and performance may suffer as a result.
In contrast, a latency sensitive, highly available NoSQL database system like ScyllaDB is PA/EL. These systems may sacrifice consistency for availability should a partition occur.
Cloud-native applications with more modern, distributed architectures often demand predictable low latency and high availability. Strong consistency requirements are less common.
Does ScyllaDB Offer Solutions for CAP Theorem?
ScyllaDBDB is a PA/EL highly available, partition tolerant, low latency system. ScyllaDB was designed to provide consistent low-latencies, not just be highly available, and it also provides tunable consistency. Under any conditions short of a complete system failure, ScyllaDB will remain highly available with predictable low latencies for mission critical applications.
ScyllaDB outperforms a distributed NewSQL database such as CockroachDB by a wide margin. Built for high availability, ScyllaDB still delivers tunable consistency, zero downtime, and high performance. Learn more.