Distributed Database

Distributed Database Definition

A distributed database is a single logical database that is distributed across multiple physical computers or locations.

Diagram showing a basic Amazon DynamoDB architecture for delivering a online gaming experience DynamoDB Cap Theorem Venn Diagram

Distributed Database FAQs

What is a Distributed Database?

A distributed database (DB) is a single logical database that is distributed across multiple physical databases, servers, data centers, or even separate networks. Distributed database management systems are more resilient, provide lower latency, and protect data more effectively. Modern database systems of all types have moved away from storing data structures as local data, using distributed, public and private cloud architectures to store data more reliably.

What Are Advantages of Distributed Databases?

The advantages of DB include many benefits, but the main objectives of DB systems are scalability, query performance (speed), and availability.

NoSQL distributed databases apply the principles of DB systems the most fully, because in NoSQL databases data is stored in a distributed manner. Data allocation in DB environments can be managed to optimize cost or speed of access internationally.

Do DBs Perform Faster?

DBs can spread the processing work across CPUs and memory of multiple servers or nodes. A NoSQL database like ScyllaDB offers eventual consistency across all nodes. That means it can run at the extreme speeds needed to ingest real-time streaming IoT data, for example. With intelligent sharding that organizes data for quick access, distributed database NoSQL systems can offer much lower latency and locking than a monolithic database.

Are DBs More Reliable?

A distributed key value database can be configured to store the same data in multiple nodes across locations. If a single node fails, the data is still available. You don’t have to wait for the database to be restored. A geo-distributed database maintains concurrent nodes across geographical regions for resilience in case of a regional power or communications outage. The ability to store a single database across multiple computers requires an algorithm for replicating data that is transparent to the users.

How Does a DB Stay in Sync?

ScyllaDB sends all write operations to all nodes, without having to wait for all nodes to report a successful write. The level of data consistency required is configurable. Read operations can query one or multiple nodes, depending on the Consistency Level configured. When the Consistency Level is set to Quorum, for example, a majority of the nodes have to agree on the value returned. The ScyllaDB Repair process runs in the background, updating nodes that are out of sync due to a write failure on that node.

Are Transactions Distributed Too?

It’s one thing to say that a DB management system stores data across multiple computers or nodes for resilience. But the units of work, or transactions, can be shared in a way that optimizes queries. ScyllaDB calls this distributed transaction a Lightweight Transactions, or LWT. LWT relies on conditional statements to ensure that data is already in the node.

How Can I Migrate from a Different Database System?

If you have existing data to migrate from another NoSQL or relational database, migration of data to a DB will either be an online or offline migration. For an offline migration during scheduled downtime, start with the schema, then forklift existing data, validate the move, and finally bring the system online with the new database. If the system must stay online during migration, enable duplicate writes to populate databases with current data during the cutover period.

Masterclass: Data Modeling for NoSQL Databases

Looking for extensive training on about data modeling for NoSQL Databases? Our experts offer a 3-hour masterclass that assists practitioners wanting to migrate from SQL to NoSQL or advance their understanding of NoSQL data modeling. This free, self-paced class covers techniques and best practices on NoSQL data modeling that will help you steer clear of mistakes that could inconvenience any engineering team.

You can access the complete course here.

What Are Some Distributed Database Examples?

DB examples include distributed NoSQL database options like Cassandra, ScyllaDB, and MongoDB. But distributed SQL database options like Google Spanner, CrateDB, CockroachDB, Yugabyte, and Amazon Aurora promise strong consistency with a SQL API syntax that distributes queries across multiple nodes while treating the database as a monolithic logical entity.

If your reasons for building DBs are disaster recovery and replication across geo-redundant datacenters, a distributed cloud database might be all you need. You can’t run Amazon Redshift or SQL Azure locally, but you can get the benefits of replication across nodes and data centers.

Distributed database case study examples include many ecommerce sites, where availability is more important than strong consistency, and the flexibility of NoSQL fits the varied attributes of products well. An online distribution database would be a use case that’s well suited to a NoSQL distributed database. Learn more from our educational page on SQL vs. NoSQL.

How To Implement Distributed Database Architecture?

If you’ve chosen a distributed database environment, you might get started with some best practices for DBs. Pick the simplest DB management system architecture that works, given your applications needs. That is often as simple as a distributed key-value database implementation.

Does ScyllaDB Offer a Distributed Database?

ScyllaDB offers a distributed database system that maps a logical database across a cluster of interconnected database nodes in a ring architecture. The nodes are considered equal, in a peer-to-peer model. Without a defined leader the cluster has no single point of failure. Clusters need not reside in a single location. Data can be distributed across a geographically dispersed cluster using multi-datacenter replication.

Each ScyllaDB node can be an individual on-premise server or a virtual server. Data is distributed as evenly as possible across all nodes in a cluster. ScyllaDB also uses logical units, known as Virtual Nodes (vNodes), to better distribute data for more predictable performance. A cluster can also store multiple copies of the same data on different nodes for reliability.

ScyllaDB segments data across shards, assigning a fragment of the data in a node to a specific CPU, along with its associated memory (RAM) and persistent storage. Shards operate as independent operating units, reducing contention and expensive locking across the distributed database.

Trending NoSQL Resources

ScyllaDB University Mascot

ScyllaDB University

Get started on your path to becoming a ScyllaDB expert.