fbpx

Join us at Scylla Summit 2022 | Free Virtual Conference | February 9-10

Register for Free >

Distributed Database

Distributed Database Definition

A distributed database is a single logical database that is distributed across multiple physical computers or locations.

Distributed Database FAQs

What is a Distributed Database?

A distributed database is a single logical database that is distributed across multiple physical databases, servers, data centers, or even separate networks. Distributed database management systems are more resilient, provide lower latency, and protect data more effectively. Modern database systems of all types have moved away from storing data structures as local data, using distributed, public and private cloud architectures to store data more reliably.

What Are Advantages of Distributed Databases?

The advantages of distributed databases include many benefits, but the main objectives of distributed database systems are scalability, query performance (speed), and availability.

NoSQL distributed databases apply the principles of distributed database systems the most fully, because in NoSQL databases data is stored in a distributed manner. Data allocation in distributed database environments can be managed to optimize cost or speed of access internationally.

Do Distributed Databases Perform Faster?

Distributed databases can spread the processing work across CPUs and memory of multiple servers or nodes. A NoSQL database like Scylla offers eventual consistency across all nodes. That means it can run at the extreme speeds needed to ingest real-time streaming IoT data, for example. With intelligent sharding that organizes data for quick access, distributed database NoSQL systems can offer much lower latency and locking than a monolithic database.

Are Distributed Databases More Reliable?

A distributed key value database can be configured to store the same data in multiple nodes across locations. If a single node fails, the data is still available. You don’t have to wait for the database to be restored. A geo-distributed database maintains concurrent nodes across geographical regions for resilience in case of a regional power or communications outage. The ability to store a single database across multiple computers requires an algorithm for replicating data that is transparent to the users.

How Does a Distributed Database Stay in Sync?

Scylla sends all write operations to all nodes, without having to wait for all nodes to report a successful write. The level of data consistency required is configurable. Read operations can query one or multiple nodes, depending on the Consistency Level configured. When the Consistency Level is set to Quorum, for example, a majority of the nodes have to agree on the value returned. The Scylla Repair process runs in the background, updating nodes that are out of sync due to a write failure on that node.

Are Transactions Distributed Too?

It’s one thing to say that a distributed database management system stores data across multiple computers or nodes for resilience. But the units of work, or transactions, can be shared in a way that optimizes queries. Scylla calls this distributed transaction a Lightweight Transactions, or LWT. LWT relies on conditional statements to ensure that data is already in the node.

How Can I Migrate from a Different Database System?

If you have existing data to migrate from another NoSQL or relational database, migration of data to a distributed database will either be an online or offline migration. For an offline migration during scheduled downtime, start with the schema, then forklift existing data, validate the move, and finally bring the system online with the new database. If the system must stay online during migration, enable duplicate writes to populate databases with current data during the cutover period.

What Are Some Distributed Database Examples?

Distributed database examples include distributed NoSQL database options like Cassandra, Scylla, and MongoDB. But distributed SQL database options like Google Spanner, CrateDB, CockroachDB, Yugabyte, and Amazon Aurora promise strong consistency with a SQL API syntax that distributes queries across multiple nodes while treating the database as a monolithic logical entity.

If your reasons for building distributed databases are disaster recovery and replication across geo-redundant datacenters, a distributed cloud database might be all you need. You can’t run Amazon Redshift or SQL Azure locally, but you can get the benefits of replication across nodes and data centers.

Distributed database case study examples include many ecommerce sites, where availability is more important than strong consistency, and the flexibility of NoSQL fits the varied attributes of products well. An online distribution database would be a use case that’s well suited to a NoSQL distributed database.

How To Implement Distributed Database Architecture?

If you’ve chosen a distributed database environment, you might get started with some best practices for distributed databases. Pick the simplest distributed database management system architecture that works, given your applications needs. That is often as simple as a distributed key-value database implementation.

Does Scylla Offer a Distributed Database?

Scylla offers a distributed database system that maps a logical database across a cluster of interconnected database nodes in a ring architecture. The nodes are considered equal, in a peer-to-peer model. Without a defined leader the cluster has no single point of failure. Clusters need not reside in a single location. Data can be distributed across a geographically dispersed cluster using multi-datacenter replication.

Each Scylla node can be an individual on-premise server or a virtual server. Data is distributed as evenly as possible across all nodes in a cluster. Scylla also uses logical units, known as Virtual Nodes (vNodes), to better distribute data for more predictable performance. A cluster can also store multiple copies of the same data on different nodes for reliability.

Scylla segments data across shards, assigning a fragment of the data in a node to a specific CPU, along with its associated memory (RAM) and persistent storage. Shards operate as independent operating units, reducing contention and expensive locking across the distributed database.

Scylla University Mascot

Scylla University

Get started on your path to becoming a Scylla expert.