ScyllaDB X Cloud has landed. Fast scaling, max efficiency, lower cost. Learn more

NoSQL Database Comparison

The NoSQL revolution in database management systems kicked off a full decade ago. Since
then, organizations of all sizes have benefitted from a key feature that the NoSQL architecture
introduced: massive scale using relatively inexpensive commodity hardware. Thanks to this innovation, organizations have been able to deploy architectures that would have been prohibitively expensive and impossible to scale using traditional relational database systems.

A number of alternative non-relational database systems have been proposed, including Google Bigtable (2006) and Amazon Dynamo (2007). The papers for these projects paved the way for Cassandra (2008) and MongoDB (2009). Today, a range of mature NoSQL databases are available to help organizations scale data-intensive applications.

Types of NoSQL Databases

Database structures include the following classifications:

Document databases— Each key is paired with a structured data “document” (data structure). Documents can contain many different key-value pairs in a nested, hierarchical format, such as Javascript Object Notation (JSON). Examples include MongoDB and Couchbase.

Graph stores— Hold social connections and other networks of data information. Examples include Neo4J and Apache Giraph, and JanusGraph.

Key-value stores— Each item is stored as a “key” (unique identifier) along with its value. A key value store is useful for storing simple collections of mixed types where the data being stored would not logically contain the same columns. Examples include Redis, Aerospike, and DynamoDB. Cassandra and ScyllaDB support key-value stores as well as wide column stores.

Columnar databases and wide column stores— These are actually two different types of databases. In a columnar database (also called a column-oriented database), data is stored in columns instead of rows, which makes it more efficient to query data in frequently-referenced columns. For example, Apache Druid or Clickhouse. Wide column stores are actually row-oriented databases, but they will have a partition key to distribute data, a clustering key to define how data is grouped, and then values across cells in multiple columns. For this reason, they may be referred to as “key-key-value” databases. Examples of wide column stores include ScyllaDB, Apache Cassandra, and DataStax Enterprise.

NoSQL Document Database

Document databases are known for being easy for developers to work with and offer teams a simple way to get started with a scalable and highly available distributed database. However, simplicity comes at the cost of performance, and performance degrades as data volume grows.

NoSQL document database design is simple and flexible. All data is stored as documents.
Document databases (also called document-oriented databases or document stores) store data in semi-structured document format using JSON, YAML, BSON, and XML. They support flexible schemas and allow developers to store and query data using the same model as their application code.

Common NoSQL Document Database use cases include:

  • Catalogs
  • User profiles
  • Content management

Some examples of NoSQL Document databases are:

  • MongoDB
  • Couchbase
  • CouchDB
  • Realm
  • Google Cloud Firestore

NoSQL Document Database Resources

NoSQL Graph Database

NoSQL graph databases use graph structures to represent connections between items as nodes, edges/relationships, and properties. They are used to identify and analyze hidden relationships between connected data.

In a graph database, a node record’s main purpose is to point to lists of relationships, labels and properties. Some graph databases store that data natively. This is called native graph storage. Others store that data in another NoSQL database. Wide column databases such as Cassandra, ScyllaDB, and Hbase are commonly used as graph database storage backends.

Beyond the core essentials of nodes, edges/relationships, and properties, common features of graph database in NoSQL include graph traversals, shortest paths, pattern matching, graph views, and support for popular graph query languages Apache TinkerPop Gremlin and W3C’s SPARQL.

Use cases for graph database in NoSQL include:

  • Fraud detection and analytics
  • Artificial intelligence and machine learning
  • Social network analytics and management
  • Recommendation engines
  • Customer 360

Some examples of NoSQL graph databases are:

  • Neo4J
  • JanusGraph
  • TigerGraph
  • Dgraph

Graph Database Resources

Key Value NoSQL Database

A key-value NoSQL database associates named keys to values of any type, including complex types. A team using a key-value NoSQL database is typically looking for simplicity and speed. NoSQL key value database design is highly partitionable and optimized for reading and writing data. This enables key-value databases to achieve horizontal scaling that is impossible for many other types of databases.

Use cases for key-value NoSQL databases include:

  • Shopping carts
  • Session stores
  • Blockchain
  • Multimedia storage

Some examples of key-value databases are:

  • Amazon DynamoDB
  • Redis
  • Memcached
  • etcd
  • ScyllaDB
  • Riak KV

Key-Value Database Resources

In-Memory NoSQL Databases

In-memory NoSQL databases are a subset of key-value databases that are used for in-memory data caching. They generally deliver high performance and low latency by minimizing reads and writes to slower disk-based systems. However, this approach is not suitable for massive volumes of data since it requires data to fit in memory. Moreover, using in-memory databases introduces complexity since data is stored in multiple layers.

Some examples in in-memory databases are:

  • Redis
  • Memcached
  • Amazon MemoryDB

In-memory Database Resources

Wide Column NoSQL Database

A wide-column NoSQL database organizes data storage into flexible columns that can be spread across multiple servers or database nodes. Multi-dimensional mapping is used to reference data by column, row, and timestamp. Wide column databases offer high-performance querying, scalability, and a flexible data model.

Use cases for wide column NoSQL databases include:

  • Log data
  • IoT (Internet of Things) sensor data
  • Time-series data, such as temperature monitoring or financial trading data
  • Attribute-based data, such as user preferences or equipment features
  • Real-time analytics

Some examples of wide column databases are:

  • Apache Cassandra
  • ScyllaDB
  • HBase
  • Google BigTable

Wide Column Database Resources

NoSQL Search Engines

NoSQL search engine databases are focused on searching data content. They offer full-text search, complex search expressions, and ranking of search results. Indexes are used to categorize the similar characteristics among data and facilitate search capability.

Use cases for search engine NoSQL databases include:

  • Text search
  • Navigational search
  • Logging and analysis
  • Time-series data such as metrics and application events

Some examples of search engine databases are:

  • Elasticsearch
  • Splunk
  • Solr
  • Algolia
  • Microsoft Azure Search

Search Database Resources

NoSQL Database vs SQL Database

Deciding whether to adopt an RDBMS vs NoSQL database is a fundamental architectural decision that should be driven by your needs for performance, scalability, availability, and consistency.

Most modern relational database management systems (a.k.a. RDBMS) use a rigid structure of tables with columns and rows. There is one entry per row and each column contains a specific piece of information. The highly organized data requires normalization, which reduces data redundancy and improves reliability. In relational databases, the SQL queries are mostly an afterthought, and the data model is based on the entities. Popular relational (SQL) databases include IBM DB2, Oracle Database, Microsoft SQL Server, and MySQL.

Non-relational databases are often implemented as NoSQL systems. There is no single type of NoSQL database. There are many different schemas, from key-value stores, to document stores, graph databases, time series, and wide-column stores. In NoSQL, you need to consider the application before thinking about the data, as the Data modeling is query based.
Some NoSQL systems also support “multi-model” schemas, meaning they can support more than one data schema internally.

Unlike the ANSI/ISO processes for the SQL standard, there is no industry standard around implementing NoSQL systems. The exact manner of supporting various NoSQL schemas is up to the software developers. Popular non-relational (NoSQL) databases include Apache Cassandra, Apache HBase, MongoDB, Redis, and ScyllaDB.

Learn more in the detailed SQL vs NoSQL guide.

NoSQL Benchmarks for Performance Comparisons
You can find benchmarks comparing popular NoSQL database throughput and latency at this NoSQL database benchmark page. For example, benchmarks compare the performance of:

  • ScyllaDB
  • Apache Cassandra
  • DynamoDB
  • Google Cloud Big Table

In addition, there are “apples to oranges” comparisons of NoSQL database performance vs distributed SQL database performance.

Best NoSQL Databases for Analytics vs. the Best NoSQL Database for Transactions

In a typical database, there are numerous workloads running at the same time. Each workload type dictates a different acceptable level of latency and throughput. For example, consider the following two workloads:

OLTP (Online Transaction Processing), the backend database for your application, handles a high volume of requests. It requires fast processing and is latency sensitive.

OLAP (Online Analytical Processing ) performs data analytics in the background. It handles a high volume of data. However, slow queries are acceptable, so it is essentially latency agnostic.

Teams traditionally had to separate these database workloads, either to isolated clusters, isolated virtual data centers or time-based segregation (run analytics/reporting overnight). Each of these has associated limitations, risks and/or significant costs.

The analogy is to consider OLAP like freight trucks: large eighteen wheelers hauling a lot of data. The raw throughput — data volume — is what is important. OLTP, on the other hand, is more like a sports car. Built for data velocity, it is latency-sensitive. By using one best database for analytics and another for transactions, you are essentially building one data highway for trucks and another highway for sports cars. This approach is inherently inefficient. A better approach is to run all traffic on the same highway, and grant some lanes of traffic priority over others.

The best way to handle both analytical and transactional workloads in the same database is to use an approach called workload prioritization. Workload prioritization splits workloads into groups and prioritizes their resource distribution according to a user-defined ratio. This mechanism kicks in only when there is a conflict about a resource (for example, under extremely high loads).

NoSQL Database Comparison: Low Latency

In general, database latency is influenced by factors such as network latency, disk I/O, data size, throughput, workload characteristics, and how the database is built (database architecture and database internals). For a detailed look at the many factors that impact database latency, read the free book Database Performance at Scale.

Here is a quick look at specific databases considered for low-latency use cases.

Cassandra Latency

Cassandra’s distributed architecture aims to minimize latency by spreading data across nodes, enabling parallel processing of requests. However, latency can still vary based on factors like data distribution, consistency levels, and hardware performance. Monitoring and tuning configurations such as replication strategies and consistency levels can help optimize Cassandra’s latency performance.

For a specific example of Cassandra latency based on benchmarks, here is a look at Cassandra latency compared to ScyllaDB latency:

[Read more about this Cassandra latency benchmark]

DynamoDB Latency

DynamoDB achieves low latency through distributed data storage, “autoscaling”, and caching mechanisms. Optimizing table design can help minimize latency in DynamoDB.

For a specific example of DynamoDB latency based on benchmarks, here is a look at DynamoDB latency compared to ScyllaDB latency:

[Read more about this DynamoDB latency benchmark]

MongoDB Latency

MongoDB’s architecture allows for horizontal scaling, distributing data across multiple nodes to minimize latency. However, it is far from linearly scalable. Techniques such as indexing, sharding, and replica sets can be utilized to optimize MongoDB’s latency performance. Monitoring system metrics and adjusting configurations can help maintain low latency in MongoDB deployments.

For a specific example of MongoDB latency based on benchmarkss, here is a look at MongoDB latency compared to ScyllaDB latency:

[Read more about this MongoDB latency benchmark]