NoSQL Database Guide

What is NoSQL?

NoSQL is a non-relational database that does not typically use Structured Query Language (SQL) to retrieve information. NoSQL databases were developed for use cases where a traditional relational database is not sufficient due to the size (volume), type (variety) or speed (velocity) of big data.

What is SQL?

Structured Query Language (SQL) is a database management language, and currently the most popular method of accessing data from and inputting data to a relational database management system (RDBMS). SQL databases have been around for decades. First invented in 1974, SQL has continued to evolve over the years. Because it is such an important industry standard, it is controlled by the International Electrotechnical Commission (IEC). The latest version is formally defined in ISO/IEC 9075:2016.

SQL databases are comprised of records organized into tables. In a single table, data is organized into rows, labeled with a primary key, and with many columns to hold values for different kinds of data. These tables are related to each other through the use of foreign keys, which join together the different tables in the database.

A schema is used to define all of the tables, their composite columns and data types, and the means of joining the tables together. The structure of a relational database can be illustrated by a design referred to as an Entity Relationship Diagram (ERD).

Entity Relationship Diagram
Figure 1: An example of a typical Entity Relationship Diagram (ERD) used to model data in a SQL RDBMS; note that the tables are joined using various relationships (these symbols are called “crow’s foot” notations)

What's the Difference Between SQL and NoSQL?

What is a NoSQL Database?

A NoSQL database (NoSQL DB) is designed to handle massive amounts of distributed data. NoSQL database management systems store and retrieve data in a variety of ways other than the joined tabular models of relational database management systems (RDBMS). NoSQL databases can handle different types of data and can accommodate many kinds of data models including key-value stores, document, columnar and graph formats.

The advantages of using NoSQL databases are for modern applications and for those with extremely large, high velocity, distributed datasets.

NoSQL Database History

The term “NoSQL” was coined by Carlo Strozzi in 1998. He had invented a form of RDBMS that used a different form of data access than SQL. Note that his database was still a relational database — tables still had joins — but the method of accessing it, using Unix pipes, did not rely on the SQL language.

Modern NoSQL databases came about a decade later with the advent of massive Internet e-commerce platforms and social media. While in the early Internet and web era many SQL databases were adapted for data applications (such as the use of open source MySQL to power WordPress sites), there were architectural limitations to SQL servers that made it too inflexible and unscalable for entire new classes of problems that needed to be solved. New data models were required. These included massive (but relatively simple) key value stores, highly distributed and highly available wide column stores, and more unique data models such as document based stores and graph databases.

Sometimes NoSQL databases are also referred to as “Not only SQL” or “NewSQL” to show that they may actually support Structured Query Language or can work in tandem with a SQL database. A SQL-based interface can still be useful, and these databases support hybrid operations, doing both SQL and Non-SQL related data management.

What are NoSQL Databases Used For?

Many of the best NoSQL databases are used for applications with very large datasets and some operate in real time. They are used by enterprises that need features and capabilities more flexible than a traditional database, which can accommodate many kinds of data models in a variety of formats (flexible). A NoSQL database provides the following data management features not found in relational databases:

  • Caching — Improves application response when the same information is being sent to many users. NoSQL storage of data provides a solution without having to maintain a custom cache.

  • Highly available, globally-distributed data — With worldwide user bases and demanding requirements for always-on services, data systems cannot just reside in a single internal datacenter. Many NoSQL databases support global distribution of servers and automatic replication of data between them. This is called Cross Datacenter Replication.

  • Fast access for big data — NoSQL software can handle the massive amounts of data in today’s systems quicker and with less cost than relational databases. “Fast” is often qualified in two ways: latency, which measures the speed of response times, often measured in the range of milliseconds or sub-milliseconds, as well as throughput, which measures the raw amount of data that can be processed, often measured in operations-per-second.

  • Consistency — The consistency requirements of many NoSQL databases use cases are less rigid than for relational databases. For instance, when handling ephemeral, transient, rapidly-changing data. In such cases, some data loss is acceptable to maintain system availability. Today, the ACID model is not always necessary when an “eventual consistency” model promotes better performance. Some offer a range of consistency levels to choose from, including “tunable consistency,” where every database transaction may have its own consistency level.

Why Use NoSQL Databases?

The flexible architecture of NoSQL databases is among the top reasons why NoSQL is an excellent choice for today’s demanding application requirements. Other advantages for when to use NoSQL are as follows:

  • Easier scalability — NoSQL databases can be easily scaled to handle massive amounts of data. Their distributed nature means increases in demand can be met by simply adding servers to the cluster.
  • More efficiency — The looser data consistency models of NoSQL databases can result in better efficiency. When new information doesn’t have to be updated across the entire database in real time, resources can be directed to more critical issues.
  • Simple documents — Document-oriented NoSQL databases store data as simple documents, which leads to easier data manipulation than the data mapping required in SQL databases.
  • Always on — The distributed nature of NoSQL databases allows them to always remain on for enterprises that require non-stop service.

Relational Database Vs. NoSQL Database

The following is a NoSQL databases overview that outlines the differences between NoSQL vs relational (SQL)

SQL NoSQL
Orientation Relational Generally non-relational
Schema Strict and rigid schema design and data normalization Loose and more varied designs for unstructured and semi-structured data; data is generally denormalized
Language Structured Query Language (SQL) for defining, reading and manipulating data. Supports JOIN statements to relate data across tables. There are different languages for querying, some quite similar to SQL, such as Cassandra Query Language (CQL) for column store databases, or others radically different, such as using object-oriented JSON for document databases.
Scalability Vertically scalable. Loads on a single server can be increased with CPU, RAM or SSD. Generally designed for horizontal scalability. Increased traffic can be handled by adding more servers in the database. This is useful for large and frequently changing datasets.
Structure Table-based, which is efficient for applications using multi-row transactions or systems that were built with a relational structure. NoSQL database structure is variable, and can be based on documents, key-value pairs, graph structures or wide-column stores.

SQL Database Examples

Db2IBM’s relational database, originally released in 1983 and created to run on mainframes; over time it was also ported to Unix and Windows; now supports Linux servers.
MariaDBAn open source SQL server, forked from MySQL; was created in 2009 when MySQL was acquired by Oracle.
MemSQLA distributed SQL server designed to run from memory (RAM) for fastest performance.
Microsoft SQL ServerMicrosoft’s SQL server, originally designed to run on OS/2, became the mainstay of databases to run on Microsoft Operating Systems. In 2017 it was ported to run on Linux.
MySQLAn open source SQL server widely adopted by web and application developers. In 2008 it was acquired by Oracle; now available as both open source and proprietary enterprise editions.
OracleFirst released in 1979, Oracle became and remains widely adopted in enterprises over the next decades.
PostgreSQLAn open source SQL server that grew out of, and eventually replaced, the 1980s Ingres project at UC Berkeley.
What is NoSQL graphic

NoSQL Database Comparison

The NoSQL database structures include the following classifications:

  • Key-value stores

    — Each item is stored as a “key” (unique identifier) along with its value. This is the most simple version of a NoSQL database. Examples include Redis and Aerospike.

  • Document databases

    — Each key is paired with a structured data “document” (data structure). Documents can contain many different key-value pairs in a nested, hierarchical format, such as Javascript Object Notation (JSON).

  • Wide-column stores

    — Designed to handle large dataset queries. Data is stored in columns instead of rows, which makes it more efficient to query data in frequently-referenced columns, and to store sparse data (where rows may have only a few data values spread across the many columns). Examples include such as Apache Cassandra, HBase and Scylla.

  • Graph stores

    — Hold social connections and other networks of data information. Examples include Neo4J and Giraph.

Using SQL and NoSQL Together

Modern enterprises do not view SQL and NoSQL as an either/or proposition. A survey at the latest DeveloperWeek conference found that 44 percent of organizations use multiple databases. Among those, 75 percent use a combination of SQL and NoSQL databases.

Many organizations take advantage of the ways that SQL and NoSQL databases can complement each other. Each brings its own strengths, making the sum greater than each part.

SQL databases handle structured data and standardize how elements relate to one another. NoSQL databases are popular when flexibility is a concern. They can create unique data structures that include documents, graphs or columns.

How is NoSQL Data Modeled?

NoSQL database schemas are often exhibit a more flexible design without enforcing the normalized data forms and pre-defined schemas and structures common in SQL databases.

One benefit is the ability of a NoSQL database to easily transition from simple key-value stores to complicated graph databases when integrated with a graph database.

Since NoSQL is an emerging technology, solutions for NoSQL data modeling are still being developed. As more enterprises use NoSQL databases, the design will mature with the following benefits:

  • Reusable data model design patterns for more efficient application development.
  • An inventory of unified NoSQL data models that can support different NoSQL products.
  • Automated data model extraction from application source code.
  • Automated code-model-data consistency validation.
  • Strong control for application and data model change management.

Are NoSQL Databases Faster than SQL Databases?

NoSQL databases such as key-value systems often have far simpler schemas, and thus, they tend to be faster than SQL systems. SQL databases tend to allow for more complex queries, exemplified by making JOINs across tables. Naturally, there is a computational cost, and time required, for such complexity. There are many additional architectural factors that go into a specific database’s performance, such as the use of in-memory tables or RAM caches, various data structures that can help efficiency (such as bloom filters) and so on. Even the programming language a database is written in can affect system performance, as it may allow for, or prohibit, low-level hardware optimization. Thus it is not possible to categorically say that NoSQL databases are always faster than SQL databases (or vice versa). Users need to create and conduct realistic tests to see how various SQL or NoSQL databases perform under specific conditions, data models, and querying patterns.

How are NoSQL Databases Indexed?

NoSQL databases are indexed with keys that correspond with the location of an associated piece of data. The most common indexing methods are B-Tree, T-Tree, O2-Tree, as well as Log Structured Merge (LSM) tree indexing.

  • B-Tree Indexing

    — Allows a variable number of child nodes, which results in more unused space and less tree balancing. The B+Tree is a popular version of B-Tree indexing where every key must reside in the leaves.

  • T-Tree Indexing

    — Features three kinds of nodes: A T-Node that has a right and left child, a leaf node with no children, and a half-leaf node with only one child. Each node stores multiple tuples, or lists. The use of binary search results in more efficient storage.

  • O2-Tree Indexing

    — Created to enhance existing indexing methods by placing tuples within each leaf node. Advances the concept of a Binary-Search tree.

  • Log Structured Merge (LSM) Tree Indexing

    — Creates an in-memory hash table (or memtable), and periodically write these structures to disk in immutable form, known as a Sorted Strings Table (SSTable). Over time, these SSTables are merged in a process known as compaction.

NoSQL Database Use Cases

The most used NoSQL databases are for high performance speed and volume. Big data and NoSQL databases are a good match because of flexible design that allows for a many kinds of datasets in many different formats.

The trade-off for speed within a large dataset is less consistency. While SQL databases provide the highest level of verification, NoSQL databases do not promise total data consistency.

But with companies like Amazon where each transaction requires a tremendous amount of read-write activity, a SQL database would not be able to match the amount of speed and scaling required for millions of real-time transactions. That is why NoSQL ecommerce and NoSQL for transactional data are vital for applications that serve high volumes of customers in real time.

NoSQL Database Examples / NoSQL Database List

The NoSQL database examples include the following:

DatabaseNoSQL Data ModelsDescription
AerospikeKey-valueA flash-optimized and in-memory open source NoSQL database
Amazon DynamoDB Key-value, DocumentCloud-based database only offered on AWS
Apache CassandraColumn
Time series
An open source highly scalable and distributed database created at Facebook to handle massive amounts of structured data.
Apache CouchDBDocumentOpen source and web-oriented database.
Apache HBaseColumnOpen source and column store database developed as a part of Hadoop.
Google BigtableColumnA compressed, high performance, proprietary data storage system.
JanusGraphGraphAn open source, distributed graph database under The Linux Foundation; works on top of Scylla or Apache Cassandra.
Microsoft Cosmos DBDocument
Columnar
Graph
Proprietary, schema-agnostic and horizontally scalable NoSQL database.
MongoDBDocumentDocument-oriented database by MongoDB.
Neo4jGraphACID-compliant and transactional graph database with native graph storage and processing.
Oracle NoSQL DatabaseKey-valueScalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management.
RedisKey-value
Document
Time series
Data structure server with keys can contain strings, hashes, lists, sets and sorted sets.
ScyllaKey-value
Column
Time series
Enterprise and open source database which is a drop-in alternative to Apache Cassandra offering higher performance, lower latency and reduced cost.