NoSQL Data Modeling

NoSQL Data ModelingDefinition

A benefit of NoSQL databases is that they don’t require strict schema design, but it’s still important that you follow data modeling best practices specific to them. These vary from technology to technology, but NoSQL data modeling can be generally defined as:

» A query-first data modeling technique that puts emphasis on the application workflow (rather than a “master” view of all data required upfront—like SQL), with the ultimate goal of defining and iterating schemas that match real-world application usage.

Let’s unpack this definition:

  • “Query-first” » Instead of focusing on how data relates to other data—like in traditional SQL data modeling—NoSQL data modeling focuses on the information that a user is most likely to request taking different actions on the application. This data is stored independently from other tables and can be duplicated to be served faster.
  • “Emphasis on application workflow” » The benefit of NoSQL data modeling is that it focuses around the user’s needs as well as the app development workflow, allowing for flexibility in object schema updates and data duplication rather than enforcing strict schema, relationships, and application architecture like SQL data modeling.
  • “Defining and iterating schemas that match real-world usage” » As you develop your application, you’re likely to deal with user feedback in regards to missing features, bugs, and unintuitive data delivery. With NoSQL data modeling, iterating your schema to match these needs and deliver new app releases is considerably faster than SQL.

As a recap, where NoSQL data modeling is an application-driven, query-first technique, traditional SQL modeling is entity-driven—meaning you first need to define the data itself, then the model, and only at the end the application and the queries made by the user.

This table from ScyllaDB University summarizes the differences:

# ScyllaDB Relational Databases
1 Query-based: Application -> Data -> Model Entity-based: Data -> Model -> Application
4 Denormalization Support for foreign-keys, Joins
5 CAP Theorem, Eventual Consistency ACID Guarantee
6 Distributed Architecture Mostly single point of failure
Image showing showing nosql data models: key value, document, graph, and wide column.

NoSQL Data Modeling FAQs

How To Model a NoSQL Database

Every NoSQL database has its own philosophy around data modeling and schema building. For example, ScyllaDB is a wide column database, meaning it organizes data storage into flexible columns that can be spread across multiple servers or database nodes, using multi-dimensional mapping to reference data by column, row, and timestamp.

Image showing wide column database

This isn’t entirely unlike a traditional SQL database.

However, where NoSQL wide column databases differ is that empty columns don’t need to be present for rows where data isn’t entered—making queries a lot more efficient.

So, how do you effectively model a NoSQL database?

By following 3 core guidelines:

  • Don’t worry about duplicated data.
    Modeling NoSQL data means switching the point of view from a data-first to a query-first angle, meaning that as long as the query delivers the relevant information and nothing more, duplicated data across multiple rows is not an issue. For advanced uses, there are cases where duplication is best avoided, but you generally don’t need to worry about it.
  • Store relevant data in a single table, document, set (key/value), etc.
    Depending on the NoSQL database of your choice, you will want to store all data relevant to a specific query in a single table, document, or other store type. This way, you won’t need to make multiple calls to the database when a user requests particular information, increasing performance and promoting application scalability.
  • Make the most of database schema flexibility.
    While versioning your schema is an important consideration even for NoSQL databases, you shouldn’t worry about having the perfect data structure for your application. You’re doing it right when you leave the SQL mentality at the door and instead focus on iteratively adding or removing columns in a table based on the user needs.

There are other important considerations to make with NoSQL data modeling, like schema versioning but—as an introduction—the 3 points above should get you quite far!

Why Is Modeling NoSQL Data Important? (2 Benefits)

There are a slew of benefits that come from effective upfront data modeling, even with a NoSQL database like ScyllaDB. But here are two critical ones:

  1. Better scalability from the get-go.
    You hear it all the time: “But will your app scale?” It’s a question as old as modern tech goes, and it comes from a genuine issue: when badly-designed, databases become less performant and more expensive to support as your application grows in size (users). This leads to significant bottlenecks that “don’t scale” with demand.By investing some time thinking about your data model upfront and following best practices as you iterate on it—you set the foundations for sustainable growth. Queries return information fast, nodes are added when needed to support high volumes (see ScyllaDB University on how node clusters work), and users are happy
    Image depicting nosql nodes
  2. Reduced cost, higher performance, and better overall app design.
    As a result of good data modeling practices, you can expect queries to perform better even under load, be more efficient, and reduce cost by using less computing power. These are all by-products of good NoSQL data practices, and they lead to a better experience both for those using your app as well as the developers serving it.Good app design comes mostly from how the data is served to the user. If they have to go through 3 or 4 steps to compile the information they need to achieve one outcome, that is often the result of bad data modeling, where multiple calls need to be made instead of embedding all of the information in one table—ready for use.

There’s more to say on why NoSQL data modeling is important, so take a look at Hackolade’s guide to learn more about the different aspects of good design with NoSQL databases.

NoSQL Data Modeling: From Beginner To Advanced

When you’re ready to take your NoSQL data modeling journey into advanced topics, it’s important that you refer to the technical documentation of each particular database technology. Some things to think about with wide column databases like Apache Cassandra and ScyllaDB are:

  • The application workflow and query analysis (University lesson)
    Per individual user query, you want to create a single partition in ScyllaDB (more on partitions in our “ring” architecture guide). This makes every query as efficient as it can be, and it ensures a good data model that reflects the user journey in your app.
  • Considering the right data types and collections (University lesson)
    Using the right data type for the information you’re storing applies to most databases, and it’s no different with Cassandra and ScyllaDB. The only difference is with Cassandra Query Language (CQL) collections: maps, sets, and lists. See the full collections reference.
  • Denormalizing when multiple table calls are required (University lesson)
    If you can’t find a way around making multiple calls, you will want to denormalize data by making a copy of it when it’s first written to the database. Since writes are very efficient in ScyllaDB, this doesn’t affect performance, and the data is now easier to retrieve.

As an overview of NoSQL data modeling, all the points and links above should keep you busy for a lot of NoSQL database design and development. If you want to dig even deeper, there is an entire course on NoSQL data modeling—it only takes a few hours to complete!

Does ScyllaDB Offer Resources For NoSQL Data Modeling?

Yes. In addition to extensive ScyllaDB University courses on NoSQL data modeling, ScyllaDB also provides a one-hour “crash course” in data modeling in a one-hour video, Wide Column Store NoSQL vs SQL Data Modeling and our educational page on SQL vs. NoSQL.

Some NoSQL databases popularized the notion of “loose schema”, often misunderstood as “schemaless” — but there is always a data model in the database, the application, or the mind of the developer.

However, NoSQL schemas are designed with very different goals in mind than SQL schemas. Where SQL normalizes data, NoSQL denormalizes. Where SQL joins ad-hoc, NoSQL pre-joins. And where SQL tries to push performance to the runtime, NoSQL bakes performance into the schema. Adding to the confusion, various NoSQL databases have different ideas on what schemas should enforce.

This short video, by ScyllaDB VP Product, Tzach Livyatan, explores the core concepts of NoSQL schema design, using ScyllaDB as an example to demonstrate the tradeoffs and rationale.


“This piece was created in partnership with Datavid, a semantic data services company with extensive experience in NoSQL databases, data architecture, and knowledge graphs.”


Trending NoSQL Resources

ScyllaDB University Mascot

ScyllaDB University

Get started on your path to becoming a ScyllaDB expert.