See all blog posts

The Data Modeling Behind Social Media “Likes”

Did you ever wonder how Instagram, Twitter, Facebook, and  other social media platforms track who liked your posts? This post explains how to do it with ScyllaDB NoSQL. And if you want to go hands-on with these and other NoSQL strategies, join us at ScyllaDB Labs, a free 2 hour interactive event. 

Recently, I was invited to speak at an event called “CityJS.” But here’s the thing: I’m the PHP guy. I don’t do JS at all, but I accepted the challenge. To pull it off, I needed to find a good example to show how a highly scalable and low latency database works.

So, I asked one of my coworkers for examples. He told me to look for high numbers inside any platform, like counters or something like that. At that point, I realized that any type of metrics can fit this example. Likes, views, comments, follows, etc. could be queried as counters. Here’s what I learned about how do proper data modeling for these using ScyllaDB.

 

First things first, right? After deciding what to cover in my talk, I needed to understand how to build this data model.

We’ll need a posts table and also a post_likes table that relates who liked each post. So far, it seems enough to do our likes counter.

My first bet for a query to count all likes was something like:

Ok and if I just do a query with SELECT count(*) FROM social.post_likes it can work, right?

Well, it worked but it was not as performant as expected when I did a test with a couple thousands of likes in a post. As the number of likes grows, the query becomes slower and slower…

“But ScyllaDB can handle thousands of rows easily… why isn’t it performant?” That’s probably what you’re wondering right now.

ScyllaDB – even as a cool database with cool features – will not solve the problem of bad data modeling. We need to think about how to make things faster.

Researching Data Types

Ok, let’s think straight: the data needs to be stored and we need the relation between who liked our post, but we can’t use it for count. So what if I create a new row as integer in the posts table and increment/decrement it every time?

Well, that seems like a good idea, but there’s a problem: we need to keep track of every change on the posts table and if we start to INSERT or UPDATE data there, we’ll probably create a bunch of nonsense records in our database.

Using ScyllaDB, every time that you need to update something, you actually create new data.

You will have to track everything that changes in your data. So, for each increase, there will be one more row unless you don’t change your clustering keys or don’t care about timestamps (a really bad idea).

After that, I went into the ScyllaDB docs and found out that there’s a type called counter that fit our needs and is also ATOMIC!

Ok, it fit our needs but not our data modeling. To use this type, we have to follow a few rules but let’s focus on the ones that are causing trouble for us right now:

  • The only other columns in a table with a counter column can be columns of the primary key (which cannot be updated).
  • No other kinds of columns can be included.
  • You need to use UPDATE queries to handle tables that own a counter data type.
  • You only can INCREMENT or DECREMENT values, setting a specific value is not permitted.

This limitation safeguards correct handling of counter and non-counter updates by not allowing them in the same operation.

So, we can use this counter but not on the posts table… Ok then, it seems that we’re finding a way to get it done.

Proper Data Modeling

With the information that counter type should not be “mixed” with other data types in a table, the only option that is left to us is create a NEW TABLE and store this type of data.

So, I made a new table called post_analytics that will hold only counter types. For the moment, let’s work with only likes since we have a Many to Many relation (post_likes) created already.


These next queries are what you probably will run for this example that we created:

Now you might have new unanswered questions in your mind like: “So every time that I need a new counter related to some data, I’ll need a new table?” Well, it depends on your use case. In the social media case, if you want to store who saw the post, you will probably need a post_viewers table with session_id and a bunch of other stuff.

Having these simple queries that can be done without joins can be way faster than having count(*) queries.

Me talking at CityJS stage

Me on the CityJS stage talking about data modeling using Typescript

Go Hands-On with ScyllaDB and Data Modeling…at the ScyllaDB Labs Event

If you want to go deeper on this topic, with some real hands-on exercises, please join us for ScyllaDB Labs! It’s a free 2 hour virtual event where I’ll be presenting along with Guy Shtub, Head of Training and Felipe Cardeneti Mendes, Solutions Architect.

Join us at the Event

This is an interactive workshop where we’ll go hands-on to build and interact with high-performance apps using ScyllaDB. It will be a  great way to discover the NoSQL strategies used by top teams and apply them in a guided, supportive environment. As you go live with some sample applications, you’ll learn about the features and best practices that will enable your own applications to get the most out of ScyllaDB.

We’ll cover:

  • Understanding if your use case is a good fit for ScyllaDB
  • Achieving and maintaining low-latency NoSQL at scale
  • Building high performance applications with ScyllaDB
  • Data modeling, ingestion, and processing with the sample app
  • Navigating key decisions & logistics when getting started with ScyllaDB

Hope to see you there!

About Daniel Reis

Daniel is a Developer Advocate at ScyllaDB. His goal is to make content about Science and Technology more accessible to everyone and create the best environment for new developers who wants to follow on this journey. In 2018, Daniel founded a developer community called He4rt Developers that is now known by almost every Brazilian developer.