See all blog posts

Mutant Monitoring System (MMS) Day 2 – Building the Tracking System

mms
IMPORTANT: Since the first publication of the Mutant Monitoring System we have made a number of updates to the concepts and code presented in the blog series. You can find the latest version now in ScyllaDB University. Click here to be directed to the new version.

 

This is part 2 of a series of blog posts that provides a story arc for ScyllaDB Training.

Previously we learned that Mutants have emerged from the shadows and are wreaking havoc on the earth! Our mission is to help the Government keep the Mutants under control by building a Mutant Monitoring System (MMS). The MMS consists of a database that will contain a Mutant Catalog and Monitoring system. In the previous exercise, we created the Mutant Catalog that consisted of the name, address, and picture of each mutant. Now we will build the Monitoring system to track the mutants — a time-series database of events that collects unique metrics for each mutant.

Building the Monitoring System

The next step is to build the tracking system keyspace and table that will allow us to keep track of the following mutant metrics:

  • Name
  • Timestamp
  • Location
  • Speed
  • Velocity
  • Heat
  • Telepathy powers

First, let’s create the modeling keyspace:

CREATE KEYSPACE tracking WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','DC1' : 3};

For this keyspace, we will use a replication factor of three with the NetworkTopologyStrategy. With a replication factor set to three, there will be a replica of the data on each node.

Now we can create the tracking table:

The primary key in ScyllaDB usually consists of two parts: Partition key and Clustering columns. The intent of a partition key is to identify the node that stores a particular row. The clustering key is the second part of the primary key and its purpose is to store the data in a sorted order. For this table, we will use the timestamp column so we can search for data on a day-to-day basis for the mutants.

A composite partition key is simply a way to use two or more column as a partition key. In the table created above, our composite partition key is: first_name, last_name. Since first_name and last_name in the tracking_data table is the partition key, all of the rows in each partition are ordered by the clustering key, timestamp.

ScyllaDB stores data in SSTables on the disk. When compaction operations run, it will merge multiple SSTables into a single new one. It is important to choose the right compaction strategy for the right use case. For the MMS, we chose the “DateTieredCompactionStrategy” because it was designed for time series data and uses the time of a data point as the clustering key. In a time-series use case, we see some common features:

  1. Clustering key and write time are correlated.
  2. Data is added in time order. Only a few out-of-order writes, typically rearranged by just a few seconds.
  3. Data is only deleted through TTL (Time To Live) or by deleting an entire partition.
  4. Data is written at nearly constant rate.
  5. A query on a time series is usually a range query on a given partition—the most common query is of the form “values from the last hour/day/week”.

Date-tiered compaction helps in making the TTL (expiration times on data) more efficient. ScyllaDB keeps track of each SSTables most recent expiration time and when it notices the oldest SSTables most recent expiration time has passed, it can drop the entire sstable without bothering to check the expiration time of the individual cells it contains. This is especially useful when all data is inserted with the same TTL.

The compaction argument “base_time_seconds” is the size of the first time window which tells ScyllaDB how much of the most newly written data should be compacted together. All data older than base_time_seconds value will be grouped together with other data the same age. This will help expensive compaction operations run less frequently since we will be recording large amounts of data on the mutants.

The compaction argument “max_sstable_age_days” stops compacting SSTables only having data older than this number of days. If this value is set low, it will help to reduce the total number of times the same value is rewritten to disk and prevents compaction of the largest SSTables.

Now that the table is created, let’s insert data into the tracking_data table for Jim Jeffries:

Let’s also add data for Bob Loblaw:

With the table created, let’s insert data into the tracking_data table for Jim Jeffries:

With data existing in the table, we are now able to query the table:

  • Query all the data that we inserted:

  • Query a mutant’s information between different times on the same day:

  • Query a mutant’s latest record:

  • Query a mutant’s latest record:

Summary

Now we have the basic database infrastructure needed with initial data for the Mutant Monitoring System. In the next few posts, we will discover how to analyze the data by integrating with other systems to monitor the mutants and recover from disaster scenarios. The faster that we get this system up and running, the better for the fate of our country. Please remain safe out there.