See all blog posts

Mutant Monitoring Systems (MMS) Day 3 – Analyzing Mutant Data with Presto

mms
IMPORTANT: Since the first publication of the Mutant Monitoring System we have made a number of updates to the concepts and code presented in the blog series. You can find the latest version now in ScyllaDB University. Click here to be directed to the new version.

 

This is part 3 of a series of blog posts that provides a story arc for ScyllaDB Training.

On day one, we set up the Mutant Catalog used to gather basic metrics for each mutant such as name and contact information. On day two, we created the database keyspace for the Mutant Monitoring System with initial data. Now we can use our Monitoring System to analyze the mutants behavior. According to Government data, a heat reading greater than 20 means that a mutant is actively using their abilities with malicious intent.

Let’s look at mutants Jim Jeffries and Bob Loblaw to see if they have committed malicious acts by analyzing their heat signatures with Presto. Presto is a distributed SQL query engine for Big Data technologies like ScyllaDB. With it, we can run complex queries such as full-text search, compare values, joins, and query data from multiple data sources.

To get started, we need to build the single instance Presto container that is preconfigured to communicate with the ScyllaDB cluster using the Cassandra driver. In production, you will need a Presto cluster that consists of multiple nodes.


cd scylla-code-samples/mms
git pull
cd presto
docker build -t presto .
docker run --name presto --network mms_web -p 8080:8080 -d presto

To verify that Presto is up and running, navigate to the following address in your web browser at
http://127.0.0.1:8080.

In the web console, we can view node and query statistics on a Presto cluster. We will cover more on this later after we run a few queries.

Let’s connect to the Presto container and run some queries against the ScyllaDB Database to
see if Jim has used his abilities maliciously by analyzing the heat signature readings.


docker exec -it presto bash
./presto-cli.jar --catalog cassandra
use tracking;

Now that we are in Presto, we can begin to query data. Let’s start with Jim Jeffries. Jim has been known to have increased heat levels around 20 or higher when he receives negative feedback on his comedy performance. Also, he will likely injure them out of anger.

select * from tracking.tracking_data where first_name='Jim' and last_name='Jeffries' and timestamp between timestamp '2017-11-11 00:00:00.000' and timestamp '2017-11-11 23:59:00.000' and heat > 20;

Alert! Jim has a reading greater than 20 and likely did something bad! Alert the local authorities!

Bob Loblaw is a shady lawyer that rarely uses his powers at alarming levels. The only power that he uses is brain power to cheat his clients. Let’s see if Bob Loblaw is being a good citizen with the following query:

select * from tracking.tracking_data where first_name='Bob' and last_name='Loblaw' and timestamp between timestamp '2017-11-11 00:00:00.000' and timestamp '2017-11-11 23:59:00.000' and heat > 20;

We can see that Bob is probably innocent and contacting the authorities is not needed.

In Presto, we can also do full-text search by using Like in our select queries.. For example, this will make it easy to search for data by the city or name. Let’s look for all activity in Cincinnati.

select * from tracking.tracking_data where location like 'Cincinatti';

Using like queries, we can also search our Mutant Catalog for all the mutants with Bob as their first name:

select * from catalog.mutant_data where first_name like 'Bob';

You can also see the stats of each query in your web browser by navigating to

http://127.0.0.1:8080 and click on the query on the bottom left side of the screen to view more information.

Summary

Our Mutant Monitoring System became slightly more advanced today by integrating with Presto. Presto enabled us to quickly scan the metrics from the tracking system so we can respond if needed. Our next mission is to prepare ourselves for disaster scenarios by learning how to recover from node failures. Be safe out there and continue to analyze the data for malicious activity.