Jul26

Using KairosDB with Scylla

Subscribe to Our Blog

This is Part 2 in a series on Thrift support in Scylla. Part 1 is here.

KairosDB is an open source time series database written on top of Cassandra, which it accesses through the Hector client library. That means it is still using the Thrift API. KairosDB can easily be set up by following the getting started instructions. All the Cassandra-specific configuration options are valid for Scylla as well. To have KairosDB use Scylla all we need to do is start it instead of Cassandra:

$ /usr/bin/scylla --smp 2 --memory 4GB

$ kairosdb/bin/kairosdb.sh run

Note that if you’re running both Scylla and KairosDB on the same machine, you’ll want to constraint Scylla’s CPU and memory usage, like in the command line above (preferably by editing the /etc/sysconfig/scylla-server configuration file).

We can use KairosDB to store a time series of how many clicks a given story receives. KairosDB allows pushing data via telnet, so we can use the following script to populate it with some data, assuming KairosDB is running on the local host:

#!/bin/bash

# Current time in milliseconds
now=$(($(date +%s%N)/1000000))
metric=story_views
story_id=$RANDOM
scrolled=$(($RANDOM % 3))
if [ $scrolled -eq 0 ]
then
    scrolled="start"
elif [ $scrolled -eq 1 ]
then
    scrolled="middle"
else
    scrolled="end"
fi

echo "put $metric $now $story_id scrolled=$scrolled" \
| nc -w 30 localhost 4242

This script adds a data point for the story_views metric at the current time, assigning it an artificial story_id and tagging it with the scrolled tag, which acts as a heuristic about how the user engaged with the story. When pushing the first data point, KairosDB will create its schema in Scylla.

After inserting some data into KairosDB, we can inspect the CQL tables that Scylla created using cqlsh. For example, we can describe the string_index table, which contains the metric names, tag names and tag values:

cqlsh> desc kairosdb.string_index;

CREATE TABLE kairosdb.string_index (
    key blob,
    column1 text,
    value blob,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC);

We can see that this table was created from the definition of a dynamic column family, as it defines a clustering key.

We can also use cqlsh to query this table and see that it contains entries with the metric name and the different values for the scrolled tag:

select * from kairosdb.string_index where key = 0x7461675f76616c756573;

 key                    | column1       | value
------------------------+---------------+-------
 0x7461675f76616c756573 |           end |    0x
 0x7461675f76616c756573 |        middle |    0x
 0x7461675f76616c756573 |   story_views |    0x
 0x7461675f76616c756573 |         start |    0x
...

All of the KairosDB API works well – and transparently – with Scylla. We can, for example, query the 2 most recent stories (assuming sequentially incremented IDs) viewed in the last day:

curl -XPOST http://localhost:8080/api/v1/datapoints/query -d '{
  "start_relative":{
    "value":"1",
    "unit":"days"
  },
  "metrics":[
    {
      "name":"story_views",
      "order":"desc",
      "limit":"2"
    }
  ]
}
'

> {
  "queries":[
    {
      "results":[
        {
          "group_by":[
            {
              "name":"type",
              "type":"number"
            }
          ],
          "name":"story_views",
          "tags":{
            "scrolled":[
              "end"
            ]
          },
          "values":[
            [
              1469016283663,
              12506
            ],
            [
              1469016283636,
              28348
            ]
          ]
        }
      ],
      "sample_size":2
    }
  ]
}

We can also issue a more complex query, such as counting how many stories were viewed and scrolled until the end during the last day:

curl -XPOST http://localhost:8080/api/v1/datapoints/query -d '{
  "start_relative":{
    "value":"1",
    "unit":"days"
  },
  "metrics":[
    {
      "name":"story_views",
      "tags":{
        "scrolled":[
          "end"
        ]
      },
      "aggregators":[
        {
          "name":"count",
          "sampling":{
            "value":1,
            "unit":"days"
          }
        }
      ]
    }
  ]
}
'

> {
  "queries":[
    {
      "results":[
        {
          "group_by":[
            {
              "name":"type",
              "type":"number"
            }
          ],
          "name":"story_views",
          "tags":{
            "scrolled":[
              "end"
            ]
          },
          "values":[
            [
              1469014929878,
              455
            ]
          ]
        }
      ],
      "sample_size":455
    }
  ]
}

When we’re done, we can delete our metric and all of its data points, using cqlsh to verify data is actually being deleted:

cqlsh> select count(*) from kairosdb.data_points;

 count
-------
  1665

curl -XDELETE http://localhost:8080/api/v1/metric/story_views

cqlsh> select count(*) from kairosdb.data_points;

 count
-------
   681

Follow ScyllaDB on Twitter for updates.

Scylla Summit

Thrift and KairosDB at Scylla Summit

Thrift is one of many topics to be covered at the upcoming Scylla Summit. Come to Scylla Summit on September 6th, in San Jose, California, to learn more about Thrift and other new and upcoming Scylla features—along with info on how companies like IBM, Outbrain, Samsung SDS, Appnexus, Hulu, and Mogujie are using Scylla for better performance and faster development. Meet Scylla developers and devops users who will cover Scylla design, best practices, advanced tooling and future roadmap items.

Going to Cassandra Summit? Add another day of NoSQL Scylla Summit takes place the day before Cassandra Summit begins and takes place at the Hilton San Jose, adjacent to the San Jose convention Center. Lunch and refreshments are provided.

Register for Scylla Summit

Duarte NunesAbout Duarte Nunes

Duarte Nunes is a Software Engineer working on ScyllaDB. He has a background in concurrent programming, distributed systems and low-latency software. Prior to ScyllaDB, he worked on MidoNet, an open source distributed network virtualization platform, making it fast and scalable.


Tags: 3rd-party-integration, deep-dive, kairosdb, time-series