Using KairosDB with Scylla
By Duarte Nunes, July 26, 2016
Subscribe to Our Blog
By submitting this form, you are confirming you have read and agree to our privacy policy.
This is Part 2 in a series on Thrift support in Scylla. Part 1 is here.
KairosDB is an open source time series database written on top of Cassandra, which it accesses through the Hector client library. That means it is still using the Thrift API. KairosDB can easily be set up by following the getting started instructions. All the Cassandra-specific configuration options are valid for Scylla as well. To have KairosDB use Scylla all we need to do is start it instead of Cassandra:
$ /usr/bin/scylla --smp 2 --memory 4GB
$ kairosdb/bin/kairosdb.sh run
Note that if you’re running both Scylla and KairosDB on the same machine, you’ll want to constraint Scylla’s CPU and memory usage, like in the command line above (preferably by editing the /etc/sysconfig/scylla-server
configuration file).
We can use KairosDB to store a time series of how many clicks a given story receives. KairosDB allows pushing data via telnet, so we can use the following script to populate it with some data, assuming KairosDB is running on the local host:
#!/bin/bash
# Current time in milliseconds
now=$(($(date +%s%N)/1000000))
metric=story_views
story_id=$RANDOM
scrolled=$(($RANDOM % 3))
if [ $scrolled -eq 0 ]
then
scrolled="start"
elif [ $scrolled -eq 1 ]
then
scrolled="middle"
else
scrolled="end"
fi
echo "put $metric $now $story_id scrolled=$scrolled" \
| nc -w 30 localhost 4242
This script adds a data point for the story_views
metric at the current time, assigning it an artificial story_id
and tagging it with the scrolled
tag, which acts as a heuristic about how the user engaged with the story. When pushing the first data point, KairosDB will create its schema in Scylla.
After inserting some data into KairosDB, we can inspect the CQL tables that Scylla created using cqlsh
. For example, we can describe the string_index
table, which contains the metric names, tag names and tag values:
cqlsh> desc kairosdb.string_index;
CREATE TABLE kairosdb.string_index (
key blob,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC);
We can see that this table was created from the definition of a dynamic column family, as it defines a clustering key.
We can also use cqlsh
to query this table and see that it contains entries with the metric name and the different values for the scrolled
tag:
select * from kairosdb.string_index where key = 0x7461675f76616c756573;
key | column1 | value
------------------------+---------------+-------
0x7461675f76616c756573 | end | 0x
0x7461675f76616c756573 | middle | 0x
0x7461675f76616c756573 | story_views | 0x
0x7461675f76616c756573 | start | 0x
...
All of the KairosDB API works well – and transparently – with Scylla. We can, for example, query the 2 most recent stories (assuming sequentially incremented IDs) viewed in the last day:
curl -XPOST http://localhost:8080/api/v1/datapoints/query -d '{
"start_relative":{
"value":"1",
"unit":"days"
},
"metrics":[
{
"name":"story_views",
"order":"desc",
"limit":"2"
}
]
}
'
> {
"queries":[
{
"results":[
{
"group_by":[
{
"name":"type",
"type":"number"
}
],
"name":"story_views",
"tags":{
"scrolled":[
"end"
]
},
"values":[
[
1469016283663,
12506
],
[
1469016283636,
28348
]
]
}
],
"sample_size":2
}
]
}
We can also issue a more complex query, such as counting how many stories were viewed and scrolled until the end during the last day:
curl -XPOST http://localhost:8080/api/v1/datapoints/query -d '{
"start_relative":{
"value":"1",
"unit":"days"
},
"metrics":[
{
"name":"story_views",
"tags":{
"scrolled":[
"end"
]
},
"aggregators":[
{
"name":"count",
"sampling":{
"value":1,
"unit":"days"
}
}
]
}
]
}
'
> {
"queries":[
{
"results":[
{
"group_by":[
{
"name":"type",
"type":"number"
}
],
"name":"story_views",
"tags":{
"scrolled":[
"end"
]
},
"values":[
[
1469014929878,
455
]
]
}
],
"sample_size":455
}
]
}
When we’re done, we can delete our metric and all of its data points, using cqlsh
to verify data is actually being deleted:
cqlsh> select count(*) from kairosdb.data_points;
count
-------
1665
curl -XDELETE http://localhost:8080/api/v1/metric/story_views
cqlsh> select count(*) from kairosdb.data_points;
count
-------
681
Follow ScyllaDB on Twitter for updates.
Thrift and KairosDB at Scylla Summit
Thrift is one of many topics to be covered at the upcoming Scylla Summit. Come to Scylla Summit on September 6th, in San Jose, California, to learn more about Thrift and other new and upcoming Scylla features—along with info on how companies like IBM, Outbrain, Samsung SDS, Appnexus, Hulu, and Mogujie are using Scylla for better performance and faster development. Meet Scylla developers and devops users who will cover Scylla design, best practices, advanced tooling and future roadmap items.
Going to Cassandra Summit? Add another day of NoSQL Scylla Summit takes place the day before Cassandra Summit begins and takes place at the Hilton San Jose, adjacent to the San Jose convention Center. Lunch and refreshments are provided.
Related Posts
Tags: 3rd-party-integration, deep-dive, kairosdb, time-series