What fascinates me most about databases is how they can be used for storing time series data. This use case is important for Internet of Things (IoT) devices and data analytics. Everyone should at least be able to relate to a time series database use case as many are likely to have devices in their home collecting and sending data such as a smart thermostat or phone or wrist device gathering your fitness activity. Wouldn’t it be nice to know how the infrastructure works behind the scenes and how to create a time series database? In this post, I will go over using KairosDB on top of Scylla based on a talk from Brian Hawkins of the KairosDB project.
KairosDB is a fast and open source time series database that can use Scylla as the primary storage backend. Data is sent to KairosDB via multiple protocols such as Telnet, Rest, and Graphite. It features an intuitive web interface for users to run queries against the database backend. KairosDB can scale to over 10 billion data points per day
Now that we know a little more about KairosDB and what it is used for, let’s learn more about the data side of things. Before applications and devices can use a database, a schema needs to be made for the data inside a keyspace and table. A database schema describes how data is organized in a database and how applications will write and read data in this format. In the above image, we see a sample schema for an application connecting to a time series database..
When creating the schema, it is important to set the primary key correctly for optimal performance. The primary key usually consists of two parts: Partition key and Clustering columns. The intent of a partition key is to identify the node that stores a particular row. The clustering key is the second part of the primary key and its purpose is to store the data in a sorted order. If the primary keys are set incorrectly you can run into hotspot issues where most of the data is being accessed and stored on few nodes, creating lower performance than expected.
So is Scylla the right platform for time series? Short answer, yes! Brian states that Scylla was incredibly fast for ingesting data and scaled well. Many are using Scylla to benefit with better performance and lower latencies for their applications. You do not have to take my word for it, check out our benchmarks!
I encourage you to watch the video above or the slides below to learn more about building a time series database with KairosDB and Scylla in depth.
Want to learn more about Scylla? Check out our download page to run Scylla on AWS, install it locally in a Virtual Machine, or run it in Docker.