In the run-up to ScyllaDB Summit 2018, we’ll be featuring our speakers and providing sneak peeks at their presentations. This interview in our ongoing series is with two speakers holding a joint session: ScyllaDB and KairosDB in Smart Vehicle Diagnostics. The first part of the talk will feature Brian Hawkins speaking on the time-series database (TSDB) KairosDB, which runs atop ScyllaDB or Cassandra. He’ll then turn the session over to Bin Wang of Faraday Future (FF), who will discuss his company’s use case in automotive real-time data collection.
Brian, Bin, thank you for taking the time to speak with me. First, tell our readers a little about yourselves and what you enjoy doing outside of work.
Brian: I recently purchased a new house and I’m still putting in the yard so I don’t have the luxury of doing anything I enjoy.
Bin: I adopted a small dog. I enjoy taking my family out in the nature on the weekends.
Brian, people have been enthusiastic about KairosDB from the start. Out of the many time-series databases available, what do you believe sets it apart?
Brian: Time-series data is addicting, no matter how much you have you want more. One way were Kairos stands out is its ability to scale as you want to store more data. If you want to store 100 metrics/sec or 1 million metrics/sec Kairos can handle it.
Another way where Kairos is different is that it is extremely customizable with plugins. You can embed it or customize it to your specific use case.
Bin, I was curious. Had you looked at our 2017 blog or seen the 2018 webinar about using KairosDB with ScyllaDB? If not, how did you decide on this path for implementation? How did you determine to use these two technologies together?
Bin: I didn’t see that webinar before I made this decision at the beginning of our project. We chose it since we need to write huge quantities of signals data into persistent storage for use later. Our usage is not only loading in module training, but also to load historical values by our engineering team.
So we decided to store them in Cassandra. After that we found ScyllaDB, which is written in C++ and has much better performance than traditional Cassandra. We switched our underlayer DB to ScyllaDB.
After running for some time, we found we used a lot of storage with plain signals storage and also there are much greater requirements with visualization of signals.
So next I looked at InfluxDB and Grafana. They are perfect for signal visualization, but InfluxDB doesn’t have the same performance as ScyllaDB. My team uses a lot ScyllaDB in our projects. I like it.
After that I found KairosDB. It is perfect for me: working on ScyllaDB, having great performance, more efficient storage of signals, and can work with Grafana as signal visualization. So my final solution is ScyllaDB, KairosDB and Grafana.
What is the use case for ScyllaDB and KairosDB at Faraday Future? Describe the data you need to manage, and the data ecosystem ScyllaDB/KairosDB would need to integrate with.
Bin: Well, as I described before, KairosDB is an interface of storage of the signals sent from FF vehicles. Each of vehicle uploading hundreds to thousands of signals each second. We created a pipeline from RabbitMQ to decoding in Spring Cloud and message queue of Kafka and store them in KairosDB and persistent into ScyllaDB.
In the consumer side, we use Grafana for the raw signal visualization, use Spark for module training and data analysing, and use Spring Cloud for the REST APIs for UI. KairosDB and ScyllaDB is the core persistence and data accessing interface of the system.
Brian, when you listen to developers like Bin putting KairosDB to the test in IoT environments, what do you believe are important considerations for how they manage KairosDB and their real-time data?
Brian: Make sure you scale to keep ahead of your data flow. Have a strategy for either expiring your data or migrating it as things fill up. A lot of users underestimate how much time-series data they will end up collecting. Also make sure each user that will send data to Kairos understands tag cardinality. Basically if a metric has too many tag combinations it will take a long time to query and won’t be very useful.
Thank you both for your time!
Speaking of time series, if you have the time to see a series of great presentations, it it seriously time to sign up for ScyllaDB Summit 2018, which is right around the corner. November 6-7, 2018.