Scylla Summit Preview: Scalable Stream Processing with KSQL, Kafka and Scylla
In the run-up to Scylla Summit 2018, we’ll be featuring our speakers and providing sneak peeks at their presentations. This interview in our ongoing series is with Hojjat Jafarpour of Confluent. His presentation at Scylla Summit is entitled Scalable Stream Processing with KSQL, Kafka and Scylla.
Hojjat, before we get into your talk, tell us a little about yourself. What do you like to do for fun?
I’m a Software Engineer at Confluent and the creator of KSQL, the Streaming SQL engine for Apache Kafka. In addition to challenging problems in scalable data management, I like traveling and outdoors. We are so lucky to have spectacular places like Yosemite, Lake Tahoe and many more wonders of nature in California and I take any opportunity to plan a getaway to such amazing places.
Kafka seems everywhere these days. What do you believe were the critical factors that drove its rapid adoption versus other options?
There are quite a few factors for the extraordinary success of Kafka. I think one of the main factors is the ability of Kafka to decouple producers and consumers of data. Such decoupling can significantly simplify the data management architecture of enterprises. Kafka, along with its complementary technologies such as KSQL/Streams and Connect can be the central nervous system for any data driven enterprise.
We see many organizations that go through the transformational phase of breaking their monolith into a microservices architecture which has Kafka as the central part of their new architecture.
Real-time data and stream processing raises the spectre of bursty data traffic patterns. You can’t control throughput as easily as with batch processing. Will you cover how to ensure you’re not just dumping data into /dev/null?
Yes, KSQL uses Kafka Streams as the physical execution engine which provides a powerful elasticity model. You can easily add new computing resource as needed for your stream processing applications in KSQL.
KSQL is a powerful tool to find and enrich data that’s coming in from live streams and topics. For people familiar with Cassandra Query Language (CQL), what are some key similarities or differences to note?
The main difference between KSQL and CQL is in their data processing model. Unlike CQL, KSQL is a Streaming SQL engine where our queries are continuous queries. When you run a query in KSQL it will keep reading input and will generate output continuously, unless you explicitly terminate the query. On the other hand, similar to CQL, one of our main goals in KSQL is to provide a familiar tool for our users to write stream processing applications. It’s much easier to learn SQL than Java. KSQL aims to make stream processing available for much broader audience than only hardcore software engineers!
Besides a general understanding of Kafka and KSQL, is there anything else Scylla Summit attendees familiarize themselves with to get the most out of your session?
Just a general understanding of Kafka is enough. I will provide a brief introduction to KSQL in the beginning of the talk.
Thanks very much for your time!
If you’d like to hear more of what Hojjat has to say, yet haven’t registered for Scylla Summit, here’s a handy button: