Tech Talks Main

Stateful Streaming Applications with Apache Spark | ScyllaDB Summit 2017

Burak YavuzDatabricks28:41

Video Slides

When working with streaming data, stateful operations are a common use case. If you would like to perform data de-duplication, calculate aggregations over event-time windows, track user activity over sessions, you are performing a stateful operation.

Apache Spark provides users with a high level, simple to use DataFrame/Dataset API to work with both batch and streaming data. The funny thing about batch workloads is that people tend to run these batch workloads over and over again. Structured Streaming allows users to run these same workloads, with the exact same business logic in a streaming fashion, helping users answer questions at lower latencies.

In this talk, we focus on stateful operations with Structured Streaming and we will demonstrate through live demos, how NoSQL stores can be plugged in as a fault tolerant state store to store intermediate state, as well as used as a streaming sink, where the output data can be stored indefinitely for downstream applications.

Real-Time AI

Is ScyllaDB right for me?

ScyllaDB University

ScyllaDB Blog

Tech Talks Main

Stateful Streaming Applications with Apache Spark | ScyllaDB Summit 2017

Video Slides

Recommended Videos

Start scaling with the world's best high performance NoSQL database.