Menu

Products
- Quick Links
  
  Real-Time AI
  
  ScyllaDB powers real-time AI with low latency, high throughput, and billion-scale data.
  Learn More
  
  Is ScyllaDB right for me?
  
  ScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
  Learn More
- Products
- Compare
- Related Technologies
Developers
Customers
Resources
- Featured Resource
  
  ScyllaDB University
  
  Level up your skills with our free NoSQL database courses.
  Take a Course
  
  ScyllaDB Blog
  
  Our blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
  Read the Blog
- Resource Center
- Events
- Compare
Pricing
Contact Us
Chat Now
Get Started
Sign In
Search
Button Links

See all blog posts

ScyllaDB University: New Spark and Kafka Lessons

By Guy Shtub

October 21, 2021

Bluesky Reddit X LinkedIn

Subscribe to Receive Blog Updates

Subscription Categories

All Posts
Events
Integrations
News
Performance
Releases
ScyllaDB
Seastar
Tutorials
User Stories
Developers Blog

Bluesky Reddit X LinkedIn

ScyllaDB University is our free online resource for you to learn and master NoSQL skills. We’re always adding new lessons and updating existing lessons to keep the content fresh and engaging.

We’re also expanding the content to cover data ecosystems, because we understand that your database doesn’t operate in a vacuum. To that end we recently published two new lessons on ScyllaDB University: Using Spark with ScyllaDB and Kafka and ScyllaDB.

Using Spark with ScyllaDB

Whether you use on-premises hardware or cloud-based infrastructure, ScyllaDB is a solution that offers high performance, scalability, and durability to your data. With ScyllaDB, data is stored in a row-and-column, table-like format that is efficient for transactional workloads. In many cases, we see ScyllaDB used for OLTP workloads.

But what about analytics workloads? Many users these days they’ve standardized on Apache Spark. It accepts everything from columnar format files like Apache Parquet to row-based Apache Avro. It can also be integrated with transactional databases like ScyllaDB.

By using Spark together with ScyllaDB, users can deploy analytics workloads on the information stored in the transactional system.

The new ScyllaDB University lesson “Using Spark with ScyllaDB” covers:

An overview of ScyllaDB, Spark, and how they can work together.
ScyllaDB and Analytics workloads
ScyllaDB token architecture, data distribution, hashing, and nodes
Spark intro: the driver program, RDDs, and data distribution
Considerations for writing and reading data using Spark and ScyllaDB
What happens when writing data and what are the different configurable variables
How data is read from ScyllaDB using Spark
How to decide if Spark should be collocated with ScyllaDB
Best practices and considerations for configuring Spark to work with ScyllaDB

<span data-mce-type="bookmark" style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class="mce_SELRES_start"></span>

Using Kafka with ScyllaDB

This lesson provides an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

Ingest data from a multitude of different systems, such as databases, your services, microservices or other software applications
Store them for future reads
Process and transform the incoming streams in real-time
Consume the stored data stream

Some common use cases for Kafka are:

Message broker (similar to RabbitMQ and others)
Serve as the “glue” between different services in your system
Provide replication of data between databases/services
Perform real-time analysis of data (e.g., for fraud detection)

The ScyllaDB Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into ScyllaDB. It supports different data formats (Avro, JSON).It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The ScyllaDB University lesson also provides a brief overview of Change Data Capture (CDC) and the ScyllaDB CDC Source Connector. To learn more about CDC, check out this lesson.

The ScyllaDB CDC Source Connector is a Kafka Connect connector that reads messages from a ScyllaDB table (with ScyllaDB CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. ScyllaDB CDC Source Connector has at-least-once semantics.

The lesson includes demos for quickly starting Kafka, using the ScyllaDB Sink Connector, viewing changes on a table with CDC enabled, and downloading, installing, configuring, and using the ScyllaDB CDC Source Connector.

To learn more about using Spark with ScyllaDB and about Kafka and ScyllaDB, check out the full lessons on ScyllaDB University. These include quiz questions and hands-on labs.

ScyllaDB University LIVE – Fall Event (November 9th and 10th)

Following the success of our previous ScyllaDB University LIVE events, we’re hosting another event in November! We’ll conduct these informative live sessions in two different time zones to better support our global community of users. The November 9th training is scheduled for a time convenient in North and South America; November 10th will be the same sessions but better scheduled for users in Europe and Asia.

As a reminder, ScyllaDB University LIVE is a FREE, half-day, instructor-led training event, with training sessions from our top engineers and architects. It will include sessions that cover the basics and how to get started with ScyllaDB, as well as more advanced topics and new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with ScyllaDB experts and network with other users.

The event will be online and instructor-led. Participants that complete the LIVE training event will receive a certificate of completion.

REGISTER FOR SCYLLA UNIVERSITY LIVE

Next Steps

If you haven’t done so yet, register a user account in ScyllaDB University and start learning. It’s free!

Join the #scylla-university channel on our community Slack for more training-related updates and discussions.

oreilly-book-ads

Apache Spark ScyllaDB University Change Data Capture Apache Kafka Kafka ScyllaDB Connector ScyllaDB University Live ScyllaDB CDC Source Connector

About Guy Shtub

Head of Training: Guy is experienced in creating products that people love. Previously he co-founded two start-ups. Outside of the office, you can find him climbing, juggling and generally getting off the beaten path. Guy holds a B.SC. degree in Software Engineering from Ben Gurion University.

View all posts by this author

Previous Post Next Post

Related Posts

Getting the Most out of ScyllaDB University LIVE

Introducing ScyllaDB University

Overheard at ScyllaDB University LIVE Summer Session

ScyllaDB University LIVE — Fall Semester 2021

Introducing the ScyllaDB Community Forum

Start scaling with the world's best high performance NoSQL database.

Product

Product Overview
Architecture
ScyllaDB Enterprise
ScyllaDB Cloud
Vector Search
Benchmarks
Solutions
Release Notes
Install
Tools
Pricing

Resources

Documentation
Online Training
NoSQL Guides
Resource Center
Events
Customer Support
Customer Portal
Community Projects
Blog

Company

About Us
Team
Customers
Partners
News
Media Kit
Careers
Contact Us

Privacy Policy
Terms of Use
Trust Center
Legal Center
Cookie Policy

2026 ©ScyllaDB | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.