See all blog posts

ScyllaDB University: New Spark and Kafka Lessons

ScyllaDB University is our free online resource for you to learn and master NoSQL skills. We’re always adding new lessons and updating existing lessons to keep the content fresh and engaging.

We’re also expanding the content to cover data ecosystems, because we understand that your database doesn’t operate in a vacuum. To that end we recently published two new lessons on ScyllaDB University: Using Spark with ScyllaDB and Kafka and ScyllaDB.

Using Spark with ScyllaDB

Whether you use on-premises hardware or cloud-based infrastructure, ScyllaDB is a solution that offers high performance, scalability, and durability to your data. With ScyllaDB, data is stored in a row-and-column, table-like format that is efficient for transactional workloads. In many cases, we see ScyllaDB used for OLTP workloads.

But what about analytics workloads? Many users these days they’ve standardized on Apache Spark. It accepts everything from columnar format files like Apache Parquet to row-based Apache Avro. It can also be integrated with transactional databases like ScyllaDB.

By using Spark together with ScyllaDB, users can deploy analytics workloads on the information stored in the transactional system.

The new ScyllaDB University lesson “Using Spark with ScyllaDB” covers:

  • An overview of ScyllaDB, Spark, and how they can work together.
  • ScyllaDB and Analytics workloads
  • ScyllaDB token architecture, data distribution, hashing, and nodes
  • Spark intro: the driver program, RDDs, and data distribution
  • Considerations for writing and reading data using Spark and ScyllaDB
  • What happens when writing data and what are the different configurable variables
  • How data is read from ScyllaDB using Spark
  • How to decide if Spark should be collocated with ScyllaDB
  • Best practices and considerations for configuring Spark to work with ScyllaDB

Using Kafka with ScyllaDB

This lesson provides an intro to Kafka and covers some basic concepts. Apache Kafka is an open-source distributed event streaming system. It allows you to:

  • Ingest data from a multitude of different systems, such as databases, your services, microservices or other software applications
  • Store them for future reads
  • Process and transform the incoming streams in real-time
  • Consume the stored data stream

Some common use cases for Kafka are:

  • Message broker (similar to RabbitMQ and others)
  • Serve as the “glue” between different services in your system
  • Provide replication of data between databases/services
  • Perform real-time analysis of data (e.g., for fraud detection)

The ScyllaDB Sink Connector is a Kafka Connect connector that reads messages from a Kafka topic and inserts them into ScyllaDB. It supports different data formats (Avro, JSON).It can scale across many Kafka Connect nodes. It has at-least-once semantics, and it periodically saves its current offset in Kafka.

The ScyllaDB University lesson also provides a brief overview of Change Data Capture (CDC) and the ScyllaDB CDC Source Connector. To learn more about CDC, check out this lesson.

The ScyllaDB CDC Source Connector is a Kafka Connect connector that reads messages from a ScyllaDB table (with ScyllaDB CDC enabled) and writes them to a Kafka topic. It works seamlessly with standard Kafka converters (JSON, Avro). The connector can scale horizontally across many Kafka Connect nodes. ScyllaDB CDC Source Connector has at-least-once semantics.

The lesson includes demos for quickly starting Kafka, using the ScyllaDB Sink Connector, viewing changes on a table with CDC enabled, and downloading, installing, configuring, and using the ScyllaDB CDC Source Connector.

To learn more about using Spark with ScyllaDB and about Kafka and ScyllaDB, check out the full lessons on ScyllaDB University. These include quiz questions and hands-on labs.

ScyllaDB University LIVE – Fall Event (November 9th and 10th)

Following the success of our previous ScyllaDB University LIVE events, we’re hosting another event in November! We’ll conduct these informative live sessions in two different time zones to better support our global community of users. The November 9th training is scheduled for a time convenient in North and South America; November 10th will be the same sessions but better scheduled for users in Europe and Asia.

As a reminder, ScyllaDB University LIVE is a FREE, half-day, instructor-led training event, with training sessions from our top engineers and architects. It will include sessions that cover the basics and how to get started with ScyllaDB, as well as more advanced topics and new features. Following the sessions, we will host a roundtable discussion where you’ll have the opportunity to talk with ScyllaDB experts and network with other users.

The event will be online and instructor-led. Participants that complete the LIVE training event will receive a certificate of completion.

REGISTER FOR SCYLLA UNIVERSITY LIVE

Next Steps

If you haven’t done so yet, register a user account in ScyllaDB University and start learning. It’s free!

Join the #scylla-university channel on our community Slack for more training-related updates and discussions.

About Guy Shtub

Head of Training: Guy is experienced in creating products that people love. Previously he co-founded two start-ups. Outside of the office, you can find him climbing, juggling and generally getting off the beaten path. Guy holds a B.SC. degree in Software Engineering from Ben Gurion University.