Best Practices for Running Spark with ScyllaDB

Eyal Gutkind, Head of Solution Architects, ScyllaDB

Spark and ScyllaDB deployments are a common theme. Executing analytics workloads on transactional data provide insights to the business team. ETL workloads using Spark and ScyllaDB are common too. We cover different workloads we have seen in practice and how we helped optimize both Spark and ScyllaDB deployments to support a smooth and efficient workflow. Best practices we discuss include correctly sizing the Spark and ScyllaDB nodes, tuning partitions sizes, setting connectors concurrency and Spark retry policies. In addition, we will cover ways to use Spark and ScyllaDB in migrations from different data models.

Share this

Video Slides