This blog post is based on a talk I gave last month at the third annual Scylla Summit in San Francisco. It explains how Scylla ensures that ingestion of data proceeds as quickly as possible, but not quicker. It looks into the existing flow-control mechanism for tables without materialized views, and into the new mechanism for tables with materialized views, which is introduced in the upcoming Scylla Open Source 3.0 release. Introduction In this post we look into ingestion of data into a Scylla cluster. What happens when we make a large volume of update (write) requests? We would like […]
Hello again! Following up on our previous post on saving data to Scylla, this time, we’ll discuss using Spark Structured Streaming with Scylla and see how streaming workloads can be written in to ScyllaDB. Our code samples repository for this post contains an example project along with a docker-compose.yaml file with the necessary infrastructure for running the it. We’re going to use the infrastructure to run the code samples throughout the post and run the project itself, so start it up as follows: After that is done, launch the Spark shell as in the previous posts: With that done, let’s […]
In a previous blog post we examined how Scylla’s paging works, explained the problems with it and introduced the new stateful paging in Scylla 2.2 that solves these problems for singular partition queries by making paging stateful. In this second blog post we are going to look into how stateful paging was extended to support range-scans as well. We were able to increase the throughput of range scans by 30% and how we also significantly reduced the amount of data read from the disk by 39% and the amount of disk operations by 73%. A range scan, or a full […]
Welcome back! Last time, we discussed how Spark executes our queries and how Spark’s DataFrame and SQL APIs can be used to read data from Scylla. That concluded the querying data segment of the series; in this post, we will see how data from DataFrames can be written back to Scylla. As always, we have a code sample repository with a docker-compose.yaml file with all the necessary services we’ll need. After you’ve cloned it, start up the services with docker-compose: After that is done, launch the Spark shell as in the previous posts in order to run the samples in […]
In this blog post, we will take a closer look at how Scylla streaming works in detail and how Scylla 2.4’s new streaming improves streaming bandwidth by 240% and reduces the time it takes to perform a “rebuild” operation by 70%.
In this post, we will explore how the Scylla data cache works and will compare the performance results to Cassandra and earlier Scylla releases.
In this blog post, we will look into Scylla’s paging, address some of the earlier problems with it, and describe how we solved those issues in our recently released Scylla 2.2.
Learn how Scylla leverages control theory to keep compactions under control. We’ll discuss the approach ScyllaDB prescribes for solving this problem.
One of the cornerstones of Scylla is the I/O Scheduler, described in details at the moment of its inception in a two-part series that can be found here (part 1) and here (part 2). In the two years in which Scylla has been powering mission-critical workloads in production the importance of the I/O Scheduler was solidified and as our users have attested themselves, it plays a key part in isolating workloads and delivering on our Autonomous Operations promise.
Interested in contributing code to a framework that provides Scylla and other programs with high-throughput I/O and networking? The Scylla team is pleased to announce that the Seastar framework has been accepted as a Google Summer of Code organization. Google Summer of Code with the Seastar project provides students with the opportunity to spend their summer break contributing to an awesome open source project, work under the mentorship of dedicated, brilliant engineers, and in addition receiving a stipend when the project milestones are met.