Following up on our previous post on saving data to Scylla, this time, we’ll discuss using Spark Structured Streaming with Scylla and see how streaming workloads can be written in to ScyllaDB. This is the fourth part of our four part series.
In a previous blog post we examined how Scylla’s paging works, explained the problems with it and introduced the new stateful paging in Scylla 2.2 that solves these problems for singular partition queries by making paging stateful. In this second blog post we are going to look into how stateful paging was extended to support range-scans as well in Scylla Open Source 3.0. We were able to increase the throughput of range scans by 30% and how we also significantly reduced the amount of data read from the disk by 39% and the amount of disk operations by 73%. A […]
Last time, we discussed how Spark executes our queries and how Spark’s DataFrame and SQL APIs can be used to read data from Scylla. That concluded the querying data segment of the series; in this post, we will see how data from DataFrames can be written back to Scylla.
In this blog post, we will take a closer look at how Scylla streaming works in detail and how Scylla 2.4’s new streaming improves streaming bandwidth by 240% and reduces the time it takes to perform a “rebuild” operation by 70%.
In this post, we will explore how the Scylla data cache works and will compare the performance results to Cassandra and earlier Scylla releases.
In this blog post, we will look into Scylla’s paging, address some of the earlier problems with it, and describe how we solved those issues in our recently released Scylla 2.2.
Learn how Scylla leverages control theory to keep compactions under control. We’ll discuss the approach ScyllaDB prescribes for solving this problem.
One of the cornerstones of Scylla is the I/O Scheduler, described in details at the moment of its inception in a two-part series that can be found here (part 1) and here (part 2). In the two years in which Scylla has been powering mission-critical workloads in production the importance of the I/O Scheduler was solidified and as our users have attested themselves, it plays a key part in isolating workloads and delivering on our Autonomous Operations promise.
Interested in contributing code to a framework that provides Scylla and other programs with high-throughput I/O and networking? The Scylla team is pleased to announce that the Seastar framework has been accepted as a Google Summer of Code organization. Google Summer of Code with the Seastar project provides students with the opportunity to spend their summer break contributing to an awesome open source project, work under the mentorship of dedicated, brilliant engineers, and in addition receiving a stipend when the project milestones are met.
Seastar provides a programming environment that abstracts away most of the problems of multi-threaded programming using a thread-per-core model. Locks, atomic variables, memory barriers, lock-free programming, and all of the scaling and complexity that come from them are gone. In their place, Seastar provides a single facility for inter-core communications. This is, of course, great for the developer, who can easily utilize many-core machines, but there is also another side: because Seastar takes care of all inter-core communications, it can apply advanced optimizations to these communications.
This article examines these optimizations and some of the complexity involved.