Welcome back! Last time, we discussed how Spark executes our queries and how Spark’s DataFrame and SQL APIs can be used to read data from Scylla. That concluded the querying data segment of the series; in this post, we will see how data from DataFrames can be written back to Scylla. As always, we have a code sample repository with a docker-compose.yaml file with all the necessary services we’ll need. After you’ve cloned it, start up the services with docker-compose: After that is done, launch the Spark shell as in the previous posts in order to run the samples in […]
In this blog post, we will take a closer look at how Scylla streaming works in detail and how Scylla 2.4’s new streaming improves streaming bandwidth by 240% and reduces the time it takes to perform a “rebuild” operation by 70%.
In this post, we will explore how the Scylla data cache works and will compare the performance results to Cassandra and earlier Scylla releases.
In this blog post, we will look into Scylla’s paging, address some of the earlier problems with it, and describe how we solved those issues in our recently released Scylla 2.2.
Learn how Scylla leverages control theory to keep compactions under control. We’ll discuss the approach ScyllaDB prescribes for solving this problem.
One of the cornerstones of Scylla is the I/O Scheduler, described in details at the moment of its inception in a two-part series that can be found here (part 1) and here (part 2). In the two years in which Scylla has been powering mission-critical workloads in production the importance of the I/O Scheduler was solidified and as our users have attested themselves, it plays a key part in isolating workloads and delivering on our Autonomous Operations promise.
Interested in contributing code to a framework that provides Scylla and other programs with high-throughput I/O and networking? The Scylla team is pleased to announce that the Seastar framework has been accepted as a Google Summer of Code organization. Google Summer of Code with the Seastar project provides students with the opportunity to spend their summer break contributing to an awesome open source project, work under the mentorship of dedicated, brilliant engineers, and in addition receiving a stipend when the project milestones are met.
Seastar provides a programming environment that abstracts away most of the problems of multi-threaded programming using a thread-per-core model. Locks, atomic variables, memory barriers, lock-free programming, and all of the scaling and complexity that come from them are gone. In their place, Seastar provides a single facility for inter-core communications. This is, of course, great for the developer, who can easily utilize many-core machines, but there is also another side: because Seastar takes care of all inter-core communications, it can apply advanced optimizations to these communications.
This article examines these optimizations and some of the complexity involved.
This is the second post in a series of four about the different compaction strategies available in Scylla. In the previous post, we introduced the Size-Tiered compaction strategy (STCS) and discussed its most significant drawback – its disk-space waste, a.k.a. space amplification. In this post, we will look at Leveled Compaction Strategy (LCS), the first alternative compaction strategy designed to solve the space amplification problem of STCS, and show that it does solve that problem, but unfortunately introduces a new problem – write amplification. The next post in this series will introduce a new compaction strategy, Hybrid Compaction Strategy, which […]
This is the first post in a series of four about the different compaction strategies available in Scylla. The series will look at the good and the bad properties of each compaction strategy, and how to choose the best compaction strategy for your workload. This first post will focus on Scylla’s default compaction strategy, size-tiered compaction strategy.