Interested in contributing code to a framework that provides Scylla and other programs with high-throughput I/O and networking? The Scylla team is pleased to announce that the Seastar framework has been accepted as a Google Summer of Code organization. Google Summer of Code with the Seastar project provides students with the opportunity to spend their summer break contributing to an awesome open source project, work under the mentorship of dedicated, brilliant engineers, and in addition receiving a stipend when the project milestones are met.
Seastar provides a programming environment that abstracts away most of the problems of multi-threaded programming using a thread-per-core model. Locks, atomic variables, memory barriers, lock-free programming, and all of the scaling and complexity that come from them are gone. In their place, Seastar provides a single facility for inter-core communications. This is, of course, great for the developer, who can easily utilize many-core machines, but there is also another side: because Seastar takes care of all inter-core communications, it can apply advanced optimizations to these communications.
This article examines these optimizations and some of the complexity involved.
This is the second post in a series of four about the different compaction strategies available in Scylla. In the previous post, we introduced the Size-Tiered compaction strategy (STCS) and discussed its most significant drawback – its disk-space waste, a.k.a. space amplification. In this post, we will look at Leveled Compaction Strategy (LCS), the first alternative compaction strategy designed to solve the space amplification problem of STCS, and show that it does solve that problem, but unfortunately introduces a new problem – write amplification. The next post in this series will introduce a new compaction strategy, Hybrid Compaction Strategy, which […]
This is the first post in a series of four about the different compaction strategies available in Scylla. The series will look at the good and the bad properties of each compaction strategy, and how to choose the best compaction strategy for your workload. This first post will focus on Scylla’s default compaction strategy, size-tiered compaction strategy.
This is a cross-post from https://www.alexgallego.org/concurrency/smf/2017/12/16/future.html. On June 8, 2016, Avi Kivity came to NYC to present ScyllaDB. During his search for a quick open desk to do some work, I volunteered open spaces we had at Concord1. We talked lock-free algorithms, memory reclamation techniques, threading models, Concord and distributed streaming engines, even C vs C++. Five hours later I was convinced that Seastar was the best systems framework I’d ever come across.
Before organizations go into production with Scylla, they must ensure that they are getting the best possible performance so their applications and services will run optimally. One of the many ways to optimize your Scylla deployment is to choose the right compaction strategy. One of the more popular talks at Scylla Summit 2017 was on this subject. Based on that talk, I will explain what compaction is and then explore the different strategies available in Scylla.
What’s the deal with prepared statements? A query itself is just a string of text. For example: INSERT INTO tb (key,val) VALUES (“key”, “value”) In this simple example, we inserted two strings in a two-column table. Before that can happen, the CQL statement string (INSERT INTO…) needs to be sent to Scylla, parsed, and assuming no errors in the query, executed. It’s the parsing part that we are concerned with here. Parsing a CQL query is a compute-intensive operation that consumes resources just like anything else you would have a computer do. What if we could do the parsing part […]
What is a User Defined Type? (UDT)? User Defined Types (UDTs) allow a definition of struct that includes multiple typed named fields (including other UDTs). Once a UDT is defined, it can be used as a column type in a table definition. In Scylla, you can define a Column as a frozen<UDT>.
The data model in Scylla and Apache Cassandra partitions data between cluster nodes using a partition key, which is defined by the database schema. Using a partition key provides an efficient way to look up rows using the partition key because you can find the node that owns the row by hashing the partition key. Unfortunately, this also means that finding a row using a non-partition key requires a full table scan which is inefficient. Secondary Indexes are a mechanism in Apache Cassandra that allows efficient searches on non-partition keys by creating an index.
When most server application developers think of I/O, they consider network I/O since most resources these days are accessed over the network: databases, object storage, and other microservices. The developer of a database, however, also has to consider file I/O. This article describes the available choices and their tradeoffs and why Scylla chose asynchronous direct I/O (AIO/DIO) as its access method.