Project Circe August Update

By Tzach Livyatan

September 1, 2021

Project Circe is ScyllaDB’s year-long initiative to improve ScyllaDB consistency and performance. Today we’re sharing our updates for the month of August 2021.

Toward ScyllaDB “Safe Mode”

ScyllaDB is a very powerful tool, with many features and options we’ve added over the years, some of which we modeled after Apache Cassandra and DynamoDB, and others that are unique to ScyllaDB. In many cases, these options, or a combination of them, are not recommended to run in production. We don’t want to disable or remove them, as they are already in use, but we also want to move users away from them. That’s why we’re introducing ScyllaDB Safe Mode.

Safe Mode is a collection of reservations that make it harder for the user to use non-recommended options in production.

Some examples of Safe Mode that we added to ScyllaDB in the last month:

It’s now possible to prevent users from using SimpleReplicationStrategy. The goal is to first default to warning and then default to actual prevention. SimpleReplicationStrategy can make it hard to later grow the cluster by adding data centers.
Warn or prevent usage of DateTieredCompactionStrategy. It has long since been deprecated in favor of TimeWindowCompactionStrategy.
Disable Thrift by default
Ensure that all nodes use the same snitch mode

More Safe Mode restrictions are planned.

Performance Tests

We are constantly improving ScyllaDB performance, but other projects are, of course, doing the same. So it’s interesting to run updated benchmarks comparing performance and other attributes. In a recent 2-part blog series we compared the performance of ScyllaDB with the latest release of its predecessor, Apache Cassandra.

Raft

We continue our initiative to combine strong consistency and high performance with Raft. Some of the latest updates:

Raft now has its own experimental flag in the configuration file: “experimental: raft”
The latest Raft pull request (PR) adds a Group 0 sub-service, which includes all the members of the clusters, and allows other services to update topology changes in a consistent, linearizable way.
This service brings us one step closer to strong consistent topology changes in ScyllaDB.
Followup services will have consistent schema changes, and later data changes (transactions).

20% Projects

You might think that working on a distributed database in cutting edge C++ is already a dream job for most developers, but the ScyllaDB dev team allocates 20% of their time to personal projects.

One such cool project is when Piotr Sarna made a PR for adding WebAssembly to user-defined functions (UDF). While still in very early stages, this has already initiated an interesting discussion in the comment thread.

User Defined Functions are an experimental feature in ScyllaDB since the 3.3 release. We originally supported Lua functions, and have now extended to WASM. More languages can be added in the future.

Below is an example taken from the PR, of a CQL command to create a simple WASM fibonacci function:


CREATE FUNCTION ks.fibonacci (str text)
    CALLED ON NULL INPUT
    RETURNS boolean
    LANGUAGE wasm
    AS ' (module
        (func $fibonacci (param $n i32) (result i32)
            (if
                (i32.lt_s (local.get $n) (i32.const 2))
                (return (local.get $n))
            )
            (i32.add
                (call $fibonacci (i32.sub (local.get $n) (i32.const 1)))
                (call $fibonacci (i32.sub (local.get $n) (i32.const 2)))
            )
        )
        (export "fibonacci" (func $fibonacci))
    ) '

More on the great potential of UDF in ScyllaDB in a talk by Avi

Some Cool Additions to Git Master

These updates will be merged into upcoming ScyllaDB releases, primarily ScyllaDB Open Source 4.6

Repair-based node operations are now enabled by default for the replacenode operation. Repair-based node operations use repair instead of streaming to transfer data, making it resumable and more robust (but slower). A new parameter defines which node operations use repair. (Learn more)
User-Defined Aggregates (UDA) have been implemented. Note UDA is based on User Defined Function (UDF) which is still an experimental feature
If ScyllaDB stalls while reclaiming memory, it will now log memory-related diagnostics so it is easier to understand the root cause.
After adding a node, a cleanup process is run to remove data that was copied to the new node. This is a compaction process that compacts only one SSTable at a time. This fact was used to optimize cleanup. In addition, the check for whether a partition should be removed during cleanup was also improved.
When ScyllaDB starts up, it checks if all SSTables conform to the compaction strategy rules, and if not, it reshapes the data to make it conformant. This helps keep reads fast. It is now possible to abort the reshape process in order to get ScyllaDB to start more quickly.
ScyllaDB uses reader objects to read sequential data. It caches those readers so they can be reused across multiple pages of the result set, eliminating the overhead of starting a new sequential read each time. However, this optimization was missed for internal paging used to implement aggregations (e.g. SUM(column)). ScyllaDB now uses the optimization for aggregates too.
The row cache behavior was quadratic in certain cases where many range tombstones were present. This has been fixed.
The installer now offers to set up RAID 5 on the data disks in addition to RAID 0; this is useful when the disks can have read errors, such as on GCP local disks.
The install script now supports supervisord in addition to systemd. This was brought in from the container image, where systemd is not available, and is useful in some situations where root access is not available.
A limitation of 10,000 connections per shard has been lifted to 50,000 connections per shard, and made tunable.
The docker image base has been switched from CentOS 7 to Ubuntu 20.04 (like the machine images). CentOS 7 is getting old.
The SSTableloader tool now supports Zstd compression.
There is a new type of compaction: validation. A validation compaction will read all SSTables and perform some checks, but write nothing. This is useful to make sure all SSTables can be read and pass sanity checks.
SSTable index files are now cached, both at the page level and at an object level (index entry). This improves large partition workloads as well as intermediate size workloads where the entire SSTable index can be cached.
It was found that the very common single-partition query was treated as an IN query with a 1-element tuple. This caused extra work to be done (to post-process the IN query). We now specialize for this common case and avoid the extra post-processing work.

Monitoring News

ScyllaDB Monitoring Stack continues to move forward fast.

We continue to invest in ScyllaDB Advisor, which takes information from ScyllaDB and OS level metrics (via Prometheus) and Logs (via Loki), combining them using policy rules to advise the user on what he should look at, in a production system.

For example ScyllaDB Monitoring Stack 3.8 now warns about prepared-statements cache eviction.

Other August Releases

ScyllaDB Manager 2.5 is out with improvements for backups and other features (checkout the release notes)
ScyllaDB Operator 1.4 is out with performance, stability and new features (checkout the release notes)
ScyllaDB Monitoring Stack 3.8.2 added a dashboard for ScyllaDB Manager 2.5, as well as bug fixes.