The Scylla team is pleased to announce the release of Scylla 1.5, a production-ready Scylla minor release. Scylla is an open source NoSQL database compatible with Apache Cassandra, with superior performance and consistently low latency. From now on, critical bugs will be fixed in 1.5 and 1.4 release series only. If you are still using open source Scylla 1.3 or earlier release – you are encouraged to upgrade. We will continue to fix bugs and add features on the master branch toward 1.6 and beyond.
Scylla 1.5 focused on stability, bug fix, and workload conditioning.
Scylla 1.5 takes another step into workload conditioning, Scylla ability to self-tune to better handle various user workload. In this release, Scylla is capable of automatically reducing the rate of requests it accepts if the disk is not fast enough to write back memtables and commitlog entries. The rate is automatically determined to be the highest rate possible that still allow the resources to keep up. More on workload conditioning.
A script that parses scylla.yaml and tunes the data file and commitlog directories it references. Tuning includes:
- Disable Linux I/O scheduler by setting it to noop. Scylla already reorders requests using its own I/O scheduler; further reordering will only reduce throughput and increase latency.
- Disable Linux I/O scheduler merging. Merging multiple I/O requests to into a single I/O request may hurt Scylla latency.
Noteworthy bug fixes
- Range scans, often used by analytics tool like Spark, issue unnecessary parallel queries making them needlessly slow #1863 (this fix is part of 1.4.2)
- when the ‘nodetool compact’ command is issued, in some circumstances, sstable file descriptors may stay open after deletion (by compaction), causing the disk space to run out #1840
nodetool inforeturns a negative cache capacity value #1801
- snapshot operation may not release all of its memory back to Scylla, eventually causing an out of memory and exit #1831
- CQL: Using SELECT DISTINCT queries with paging can return duplicate results #1822
- Histogram and moving average metrics returning wrong values which propagate to REST, JMX API like org.apache.cassandra.metrics.Write.Latency #1832, #1836,#1837
- Scylla may not start on large machine (with many cores) when a column family has many sstables #1812
- CQL: Selecting the same column twice will trigger an assert #1367
- Relatively high 99th percentile latencies during compaction on Ubuntu, caused by different default clock configuration #1794
- When the cache is disabled (non-default), concurrent reads and writes to the same partition may result in a leak, eventually leading to a crash #1753
- Scylla may exit with a core dump during service shutdown #1835
- When Scylla restarts with large amount of data on disk, it can take a very long time until Scylla starts to accept requests #1856
- A bug in the latest version of Systemd stops Scylla from starting on CentOS 7.3 #1846
- Wrong memtable flushing criteria and timing may hurt Scylla performance under pressure #1918, #1919
- Scylla may exit on the init phase when reading unrecognized sstable component #1922
Noteworthy new and updated metrics in Scylla 1.5
database_total_operations_requests_blocked_memory– a new monotonic counter of replica writes which were delayed due to too much dirty (unflushed) memory. Typically that is caused by disk not keeping up with the write rate. Helps in correlating increased write latency with this cause of stall.
commitlog_total_operations_requests_blocked_memory– total over time amount of requests blocked in the commitlog write path due to insufficient disk speed. Part of workload conditioning.
commitlog_queue_length_pending_allocations– the amount of requests currently blocked in the commitlog write path due to insufficient disk speed. Part of workload conditioning.
The following metrics was made obsolete by the workload conditioning