ScyllaDB Open Source 3.1.0

The ScyllaDB team is pleased to announce the availability of ScyllaDB Open Source 3.1.0, a production-ready release of our open source NoSQL database.

ScyllaDB Open Source 3.1 release brings major new features including, Row Level Repair, CQL enhancements, and new nodetool operations.

ScyllaDB is an open source, Apache-Cassandra-compatible NoSQL database, with superior performance and consistently low latencies. Find the ScyllaDB Open Source 3.1 repository for your Linux distribution here.

Please note that only the last two minor releases of ScyllaDB Open Source project are supported. Starting today, only ScyllaDB Open Source 3.1 and 3.0 will be supported; ScyllaDB Open Source 2.3 will no longer be supported.

Related Links

Deployment

I3en Instance Support: ScyllaDB Open Source 3.1 is now also optimized to work with the new AWS EC2 I3en server family.

Ubuntu and Debian Version Support Changes: ScyllaDB Open Source 3.1 is not available on Ubuntu 14.04 or Debian 8. If you are using ScyllaDB with either of these OS/versions please upgrade to Ubuntu 16.04 or 18.04 or Debian 9 accordingly.

Relocatable packaging
ScyllaDB Open Source packaging has been overhauled. The changes are internal to the build system and should have little effect on users.

  • Binaries are built once per release, instead of once per release per OS distribution. This reduces dissimilarities between OS distributions, increases the reliability of less popular distributions, and aids in debugging.
  • Dependencies are bundled in the package file. This simplifies installation for users that are disconnected from the Internet (offline installation), as they can download and install one file instead of having to collect numerous dependencies.
  • The changes also lay the groundwork for rootless and full offline install modes in a future release.

New Features in ScyllaDB Open Source 3.1

SSTable “mc” format is enabled by default. Apache Cassandra 3.x SSTable format (mc) has been available in ScyllaDB Open Source since release 3.0. In ScyllaDB Open Source 3.1, mc formatted tables are enabled by default. Note that you can continue to use the old file format by setting enable_sstables_mc_format: false in scylla.yaml.

  • More on the mc format here

CQL

Add CQL PER PARTITION LIMIT

For example:

SELECT * FROM users PARTITION LIMIT 2;

  • See here for using LIMIT for SELECTs

BYPASS CACHE clause #3770
The new BYPASS CACHE clause on SELECT statements informs the database that the data being read is unlikely to be read again in the near future, and also was unlikely to have been read in the near past; therefore no attempt should be made to read it from the cache or to populate the cache with the data. This is mostly useful for range scans which typically process large amounts of data with no temporal locality and do not benefit from the cache.

For example:

SELECT * from heartrate BYPASS CACHE;

If you are using ScyllaDB Monitoring Stack, you can use the Cache section of the ScyllaDB Per Server dashboard, to see the effect of the BYPASS CACHE command on the cache hit and miss ratio.

  • More on BYPASS CACHE here

ALLOW FILTERING enhancements

  • Support multi-column restrictions in ALLOW FILTERING #3574
    SELECT * FROM t WHERE (c, d) IN ((1, 2), (1,3), (1,4)) ALLOW FILTERING;
    SELECT * FROM t WHERE (c, d) < (1, 3) ALLOW FILTERING;
    SELECT * FROM t WHERE (c, d) < (1, 3) AND (c, d) > (1, 1) ALLOW FILTERING;
  • Support restricted column, not in select clause #3803
    CREATE TABLE t (id int primary key, id_dup int);
    SELECT id FROM t WHERE id_dup = 3 ALLOW FILTERING;

    The restriction is applied to a column, which is not in the select clause. In prior versions this command returned incorrect results, but now it works as intended

  • Support CONTAINS restrictions #3573
    CREATE TABLE t (p frozen<map<text, text>>, c1 frozen<list>, c2 frozen<set>, v map<text, text>, id int, PRIMARY KEY(p, c1, c2));
    SELECT id FROM t WHERE p CONTAINS KEY 'a' ALLOW FILTERING;
    SELECT id FROM t WHERE c1 CONTAINS 3 ALLOW FILTERING;
    SELECT id FROM t WHERE c2 CONTAINS 0.1 ALLOW FILTERING;
    SELECT id FROM t WHERE v CONTAINS KEY 'y1' ALLOW FILTERING;
    SELECT id FROM t WHERE v CONTAINS KEY 'y1' AND c2 CONTAINS 3 ALLOW FILTERING;
    

CQL: Group functions count now works with bytes type #3824

For example:

create table t (p int, c int, v boolean, primary key (p, c));
select count(v) from t where p = 1 and c = 1;

Local Secondary Index #4185
Local Secondary Indexes is an enhancement to Global Secondary Indexes, which allows ScyllaDB to optimize workloads where the partition key of the base table and the index are the same key.

Tooling

Large cell / collection detector
ScyllaDB is not optimized for very large rows or large cells. They require allocation of large, contiguous memory areas and therefore may increase latency. Rows may also grow over time. For example, many insert operations may add elements to the same collection, or a large blob can be inserted in a single operation.

Similar to the large partitions table, the large rows and large cells tables are updated when SSTables are written or deleted, for example, on memtable flush or during compaction.

Examples for using the new Large Rows and Large Cells tables:

SELECT * FROM system.from system.large_rows;
SELECT * FROM system.from system.large_cells;

nodetool toppartitions #2811
Samples cluster writes and reads and reports the most active partitions in a specified table and time frame. For example:

> nodetool toppartitions nba team_roster 5000
WRITES Sampler:
  Cardinality: ~5 (256 capacity)
  Top 10 partitions:
         Partition Count +/-
         Russell Westbrook 100 0
         Jermi Grant 25 0
         Victor Oladipo 17 0
         Andre Roberson 1 0
         Steven Adams 1 0
READS Sampler:
  Cardinality: ~5 (256 capacity)
  Top 10 partitions:
         Partition Count +/-
         Russell Westbrook 100 0
         Victor Oladipo 17 0
         Jermi Grant 12 0
         Andre Roberson 5 0
         Steven Adams 1 0

Nodetool upgradesstables #4245

Rewrites SSTables for keyspace/table to the latest ScyllaDB version. Note that this is *not* required when enabling mc format, or upgrading to a newer ScyllaDB version. In these cases, ScyllaDB will write new SSTable, either in memtable flush or compaction, while keeping the old tables in the old format.

nodetool upgradesstables ( -a | --include-all-sstables ) -- <keyspace> <table> …

By default, the command only rewrites SSTables which are *not* of the latest release, -a | –include-all-sstables option can by use to rewrite *all* the sstables.

  • More on Nodetool upgradesstables here
  • Note: The default node_exporter installed from scylla_setup is now 0.17 (updated from 0.14)

Performance

Row Level Repair #3033

In partition-level repair (the algorithm used in ScyllaDB Open Source 3.0 and earlier), the repair master node splits the ranges to sub-ranges containing 100 partitions, and computes the checksum of those 100 partitions and asks the related peers to do the same.

  • If the checksum matches, the data in this subrange is synced, and no further action is required.
  • If the checksum mismatches, the repair master fetches the data from all the peers and sends back the merged data to peers.

This approach has two major problems:

  1. A mismatch of only a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large, even hundreds of MBs. 100 partitions can be way over one gigabyte of data.
  2. Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice

To fix the two issues above we introduce the new Row-Level Repair. Row-level repair works on a small range which contains only a few rows (a few megabytes of data at most), reads these rows to memory, finds the mismatches and sends them to the peers. By that, it only reads the data once, and significantly reduces the data volume stream for each row mismatch.In a benchmark done on a three ScyllaDB nodes cluster, on AWS using i3.xlarge instance, each with 1 billion Rows (241 GiB of data), we tested three use cases:

 

Use case Description Time to repair Improvement
ScyllaDB Open Source 3.0 ScyllaDB Open Source 3.1
0% synched One of the nodes has zero data. The other two nodes have 1 billion identical rows. 49.0 min 37.07 min x1.32 faster
100% synched All of the 3 nodes have 1 billion identical rows. 47.6 min 9.8 min x4.85 faster
99.9% synched Each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. 96.4 min 14.2 min x6.78 faster

The new row-level repair shines where a small percent of the data is out of sync – the most likely use case in case of short network issues or a node restart.

For the last use case, the bytes sent over the wire:

ScyllaDB 3.0 ScyllaDB 3.1 Transfer data ratio
TX 120.52 GiB 4.28 GiB 3.6%
RX 60.09 GiB 2.14 GiB 3.6%

As expected, where the actual difference between nodes is small, sending just relevant rows, not 100 partitions at a time, makes a huge difference.

  • More on row level repair implementation and results here

IOCB_CMD_POLL support

On Linux 4.19 or higher, ScyllaDB will use a new method of waiting for network events, IOCB_CMD_POLL.

The new interface is detected and used automatically. To use the old interface, add the command line option

--reactor-backend=epoll

Materialized Views improvements

  • Minimize generated view updates for unselected column updates #3819
  • Indexing within a partition is inefficient #4185

Move truncation records to separate table (#4083)

Metrics Updates from ScyllaDB 3.0 to ScyllaDB 3.1

ScyllaDB Monitoring Stack release 3.0 includes ScyllaDB 3.1 dashboards.

  • All metrics changed for ScyllaDB 3.1 are here

16 October 2019