Scylla Open Source 3.2.0

The Scylla team is pleased to announce the availability of Scylla Open Source 3.2.0, a production-ready release of our open source NoSQL database.

Scylla Open Source 3.2 brings major new features including IPv6 support, CQL GROUP BY and LIKE operations, open range deletions, other CQL enhancements, Zstandard compression, and stability improvements.

Scylla 3.2 also includes major new experimental features, like Lightweight Transactions (LWT) using the Paxos consensus algorithm, Change Data Capture (CDC) and an Amazon DynamoDB Compatible API (known as “Alternator”). These features are experimental and you are welcome to try them out and share feedback. Once stabilized they will be promoted to General Availability (GA) in a followup Scylla release (read more below).

Scylla is an open source, Apache-Cassandra-compatible NoSQL database with superior performance and consistently low latencies. Find the Scylla Open Source 3.2 repository for your Linux distribution here.

Please note that only the last two minor releases of Scylla Open Source project are supported. Starting today, only Scylla Open Source 3.2 and 3.1 will be supported; Scylla Open Source 3.0 will no longer be supported.

Related Links

New features in Scylla 3.2

CQL: support for GROUP BY to Select statement #2206

Example:

SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1;

CQL: LIKE Operation #4477

The new CQL LIKE keyword allows matching any column to a search pattern, using % as a wildcard. Note that LIKE only works with ALLOW FILTERING.

LIKE Syntax support:
'_' matches any single character
'%' matches any substring (including an empty string)
'\' escapes the next pattern character, so it matches verbatim
any other pattern character matches itself
an empty pattern matches empty text fields

For example:
> INSERT INTO t (id, name) VALUES (17, ‘Mircevski’)
> SELECT * FROM t where name LIKE 'Mirc%' allow filtering

CQL: Support open range deletions in CQL #432

Allow range deletion based on the clustering key, similar to range query with SELECT

Example:

CREATE TABLE events ( id text, created_at date, content text, PRIMARY KEY (id, created_at) );
DELETE FROM events WHERE id='concert' AND created_at <= '2019-10-31';

CQL: Auto-expand replication_factor for NetworkTopologyStrategy #4210

Allowing users to set Replication Factor (RF) for all existing Data Centers (DCs).

Example:

CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}

Note than when adding new data centers, you need to set the RF again to include the added DCs. You can again use the auto-expansion option:

ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3}

Or you can set RF for the new DC manually (if you want a different RF, for example), but remember to keep the old DCs’ replication factors:

ALTER KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'new-dc' : 5}

CQL: Support non-frozen UDTs #2201

Non-frozen User Defined Types (UDTs), similarly to non-frozen collections, allow updating single fields of a UDT value without knowing the entire value. For example:

CREATE TYPE ut (
a int,
b int
);
CREATE TABLE tbl (
pk int PRIMARY KEY,
  val ut
);
INSERT INTO tbl (pk, val) VALUES (0, {a:0, b:0});
UPDATE tbl SET val.b = 1 WHERE pk = 0;

CQL: CQL Tracing now include reports on SSTable access #4908

Disk I/O has a strong effect on query latency, but until now, it was not reported in CQL tracing.
This change introduces relevant entries to the CQL trace whenever SSTables are read from disk.

Example:

Support IPv6 for client-to-node and node-to-node communication #2027

Scylla now supports IPv6 Global Scope Addresses for all IPs: seeds, listen_address, broadcast_address etc. Note that Scylla Monitoring stack 3.0 and above supports IPv6 addresses as well, and IPv6 support for Scylla Manager is coming soon (Scylla Manager 2.0.1). Make sure to enable enable_ipv6_dns_lookup in scylla.yaml (see below)

Example from scylla.yaml

enable_ipv6_dns_lookup: true
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "fcff:69:46::7b"
listen_address: fcff:69:46::7b
broadcast_rpc_address: fcff:69:46::7b
rpc_address: fcff:69:46::7b

  • Documentation for IPv6

Scylla Docker node_exporter

node_exporter is now part of the Scylla Docker, making it easier to consume OS-level metrics when working with Docker and Scylla Monitoring

Stability: improve CQL server admission control #4768

Current admission control takes a permit when CQL requests start and releases it when a reply is sent, but some requests may leave background work behind after that point. In Scylla 3.2, admission control takes into account these background tasks, and improves the way Scylla handles overload conditions.

Performance: Segregate data by timestamp for Time Window Compaction Strategy (TWCS) #2687

Isolate TWCS from read repair or compaction with an imported SSTable, which may ruin the TWCS file per time range property.

Performance: add ZSTD compression #2613

Adds support for the option to compress SSTables using the Zstandard algorithm from Facebook.
To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the compression argument when creating a table.
You can also specify a compression_level (default is 3). See Zstd documentation for the available compression levels.

Also read our two part blog series on Compression in Scylla: Part One and Part Two.

Example:
create table a.a (a text primary key, b int) with compression = {'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': 1, 'chunk_length_in_kb': '64'};

Performance: Repair improvements in Scylla 3.2:

  • Improved data transfer efficiency between nodes. Repair is now switched to use the Seastar RPC stream interface which is more efficient to transfer large amounts of repair data #4581. Read our blog on RPC streaming improvements.
  • Increased the internal row buffer size to provide better performance on cross DC clusters with high latency links. #4581
  • Repair can now adjust ranges to repair in parallel according to memory size #4696

Metric changes Scylla 3.1 to Scylla 3.2

Full list of metrics is available here

For Scylla 3.2 Dashboard, pull the master branch of the Scylla Monitoring Stack

New Experimental features in Scylla 3.2

Scylla 3.2 enables a new per feature experimental list. #5338

While previous versions of Scylla allow setting experimental: [true|false] in scylla.yaml, Scylla now supports a per feature experimental parameter, allowing you to enable a subset of the experimental features of the release. For example, here is how to enable only CDC and LWT in scylla.yaml:
experimental_features:
- cdc
- lwt

Note: both LWT and CDC are at an early stage and will likely change until they become production ready. Forward compatibility (ability to upgrade) for these features is *not* guaranteed.

Lightweight Transactions (LWT) [Experimental] #1359

Lightweight transactions (LWT), also known as Compare and Set (CAS), add support for conditional INSERT and UPDATE CQL command. Scylla supports both equality and non-equality conditions for lightweight transactions (i.e., you can use <, <=, >, >=, != and IN operators in an IF clause).

Example:

INSERT INTO ks.names (id, name) VALUES (17, ‘kostja') IF NOT EXISTS;
UPDATE ks.names SET name = 'Gleb’ WHERE id = 17 IF name = ‘Vladimir’;

Under the hood, Scylla uses Paxos as a consensus algorithm to implement a safe read-before-write for LWT. Note that LWT has an impact on performance, as the Paxos algorithm requires more message passing compared to eventually consistent updates. Expect LWT performance to improve with followup releases.

Change Data Capture (CDC) [Experimental] #4985

Change Data Capture is a feature that enables users to monitor data in a selected table and asynchronously consume the change events. You can enable CDC per table.

CDC implementation in Scylla is significantly different from Apache Cassandra. With Scylla’s CDC, you can choose to keep track of the updates, original values and/or new values. The data is stored in regular Scylla tables (SSTables) and can be queried asynchronously using a standard Scylla/Cassandra CQL driver. Please refer to this presentation regarding the way Scylla implemented CDC.

Data in a CDC table is set with a TTL, minimizing the possibility of an overflow.

Example:

CREATE TABLE base_table (
pk text,
ck text,
val1 text,
val2 text,
PRIMARY KEY (pk, ck)
) WITH cdc = { ‘enabled’ = ‘true’, preimage = ‘true’ };

Note that CDC has an impact on performance, as each update to a CDC enabled table is written to two tables (the original table and the CDC table). CDC is still under continuous development and the API and syntax are likely to change.

Scylla Alternator: The Open Source DynamoDB-compatible API [Experimental]

Project Alternator is an open-source implementation for an Amazon DynamoDB™-compatible API. The goal of this project is to deliver an open source alternative to Amazon’s DynamoDB, deployable wherever a user would want: on-premises, on other public clouds like Microsoft Azure or Google Cloud Platform, or still on AWS (for users who wish to take advantage of other aspects of Amazon’s market-leading cloud ecosystem, such as the high-density i3en instances). DynamoDB users can keep their same client code unchanged. Alternator is written in C++ and is a part of Scylla 3.2.

21 Jan 2020