ScyllaDB Manager Release 1.4

The ScyllaDB Enterprise team is pleased to announce the release of ScyllaDB Manager 1.4, a production-ready release of ScyllaDB Manager for ScyllaDB Enterprise and ScyllaDB Open-Source customers.

This release contains enhancements to the repair process and simplifies configuring SSH connectivity with ScyllaDB nodes. It also improves user experience working with sctool – the ScyllaDB Manager CLI.

Useful Links

Upgrade from ScyllaDB Manager 1.3 to ScyllaDB Manager 1.4
ScyllaDB Manager 1.4 documentation
Download ScyllaDB Manager – Enterprise Customers
Download ScyllaDB Manager – Open Source Users (limited to 5-node connections at any time)

Changed scyllamgr_ssh_setup script

The script to simplify setting up ScyllaDB Manager connectivity to ScyllaDB nodes has been updated. Discovery mode (used to find the IPs of all the nodes in the cluster, just by listing one of the nodes) is now the default behavior. There is a new --single-node flag that one can use to run the setup script only on the nodes listed.

The --manager-user flag that specified the user name to be created has been renamed to --create-user (default is scylla-manager), and the --manager-identity-file flag has been dropped. All ScyllaDB Manager generated keys are stored in ~/.ssh/ directory.

The scyllamgr_ssh_test script has been integrated into scyllamgr_ssh_setup as a final validation step. This script checks that there is SSH connectivity using the key and password which the scyllamgr_ssh_setup created.

scyllamgr_ssh_setup -u root -i ~/root.pem 192.168.100.11
> Generating a new SSH key pair ('/home/user/.ssh/scylla-manager.pem' '/home/user/.ssh/scylla-manager.pem.pub')
> Discovering all the cluster nodes
> Creating user 'scylla-manager'
192.168.100.11 OK
192.168.100.12 OK
192.168.100.13 OK
192.168.100.21 OK
192.168.100.22 OK
192.168.100.23 OK
> Testing ScyllaDB API connectivity
192.168.100.11 OK
192.168.100.12 OK
192.168.100.13 OK
192.168.100.21 OK
192.168.100.22 OK
192.168.100.23 OK
> Done!
> To add the cluster to ScyllaDB Manager run: sctool cluster add --host 192.168.100.11 --ssh-user 'scylla-manager' --ssh-identity-file '/home/user/.ssh/scylla-manager.pem'

Full reference of the new scyllamgr_ssh_setup here.

Repair

Dry Run Mode

We have added a dry-run mode to the sctool repair command, which lets you inspect how the keyspace and datacenter patterns are evaluated when executing a repair. It validates and prints repair information without actually scheduling the repair.

sctool repair -c prod-cluster -K *.data --dc us_* --dry-run
NOTICE: dry run mode, repair is not scheduled

Data Centers:
  - us_east
  - us_west
Keyspace: banana
  - data
Keyspace: orange
  - data

Replication Strategy Aware

Repair is now replication strategy aware and will automatically skip keyspaces with LocalStrategy replication strategy.

In addition, we enabled system tables repair which was skipped in previous ScyllaDB Manager versions.

Improved support for multi-datacenter cluster repair

We added support for repairing keyspaces that use SimpleStrategy replication strategy in multi-datacenter clusters. In previous versions of ScyllaDB Manager repairing such keyspaces would result in a repair failure even if the other keyspaces were correctly setup. In normal circumstances, ScyllaDB Manager selects the closest datacenter that contains all the tokens for a given keyspace and uses nodes in that datacenter as coordinator nodes for repairing all the nodes in the cluster. On the other hand, when a keyspace using SimpleStrategy is detected we invoke primary-ranges repair on all the nodes in the cluster.

ScyllaDB Manager now uses ScyllaDB API to terminate streaming of a stopped repair

To avoid overloading the cluster, repair does not start if there is another repair running. It does not matter if the repair was started by nodetool, another ScyllaDB Manager instance, or a different tool.

Users can now limit the number of shards repaired simultaneously on a host to further slow down repair and reduce the load on the cluster. By default, all the shards are repaired in parallel. The setting is global and can be set in the ScyllaDB Manager configuration file.

Full reference of sctool repair here.

Repair progress

The sctool task progress command output has been changed to show more relevant information. In addition, the general task information is now shown in a key-value format for better readability. Repair task arguments are now shown on the top. In case of an error, an error causation statement is displayed providing better understanding of what went wrong.

sctool task progress -c test-cluster repair/953b8f67-ac5e-411d-8964-47488964deaf
Arguments: -K foo,bar,zoo,!zoo.horse
Status: ERROR
Cause: failed to load units: no matching units found for filters, ks=[foo.* bar.* zoo.* !zoo.horse]
Start time: 22 Mar 19 14:31:56 CET
End time: 22 Mar 19 14:31:56 CET
Duration: 0s
Progress: 0%

Users are informed about the failed segments for each individual keyspace, host, and shard. In case of error, the progress information is displayed as success%/failed% instead of only success%.

For example:

sctool task progress -c test-cluster repair/eb1cc148-ba35-47b8-9def-ed0e7d335c24
Arguments:
Status:      RUNNING
Start time:  22 Mar 19 14:40:27 CET
Duration:    10m40s
Progress:    63.04% / 0.02%
Datacenters:
  - dc1
  - dc2
╭───────────────────────┬────────────────╮
│ system_auth           │ 100.00%        │
│ system_distributed    │ 100.00%        │
│ system_traces         │ 100.00%        │
│ test_keyspace_dc1_rf2 │ 100.00% .      │
│ test_keyspace_dc1_rf3 │ 100.00%        │
│ test_keyspace_dc2_rf2 │ 57.90% / 0.23% │
│ test_keyspace_dc2_rf3 │ 0.00%          │
│ test_keyspace_rf2 .   │ 0.00%          │
│ test_keyspace_rf3     │ 0.00%          │
╰───────────────────────┴────────────────╯

--run flag added

It’s now possible to show the progress of a particular historical run with the --run flag.

For example:

sctool task progress -c test-cluster repair/eb1cc148-ba35-47b8-9def-ed0e7d335c24 --run 52f08063-1683-49ac-8afc-ec32e284b5a5

You can display the run IDs by running the sctool task history command.

Filtering capabilities are added

-K, --keyspace flag has been added to sctool task progress to show the progress of only the specified keyspaces. It uses the same glob pattern matching engine that is used to schedule repairs [insert link here]. The --host flag can be used to show only the progress of specified coordinator hosts.

Full reference of sctool task progress here.

Task list

The sctool task list command shows the task arguments in a format that allows for copying and using them to schedule a new task. For tasks with an ERROR status, the number of remaining retries is printed next to the status. The error cause is removed from the task list can only be displayed using the sctool task progress command.

sctool task list -c prod-cluster
╭───────────────────────────────────────────────────────┬───────────────────────┬────────────────────────────────┬────────╮
│ task                                                  │ arguments             │ next run                       │ status │
├───────────────────────────────────────────────────────┼───────────────────────┼────────────────────────────────┼────────┤
│ healthcheck/49079cd0-5bea-4224-80be-ad12d7d5f742      │                       │ 20 May 19 12:06:16 CEST (+15s) │ DONE . │
│ repair/cf62041e-ae19-43ec-a114-14c8716edb2c           │ -K data_\*\!data_skip │ 20 May 19 12:06:24 CEST        │ NEW    │
│ healthcheck_rest/30d17e9e-4f11-4226-97a0-2616ca3c4b94 │                       │ 20 May 19 13:04:46 CEST (+1h)  │ DONE   │
│ repair/166e30ab-ce3a-45fc-a7fa-10c7c9169ad8           │                       │ 21 May 19 00:00:00 CEST (+7d)  │ NEW    │
╰───────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────┴────────╯

Full reference of the new sctool task list here.

Status

The sctool status command can be used to check the ScyllaDB REST API status on the ScyllaDB nodes, and help track the nodes that were not connected to SSH using the SSH setup script. A new column REST is added similar to the CQL column it shows the availability and latency of ScyllaDB REST API. All the nodes should be in the UP state in order to let ScyllaDB Manager operate with the cluster.

sctool status -c prod-cluster
Datacenter: dc1
╭──────────┬─────┬──────────┬────────────────╮
│ CQL      │ SSL │ REST     │ Host           │
├──────────┼─────┼──────────┼────────────────┤
│ UP (0ms) │ OFF │ UP (3ms) │ 192.168.100.11 │
│ UP (0ms) │ OFF │ UP (0ms) │ 192.168.100.12 │
│ UP (1ms) │ OFF │ UP (0ms) │ 192.168.100.13 │
╰──────────┴─────┴──────────┴────────────────╯
Datacenter: dc2
╭──────────┬─────┬──────────┬────────────────╮
│ CQL      │ SSL │ REST     │ Host           │
├──────────┼─────┼──────────┼────────────────┤
│ UP (2ms) │ OFF │ UP (0ms) │ 192.168.100.21 │
│ UP (3ms) │ OFF │ UP (0ms) │ 192.168.100.22 │
│ UP (8ms) │ OFF │ UP (0ms) │ 192.168.100.23 │
╰──────────┴─────┴──────────┴────────────────╯

Status metrics

There is a new task healthcheck_rest that by default runs every hour to check the cluster nodes REST API connectivity. The results are recorded as Prometheus metrics.

Full reference of the new sctool status command here.

Improved error reporting

Errors are propagated to sctool, within the interface users can see the full error as there is no “check in the logs” message. In the case of a 404 HTTP status users gets information on the kind of resource that was not found.