May21

Scylla Manager 1.4 Release Announcement

Share on facebook
Share on twitter
Share on print
Share on email
Share on linkedin

Subscribe to Our Blog

Scylla Manager Release Note

The Scylla Enterprise team is pleased to announce the release of Scylla Manager 1.4, a production-ready release of Scylla Manager for Scylla Enterprise and Scylla Open-Source customers.

This release contains enhancements to the repair process and simplifies configuring SSH connectivity with Scylla nodes. It also improves user experience working with sctool – the Scylla Manager CLI.

Useful Links

Changed scyllamgr_ssh_setup script

The script to simplify setting up Scylla Manager connectivity to Scylla nodes has been updated. Discovery mode (used to find the IPs of all the nodes in the cluster, just by listing one of the nodes) is now the default behavior. There is a new --single-node flag that one can use to run the setup script only on the nodes listed.

The --manager-user flag that specified the user name to be created has been renamed to --create-user (default is scylla-manager), and the --manager-identity-file flag has been dropped. All Scylla Manager generated keys are stored in ~/.ssh/ directory.

The scyllamgr_ssh_test script has been integrated into scyllamgr_ssh_setup as a final validation step. This script checks that there is SSH connectivity using the key and password which the scyllamgr_ssh_setup created.

scyllamgr_ssh_setup -u root -i ~/root.pem 192.168.100.11
> Generating a new SSH key pair ('/home/user/.ssh/scylla-manager.pem' '/home/user/.ssh/scylla-manager.pem.pub')
> Discovering all the cluster nodes
> Creating user 'scylla-manager'
192.168.100.11 OK
192.168.100.12 OK
192.168.100.13 OK
192.168.100.21 OK
192.168.100.22 OK
192.168.100.23 OK
> Testing Scylla API connectivity
192.168.100.11 OK
192.168.100.12 OK
192.168.100.13 OK
192.168.100.21 OK
192.168.100.22 OK
192.168.100.23 OK
> Done!
> To add the cluster to Scylla Manager run: sctool cluster add --host 192.168.100.11 --ssh-user 'scylla-manager' --ssh-identity-file '/home/user/.ssh/scylla-manager.pem'

Full reference of the new scyllamgr_ssh_setup here.

Repair

Dry Run Mode

We have added a dry-run mode to the sctool repair command, which lets you inspect how the keyspace and datacenter patterns are evaluated when executing a repair. It validates and prints repair information without actually scheduling the repair.

sctool repair -c prod-cluster -K *.data --dc us_* --dry-run
NOTICE: dry run mode, repair is not scheduled

Data Centers:
  - us_east
  - us_west
Keyspace: banana
  - data
Keyspace: orange
  - data

Replication Strategy Aware

Repair is now replication strategy aware and will automatically skip keyspaces with LocalStrategy replication strategy.

In addition, we enabled system tables repair which was skipped in previous Scylla Manager versions.

Improved support for multi-datacenter cluster repair

We added support for repairing keyspaces that use SimpleStrategy replication strategy in multi-datacenter clusters. In previous versions of Scylla Manager repairing such keyspaces would result in a repair failure even if the other keyspaces were correctly setup. In normal circumstances, Scylla Manager selects the closest datacenter that contains all the tokens for a given keyspace and uses nodes in that datacenter as coordinator nodes for repairing all the nodes in the cluster. On the other hand, when a keyspace using SimpleStrategy is detected we invoke primary-ranges repair on all the nodes in the cluster.

Scylla Manager now uses Scylla API to terminate streaming of a stopped repair

To avoid overloading the cluster, repair does not start if there is another repair running. It does not matter if the repair was started by nodetool, another Scylla Manager instance, or a different tool.

Users can now limit the number of shards repaired simultaneously on a host to further slow down repair and reduce the load on the cluster. By default, all the shards are repaired in parallel. The setting is global and can be set in the Scylla Manager configuration file.

Full reference of sctool repair here.

Repair progress

The sctool task progress command output has been changed to show more relevant information. In addition, the general task information is now shown in a key-value format for better readability. Repair task arguments are now shown on the top. In case of an error, an error causation statement is displayed providing better understanding of what went wrong.

sctool task progress -c test-cluster repair/953b8f67-ac5e-411d-8964-47488964deaf
Arguments: -K foo,bar,zoo,!zoo.horse
Status: ERROR
Cause: failed to load units: no matching units found for filters, ks=[foo.* bar.* zoo.* !zoo.horse]
Start time: 22 Mar 19 14:31:56 CET
End time: 22 Mar 19 14:31:56 CET
Duration: 0s
Progress: 0%

Users are informed about the failed segments for each individual keyspace, host, and shard. In case of error, the progress information is displayed as success%/failed% instead of only success%.

For example:

sctool task progress -c test-cluster repair/eb1cc148-ba35-47b8-9def-ed0e7d335c24
Arguments:
Status:      RUNNING
Start time:  22 Mar 19 14:40:27 CET
Duration:    10m40s
Progress:    63.04% / 0.02%
Datacenters:
  - dc1
  - dc2
╭───────────────────────┬────────────────╮
│ system_auth           │ 100.00%        │
│ system_distributed    │ 100.00%        │
│ system_traces         │ 100.00%        │
│ test_keyspace_dc1_rf2 │ 100.00% .      │
│ test_keyspace_dc1_rf3 │ 100.00%        │
│ test_keyspace_dc2_rf2 │ 57.90% / 0.23% │
│ test_keyspace_dc2_rf3 │ 0.00%          │
│ test_keyspace_rf2 .   │ 0.00%          │
│ test_keyspace_rf3     │ 0.00%          │
╰───────────────────────┴────────────────╯

--run flag added

It’s now possible to show the progress of a particular historical run with the --run flag.

For example:

sctool task progress -c test-cluster repair/eb1cc148-ba35-47b8-9def-ed0e7d335c24 --run 52f08063-1683-49ac-8afc-ec32e284b5a5

You can display the run IDs by running the sctool task history command.

Filtering capabilities are added

-K, --keyspace flag has been added to sctool task progress to show the progress of only the specified keyspaces. It uses the same glob pattern matching engine that is used to schedule repairs [insert link here]. The --host flag can be used to show only the progress of specified coordinator hosts.

Full reference of sctool task progress here.

Task list

The sctool task list command shows the task arguments in a format that allows for copying and using them to schedule a new task. For tasks with an ERROR status, the number of remaining retries is printed next to the status. The error cause is removed from the task list can only be displayed using the sctool task progress command.

sctool task list -c prod-cluster
╭───────────────────────────────────────────────────────┬───────────────────────┬────────────────────────────────┬────────╮
│ task                                                  │ arguments             │ next run                       │ status │
├───────────────────────────────────────────────────────┼───────────────────────┼────────────────────────────────┼────────┤
│ healthcheck/49079cd0-5bea-4224-80be-ad12d7d5f742      │                       │ 20 May 19 12:06:16 CEST (+15s) │ DONE . │
│ repair/cf62041e-ae19-43ec-a114-14c8716edb2c           │ -K data_\*\!data_skip │ 20 May 19 12:06:24 CEST        │ NEW    │
│ healthcheck_rest/30d17e9e-4f11-4226-97a0-2616ca3c4b94 │                       │ 20 May 19 13:04:46 CEST (+1h)  │ DONE   │
│ repair/166e30ab-ce3a-45fc-a7fa-10c7c9169ad8           │                       │ 21 May 19 00:00:00 CEST (+7d)  │ NEW    │
╰───────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────┴────────╯

Full reference of the new sctool task list here.

Status

The sctool status command can be used to check the Scylla REST API status on the Scylla nodes, and help track the nodes that were not connected to SSH using the SSH setup script. A new column REST is added similar to the CQL column it shows the availability and latency of Scylla REST API. All the nodes should be in the UP state in order to let Scylla Manager operate with the cluster.

sctool status -c prod-cluster
Datacenter: dc1
╭──────────┬─────┬──────────┬────────────────╮
│ CQL      │ SSL │ REST     │ Host           │
├──────────┼─────┼──────────┼────────────────┤
│ UP (0ms) │ OFF │ UP (3ms) │ 192.168.100.11 │
│ UP (0ms) │ OFF │ UP (0ms) │ 192.168.100.12 │
│ UP (1ms) │ OFF │ UP (0ms) │ 192.168.100.13 │
╰──────────┴─────┴──────────┴────────────────╯
Datacenter: dc2
╭──────────┬─────┬──────────┬────────────────╮
│ CQL      │ SSL │ REST     │ Host           │
├──────────┼─────┼──────────┼────────────────┤
│ UP (2ms) │ OFF │ UP (0ms) │ 192.168.100.21 │
│ UP (3ms) │ OFF │ UP (0ms) │ 192.168.100.22 │
│ UP (8ms) │ OFF │ UP (0ms) │ 192.168.100.23 │
╰──────────┴─────┴──────────┴────────────────╯

Status metrics

There is a new task healthcheck_rest that by default runs every hour to check the cluster nodes REST API connectivity. The results are recorded as Prometheus metrics.

Full reference of the new sctool status command here.

Improved error reporting

Errors are propagated to sctool, within the interface users can see the full error as there is no “check in the logs” message. In the case of a 404 HTTP status users gets information on the kind of resource that was not found.

Michal MatczukAbout Michal Matczuk

Michał is a software engineer working on Scylla management. He's a Go enthusiast and contributor to many open source projects. He has a background in network programming. Prior joining ScyllaDB, he worked with StratoScale and NTT.


Tags: Release Note, release notes, Scylla Manager