Scylla Manager 2.2

The Scylla Manager team is pleased to announce the release of Scylla Manager 2.2, a production-ready version of Scylla Manager for Scylla Enterprise and Scylla Open-Source customers. Scylla Manager is a centralized cluster administration and recurrent tasks automation tool.

Scylla Manager 2.2 brings improvements to Scylla Repair performance and adds backup to Google Cloud Storage (GCP), as well as other improvements and bug fixes.

Scylla Enterprise customers are encouraged to upgrade to Scylla Manager 2.2 in coordination with the Scylla support team.

The new release includes upgrades of both Scylla Manager Server, Manager CLI tool (sctool) and Scylla Manager Agent.

Useful Links:

Repair Improvements

Small table optimization

Scylla Manager breaks each table repair task to multiple small chunks. While this is useful for large tables, it proves to add redundant time when repairing small tables, for example system tables, resulting in a long repair time even for clusters with very little data.

In Manager 2.2, tables below configurable threshold (1GB by default) will be treated as small and repaired in a single repair command; this may affect production performance depending on the intensity of the operations.

Parallel repair

Manager 2.2 automatically breaks repair tasks to parallel tasks, running independently on distinct ranges of the token ring. Each node can take part in at most one repair at any given moment. Parallel repair can reduce the repair time in a factor of the number of nodes divided by the replication factor (RF). For example, a repair on a cluster of 12 nodes, Keyspace with RF=3 and default parallelism, can run up to x4 times faster, dependent on the online requests load.

User can control the level of parallelism using the new --parallel parameter.

  • You can disable the parallel repair feature by setting the value to 1.
  • Default parallelism is zero; max parallelism value in the cluster based on the replication factor.

The following represents the saving in repair time in a 9 nodes cluster, all in the same data center, while serving requests in real time with 50% load.

Refer to the Scylla Manager documentation.

Control over parallelism and intensity during repair

A new command sctool repair control is added. It lets users change parallelism and intensity (added in 2.1) of an ongoing repair, without ever restarting the task. In addition to that commands sctool repair update and sctool backup update let you update task specification of existing tasks for future runs.

Note: starting with Scylla Enterprise 2020.1 and Scylla Open Source 3.1, Scylla uses row level repair. This greatly improves repair performance, including Scylla Manager users. As a side effect, the intensity flag while still supported, has a smaller effect on the repair time (see more in the documentation).

Backup to Google Cloud Storage (GCP)

Manager 2.2 now supports backup to GCP in a similar way to the existing AWS s3 support.
To enable, use the sctool backup command, for example

$ sctool backup --cluster prod-cluster --location gcs:manager-prod-backup --retention 7 --interval 1d

Where manager-prod-backup is the target bucket.

GCP back up default rate was tested to provide fast upload rate, with minimal impact on line requests.

Read the documentation here.

Healthcheck

Scylla Manager now supports health check reports for Scylla AWS DynamoDB compatible API (Alternator) .

A new (enabled by default) dynamic timeout parameter for each DC is based on measurements on RTT of recent probes. This fixes the issue of reporting remote DC nodes as not responsive due to longer latency values.

sctool updates

  • Status
  • Repair parameters token-ranges and with-host are removed.
  • Backup parameter force is removed.

Port Updates

Monitoring ports used by Scylla Manager and Manager Agent was overlapping with users’ existing applications. To fix this, we changed the default ports as follow:

Scylla Manager Server:

  • Default Prometheus port changes from 56090 → 5090
  • Default HTTP API port changes from 56080 → 5080
  • Default HTTPS API port changes from 56443 → 5443
  • Default pprof debug port changes from 56112 → 5112

Scylla Manager Agent:

  • Default Prometheus port changes from 56090 → 5090
  • Default pprof debug port changes from 56112 → 5112

For most users, the only ports used are the ones used by Prometheus (highlighted above). Please make sure to either:

  • Update your Prometheus configuration to listen to the new port (already done in Monitoring 3.5)
  • Update Manager and Manager configuration to keep using the old ports

For the first, make sure your firewall / security groups open the new monitoring port, and close the old one.

New OS support

  • Added CentOS 8 support

Configuration

The following parameters has been removed from Manager configuration

  • segments_per_repair
  • shard_parallel_max
  • shard_failed_segments_max
  • error_backoff

Monitoring

You can use Scylla Monitoring Stack 3.5 with -M 2.2 option to get Manager 2.2 dashboards.

02 Nov 2020