Scylla Monitoring Stack Release 2.2

The Scylla team is pleased to announce the release of Scylla Monitoring Stack 2.2.

Scylla Monitoring Stack is an open source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 2.2 supports:

  • Scylla Open Source versions 2.3 and 3.0
  • Scylla Enterprise versions 2017.x, 2018.x and 2019.x
  • Scylla Manager 1.3.x

Related Links

New in Scylla Monitoring Stack 2.2

  • CQL optimization dashboard (#471)
    The CQL optimization dashboard helps identify issues when developing an application with Scylla such as non-prepared statements, queries that are not token aware, non paged queries, and requests from a remote DC. Before using the new dashboard, make sure you correctly defined the DC names (see Align Data Center Names below). More on the new Optimization Dashboard. A blog post on the optimization dashboard, and how to use it will be published soon.

Scylla Monitor 2.2 Dashboard

  • Unified target files for Scylla and node_exporter (#378)
    To simplify the Prometheus configuration of Scylla nodes and the node_exporter targets, you only need to configure Scylla targets. Prometheus assumes that there is a node_exporter running on each of the Scylla servers and will use the same IPs as those set in the targets. It is still possible to configure a specific node_exporter target file.
  • Per machine (node_exporter related) dashboard added to Enterprise (#495)
    The per-machine dashboard shows information about the host disk and network. It is now available for Enterprise.
  • Prometheus container uses the current user ID and group (#487)
    There is an ongoing issue with the volume the Prometheus container uses to store its data. From Scylla manager version 2.2, the container will run as the current user and with the user group ID. This means, that the data directory should have the current user permissions. While this does not require any changes, it is recommended to check your Docker installation and make sure you are not running Docker as root.
  • kill-all.sh kills Prometheus instances gracefully (#438)
    The kill-all command will now attempt to kill Prometheus gracefully. By doing so, Prometheus will start quickly after shutdown. This means that shutdowns can take longer than anticipated. The kill-all will wait for up to two minutes for Prometheus to shut down. Once the time has lapsed, the command will forcefully kill the container.
  • start-all.sh now supports --version flag (#374)
    To verify your Monitoring stack version, you can now run ./start-all.sh --version
  • Remove the version from the dashboard names (#486)
    Following the move to Grafana 5 and the use of the dashboard folders, the version was removed from the dashboard names.
  • Dashboard loaded from API should have overwritten properties set to true (#474)
    For users who upload the dashboard with the API, dashboards have the overwrite flag set to true, so you can upload the same dashboard twice.
  • Update Alertmanager to 0.16 (#478)
    Following the changes in Alertmanager see the changelog for details

Align Data Center Names

The new Optimization Dashboard (above) relies on the definition of nodes per Data Center in the Monitoring Stack, to match the Data Center names used in Scylla Cluster. For example:

nodetool status
Datacenter: DC1
|/ State=Normal/Leaving/Joining/Moving
-- Address     Load       Tokens   Owns Host ID                              Rack
UN  108.83 KB  256      ?    fae7039a-21ad-4e94-9474-430abcf48158 Rack1
UN  108.86 KB  256      ?    fe2986de-9c8a-44bb-8b3b-923519095a23 Rack1
UN  108.84 KB  256      ?    2a10a36f-365f-455a-85d4-18cd40b6b765 Rack1
Datacenter: DC2
|/ State=Normal/Leaving/Joining/Moving
-- Address     Load       Tokens   Owns Host ID                              Rack
UN  108.8 KB   256      ?    edbc46cb-d948-4745-a90b-28d3bc90c034 Rack1
UN  108.76 KB  256      ?    58a5a43c-8ec1-4369-91d9-6bd79d1d706a Rack1
UN  108.27 KB  256      ?    c5886895-be75-4e18-8fa5-3633a10f9ee8 Rack1

Should match the data center names, in this case, DC1 and DC2 found in scylla-grafana-monitoring/prometheus/scylla_servers.yml

Such as:

- targets:

cluster: my-cluster
dc: DC1

- targets:

cluster: my-cluster
dc: DC2

Bug Fixes

  • Moved the node_exporter relabeling to metric_relabeling (#497)
  • Fixed units in foreground write (#463)
  • Manager dashboard was missing UUID (#505)