Scylla Monitoring Stack 3.0

The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.0.

Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.0 supports:

  • Scylla Open Source versions 2.3, 3.0 and 3.1
  • Scylla Enterprise versions 2018.x and 2019.x
  • Scylla Manager 1.4.x

Related Links

Scylla Monitoring 3.0 is not fully backward compatible, make sure to follow the upgrade guide for more details.

New in Scylla Monitoring Stack 3.0

  • A general reorganization of all dashboards
    After the reorganization the dashboard names are:

    • Overview – Quick overview of a cluster
    • Detailed – In-depth detailed look, focusing on the server level
    • CQL – Covers CQL metrics and points out potential problems such as shard aware drivers, non paged queries, etc.
    • OS – OS-related metrics, about disk and network as reported by the node_exporter agent
    • IO – Scylla IO metrics, focusing on the IO Queue
    • CPU – CPU related metrics
    • Errors – A single place for errors generated by Scylla
    • Manager – The Scylla Manager Dashboard
  • Metrics clean up – While moving to 3.0 there are both metrics changes and label changes:
    • Scylla monitoring uses node_exporter to export OS-related metrics. When installing Scylla Open Source versions 2.3 and above, the installation package installs node_exporter version 0.17, which is not backward compatible with previous versions of node_exporter.
      Make sure you are using node_exporter version 0.17 as explained in the upgrade guide.
    • Label cleanup: Removed redundant labels in Prometheus, saving some memory on the Prometheus side. The changes will cause a visible effect on the metrics during the upgrade, the grafana graphs change their color, but historical data will still be persistent.
  • Removal of the targets configuration files from the repository: There are two target files Scylla monitoring uses, one for Scylla servers (prometheus/scylla_servers.yml) and one for Scylla Manager (prometheus/scylla_manager_servers.yml).
    As these files are different for each deployment, they were removed from the repository. Make sure to follow the upgrade guide and copy the files from their old locations.
  • Switch from Grafana 5 to Grafana 6 – Grafana 6 includes a facelift and changes to their plugin architecture. Switching to the newer version, allows us (when going forward) to use newer Grafana features.
  • Switch to python3 – python 2 is getting closer to its end of life. Python is only used when modifying dashboards with the make_dashboard.sh script or when using the genconfig.py script to generate the scylla_server.yml file.
  • Switch from Prometheus 2.7.2 to Prometheus 2.10 – you can read about Prometheus releases here
  • New Alerts. Prometheus Alert Manager allows you to set alerts for your Scylla cluster. Scylla Monitoring Stack comes with a few out of the box alerts. Scylla Monitoring 3.0 adds two new default alerts:
  • The Nodes table, in the Overview dashboard now shows the state and Scylla version of each node and provides a quick link to the Node Detailed dashboard.

Nodes table in upcoming Scylla Monitoring 3.0 provides Each node’s version in real-time

Bug Fixes

  • Multiple notice messages when running a clean clone #676
  • Link from Overview::nodetable to node is broken #674
  • Prometheus/Grafana creates non-existing directories with root permissions #669
  • Scylla Manager Metrics dashboard: image broken in air gapped environments #653

27 Aug 2019