ScyllaDB Monitoring Stack 3.0

The ScyllaDB team is pleased to announce the release of ScyllaDB Monitoring Stack 3.0.

ScyllaDB Monitoring Stack is an open-source stack for monitoring ScyllaDB Enterprise and ScyllaDB Open Source, based on Prometheus and Grafana. ScyllaDB Monitoring Stack 3.0 supports:

ScyllaDB Open Source versions 2.3, 3.0 and 3.1
ScyllaDB Enterprise versions 2018.x and 2019.x
ScyllaDB Manager 1.4.x

New in ScyllaDB Monitoring Stack 3.0

A general reorganization of all dashboards
After the reorganization the dashboard names are:
- Overview – Quick overview of a cluster
- Detailed – In-depth detailed look, focusing on the server level
- CQL – Covers CQL metrics and points out potential problems such as shard aware drivers, non paged queries, etc.
- OS – OS-related metrics, about disk and network as reported by the node_exporter agent
- IO – ScyllaDB IO metrics, focusing on the IO Queue
- CPU – CPU related metrics
- Errors – A single place for errors generated by ScyllaDB
- Manager – The ScyllaDB Manager Dashboard
Metrics clean up – While moving to 3.0 there are both metrics changes and label changes:
- ScyllaDB monitoring uses node_exporter to export OS-related metrics. When installing ScyllaDB Open Source versions 2.3 and above, the installation package installs node_exporter version 0.17, which is not backward compatible with previous versions of node_exporter.
  Make sure you are using node_exporter version 0.17 as explained in the upgrade guide.
- Label cleanup: Removed redundant labels in Prometheus, saving some memory on the Prometheus side. The changes will cause a visible effect on the metrics during the upgrade, the grafana graphs change their color, but historical data will still be persistent.
Removal of the targets configuration files from the repository: There are two target files ScyllaDB monitoring uses, one for ScyllaDB servers (prometheus/scylla_servers.yml) and one for ScyllaDB Manager (prometheus/scylla_manager_servers.yml).
As these files are different for each deployment, they were removed from the repository. Make sure to follow the upgrade guide and copy the files from their old locations.
Switch from Grafana 5 to Grafana 6 – Grafana 6 includes a facelift and changes to their plugin architecture. Switching to the newer version, allows us (when going forward) to use newer Grafana features.
Switch to python3 – python 2 is getting closer to its end of life. Python is only used when modifying dashboards with the make_dashboard.sh script or when using the genconfig.py script to generate the scylla_server.yml file.
Switch from Prometheus 2.7.2 to Prometheus 2.10 – you can read about Prometheus releases here
New Alerts. Prometheus Alert Manager allows you to set alerts for your ScyllaDB cluster. ScyllaDB Monitoring Stack comes with a few out of the box alerts. ScyllaDB Monitoring 3.0 adds two new default alerts:
- Alert on low free disk space on the root partition
- Alert when a node changes its status to leaving
The Nodes table, in the Overview dashboard now shows the state and ScyllaDB version of each node and provides a quick link to the Node Detailed dashboard.

Nodes table in upcoming ScyllaDB Monitoring 3.0 provides Each node’s version in real-time

Bug Fixes

Multiple notice messages when running a clean clone #676
Link from Overview::nodetable to node is broken #674
Prometheus/Grafana creates non-existing directories with root permissions #669
ScyllaDB Manager Metrics dashboard: image broken in air gapped environments #653

27 Aug 2019