See all blog posts

Scylla Monitoring Stack Release 3.0

Scylla Monitoring Stack Release Notes

The Scylla team is pleased to announce the release of Scylla Monitoring Stack 3.0.

Scylla Monitoring Stack is an open-source stack for monitoring Scylla Enterprise and Scylla Open Source, based on Prometheus and Grafana. Scylla Monitoring Stack 3.0 supports:

  • Scylla Open Source versions 2.3, 3.0 and 3.1
  • Scylla Enterprise versions 2018.x and 2019.x
  • Scylla Manager 1.4.x

Related Links

Scylla Monitoring 3.0 is not fully backward compatible, make sure to follow the upgrade guide for more details.

New in Scylla Monitoring Stack 3.0

  • A general reorganization of all dashboards
    After the reorganization the dashboard names are:

    • Overview – Quick overview of a cluster
    • Detailed – In-depth detailed look, focusing on the server level
    • CQL – Covers CQL metrics and points out potential problems such as shard aware drivers, non paged queries, etc.
    • OS – OS-related metrics, about disk and network as reported by the node_exporter agent
    • IO – Scylla IO metrics, focusing on the IO Queue
    • CPU – CPU related metrics
    • Errors – A single place for errors generated by Scylla
    • Manager – The Scylla Manager Dashboard
  • Metrics clean up – While moving to 3.0 there are both metrics changes and label changes:
    • Scylla monitoring uses node_exporter to export OS-related metrics. When installing Scylla Open Source versions 2.3 and above, the installation package installs node_exporter version 0.17, which is not backward compatible with previous versions of node_exporter.
      Make sure you are using node_exporter version 0.17 as explained in the upgrade guide.
    • Label cleanup: Removed redundant labels in Prometheus, saving some memory on the Prometheus side. The changes will cause a visible effect on the metrics during the upgrade, the grafana graphs change their color, but historical data will still be persistent.
  • Removal of the targets configuration files from the repository: There are two target files Scylla monitoring uses, one for Scylla servers (prometheus/scylla_servers.yml) and one for Scylla Manager (prometheus/scylla_manager_servers.yml).
    As these files are different for each deployment, they were removed from the repository. Make sure to follow the upgrade guide and copy the files from their old locations.
  • Switch from Grafana 5 to Grafana 6 – Grafana 6 includes a facelift and changes to their plugin architecture. Switching to the newer version, allows us (when going forward) to use newer Grafana features.
  • Switch to python3 – python 2 is getting closer to its end of life. Python is only used when modifying dashboards with the make_dashboard.sh script or when using the genconfig.py script to generate the scylla_server.yml file.
  • Switch from Prometheus 2.7.2 to Prometheus 2.10 – you can read about Prometheus releases here
  • New Alerts. Prometheus Alert Manager allows you to set alerts for your Scylla cluster. Scylla Monitoring Stack comes with a few out of the box alerts. Scylla Monitoring 3.0 adds two new default alerts:
  • The Nodes table, in the Overview dashboard now shows the state and Scylla version of each node and provides a quick link to the Node Detailed dashboard.

Nodes table in upcoming Scylla Monitoring 3.0 provides Each node’s version in real-time

Bug Fixes

  • Multiple notice messages when running a clean clone #676
  • Link from Overview::nodetable to node is broken #674
  • Prometheus/Grafana creates non-existing directories with root permissions #669
  • Scylla Manager Metrics dashboard: image broken in air gapped environments #653
Amnon Heiman

About Amnon Heiman

Amnon has 15 years of experience in software development of large scale systems. Previously he worked at Convergin, which was acquired by Oracle. Amnon holds a BA and MSc in Computer Science from the Technion-Machon Technologi Le' Israel and an MBA from Tel Aviv University.