Scylla Monitoring Stack is a bundle of four components based on de-facto industry-standard open-source tools (a Prometheus metric collector, alert manager, Grafana 6 dashboards, and Grafana Loki log aggregation system) that can be deployed as containers or directly onto a host. It collects aggregated NoSQL performance metrics, logs and events through Scylla Manager. Scylla Monitoring Stack is available for Scylla Open Source, Scylla Enterprise and Scylla Cloud customers.
Log Aggregation System
The Scylla Monitoring Stack empowers DevOps, infrastructure operations teams, and database administrators to quickly find and fix issues impacting the performance of their Scylla clusters. Teams can drill down from high-level to detailed NoSQL dashboards.
Scylla Monitoring Stack includes a set of pre-built dashboards to monitor your Scylla cluster in real-time. Hundreds of different NoSQL metrics populate dashboard components for your operations team to review historical trends and identify anomalous behavior in your cluster.
The Scylla CQL dashboard helps teams identify query issues, poor data models, and unexpected driver behavior. Teams can quickly see, for example, if their cluster is being hit by a lot of heavy queries with full table scans where “allow filtering” is enabled.
Quickly identify nodes in your cluster and drill down to detailed OS-level metrics such as CPU utilization, IO, and Errors. Teams can quickly decide if nodes need to get rebooted or if the team needs to perform a rolling upgrade on nodes running old versions.
Set conditional alerts for your Scylla cluster within the alert manager so your team knows when incidents arise. Out-of-the-box alert triggers are included for conditions such as:
Database administrators are able to annotate heavy tasks such as backup or repair start and finish times. This helps cross-functional teams visually understand why there may be additional latency or reduced throughput at particular times.
The Advisor is a new concept in Scylla Monitoring. It identifies potential problems and notifies you of them. The Advisor section in the Overview dashboard has two parts. One is for various issues detected, like unprepared statements. The second is an indication of how balanced the system is. When the cluster works properly, all nodes and shards should act the same. An outlier shard could be a result of a problem. For example, if the number of CQL connections per shard varies between shards, it indicates a driver configuration issue.
Get started on the path to Scylla expertise
It’s easy to get started with our NoSQL DBaaS