Oct2

Monitoring Scylla with Datadog: A Tale about Datadog – Prometheus integration

Subscribe to Our Blog

As an open-source project, our Scylla NoSQL database has an open-source monitoring stack, Scylla Monitoring Stack. It uses Prometheus as the reporting and collecting layer and Grafana for dashboard creation. Prometheus pulls metrics directly from each Scylla Server.

The Scylla Monitoring Stack

A question we keep getting from customers is: “How can I monitor Scylla with Datadog?” And more generally “How do you integrate Datadog and Prometheus?”

I went to find out and returned with two answers. You can sort out the first from Datadog documentation, but there’s an alternative and more surprising answer — read to the end to see it. If you want to use this solution for your other applications, you can replace “Scylla” below with your application.

Before we go any further, a few words about Prometheus.

Prometheus essentials

Prometheus is an open-source monitoring solution. It uses a pull-based approach, as the Prometheus server “scrapes” its targets periodically. Prometheus reads the targets directly from applications that have a Prometheus API (like Scylla does) or use exporters for applications that do not.

Prometheus protocol is text-based and human-readable so if at any point you have an issue with the data, you can point your browser or use curl to fetch the metrics directly.
For example, you can look at Scylla’s metrics by calling:

$ curl http://localhost:9180/metrics

Prometheus Datadog Integration – Local Agent

The first approach, which you will find if you check Datadog documentation, is to use a local agent (installed on the same node as the Scylla database server).

Datadog agent version 6.1.0 and onwards can read Prometheus protocol. On each of the hosts that run an application that you want to monitor (Scylla in our case) run the Datadog agent and let the agent read from the application.

For this blog-post, we assume that you already have a Datadog account. You can register for a 14 day trial for free if you don’t.

Now follow these steps:

  1. Install the Datadog agent on each Scylla server as described here.
  2. Create a Prometheus configuration file for the agent.
    sudo cp
    /etc/datadog-agent/conf.d/prometheus.d/conf.yaml.example
    /etc/datadog-agent/conf.d/prometheus.d/conf.yaml
  3. Edit the configuration file:
    Point the prometheus_url to Scylla’s Prometheus endpoint: http://localhost:9180/metrics
    Set the metrics to scylla_* and set the namespace to scylla.An example of a minimal config file:

    init_config: instances:
      - prometheus_url: http://localhost:9180/metrics
        namespace: scylla
        metrics:
          - scylla_*
  4. Restart the agent as described here.
  5. You should now be able to use the Scylla metrics in your dashboard.

Please note installing an agent on each server can compete with Scylla on resources. If you do choose this method, it’s recommended to limit Datadog agents by using systemd slices.

Reading directly from a Prometheus Server

If you have Datadog agents running on each of your nodes, the previous section could be good enough. But, if you are running a Prometheus server it is sometimes useful to read the metrics from the Prometheus server and not from the endpoints.

Happily enough, there is such an option using Prometheus federation. Federation allows one Prometheus server to read metrics that are stored in another server.

To use Prometheus federation with Datadog, you need the Datadog agent only on the machine that runs the Prometheus server. Install and configure the Datadog agent like in the previous section, but this time set the prometheus_url to the federate endpoint.

An example of such a configuration file:

init_config:

instances:
  - prometheus_url: http://localhost:9090/federate?match[]={job=~"scylla"}
    namespace: scylla
    metrics:
      - scylla_*

In the example, we used the match with job equals scylla, but it would work for any other monitored application with different parameters.

Note that you must supply a match parameter for the federate endpoint to work

A Scylla dashboard example in Datadog

To create a Datadog dashboard follow the instructions here.

When you add a graph, Datadog will show the available metrics, scylla’s metrics start with ‘scylla’.

As a reference and suggestion, you can look at Scylla Monitoring Dashboards for the metrics we think are more useful.

Conclusion (and which option to choose)

There are two alternatives for reporting Scylla metrics to Datadog:

  1. Using a local agent on each Scylla server
  2. Using one agent on Scylla Monitoring Stack, reading from Prometheus

Although both methods work, we recommend the second one for the following reasons:

  • Simpler to deploy (only one agent)
  • Does not risk Scylla by running an additional process on each DB server. Such an agent may compete with Scylla on server resources, bandwidth etc.

Please note, both methods can work in parallel to Scylla Monitoring Stack.

We recommend having Scylla Monitoring Stack in place, even if you are using DataDog. Scylla Monitoring dashboards are very rich and useful and will help our support team to debug any issue you might have.

Amnon HeimanAbout Amnon Heiman

Amnon has 15 years of experience in software development of large scale systems. Previously he worked at Convergin, which was acquired by Oracle. Amnon holds a BA and MSc in Computer Science from the Technion-Machon Technologi Le' Israel and an MBA from Tel Aviv University.


Tags: Datadog, grafana, Prometheus, scylla, Scylla Monitoring Stack, Scylla Open Source