Monitoring

This document describes how to monitor the state and performance of Vespa via an external metrics system. Vespa provides custom integrations with CloudWatch, Datadog and Prometheus, as well as a generic HTTP API to retrieve metrics in JSON format.

There are two main approaches to transfer metrics to an external system:

  • Have the external system pull metrics from Vespa
  • Make Vespa push metrics to the external system

Below, we'll take a look at the two approaches.

Pulling metrics from Vespa

All pull-based solutions use Vespa's metrics API, which provides metrics in JSON format, either for the full system or for a single node.

CloudWatch

Metrics can be pulled into CloudWatch from both Vespa Cloud and self-hosted Vespa. The recommended solution is to use an AWS lambda function, as described in Pulling Vespa metrics to Cloudwatch.

Datadog

Note: This method currently works for self-hosted Vespa only.

The Vespa team has created a Datadog Agent integration to allow real-time monitoring of Vespa in Datadog. The Datadog Vespa integration is not packaged with the agent, but is included in Datadog's integrations-extras repository. Clone it and follow the steps in the README.

Prometheus

Note: This method currently works for self-hosted Vespa only.

The metrics API on each host exposes metrics in a text based format that can be scraped by Prometheus at http://host:19092/prometheus/v1/values.

Pushing metrics to CloudWatch

Note: This method currently works for self-hosted Vespa only.

This is presumably the most convenient way to monitor Vespa in CloudWatch. Steps / requirements:

  1. An IAM user or IAM role that only has the putMetricData permission.
  2. Store the credentials for the above user or role in a shared credentials file on each Vespa node. If a role is used, provide a mechanism to keep the credentials file updated when keys are rotated.
  3. Configure Vespa to push metrics to CloudWatch - example configuration for the admin section in services.xml:
    <metrics>
        <consumer id="my-cloudwatch">
          <metric-set id="default" />
          <cloudwatch region="us-east-1" namespace="my-vespa-metrics">
              <shared-credentials file="/path/to/credentials-file" />
          </cloudwatch>
        </consumer>
    </metrics>
    
    This configuration sends the default set of Vespa metrics to the CloudWatch namespace my-vespa-metrics in the us-east-1 region. Refer to the metric list for default metric set.