Enterprise

Monitor a Vespa on Kubernetes Deployment

Use the Prometheus Operator to collect metrics from a Vespa on Kubernetes deployment. This guide covers the installation of the monitoring stack, configuration of PodMonitor resources for Vespa components, and forwarding metrics to Grafana Cloud.

Prerequisites

A Kubernetes cluster (EKS, GKE, AKS, or Minikube).
Helm CLI
Kubernetes Command Line Tool (kubectl)
A Grafana Cloud account

1. Install Prometheus Operator

The recommended way to install Prometheus on Kubernetes is via the kube-prometheus-stack Helm chart. Add the repository and create a monitoring namespace.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ kubectl create namespace monitoring

Configure Grafana Cloud Credentials

If you intend to forward metrics to Grafana Cloud, create a Kubernetes Secret with your credentials. Retrieve your Instance ID (User) and API Token (Password) from the Grafana Cloud Portal under Configure Prometheus.

$ kubectl create secret generic grafana-cloud-prometheus -n monitoring --from-literal=username=$INSTANCE_ID --from-literal=password=$API_TOKEN

Configure Helm Values

Create a prometheus-values.yaml file. This configuration enables remote writing to Grafana Cloud, configures the Prometheus Operator to select all PodMonitors, and disables the local Grafana instance.

prometheus:
  prometheusSpec:
  # Allow Prometheus to discover PodMonitors in other namespaces
  podMonitorSelectorNilUsesHelmValues: false
  serviceMonitorSelectorNilUsesHelmValues: false

  # Remote write configuration for Grafana Cloud
  remoteWrite:
    - url: [https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push](https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push)
      basicAuth:
        username:
          name: grafana-cloud-prometheus
          key: username
        password:
          name: grafana-cloud-prometheus
          key: password
      writeRelabelConfigs:
        - sourceLabels: [__address__]
          targetLabel: cluster
          replacement: my-cluster-name

# Disable local Grafana
grafana:
  enabled: false

# Enable Alertmanager and Kube State Metrics
alertmanager:
  enabled: true
  kube-state-metrics:
  enabled: true

Install the stack using Helm:

$ helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --values prometheus-values.yaml

2. Configure PodMonitors

Vespa exposes metrics on specific ports that differ from standard web traffic ports. We use the PodMonitor Custom Resource to define how Prometheus should scrape these endpoints.

Monitor ConfigServer Pods

ConfigServers expose metrics on port 19071 at the path /configserver-metrics. Apply the following configuration to scrape these metrics.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: vespa-configserver
  namespace: $NAMESPACE
  labels:
    release: prometheus # Required to be picked up by the operator
spec:
  selector:
    matchLabels:
      app: vespa-configserver
  podMetricsEndpoints:
    - targetPort: 19071
      path: /configserver-metrics
      interval: 30s
      scheme: http
      params:
        format: ['prometheus']
      relabelings:
        # Map Kubernetes pod name to the 'pod' label
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - targetLabel: vespa_role
          replacement: configserver

Monitor Application Pods

Container and Content Pods expose metrics on the state API port 19092 at /prometheus/v1/values. The following example defines a PodMonitor for Vespa application pods.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: vespa-application
  namespace: default
  labels:
    release: prometheus
spec:
  selector:
    matchExpressions:
      # Selects pods that are part of a Vespa application (feed, query, content)
      - key: vespa.ai/cluster-name
        operator: Exists
  podMetricsEndpoints:
    - targetPort: 19092
      path: /prometheus/v1/values
      interval: 30s
      scheme: http
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: namespace
        # Extract the role from the pod name or labels if needed
        - targetLabel: vespa_role
          replacement: node

3. Verify Metrics

Once the PodMonitors are applied, verify that Prometheus is successfully scraping the targets.

Check Targets Locally

Port-forward the Prometheus UI to your local machine:

$ kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090

Navigate to http://localhost:9090/targets. You should see targets named default/vespa-configserver and default/vespa-application in the UP state.

Query Metrics

You can verify the data using PromQL queries in the Prometheus UI or Grafana Explore:

# Check availability of Config Servers
up{vespa_role="configserver"}

# Retrieve average maintenance duration
vespa_maintenance_duration_average

# List all metrics coming from Vespa
{job=~"default/vespa-.*"}

Troubleshooting

Targets show No active targets:

This indicates the PodMonitor selector does not match any Pods. Verify the labels on your Vespa pods:

$ kubectl get pods -n $NAMESPACE --show-labels

Ensure the selector.matchLabels in your PodMonitor YAML matches the labels shown in the output above.

Targets are in DOWN state:

This usually means Prometheus cannot reach the metric endpoint. Verify that the metrics are exposed on the expected port by running a curl command from within the cluster:

$ kubectl run curl-test -n $NAMESPACE --image=curlimages/curl -it --rm -- curl http://cfg-0.$NAMESPACE.svc.cluster.local:19071/configserver-metrics?format=prometheus

Network Policies:

If you use NetworkPolicy to restrict traffic, ensure you have a policy allowing ingress traffic from the monitoring namespace to the $NAMESPACE namespace on ports 19071 and 19092.