# Monitor a Vespa on Kubernetes Deployment

 

Use the Prometheus Operator to collect metrics from a Vespa on Kubernetes deployment. This guide covers the installation of the monitoring stack, configuration of `PodMonitor` resources for Vespa components, and forwarding metrics to Grafana Cloud.

## Prerequisites

- A Kubernetes cluster (EKS, GKE, AKS, or Minikube).
- [Helm CLI](https://helm.sh/docs/intro/install/)
- Kubernetes Command Line Tool ([kubectl](https://kubernetes.io/docs/reference/kubectl/))
- A Grafana Cloud account

## 1. Install Prometheus Operator

The recommended way to install Prometheus on Kubernetes is via the [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) Helm chart. Add the repository and create a monitoring namespace.

```
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ kubectl create namespace monitoring
```

### Configure Grafana Cloud Credentials

If you intend to forward metrics to Grafana Cloud, create a Kubernetes Secret with your credentials. Retrieve your **Instance ID** (User) and **API Token** (Password) from the Grafana Cloud Portal under _Configure Prometheus_.

```
$ kubectl create secret generic grafana-cloud-prometheus -n monitoring --from-literal=username=$INSTANCE_ID --from-literal=password=$API_TOKEN
```

### Configure Helm Values

Create a `prometheus-values.yaml` file. This configuration enables remote writing to Grafana Cloud, configures the Prometheus Operator to select all `PodMonitors`, and disables the local Grafana instance.

```
prometheus:
  prometheusSpec:
  # Allow Prometheus to discover PodMonitors in other namespaces
  podMonitorSelectorNilUsesHelmValues: false
  serviceMonitorSelectorNilUsesHelmValues: false

  # Remote write configuration for Grafana Cloud
  remoteWrite:
    - url: [https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push](https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push)
      basicAuth:
        username:
          name: grafana-cloud-prometheus
          key: username
        password:
          name: grafana-cloud-prometheus
          key: password
      writeRelabelConfigs:
        - sourceLabels: [__address__]
          targetLabel: cluster
          replacement: my-cluster-name

# Disable local Grafana
grafana:
  enabled: false

# Enable Alertmanager and Kube State Metrics
alertmanager:
  enabled: true
  kube-state-metrics:
  enabled: true
```

Install the stack using Helm:

```
$ helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --values prometheus-values.yaml
```

## 2. Configure PodMonitors

Vespa exposes metrics on specific ports that differ from standard web traffic ports. We use the `PodMonitor` Custom Resource to define how Prometheus should scrape these endpoints.

### Monitor ConfigServer Pods

ConfigServers expose metrics on port **19071** at the path `/configserver-metrics`. Apply the following configuration to scrape these metrics.

```
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: vespa-configserver
  namespace: $NAMESPACE
  labels:
    release: prometheus # Required to be picked up by the operator
spec:
  selector:
    matchLabels:
      app: vespa-configserver
  podMetricsEndpoints:
    - targetPort: 19071
      path: /configserver-metrics
      interval: 30s
      scheme: http
      params:
        format: ['prometheus']
      relabelings:
        # Map Kubernetes pod name to the 'pod' label
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - targetLabel: vespa_role
          replacement: configserver
```

### Monitor Application Pods

Container and Content Pods expose metrics on the state API port **19092** at `/prometheus/v1/values`. The following example defines a PodMonitor for Vespa application pods.

```
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: vespa-application
  namespace: default
  labels:
    release: prometheus
spec:
  selector:
    matchExpressions:
      # Selects pods that are part of a Vespa application (feed, query, content)
      - key: vespa.ai/cluster-name
        operator: Exists
  podMetricsEndpoints:
    - targetPort: 19092
      path: /prometheus/v1/values
      interval: 30s
      scheme: http
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: namespace
        # Extract the role from the pod name or labels if needed
        - targetLabel: vespa_role
          replacement: node
```

## 3. Verify Metrics

Once the `PodMonitors` are applied, verify that Prometheus is successfully scraping the targets.

### Check Targets Locally

Port-forward the Prometheus UI to your local machine:

```
$ kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
```

Navigate to [http://localhost:9090/targets](http://localhost:9090/targets). You should see targets named `default/vespa-configserver` and `default/vespa-application` in the **UP** state.

### Query Metrics

You can verify the data using PromQL queries in the Prometheus UI or Grafana Explore:

```
# Check availability of Config Servers
up{vespa_role="configserver"}

# Retrieve average maintenance duration
vespa_maintenance_duration_average

# List all metrics coming from Vespa
{job=~"default/vespa-.*"}
```

## Troubleshooting

**Targets show `No active targets`**:

This indicates the `PodMonitor` selector does not match any Pods. Verify the labels on your Vespa pods:

```
$ kubectl get pods -n $NAMESPACE --show-labels
```

Ensure the `selector.matchLabels` in your `PodMonitor` YAML matches the labels shown in the output above.

**Targets are in `DOWN` state**:

This usually means Prometheus cannot reach the metric endpoint. Verify that the metrics are exposed on the expected port by running a curl command from within the cluster:

```
$ kubectl run curl-test -n $NAMESPACE --image=curlimages/curl -it --rm -- curl http://cfg-0.$NAMESPACE.svc.cluster.local:19071/configserver-metrics?format=prometheus
```

**Network Policies**:

If you use `NetworkPolicy` to restrict traffic, ensure you have a policy allowing ingress traffic from the `monitoring` namespace to the `$NAMESPACE` namespace on ports 19071 and 19092.

 Copyright © 2026 - [Cookie Preferences](#)

### On this page:

- [Prerequisites](#)
- [1. Install Prometheus Operator](#)
- [Configure Grafana Cloud Credentials](#)
- [Configure Helm Values](#)
- [2. Configure PodMonitors](#)
- [Monitor ConfigServer Pods](#)
- [Monitor Application Pods](#)
- [3. Verify Metrics](#)
- [Check Targets Locally](#)
- [Query Metrics](#)
- [Troubleshooting](#)

