Use the Prometheus Operator to collect metrics from a Vespa on Kubernetes deployment.
This guide covers the installation of the monitoring stack, configuration of PodMonitor resources for Vespa components, and forwarding metrics to Grafana Cloud.
The recommended way to install Prometheus on Kubernetes is via the kube-prometheus-stack Helm chart. Add the repository and create a monitoring namespace.
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm repo update $ kubectl create namespace monitoring
If you intend to forward metrics to Grafana Cloud, create a Kubernetes Secret with your credentials. Retrieve your Instance ID (User) and API Token (Password) from the Grafana Cloud Portal under Configure Prometheus.
$ kubectl create secret generic grafana-cloud-prometheus -n monitoring --from-literal=username=$INSTANCE_ID --from-literal=password=$API_TOKEN
Create a prometheus-values.yaml file. This configuration enables remote writing to Grafana Cloud,
configures the Prometheus Operator to select all PodMonitors, and disables the local Grafana instance.
prometheus:
prometheusSpec:
# Allow Prometheus to discover PodMonitors in other namespaces
podMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
# Remote write configuration for Grafana Cloud
remoteWrite:
- url: [https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push](https://prometheus-prod-XX-prod-XX.grafana.net/api/prom/push)
basicAuth:
username:
name: grafana-cloud-prometheus
key: username
password:
name: grafana-cloud-prometheus
key: password
writeRelabelConfigs:
- sourceLabels: [__address__]
targetLabel: cluster
replacement: my-cluster-name
# Disable local Grafana
grafana:
enabled: false
# Enable Alertmanager and Kube State Metrics
alertmanager:
enabled: true
kube-state-metrics:
enabled: true
Install the stack using Helm:
$ helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --values prometheus-values.yaml
Vespa exposes metrics on specific ports that differ from standard web traffic ports.
We use the PodMonitor Custom Resource to define how Prometheus should scrape these endpoints.
ConfigServers expose metrics on port 19071 at the path /configserver-metrics.
Apply the following configuration to scrape these metrics.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: vespa-configserver
namespace: $NAMESPACE
labels:
release: prometheus # Required to be picked up by the operator
spec:
selector:
matchLabels:
app: vespa-configserver
podMetricsEndpoints:
- targetPort: 19071
path: /configserver-metrics
interval: 30s
scheme: http
params:
format: ['prometheus']
relabelings:
# Map Kubernetes pod name to the 'pod' label
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: pod
- targetLabel: vespa_role
replacement: configserver
Container and Content Pods expose metrics on the state API port 19092 at /prometheus/v1/values.
The following example defines a PodMonitor for Vespa application pods.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: vespa-application
namespace: default
labels:
release: prometheus
spec:
selector:
matchExpressions:
# Selects pods that are part of a Vespa application (feed, query, content)
- key: vespa.ai/cluster-name
operator: Exists
podMetricsEndpoints:
- targetPort: 19092
path: /prometheus/v1/values
interval: 30s
scheme: http
relabelings:
- sourceLabels: [__meta_kubernetes_pod_name]
targetLabel: pod
- sourceLabels: [__meta_kubernetes_namespace]
targetLabel: namespace
# Extract the role from the pod name or labels if needed
- targetLabel: vespa_role
replacement: node
Once the PodMonitors are applied, verify that Prometheus is successfully scraping the targets.
Port-forward the Prometheus UI to your local machine:
$ kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
Navigate to http://localhost:9090/targets. You should see targets named default/vespa-configserver and default/vespa-application in the UP state.
You can verify the data using PromQL queries in the Prometheus UI or Grafana Explore:
# Check availability of Config Servers
up{vespa_role="configserver"}
# Retrieve average maintenance duration
vespa_maintenance_duration_average
# List all metrics coming from Vespa
{job=~"default/vespa-.*"}
Targets show No active targets:
This indicates the PodMonitor selector does not match any Pods.
Verify the labels on your Vespa pods:
$ kubectl get pods -n $NAMESPACE --show-labels
Ensure the selector.matchLabels in your PodMonitor YAML matches the labels shown in the output above.
Targets are in DOWN state:
This usually means Prometheus cannot reach the metric endpoint. Verify that the metrics are exposed on the expected port by running a curl command from within the cluster:
$ kubectl run curl-test -n $NAMESPACE --image=curlimages/curl -it --rm -- curl http://cfg-0.$NAMESPACE.svc.cluster.local:19071/configserver-metrics?format=prometheus
Network Policies:
If you use NetworkPolicy to restrict traffic, ensure you have a policy allowing ingress traffic
from the monitoring namespace to the $NAMESPACE namespace on ports 19071 and 19092.