Vespa on Kubernetes supports zero-downtime rolling upgrades. An upgrade involves upgrading the
vespa-operator via the Helm chart and the ConfigServer and Application (Container and Content) Pods through
the VespaSet resource.
We do not support version drift between the vespa-operator and the VespaSet. Accordingly,
upgrades should be planned so that all components are updated together. To ensure availability, they should be performed
in the order as shown in this guide.
Some upgrades may introduce changes to the VespaSet CRD definition. These changes should be
applied to the cluster before performing the upgrade. As a rule of thumb, we recommend executing this before
every upgrade procedure.
Helm does not manage the lifecycle of the CRD after it is installed (see the official documentation).
As a result, CRD updates must be handled manually. Given the official Helm Chart for Vespa on Kubernetes, this can be performed by extracting the
CRD definition from the OCI package and applying it directly using kubectl.
$ helm show crds $HELM_CHART_REF --version $VESPA_VERSION > vespaset-crd.yaml $ kubectl apply -f vespaset-crd.yaml
The operator can be upgraded through helm by running helm upgrade with the new VESPA_VERSION.
Replace $NAMESPACE with the namespace where Vespa is installed. Refer to Factory for the latest VESPA_VERSION.
Note that upgrading the operator does not affect the ConfigServer and Application Pods. Their upgrade will be performed in a subsequent step.
$ helm upgrade vespa-operator vespa/vespa-operator \ --version $OPERATOR_VERSION \ --namespace $NAMESPACE \ --reuse-values
Wait for the operator to finish rolling out before proceeding.
$ kubectl rollout status deployment/vespa-operator -n $NAMESPACE
To upgrade the ConfigServer and application Pods, patch the spec.version field in the
VespaSet resource. Ensure that the target image is available and accessible on the Kubernetes Node at
VESPA_OPERATOR_IMAGE:VESPA_VERSION and VESPA_IMAGE:VESPA_VERSION before proceeding.
For example:
$ cat > vespaset.yaml <<EOF
apiVersion: k8s.ai.vespa/v1
kind: VespaSet
metadata:
name: vespaset-sample
namespace: ${NAMESPACE}
spec:
version: 8.566.7 # Specify the version to upgrade to.
configServer:
image: "${VESPA_OPERATOR_IMAGE}"
storageClass: "gp3"
generateRbac: false
application:
image: "${VESPA_IMAGE}"
storageClass: "gp3"
ingress:
endpointType: "NONE"
EOF
$ kubectl apply -f vespaset.yaml
The ConfigServer Pods will detect a change to the VespaSet resource and orchestrate the upgrade procedure to
themselves and the Application Pods.
The upgrade always proceeds in two phases: ConfigServer Pods are upgraded first, followed by Application Pods. This ordering is required because the Config Servers must be running the new version before they can safely orchestrate Application Pods onto it.
Additionally, the base template for creating a Vespa Pod, whether it be a ConfigServer or Application Pod, could have been changed during the upgrade. As such, the ConfigServer Pods should ensure that they are basing off the latest template, rather than a stale one, to prevent needlessly recreating the Pods with the latest template post-upgrade.
During the upgrade procedure, each Pod is upgraded one at a time. This process is sequential. For each Pod, the operator:
Converged Version as a status on the VespaSetThe cluster remains operational throughout this procedure. The remaining ConfigServer Pods continue serving configuration to Application Pods while each node is upgraded in turn, and the Dataplane layer will continue to serve traffic as normal. For Content Pods, the operator waits for data redistribution to complete before moving to the next Pod, ensuring no data loss during the rollout.
To ensure zero downtime for any applications, ingress should be properly configured so that traffic is correctly load balanced across the Dataplane layer, allowing requests to be seamlessly routed away from Pods undergoing upgrades. Refer to the Ingress page for more details.
Throughout the upgrade, each Pod's status is reflected in the VespaSet status. A Pod that
is actively being upgraded reports its phase as UPGRADING. A Pod that has successfully completed
the upgrade reports its Converged Version as the new version.
In the example below, the Config Server Pods have all converged to 8.577, while the Application Pod
default-100 is currently upgrading and has not yet converged from 8.576.
$ kubectl describe vespaset vespaset-sample -n $NAMESPACE
Name: vespaset-sample
Namespace: $NAMESPACE
Labels: <none>
Annotations: <none>
API Version: k8s.ai.vespa/v1
Kind: VespaSet
Metadata:
Creation Timestamp: 2026-01-29T21:32:27Z
Finalizers:
vespasets.k8s.ai.vespa/finalizer
Generation: 1
Resource Version: 121822902
UID: a70f56e9-6625-4011-acd7-9f7cad29dbc2
Spec:
Application:
Image: $VESPA_IMAGE
Storage Class: gp3
Config Server:
Generate Rbac: false
Image: $VESPA_IMAGE
Storage Class: gp3
Ingress:
Endpoint Type: LOAD_BALANCER
Version: 8.577
Status:
Bootstrap Status:
Pods:
cfg-1:
Last Updated: 2026-01-29T21:38:45Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.577
cfg-2:
Last Updated: 2026-01-29T21:38:09Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.577
cfg-3:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.577
default-100:
Last Updated: 2026-01-29T21:38:45Z
Message: Pod is upgrading
Phase: UPGRADING
Converged Version: 8.576
default-101:
Last Updated: 2026-01-29T21:38:09Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
documentation-102:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
documentation-103:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
cluster-controller-104:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
cluster-controller-105:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
cluster-controller-106:
Last Updated: 2026-01-29T21:36:32Z
Message: Pod is running
Phase: RUNNING
Converged Version: 8.576
Last Transition Time: 2026-01-29T21:33:55Z
Message: All configservers running
Phase: RUNNING
Events: <none>
The upgrade is complete when every Pod's Converged Version matches the new version and all
phases report RUNNING.
If a Pod fails to converge to the target version — for example, due to an image pull failure, a crash loop, or a failed health check, the ConfigServer will continuously retry the upgrade for that Pod until it either succeeds or an administrator intervenes.
In this scenario, the administrator can diagnose the issue by inspecting the ConfigServer logs or the events of the failing Pod in the current upgrade phase. Once the issue is resolved, the ConfigServer will automatically retry the upgrade for that Pod and proceed with the remaining nodes.
For example, suppose the Pod search-106 is failing to upgrade.
$ kubectl get logs cfg-1 -n $NAMESPACE $ kubectl get logs cfg-2 -n $NAMESPACE $ kubectl get logs cfg-3 -n $NAMESPACE $ kubectl describe pod search-106 -n $NAMESPACE
This design prevents a bad upgrade from cascading to the rest of the Pods. Since the ConfigServer refuses to advance past a Pod that has not converged, the remaining Pods stay on the previous known-good version while the administrator investigates.