Administrative Procedures


Refer to the multinode install for a primer on how to set up a cluster. Required architecture is x86_64.

System status

  • Check logs
  • Use performance graphs, System Activity Report (sar) or status pages to track load
  • Use query tracing
  • Use feed tracing
  • Use the cluster controller status page (below) to track the status of search/storage nodes.

Process PID files

All Vespa processes have a PID file $VESPA_HOME/var/run/{service name}.pid, where {service name} is the Vespa service name, e.g. container or distributor. It is the same name which is used in the administration interface in the config sentinel.

Status pages

Vespa service instances have status pages for debugging and testing. Status pages are subject to change at any time - take care when automating. Procedure

  1. Find the port: The status pages runs on ports assigned by Vespa. To find status page ports, use vespa-model-inspect to list the services run in the application.
    $ vespa-model-inspect services
    To find the status page port for a specific node for a specific service, pick the correct service and run:
    $ vespa-model-inspect service [Options] <service-name>
  2. Get the status and metrics: distributor, storagenode, searchnode and container-clustercontroller are content services with status pages. These ports are tagged HTTP. The cluster controller have multiple ports tagged HTTP, where the port tagged STATE is the one with the status page. Try connecting to the root at the port, or /state/v1/metrics. The distributor and storagenode status pages are available at /:
      $ vespa-model-inspect service searchnode
      searchnode @ : search
      tcp/ (FS4)
      tcp/ (TEST HACK SRMP)
      tcp/ (HEALTH JSON HTTP)
      $ curl
      $ vespa-model-inspect service distributor
      distributor @ : content
      tcp/ (MESSAGING)
      tcp/ (STATUS RPC)
      tcp/ (STATE STATUS HTTP)
      $ curl
      $ curl
    A status page for the cluster controller is available at the status port at http://hostname:port/clustercontroller-status/v1/<clustername>. If clustername is not specified, the available clusters will be listed. The cluster controller leader status page will show if any nodes are operating with differing cluster state versions. It will also show how many data buckets are pending merging (document set reconciliation) due to either missing or being out of sync.
    $ vespa-model-inspect service container-clustercontroller | grep HTTP
    With multiple cluster controllers, look at the one with a "/0" suffix in its config ID; it is the preferred leader.

    The cluster state version is listed under the SSV table column. Divergence here usually points to host or networking issues.

Cluster state

Cluster and node state information is available through the State Rest API. This API can also be used to set a user state for a node - alternatively use:

Also see the cluster controller status page.

Some state is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves this:

  • The last cluster state version is stored, such that a cluster controller taking over can continue the version numbers where the old one left of
  • The unit states (set by operators) are stored
In case of state data lost, the cluster state is reset - see cluster controller for implications.

Cluster controller configuration

To configure the cluster controller, use services.xml and/or add configuration under the services element - example:

<services version="1.0">
    <config name="vespa.config.content.fleetcontroller">


Disk and/or memory might be exhausted and block feeding - recover from feed block

Prioritized partition queues on the content nodes. The metric to look out for is the .filestor.alldisks.queuesize metric, giving a total value for all the partitions. While queue size is the most dependable metric, operations can be queued elsewhere too, which may be able to hide the issue. There are some visitor related queues: .visitor.allthreads.queuesize and .visitor.cv_queuesize.

Many applications built on Vespa is used by multiple clients and faces the problem of protecting clients from abuse. Use the RateLimitingSearcher to rate limit load from each client type.

Monitor CPU, memory and IO usage. Note: A fully utilized resource does not necessarily indicate overload. As the content cluster supports prioritized operations, it will typically do as many low priority operations as it is able to when no high priority operations are in the queue. This means, that even if there's just a low priority reprocess or a load rebalancing effort going on after a node went down, the cluster may still use up all available resources to process these low priority tasks.

Track network bandwidth consumption and switch saturation.

Monitor distance to ideal state

Refer to the distribution algorithm. Distributor status pages can be viewed to manually inspect state metrics:

.idealstate.idealstate_diff This metric tries to create a single value indicating distance to the ideal state. A value of zero indicates that the cluster is in the ideal state. Graphed values of this metric gives a good indication for how fast the cluster gets back to the ideal state after changes. Note that some issues may hide other issues, so sometimes the graph may appear to stand still or even go a bit up again, as resolving one issue may have detected one or several others.
.idealstate.buckets_toofewcopies Specifically lists how many buckets have too few copies. Compare to the buckets metric to see how big a portion of the cluster this is.
.idealstate.buckets_toomanycopies Specifically lists how many buckets have too many copies. Compare to the buckets metric to see how big a portion of the cluster this is.
.idealstate.buckets The total number of buckets managed. Used by other metrics reporting bucket counts to know how big a part of the cluster they relate to.
.idealstate.buckets_notrusted Lists how many buckets have no trusted copies. Without trusted buckets operations against the bucket may have poor performance, having to send requests to many copies to try and create consistent replies.
.idealstate.delete_bucket.pending Lists how many buckets that needs to be deleted.
.idealstate.merge_bucket.pending Lists how many buckets there are, where we suspect not all copies store identical document sets.
.idealstate.split_bucket.pending Lists how many buckets are currently being split.
.idealstate.join_bucket.pending Lists how many buckets are currently being joined.
.idealstate.set_bucket_state.pending Lists how many buckets are currently altered for active state. These are high priority requests which should finish fast, so these requests should seldom be seen as pending.

Cluster reconfig / restart


  • Running vespa-deploy prepare will not change served configurations until vespa-deploy activate is run. vespa-deploy prepare will warn about all config changes that require restart.
  • Refer to schemas for how to add/change/remove these.
  • Refer to elastic Vespa for details on how to add/remove capacity from a Vespa cluster.
  • See chained components for how to add or remove searchers and document processors.

Add or remove content node

An alternative to increasing cluster size is building a new cluster, then migrate documents to it. This is supported using visiting.

  1. Node setup: Prepare nodes by installing software, set up the file systems / directories and set configuration server(s). Details. Start the node.
  2. Modify configuration: Add/remove a node-element in services.xml and hosts.xml. Refer to multinode install. It is key that the node's distribution-key is higher than the highest existing index.
  3. Deploy: Observe metrics to track progress as the cluster redistributes documents. Use the cluster controller to monitor the state of the cluster.
  4. Tune performance: Use maxpendingidealstateoperations to tune concurrency of bucket merge operations from distributor nodes. Likewise, tune merges - concurrent merge operations per content node. The tradeoff is speed of bucket replication vs use of resources, which impacts the applications' regular load.
  5. Finish: The cluster is done redistributing when idealstate.merge_bucket.pending is zero on all distributors.

Do not remove more than redundancy-1 nodes at a time. Observe idealstate.merge_bucket.pending to know bucket replica status, when zero on all distributor nodes, it is safe to remove more nodes. If grouped distribution is used to control bucket replicas, remove all nodes in a group if the redundancy settings ensure replicas in each group.

To increase bucket redundancy level before taking nodes out, retire nodes. Again, track idealstate.merge_bucket.pending to know when done. Use the State API or vespa-set-node-state to set a node to retired. The cluster controller's status page lists node states.

Merge content clusters

To merge two content clusters, add nodes to the cluster like add node above, considering:

Add or remove services on a node

It is possible to run multiple Vespa services on the same host. If changing the services on a given host, stop Vespa on the given host before running vespa-deploy activate. This is because the services will be allocated port numbers depending on what is running on the host. Consider if some of the services changed are used by services on other hosts. In that case, restart services on those hosts too. Procedure:

  1. Edit services.xml and hosts.xml
  2. Stop Vespa on the nodes that have changes
  3. Run vespa-deploy prepare and vespa-deploy activate
  4. Start Vespa on the nodes that have changes

Add/remove/modify chains

Searcher, Document processing and Processing chains can be modified at runtime without restarts. Modification includes adding/removing processors in chains and changing names of chains and processors. Make the change and deploy. Some changes require a container restart, refer to reconfiguring document processing.

Configure grouped setup

Refer to the sizing examples for changing from a flat to grouped cluster.

Restart distributor node

When a distributor stops, it will try to respond to any pending cluster state request first. New incoming requests after shutdown is commenced will fail immediately, as the socket is no longer accepting requests. Cluster controllers will thus detect processes stopping almost immediately.

The cluster state will be updated with the new state internally in the cluster controller. Then the cluster controller will wait for maximum min_time_between_new_systemstates before publishing the new cluster state - this to reduce short-term state fluctuations.

The cluster controller has the option of setting states to make other distributors take over ownership of buckets, or mask the change, making the buckets owned by the distributor restarting unavailable for the time being. Distributors restart fast, so the restarting distributor may transition directly from up to initializing. If it doesn't, current default behavior is to set it down immediately.

If transitioning directly from up to initializing, requests going through the remaining distributors will be unaffected. The requests going through the restarting distributor will immediately fail when it shuts down, being resent automatically by the client. The distributor typically restart within seconds, and syncs up with the service layer nodes to get metadata on buckets it owns, in which case it is ready to serve requests again.

If the distributor transitions from up to down, and then later to initializing, other distributors will request metadata from the service layer node to take over ownership of buckets previously owned by the restarting distributor. Until the distributors have gathered this new metadata from all the service layer nodes, requests for these buckets can not be served, and will fail back to client. When the restarting node comes back up and is marked initializing or up in the cluster state again, the additional nodes will dump knowledge of the extra buckets they previously acquired.

For requests with timeouts of several seconds, the transition should be invisible due to automatic client resending. Requests with a lower timeout might fail, and it is up to the application whether to resend or handle failed requests.

Requests to buckets not owned by the restarting distributor will not be affected. The other distributors will start to do some work though, affecting latency, and distributors will refetch metadata for all buckets they own, not just the additional buckets, which may cause some disturbance.

Restart content node

When a content node restarts in a controlled fashion, it marks itself in the stopping state and rejects new requests. It will process its pending request queue before shutting down. Consequently, client requests are typically unaffected by content node restarts. The currently pending requests will typically be completed. New copies of buckets will be created on other nodes, to store new requests in appropriate redundancy. This happens whether node transitions through down or maintenance state. The difference being that if transitioning through maintenance state, the distributor will not start any effort of synchronizing new copies with existing copies. They will just store the new requests until the maintenance node comes back up.

When coming back up, content nodes will start with gathering information on what buckets it has data stored for. While this is happening, the service layer will expose that it is initializing, but not done with the bucket list stage. During this time, the cluster controller will not mark it initializing in cluster state yet. Once the service layer node knows what buckets it has, it reports that it is calculating metadata for the buckets, at which time the node may become visible as initializing in cluster state. At this time it may start process requests, but as bucket checksums have not been calculated for all buckets yet, there will exist buckets where the distributor doesn't know if they are in sync with other copies or not.

The background load to calculate bucket checksums has low priority, but load received will automatically create metadata for used buckets. With an overloaded cluster, the initializing step may not finish before all buckets have been initialized by requests. With a cluster close to max capacity, initializing may take quite some time.

The cluster is mostly unaffected during restart. During the initializing stage, bucket metadata is unknown. Distributors will assume other copies are more appropriate for serving read requests. If all copies of a bucket are in an initializing state at the same time, read requests may be sent to a bucket copy that does not have the most updated state to process it.


Also see the FAQ.

No endpoint Most problems with the quick start guides are due to Docker out of memory. Make sure at least 6G memory is allocated to Docker:
$ docker info | grep "Total Memory"
OOM symptoms include
INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respond
The container is named vespa in the guides, for a shell do:
$ docker exec -it vespa bash
Log viewing Use vespa-logfmt to view the vespa log - example:
$ /opt/vespa/bin/vespa-logfmt -l warning,error
Json For json pretty-print, append
| python -m json.tool
to commands that output json - or use jq.

Distributor or content node not existing

Content cluster nodes will register in the vespa-slobrok naming service on startup. If the nodes have not been set up or fail to start required processes, the naming service will mark them as unavailable.

Effect on cluster: Calculations for how big percentage of a cluster that is available will include these nodes even if they never have been seen. If many nodes are configured, but not in fact available, the cluster may set itself offline due by concluding too many nodes are down.

Content node not available on the network

vespa-slobrok requires nodes to ping it periodically. If they stop sending pings, they will be set as down and the cluster will restore full availability and redundancy by redistribution load and data to the rest of the nodes. There is a time window where nodes may be unavailable but still not set down by slobrok.

Effect on cluster: Nodes that become unavailable will be set as down after a few seconds. Before that, document operations will fail and will need to be resent. After the node is set down, full availability is restored. Data redundancy will start to restore.

Clean mode

There has been rare occasions were we ended up with data that was internally inconsistent. This should be very rare and has only happend once. For those circumstances it is possible to start the node in a special validate_and_sanitize_docstore mode. This will do its best to clean up inconsistent data. However detecting that this is required is not an easy feat. In order for this approach to work, all nodes must be stopped before enabling this feature. If not the poison pill might come back as they have the same redundancy as the rest of your documents.

Distributor or content node crashing

A crashing node restarts in much the same node as a controlled restart. A content node will not finish processing the currently pending requests, causing failed requests. Client resending might hide these failures, as the distributor should be able to process the resent request quickly, using other copies than the recently lost one.

Thrashing nodes

An example is OS disk using excessive amount of time to complete IO requests. Eventually the maximum number of files are open, and as the OS is so dependent on the filesystem, it ends up not being able to do much at all.

get-node-state requests from the cluster controller fetch node metrics from /proc and write this to a temp directory on the disk before responding. This causes a thrashing node to time out get-node-state requests, setting the node down in the cluster state.

Effect on cluster: This will have the same effects like the not available on network issue.

Constantly restarting distributor or service layer node

A broken node may end up with processes constantly restarting. It may die during initialization due to accessing corrupt files, or it may die when it starts receiving requests of a given type triggering a node local bug. This is bad for distributor nodes, as these restarts create constant ownership transfer between distributors, causing windows where buckets are unavailable.

The cluster controller has functionality for detecting such nodes. If a node restarts in a way that is not detected as a controlled shutdown, more than max_premature_crashes, the cluster controller will set the wanted state of this node to be down.

Detecting a controlled restart is currently a bit tricky. A controlled restart is typically initiated by sending a TERM signal to the process. Not having any other sign, the content layer has to assume that all TERM signals are the cause of controlled shutdowns. Thus, if the process keep being killed by kernel due to using too much memory, this will look like controlled shutdowns to the content layer.

Content cluster tradeoffs

Availability vs resources Keeping index structures costs resources. Not all replicas of buckets are necessarily searchable, unless configured using searchable-copies. As Vespa indexes buckets on-demand, the most cost-efficient setting is 1, if one can tolerate temporary coverage loss during node failures. Note that searchable-copies does not apply to streaming search as this does not use index structures.
Data retention vs size

When a document is removed, the document data is not immediately purged. Instead, remove-entries (tombstones of removed documents) are kept for a configurable amount of time. The default is two weeks, refer to pruneremoveddocumentsage. This ensures that removed documents stay removed in a distributed system where nodes change state. Entries are removed periodically after expiry. Hence, if a node comes back up after being down for more than two weeks, removed documents are available again, unless the data on the node is wiped first. A larger pruneremoveddocumentsage will hence grow the storage size as this keeps document and tombstones longer.

Note: The backend does not store remove-entries for nonexistent documents. This to prevent clients sending wrong document identifiers from filling a cluster with invalid remove-entries. A side-effect is that if a problem has caused all replicas of a bucket to be unavailable, documents in this bucket cannot be marked removed until at least one replica is available again. Documents are written in new bucket replicas while the others are down - if these are removed, then older versions of these will not re-emerge, as the most recent change wins.

Transition time See transition-time for tradeoffs for how quickly nodes are set down vs. system stability.
Removing unstable nodes One can configure how many times a node is allowed to crash before it will automatically be removed. The crash count is reset if the node has been up or down continuously for more than the stable state period. If the crash count exceeds max premature crashes, the node will be disabled. Refer to troubleshooting.
Minimal amount of nodes required to be available A cluster is typically sized to handle a given load. A given percentage of the cluster resources are required for normal operations, and the remainder is the available resources that can be used if some of the nodes are no longer usable. If the cluster loses enough nodes, it will be overloaded:

  • Remaining nodes may create disk full situation. This will likely fail a lot of write operations, and if disk is shared with OS, it may also stop the node from functioning.
  • Partition queues will grow to maximum size. As queues are processed in FIFO order, operations are likely to get long latencies.
  • Many operations may time out while being processed, causing the operation to be resent, adding more load to the cluster.
  • When new nodes are added, they cannot serve requests before data is moved to the new nodes from the already overloaded nodes. Moving data puts even more load on the existing nodes, and as moving data is typically not high priority this may never actually happen.
To configure what the minimal cluster size is, use min-distributor-up-ratio and min-storage-up-ratio.