Refer to the multinode-HA sample application for a primer on how to set up a cluster - use this as a starting point. Try the Multinode testing and observability sample app to get familiar with interfaces and behavior.
There is no restart command, do a stop then start for a restart. Start and stop all services on a node:
$ $VESPA_HOME/bin/vespa-start-services $ $VESPA_HOME/bin/vespa-stop-services
Likewise, for the config server:
$ $VESPA_HOME/bin/vespa-start-configserver $ $VESPA_HOME/bin/vespa-stop-configserver
Learn more about which processes / services are started at Vespa startup, read the start sequence and find training videos in the vespaengine YouTube channel.
Use vespa-sentinel-cmd to stop/start individual services.
See multinode for systemd/systemctl examples. Docker containers has relevant start/stop information, too.
All Vespa processes have a PID file $VESPA_HOME/var/run/{service name}.pid, where {service name} is the Vespa service name, e.g. container or distributor. It is the same name which is used in the administration interface in the config sentinel.
Learn how to start / stop nodes.
Vespa service instances have status pages for debugging and testing. Status pages are subject to change at any time - take care when automating. Procedure:
$ vespa-model-inspect servicesTo find the status page port for a specific node for a specific service, pick the correct service and run:
$ vespa-model-inspect service [Options] <service-name>
/
:
$ vespa-model-inspect service searchnode searchnode @ myhost.mydomain.com : search search/search/cluster.search/0 tcp/myhost.mydomain.com:19110 (STATUS ADMIN RTC RPC) tcp/myhost.mydomain.com:19111 (FS4) tcp/myhost.mydomain.com:19112 (TEST HACK SRMP) tcp/myhost.mydomain.com:19113 (ENGINES-PROVIDER RPC) tcp/myhost.mydomain.com:19114 (HEALTH JSON HTTP) $ curl http://myhost.mydomain.com:19114/state/v1/metrics ... $ vespa-model-inspect service distributor distributor @ myhost.mydomain.com : content search/distributor/0 tcp/myhost.mydomain.com:19116 (MESSAGING) tcp/myhost.mydomain.com:19117 (STATUS RPC) tcp/myhost.mydomain.com:19118 (STATE STATUS HTTP) $ curl http://myhost.mydomain.com:19118/state/v1/metrics ... $ curl http://myhost.mydomain.com:19118/ ...
http://hostname:port/clustercontroller-status/v1/<clustername>
.
If clustername is not specified, the available clusters will be listed.
The cluster controller leader status page will show if any nodes are operating with differing cluster state versions.
It will also show how many data buckets are pending merging (document set reconciliation)
due to either missing or being out of sync.
$ vespa-model-inspect service container-clustercontroller | grep HTTP
With multiple cluster controllers, look at the one with a "/0" suffix in its config ID; it is the preferred leader.
The cluster state version is listed under the SSV table column. Divergence here usually points to host or networking issues.
Cluster and node state information is available through the /cluster/v2 API. This API can also be used to set a user state for a node - alternatively use:
Also see the cluster controller status page.
State is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves this:
In case of state data lost, the cluster state is reset - see cluster controller for implications.
It is recommended to run the cluster controller on the same hosts as config servers, as they share a zookeeper cluster for state and deploying three nodes is best practise for both. See the multinode-HA sample app for a working example.
To configure the cluster controller, use services.xml and/or add configuration under the services element - example:
<services version="1.0"> <config name="vespa.config.content.fleetcontroller"> <min_time_between_new_systemstates>5000</min_time_between_new_systemstates> </config>
A broken content node may end up with processes constantly restarting. It may die during initialization due to accessing corrupt files, or it may die when it starts receiving requests of a given type triggering a node local bug. This is bad for distributor nodes, as these restarts create constant ownership transfer between distributors, causing windows where buckets are unavailable.
The cluster controller has functionality for detecting such nodes. If a node restarts in a way that is not detected as a controlled shutdown, more than max_premature_crashes, the cluster controller will set the wanted state of this node to be down.
Detecting a controlled restart is currently a bit tricky. A controlled restart is typically initiated by sending a TERM signal to the process. Not having any other sign, the content layer has to assume that all TERM signals are the cause of controlled shutdowns. Thus, if the process keep being killed by kernel due to using too much memory, this will look like controlled shutdowns to the content layer.
Refer to the distribution algorithm.
Use distributor status pages to inspect state metrics,
see metrics.
idealstate.merge_bucket.pending
is the best metric to track,
it is 0 when the cluster is balanced - a non-zero value indicates buckets out of sync.
vespa prepare
will not change served
configuration until vespa activate
is run.
vespa prepare
will warn about all config changes that require restart.
idealstate.merge_bucket.pending
is zero on all distributors.
Do not remove more than redundancy-1 nodes at a time, to avoid data loss.
Observe idealstate.merge_bucket.pending
to know bucket replica status,
when zero on all distributor nodes, it is safe to remove more nodes.
If grouped distribution
is used to control bucket replicas, remove all nodes in a group
if the redundancy settings ensure replicas in each group.
To increase bucket redundancy level before taking nodes out,
retire nodes.
Again, track idealstate.merge_bucket.pending
to know when done.
Use the /cluster/v2 API or
vespa-set-node-state
to set a node to retired.
The cluster controller's
status page lists node states.
An alternative to increasing cluster size is building a new cluster, then migrate documents to it. This is supported using visiting.
To merge two content clusters, add nodes to the cluster like above, considering:
Read changing topology first, and plan the sequence of steps.
Make sure to not change the distribution-key
for nodes in services.xml.
It is not required to restart nodes as part of this process
It is possible to run multiple Vespa services on the same host.
If changing the services on a given host,
stop Vespa on the given host before running vespa activate
.
This is because the services are dynamically allocated port numbers,
depending on what is running on the host.
Consider if some of the services changed are used by services on other hosts.
In that case, restart services on those hosts too. Procedure:
vespa prepare
and vespa activate
Also see the FAQ.
No endpoint |
Most problems with the quick start guides are due to Docker out of memory. Make sure at least 6G memory is allocated to Docker: $ docker info | grep "Total Memory" or $ podman info | grep "memTotal"OOM symptoms include INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respondThe container is named vespa in the guides, for a shell do: $ docker exec -it vespa bash |
---|---|
Log viewing |
Use vespa-logfmt to view the vespa log - example: $ /opt/vespa/bin/vespa-logfmt -l warning,error |
Json |
For json pretty-print, append | python -m json.toolto commands that output json - or use jq. |
Routing |
Vespa lets application set up custom document processing / indexing, with different feed endpoints. Refer to indexing for how to configure this in services.xml. #13193 has a summary of problems and solutions. |
Tracing |
Use tracelevel to dump the routes and hops for a write operation - example: $ curl -H Content-Type:application/json --data-binary @docs.json \ $ENDPOINT/document/v1/mynamespace/doc/docid/1?tracelevel=4 | jq . { "pathId": "/document/v1/mynamespace/doc/docid/1", "id": "id:mynamespace:doc::1", "trace": [ { "message": "[1623413878.905] Sending message (version 7.418.23) from client to ..." }, { "message": "[1623413878.906] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.907] Sending message (version 7.418.23) from 'default/container.0' ..." }, { "message": "[1623413878.907] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.909] Selecting route" }, { "message": "[1623413878.909] No cluster state cached. Sending to random distributor." } |
There has been rare occasions were Vespa stored data that was internally inconsistent. For those circumstances it is possible to start the node in a validate_and_sanitize_docstore mode. This will do its best to clean up inconsistent data. However, detecting that this is required is not easy, consult the Vespa Team first. In order for this approach to work, all nodes must be stopped before enabling this feature - this to make sure the data is not redistributed.
Availability vs resources |
Keeping index structures costs resources. Not all replicas of buckets are necessarily searchable, unless configured using searchable-copies. As Vespa indexes buckets on-demand, the most cost-efficient setting is 1, if one can tolerate temporary coverage loss during node failures. |
---|---|
Data retention vs size |
When a document is removed, the document data is not immediately purged. Instead, remove-entries (tombstones of removed documents) are kept for a configurable amount of time. The default is two weeks, refer to removed-db prune age. This ensures that removed documents stay removed in a distributed system where nodes change state. Entries are removed periodically after expiry. Hence, if a node comes back up after being down for more than two weeks, removed documents are available again, unless the data on the node is wiped first. A larger prune age will grow the storage size as this keeps document and tombstones longer. Note: The backend does not store remove-entries for nonexistent documents. This to prevent clients sending wrong document identifiers from filling a cluster with invalid remove-entries. A side effect is that if a problem has caused all replicas of a bucket to be unavailable, documents in this bucket cannot be marked removed until at least one replica is available again. Documents are written in new bucket replicas while the others are down - if these are removed, then older versions of these will not re-emerge, as the most recent change wins. |
Transition time |
See transition-time for tradeoffs for how quickly nodes are set down vs. system stability. |
Removing unstable nodes |
One can configure how many times a node is allowed to crash before it will automatically be removed. The crash count is reset if the node has been up or down continuously for more than the stable state period. If the crash count exceeds max premature crashes, the node will be disabled. Refer to troubleshooting. |
Minimal amount of nodes required to be available |
A cluster is typically sized to handle a given load. A given percentage of the cluster resources are required for normal operations, and the remainder is the available resources that can be used if some of the nodes are no longer usable. If the cluster loses enough nodes, it will be overloaded:
|