# Vespa documentation > Vespa is a powerful and scalable engine for low-latency computation over large data sets. It is designed for real-time applications that require a combination of search, recommendation, and personalization. Vespa allows developers to build applications that can handle high volumes of queries and data writes while maintaining fast response times. **Key Features** Vespa offers a rich set of features for building sophisticated search and recommendation applications: * **Real-time Indexing and Search:** Vespa provides low-latency CRUD (Create, Read, Update, Delete) operations, making data searchable in milliseconds after being fed. * **Approximate Nearest Neighbor (ANN) Search:** Vespa includes a highly efficient implementation of the HNSW algorithm for fast and accurate vector search, which can be combined with traditional filters. * **Flexible Ranking and Relevance:** Ranking is a first-class citizen in Vespa. It supports complex, multi-phase ranking expressions and can integrate with machine-learned models (e.g., ONNX, TensorFlow, XGBoost, LightGBM) to deliver highly relevant results. * **Scalability and Elasticity:** Vespa is designed to scale horizontally. Content clusters can be grown or shrunk on the fly without service interruptions, and data is automatically redistributed to maintain a balanced load. * **Rich Data Modeling:** Vespa supports a variety of data types, including structured data, unstructured text, and tensors for vector embeddings. It also supports parent-child relationships to model complex data hierarchies. * **Comprehensive Query Language:** Vespa's query language (YQL) allows for a combination of keyword search, structured filtering, and nearest neighbor search in a single query. * **Component-Based Architecture:** Vespa's container clusters host custom Java components (Searchers, Document Processors) that allow for extensive customization of query and data processing pipelines. **Architecture** A Vespa application consists of two main types of clusters: * **Stateless Container Clusters:** These clusters handle incoming queries and data writes. They host application components that process requests and responses, perform query rewriting, and federate to backend services. * **Stateful Content Clusters:** These clusters are responsible for storing, indexing, and searching data. They automatically manage data distribution and redundancy. Vespa's architecture is designed for high availability and fault tolerance. When a node fails, the system automatically re-routes traffic and re-distributes data to maintain service. **Application Package** A Vespa application is defined by an **application package**, which contains all the necessary configuration, schemas, components, and machine-learned models. This self-contained package allows for atomic deployments and ensures consistency between code and configuration. Key files in an application package include: * **`services.xml`**: Defines the services and clusters that make up the application, including their topology and resource allocation. * **`schemas/*.sd`**: Defines the document types, their fields, and how they should be indexed and searched. Rank profiles are also defined within schemas. **APIs and Interfaces** Vespa provides a comprehensive set of APIs for interacting with the system: * **Document API (`/document/v1/`)**: A REST API for performing CRUD operations on documents. * **Query API (`/search/`)**: A powerful API for querying data using YQL, with extensive options for ranking, grouping, and presentation. * **Configuration and Deployment APIs**: REST APIs for deploying application packages and managing system configuration. This overview provides a glimpse into the capabilities of the Vespa Search engine. For more in-depth information, please refer to each of the documentation links below. ## Access Logging ### Access Logging The Vespa access log format allows the logs to be processed by a number of available tools handling JSON based (log) files. #### Access Logging The Vespa access log format allows the logs to be processed by a number of available tools handling JSON based (log) files. With the ability to add custom key/value pairs to the log from any Searcher, you can easily track the decisions done by container components for given requests. ##### Vespa Access Log Format In the Vespa access log, each log event is logged as a JSON object on a single line. The log format defines a list of fields that can be logged with every request. In addition to these fields, [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) can be logged via Searcher code. Pre-defined fields: | Name | Type | Description | Always present | | --- | --- | --- | --- | | ip | string | The IP address request came from | yes | | time | number | UNIX timestamp with millisecond decimal precision (e.g. 1477828938.123) when request is received | yes | | duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes | | responsesize | number | The size of the response in bytes | yes | | code | number | The HTTP status code returned | yes | | method | string | The HTTP method used (e.g. 'GET') | yes | | uri | string | The request URI from path and beyond (e.g. '/search?query=test') | yes | | version | string | The HTTP version (e.g. 'HTTP/1.1') | yes | | agent | string | The user agent specified in the request | yes | | host | string | The host header provided in the request | yes | | scheme | string | The scheme of the request | yes | | port | number | The IP port number of the interface on which the request was received | yes | | remoteaddr | string | The IP address of the [remote client](#logging-remote-address-port) if specified in HTTP header | no | | remoteport | string | The port used from the [remote client](#logging-remote-address-port) if specified in HTTP header | no | | peeraddr | string | Address of immediate client making request if different from _remoteaddr_ | no | | peerport | string | Port used by immediate client making request if different from _remoteport_ | no | | user-principal | string | The name of the authenticated user (java.security.Principal.getName()) if principal is set | no | | ssl-principal | string | The name of the x500 principal if client is authenticated through SSL/TLS | no | | search | object | Object holding search specific fields | no | | search.totalhits | number | The total number of hits for the query | no | | search.hits | number | The hits returned in this specific response | no | | search.coverage | object | Object holding [query coverage information](graceful-degradation.html) similar to that returned in result set. | no | | connection | string | Reference to the connection log entry. See [Connection log](#connection-log) | no | | attributes | object | Object holding [custom key/value pairs](#logging-key-value-pairs-to-the-json-access-log-from-searchers) logged in searcher. | no | **Note:** IP addresses can be both IPv4 addresses in standard dotted format (e.g. 127.0.0.1) or IPv6 addresses in standard form with leading zeros omitted (e.g. 2222:1111:123:1234:0:0:0:4321). An example log line will look like this (here, pretty-printed): ``` { "ip": "152.200.54.243", "time": 920880005.023, "duration": 0.122, "responsesize": 9875, "code": 200, "method": "GET", "uri": "/search?query=test¶m=value", "version": "HTTP/1.1", "agent": "Mozilla/4.05 [en] (Win95; I)", "host": "localhost", "search": { "totalhits": 1234, "hits": 0, "coverage": { "coverage": 98, "documents": 100, "degraded": { "non-ideal-state": true } } } } ``` **Note:** The log format is extendable by design such that the order of the fields can be changed and new fields can be added between minor versions. Make sure any programmatic log handling is using a proper JSON processor. Example: Decompress, pretty-print, with human-readable timestamps: ``` $[jq](https://stedolan.github.io/jq/)'. + {iso8601date:(.time | todateiso8601)}' \ <(unzstd -c /opt/vespa/logs/vespa/access/JsonAccessLog.default.20210601010000.zst) ``` ###### Logging Remote Address/Port In some cases when a request passes through an intermediate service, this service may add HTTP headers indicating the IP address and port of the real origin client. These values are logged as _remoteaddr_ and _remoteport_ respectively. Vespa will log the contents in any of the following HTTP request headers as _remoteaddr_: _X-Forwarded-For_, _Y-RA_, _YahooRemoteIP_ or _Client-IP_. If more than one of these headers are present, the precedence is in the order listed here, i.e. _X-Forwarded-For_ takes precedence over _Y-RA_. The contents of the _Y-RP_ HTTP request header will be logged as _remoteport_. If the remote address or -port differs from those initiating the HTTP request, the address and port for the immediate client making the request are logged as _peeraddress_ and _peerport_ respectively. ##### Configuring Logging For details on the access logging configuration see [accesslog in the container](reference/services-container.html#accesslog) element in _services.xml_. Key configuration options include: - **fileNamePattern**: Pattern for log file names with time variable support - **rotationInterval**: Time-based rotation schedule (minutes since midnight) - **rotationSize**: Size-based rotation threshold in bytes (0 = disabled) - **rotationScheme**: Either 'sequence' or 'date' - **compressionFormat**: GZIP or ZSTD compression for rotated files ###### Logging Request Content Vespa supports logging of request content for specific URI paths. This is useful for inspecting query content of search POST requests or document operations of Document v1 POST/PUT requests. The request content is logged as a base64-encoded string in the JSON access log. To configure request content logging, use the [request-content](reference/services-container.html#request-content) element in the accesslog configuration in _services.xml_. Here is an example of how the request content appears in the JSON access log: ``` { ... "method": "POST", "uri": "/search", ..., "request-content": { "type": "application/json; charset=utf-8", "length": 12345, "body": "" } } ``` ###### File name pattern The file name pattern is expanded using the time when the file is created. The following parts in the file name are expanded: | Field | Format | Meaning | Example | | --- | --- | --- | --- | | %Y | YYYY | Year | 2003 | | %m | MM | Month, numeric | 08 | | %x | MMM | Month, textual | Aug | | %d | dd | Date | 25 | | %H | HH | Hour | 14 | | %M | mm | Minute | 30 | | %S | ss | Seconds | 35 | | %s | SSS | Milliseconds | 123 | | %Z | Z | Time zone | -0400 | | %T | Long | System.currentTimeMillis | 1349333576093 | | %% | % | Escape percentage | % | ##### Log rotation Apache httpd style log _rotation_ can be configured by setting the _rotationScheme_. There's two alternatives for the rotationScheme, sequence and date. Rotation can be triggered by time intervals using _rotationInterval_ and/or by file size using _rotationSize_. ###### Sequence rotation scheme The _fileNamePattern_ is used for the active log file name (which in this case will often be a constant string). At rotation, this file is given the name fileNamePattern.N where N is 1 + the largest integer found by extracting the integers from all files ending by .\ in the same directory ``` ``` ###### Date rotation scheme The _fileNamePattern_ is used for the active log file name here too, but the log files are not renamed at rotation. Instead, you must specify a time-dependent fileNamePattern so that each time a new log file is created, the name is unique. In addition, a symlink is created pointing to the active log file. The name of the symlink is specified using _symlinkName_. ``` ``` ###### Rotation interval The time of rotation is controlled by setting _rotationInterval_: ``` ``` The rotationInterval is a list of numbers specifying when to do rotation. Each element represents the number of minutes since midnight. Ending the list with '...' means continuing the [arithmetic progression](https://en.wikipedia.org/wiki/Arithmetic_progression) defined by the two last numbers for the rest of the day. E.g. "0 100 240 480 ..." is expanded to "0 100 240 480 720 960 1200" ###### Log retention Access logs are rotated, but not deleted by Vespa processes. It is up to the application owner to take care of archiving of access logs. ##### Logging Key/Value pairs to the JSON Access Log from Searchers To add a key/value pair to the access log from a searcher, use ``` query/result.getContext(true).logValue(key,value) ``` Such key/value pairs may be added from any thread participating in handling the query without incurring synchronization overhead. If the same key is logged multiple times, the values written will be included in the log as an array of strings rather than a single string value. The key/value pairs are added to the _attributes_ object in the log. An example log line will then look something like this: ``` {"ip":"152.200.54.243","time":920880005.023,"duration":0.122,"responsesize":9875,"code":200,"method":"GET","uri":"/search?query=test¶m=value","version":"HTTP/1.1","agent":"Mozilla/4.05 [en] (Win95; I)","host":"localhost","search":{"totalhits":1234,"hits":0},"attributes":{"singlevalue":"value1","multivalue":["value2","value3"]}} ``` A pretty print version of the same example: ``` { "ip": "152.200.54.243", "time": 920880005.023, "duration": 0.122, "responsesize": 9875, "code": 200, "method": "GET", "uri": "/search?query=test¶m=value", "version": "HTTP/1.1", "agent": "Mozilla/4.05 [en] (Win95; I)", "host": "localhost", "search": { "totalhits": 1234, "hits": 0 }, "attributes": { "singlevalue": "value1", "multivalue": [ "value2", "value3" ] } } ``` ##### Connection log In addition to the access log, one entry per connection is written to the connection log. This entry is written on connection close. Available fields: | Name | Type | Description | Always present | | --- | --- | --- | --- | | id | string | Unique ID of the connection, referenced from access log. | yes | | timestamp | number | Timestamp (ISO8601 format) when the connection was opened | yes | | duration | number | The duration of the request in seconds with millisecond decimal precision (e.g. 0.123) | yes | | peerAddress | string | IP address used by immediate client making request | yes | | peerPort | number | Port used by immediate client making request | yes | | localAddress | string | The local IP address the request was received on | yes | | localPort | number | The local port the request was received on | yes | | remoteAddress | string | Original client ip, if proxy protocol enabled | no | | remotePort | number | Original client port, if proxy protocol enabled | no | | httpBytesReceived | number | Number of HTTP bytes sent over the connection | no | | httpBytesSent | number | Number of HTTP bytes received over the connection | no | | requests | number | Number of requests sent by the client | no | | responses | number | Number of responses sent to the client | no | | ssl | object | Detailed information on ssl connection | no | ##### SSL information | Name | Type | Description | Always present | | --- | --- | --- | --- | | clientSubject | string | Client certificate subject | no | | clientNotBefore | string | Client certificate valid from | no | | clientNotAfter | string | Client certificate valid to | no | | sessionId | string | SSL session id | no | | protocol | string | SSL protocol | no | | cipherSuite | string | Name of session cipher suite | no | | sniServerName | string | SNI server name | no | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Vespa Access Log Format](#access-log-format) - [Logging Remote Address/Port](#logging-remote-address-port) - [Configuring Logging](#configuring-logging) - [Logging Request Content](#logging-request-content) - [File name pattern](#file-name-pattern) - [Log rotation](#log-rotation) - [Sequence rotation scheme](#sequence-rotation-scheme) - [Date rotation scheme](#date-rotation-scheme) - [Rotation interval](#rotation-interval) - [Log retention](#log-retention) - [Logging Key/Value pairs to the JSON Access Log from Searchers](#logging-key-value-pairs-to-the-json-access-log-from-searchers) - [Connection log](#connection-log) - [SSL information](#ssl-information) --- ## Admin Procedures ### Administrative Procedures Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for a primer on how to set up a cluster - use this as a starting point. #### Administrative Procedures ##### Install Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for a primer on how to set up a cluster - use this as a starting point. Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior. ##### Vespa start / stop / restart Start and stop all services on a node: ``` $ $VESPA_HOME/bin/[vespa-start-services](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-start-services)$ $VESPA_HOME/bin/[vespa-stop-services](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stop-services) ``` Likewise, for the config server: ``` $ $VESPA_HOME/bin/[vespa-start-configserver](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-start-configserver)$ $VESPA_HOME/bin/[vespa-stop-configserver](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stop-configserver) ``` There is no _restart_ command, do a _stop_ then _start_ for a restart. Learn more about which processes / services are started at [Vespa startup](/en/operations-selfhosted/config-sentinel.html), read the [start sequence](/en/operations-selfhosted/configuration-server.html#start-sequence) and find training videos in the vespaengine [YouTube channel](https://www.youtube.com/@vespaai). Use [vespa-sentinel-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-sentinel-cmd) to stop/start individual services. **Important:** Running _vespa-stop-services_ on a content node will call[prepareRestart](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-proton-cmd) to optimize restart time, and is the recommended way to stop Vespa on a node. See [multinode](multinode-systems.html#aws-ec2) for _systemd_ /_systemctl_ examples. [Docker containers](/en/operations-selfhosted/docker-containers.html) has relevant start/stop information, too. ###### Content node maintenance mode When stopping a content node _temporarily_ (e.g. for a software upgrade), consider manually setting the node into [maintenance mode](../reference/cluster-v2.html#maintenance) _before_ stopping the node to prevent automatic redistribution of data while the node is down. Maintenance mode must be manually removed once the node has come back online. See also: [cluster state](#cluster-state). Example of setting a node with [distribution key](../reference/services-content.html#node) 42 into `maintenance` mode using [vespa-set-node-state](vespa-cmdline-tools.html#vespa-set-node-state), additionally supplying a reason that will be recorded by the cluster controller: ``` $ vespa-set-node-state --type storage --index 42 maintenance "rebooting for software upgrade" ``` After the node has come back online, clear maintenance mode by marking the node as `up`: ``` $ vespa-set-node-state --type storage --index 42 up ``` Note that if the above commands are executed _locally_ on the host running the services for node 42, `--index 42` can be omitted; `vespa-set-node-state` will use the distribution key of the local node if no `--index` has been explicitly specified. ##### System status - Use [vespa-config-status](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-config-status) on a node in [hosts.xml](../reference/hosts.html) to verify all services run with updated config - Make sure [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) is set and identical on all nodes in hosts.xml - Use the _cluster controller_ status page (below) to track the status of search/storage nodes. - Check [logs](../reference/logs.html) - Use performance graphs, System Activity Report (_sar_) or [status pages](#status-pages) to track load - Use [query tracing](../reference/query-api-reference.html#trace.level) - Disk and/or memory might be exhausted and block feeding - recover from [feed block](/en/operations/feed-block.html) ##### Status pages All Vespa services have status pages, for showing health, Vespa version, config, and metrics. Status pages are subject to change at any time - take care when automating. Procedure: 1. **Find the port:** The status pages runs on ports assigned by Vespa. To find status page ports, use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) to list the services run in the application. ``` $ vespa-model-inspect services ``` To find the status page port for a specific node for a specific service, pick the correct service and run: ``` $ vespa-model-inspect service [Options] ``` 2. **Get the status and metrics:**_distributor_, _storagenode_, _searchnode_ and _container-clustercontroller_ are content services with status pages. These ports are tagged HTTP. The cluster controller have multiple ports tagged HTTP, where the port tagged STATE is the one with the status page. Try connecting to the root at the port, or /state/v1/metrics. The _distributor_ and _storagenode_ status pages are available at `/`: ``` $ vespa-model-inspect service searchnode searchnode @ myhost.mydomain.com : search search/search/cluster.search/0 tcp/myhost.mydomain.com:19110 (STATUS ADMIN RTC RPC) tcp/myhost.mydomain.com:19111 (FS4) tcp/myhost.mydomain.com:19112 (TEST HACK SRMP) tcp/myhost.mydomain.com:19113 (ENGINES-PROVIDER RPC)tcp/myhost.mydomain.com:19114 (HEALTH JSON HTTP)$ curl http://myhost.mydomain.com:19114/state/v1/metrics ... $ vespa-model-inspect service distributor distributor @ myhost.mydomain.com : content search/distributor/0 tcp/myhost.mydomain.com:19116 (MESSAGING) tcp/myhost.mydomain.com:19117 (STATUS RPC)tcp/myhost.mydomain.com:19118 (STATE STATUS HTTP)$ curl http://myhost.mydomain.com:19118/state/v1/metrics ... $ curl http://myhost.mydomain.com:19118/ ... ``` 3. **Use the cluster controller status page**: A status page for the cluster controller is available at the status port at `http://hostname:port/clustercontroller-status/v1/`. If _clustername_ is not specified, the available clusters will be listed. The cluster controller leader status page will show if any nodes are operating with differing cluster state versions. It will also show how many data buckets are pending merging (document set reconciliation) due to either missing or being out of sync. ``` $[vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect)service container-clustercontroller | grep HTTP ``` With multiple cluster controllers, look at the one with a "/0" suffix in its config ID; it is the preferred leader. The cluster state version is listed under the _SSV_ table column. Divergence here usually points to host or networking issues. ##### Cluster state Cluster and node state information is available through the [/cluster/v2 API](../reference/cluster-v2.html). This API can also be used to set a _user state_ for a node - alternatively use: - [vespa-get-cluster-state](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-cluster-state) - [vespa-get-node-state](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-node-state) - [vespa-set-node-state](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-set-node-state) Also see the cluster controller [status page](#status-pages). State is persisted in a ZooKeeper cluster, restarting/changing a cluster controller preserves: - Last cluster state version number, for new cluster controller handover at restarts - User states, set by operators - i.e. nodes manually set to down / maintenance In case of state data lost, the cluster state is reset - see [cluster controller](../content/content-nodes.html#cluster-controller) for implications. ##### Cluster controller configuration It is recommended to run cluster controllers on the same hosts as [config servers](/en/operations-selfhosted/configuration-server.html), as they share a zookeeper cluster for state and deploying three nodes is best practise for both. See the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample app for a working example. To configure the cluster controller, use [services.xml](../reference/services-content.html#cluster-controller) and/or add [configuration](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) under the _services_ element - example: ``` 5000 ``` A broken content node may end up with processes constantly restarting. It may die during initialization due to accessing corrupt files, or it may die when it starts receiving requests of a given type triggering a node local bug. This is bad for distributor nodes, as these restarts create constant ownership transfer between distributors, causing windows where buckets are unavailable. The cluster controller has functionality for detecting such nodes. If a node restarts in a way that is not detected as a controlled shutdown, more than [max\_premature\_crashes](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def), the cluster controller will set the wanted state of this node to be down. Detecting a controlled restart is currently a bit tricky. A controlled restart is typically initiated by sending a TERM signal to the process. Not having any other sign, the content layer has to assume that all TERM signals are the cause of controlled shutdowns. Thus, if the process keep being killed by kernel due to using too much memory, this will look like controlled shutdowns to the content layer. ##### Monitor distance to ideal state Refer to the [distribution algorithm](../content/idealstate.html). Use distributor [status pages](#status-pages) to inspect state metrics, see [metrics](../content/content-nodes.html#metrics). `idealstate.merge_bucket.pending` is the best metric to track, it is 0 when the cluster is balanced - a non-zero value indicates buckets out of sync. ##### Cluster configuration - Running `vespa prepare` will not change served configuration until `vespa activate` is run. `vespa prepare` will warn about all config changes that require restart. - Refer to [schemas](../schemas.html) for how to add/change/remove these. - Refer to [elasticity](../elasticity.html) for how to add/remove capacity from a Vespa cluster, procedure below. - See [chained components](../components/chained-components.html) for how to add or remove searchers and document processors. - Refer to the [sizing examples](../performance/sizing-examples.html) for changing from a _flat_ to _grouped_ content cluster. ##### Add or remove a content node 1. **Node setup:** Prepare the node by installing software, set up the file systems/directories and set [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables). [Start](#vespa-start-stop-restart) the node. 2. **Modify configuration:** Add/remove a [node](../reference/services-content.html#node)-element in _services.xml_ and [hosts.xml](../reference/hosts.html). Refer to [multinode install](multinode-systems.html). Make sure the _distribution-key_ is unique. 3. **Deploy**: [Observe metrics](#monitor-distance-to-ideal-state) to track progress as the cluster redistributes documents. Use the [cluster controller](../content/content-nodes.html#cluster-controller) to monitor the state of the cluster. 4. **Tune performance (optional):** Use [maxpendingidealstateoperations](https://github.com/vespa-engine/vespa/blob/master/storage/src/vespa/storage/config/stor-distributormanager.def) to tune concurrency of bucket merge operations from distributor nodes. Likewise, tune [merges](../reference/services-content.html#merges) - concurrent merge operations per content node. The tradeoff is speed of bucket replication vs use of resources, which impacts the applications' regular load. 5. **Finish:** The cluster is done redistributing when `idealstate.merge_bucket.pending` is zero on all distributors. Do not remove more than _redundancy_-1 nodes at a time, to avoid data loss. Observe `idealstate.merge_bucket.pending` to know bucket replica status, when zero on all distributor nodes, it is safe to remove more nodes. If [grouped distribution](../elasticity.html#grouped-distribution) is used to control bucket replicas, remove all nodes in a group if the redundancy settings ensure replicas in each group. To increase bucket redundancy level before taking nodes out, [retire](../content/content-nodes.html) nodes. Again, track `idealstate.merge_bucket.pending` to know when done. Use the [/cluster/v2 API](../reference/cluster-v2.html) or [vespa-set-node-state](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-set-node-state) to set a node to the _retired_ state. The [cluster controller's](../content/content-nodes.html#cluster-controller) status page lists node states. An alternative to increasing cluster size is building a new cluster, then migrate documents to it. This is supported using [visiting](../visiting.html). To _merge_ two content clusters, add nodes to the cluster like above, considering: - [distribution-keys](../reference/services-content.html#node) must be unique. Modify paths like _$VESPA\_HOME/var/db/vespa/search/mycluster/n3_ before adding the node. - Set [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables), then start the node. ##### Topology change Read [changing topology first](/en/elasticity.html#changing-topology), and plan the sequence of steps. Make sure to not change the `distribution-key` for nodes in _services.xml_. It is not required to restart nodes as part of this process ##### Add or remove services on a node It is possible to run multiple Vespa services on the same host. If changing the services on a given host, stop Vespa on the given host before running `vespa activate`. This is because the services are dynamically allocated port numbers, depending on what is running on the host. Consider if some of the services changed are used by services on other hosts. In that case, restart services on those hosts too. Procedure: 1. Edit _services.xml_ and _hosts.xml_ 2. Stop Vespa on the nodes that have changes 3. Run `vespa prepare` and `vespa activate` 4. Start Vespa on the nodes that have changes ##### Troubleshooting Also see the [FAQ](../faq.html). | No endpoint | Most problems with the quick start guides are due to Docker out of memory. Make sure at least 6G memory is allocated to Docker: ``` $ docker info | grep "Total Memory" or $ podman info | grep "memTotal" ``` OOM symptoms include ``` INFO: Problem with Handshake localhost:8080 ssl=false: localhost:8080 failed to respond ``` The container is named _vespa_ in the guides, for a shell do: ``` $ docker exec -it vespa bash ``` | | Log viewing | Use [vespa-logfmt](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-logfmt) to view the vespa log - example: ``` $ /opt/vespa/bin/vespa-logfmt -l warning,error ``` | | Json | For json pretty-print, append ``` | python -m json.tool ``` to commands that output json - or use [jq](https://stedolan.github.io/jq/). | | Routing | Vespa lets application set up custom document processing / indexing, with different feed endpoints. Refer to [indexing](../indexing.html) for how to configure this in _services.xml_. [#13193](https://github.com/vespa-engine/vespa/issues/13193) has a summary of problems and solutions. | | Tracing | Use [tracelevel](../reference/document-v1-api-reference.html#request-parameters) to dump the routes and hops for a write operation - example: ``` $ curl -H Content-Type:application/json --data-binary @docs.json \ $ENDPOINT/document/v1/mynamespace/doc/docid/1?tracelevel=4 | jq . { "pathId": "/document/v1/mynamespace/doc/docid/1", "id": "id:mynamespace:doc::1", "trace": [ { "message": "[1623413878.905] Sending message (version 7.418.23) from client to ..." }, { "message": "[1623413878.906] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.907] Sending message (version 7.418.23) from 'default/container.0' ..." }, { "message": "[1623413878.907] Message (type 100004) received at 'default/container.0' ..." }, { "message": "[1623413878.909] Selecting route" }, { "message": "[1623413878.909] No cluster state cached. Sending to random distributor." } ``` | ##### Clean start mode There has been rare occasions were Vespa stored data that was internally inconsistent. For those circumstances it is possible to start the node in a [validate\_and\_sanitize\_docstore](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/proton.def) mode. This will do its best to clean up inconsistent data. However, detecting that this is required is not easy, consult the Vespa Team first. In order for this approach to work, all nodes must be stopped before enabling this feature - this to make sure the data is not redistributed. ##### Content cluster configuration | Availability vs resources | Keeping index structures costs resources. Not all replicas of buckets are necessarily searchable, unless configured using [searchable-copies](../reference/services-content.html#searchable-copies). As Vespa indexes buckets on-demand, the most cost-efficient setting is 1, if one can tolerate temporary coverage loss during node failures. | | Data retention vs size | When a document is removed, the document data is not immediately purged. Instead, _remove-entries_ (tombstones of removed documents) are kept for a configurable amount of time. The default is two weeks, refer to [removed-db prune age](../reference/services-content.html#removed-db-prune-age). This ensures that removed documents stay removed in a distributed system where nodes change state. Entries are removed periodically after expiry. Hence, if a node comes back up after being down for more than two weeks, removed documents are available again, unless the data on the node is wiped first. A larger _prune age_ will grow the storage size as this keeps document and tombstones longer. **Note:** The backend does not store remove-entries for nonexistent documents. This to prevent clients sending wrong document identifiers from filling a cluster with invalid remove-entries. A side effect is that if a problem has caused all replicas of a bucket to be unavailable, documents in this bucket cannot be marked removed until at least one replica is available again. Documents are written in new bucket replicas while the others are down - if these are removed, then older versions of these will not re-emerge, as the most recent change wins. | | Transition time | See [transition-time](../reference/services-content.html#transition-time) for tradeoffs for how quickly nodes are set down vs. system stability. | | Removing unstable nodes | One can configure how many times a node is allowed to crash before it will automatically be removed. The crash count is reset if the node has been up or down continuously for more than the [stable state period](../reference/services-content.html#stable-state-period). If the crash count exceeds [max premature crashes](../reference/services-content.html#max-premature-crashes), the node will be disabled. Refer to [troubleshooting](#troubleshooting). | | Minimal amount of nodes required to be available | A cluster is typically sized to handle a given load. A given percentage of the cluster resources are required for normal operations, and the remainder is the available resources that can be used if some of the nodes are no longer usable. If the cluster loses enough nodes, it will be overloaded: - Remaining nodes may create disk full situation. This will likely fail a lot of write operations, and if disk is shared with OS, it may also stop the node from functioning. - Partition queues will grow to maximum size. As queues are processed in FIFO order, operations are likely to get long latencies. - Many operations may time out while being processed, causing the operation to be resent, adding more load to the cluster. - When new nodes are added, they cannot serve requests before data is moved to the new nodes from the already overloaded nodes. Moving data puts even more load on the existing nodes, and as moving data is typically not high priority this may never actually happen. To configure what the minimal cluster size is, use [min-distributor-up-ratio](../reference/services-content.html#min-distributor-up-ratio) and [min-storage-up-ratio](../reference/services-content.html#min-storage-up-ratio). | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Install](#install) - [Vespa start / stop / restart](#vespa-start-stop-restart) - [Content node maintenance mode](#content-node-maintenance-mode) - [System status](#system-status) - [Status pages](#status-pages) - [Cluster state](#cluster-state) - [Cluster controller configuration](#cluster-controller-configuration) - [Monitor distance to ideal state](#monitor-distance-to-ideal-state) - [Cluster configuration](#cluster-configuration) - [Add or remove a content node](#add-or-remove-a-content-node) - [Topology change](#topology-change) - [Add or remove services on a node](#add-or-remove-services-on-a-node) - [Troubleshooting](#troubleshooting) - [Clean start mode](#clean-start-mode) - [Content cluster configuration](#content-cluster-configuration) --- ## Api ### Vespa API and interfaces - [Deploy API](reference/deploy-rest-api-v2.html): Deploy [application packages](applications.html) to configure a Vespa application #### Vespa API and interfaces ##### Deployment and configuration - [Deploy API](reference/deploy-rest-api-v2.html): Deploy [application packages](applications.html) to configure a Vespa application - [Config API](reference/config-rest-api-v2.html): Get and Set configuration - [Tenant API](reference/application-v2-tenant.html): Configure multiple tenants in the config servers ##### Document API - [Reads and writes](reads-and-writes.html): APIs and binaries to read and update documents - [/document/v1/](reference/document-v1-api-reference.html): REST API for operations based on document ID (get, put, remove, update) - [Feeding API](vespa-feed-client.html): High performance feeding API, the recommended API for feeding data - [JSON feed format](reference/document-json-format.html): The Vespa Document format - [Vespa Java Document API](document-api-guide.html) ##### Query and grouping - [Query API](query-api.html), [Query API reference](reference/query-api-reference.html) - [Query Language](query-language.html), [Query Language reference](reference/query-language-reference.html), [Simple Query Language reference](reference/simple-query-language-reference.html), [Predicate fields](predicate-fields.html) - [Vespa Query Profiles](query-profiles.html) - [Grouping API](grouping.html), [Grouping API reference](reference/grouping-syntax.html) ##### Processing - [Vespa Processing](jdisc/processing.html): Request-Response processing - [Vespa Document Processing](document-processing.html): Feed processing ##### Request processing - [Searcher API](searcher-development.html) - [Federation API](federation.html) - [Web service API](developing-web-services.html) ##### Result processing - [Custom renderer API](result-rendering.html) ##### Status and state - [Health and Metric APIs](operations/metrics.html) - [/cluster/v2 API](reference/cluster-v2.html) Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Deployment and configuration](#deployment-and-configuration) - [Document API](#document-api) - [Query and grouping](#query-and-grouping) - [Processing](#processing) - [Request processing](#request-processing) - [Result processing](#result-processing) - [Status and state](#status) --- ## Application Packages Reference ### Application Package Reference This is the [application package](../application-packages.html) reference. #### Application Package Reference This is the [application package](../application-packages.html) reference. An application package is the deployment unit in Vespa. To deploy an application, create an application package and [vespa deploy](/en/vespa-cli.html#deployment) or use the [deploy API](deploy-rest-api-v2.html). The application package is a directory of files and subdirectories: | Directory/file | Required | Description | | --- | --- | --- | | [services.xml](services.html) | Yes | Describes which services to run where, and their main configuration. | | [hosts.xml](hosts.html) | No | Vespa Cloud: Not used. See node counts in [services.xml](/en/reference/services). Self-managed: The mapping from logical nodes to actual hosts. | | [deployment.xml](/en/reference/deployment) | Yes, for Vespa Cloud | Specifies which environments and regions the application is deployed to during automated application deployment, as which application instances. This file also specifies other deployment-related configurations like [cloud accounts](/en/cloud/enclave/enclave.html) and [private endpoints](/en/cloud/private-endpoints.html). The file is required when deploying to the [prod environment](/en/cloud/environments.html#prod) - it is ignored (with some exceptions) when deploying to the _dev_ environment. | | [validation-overrides.xml](validation-overrides.html) | No | Override, allowing this package to deploy even if it fails validation. | | [.vespaignore](vespaignore.html) | No | Contains a list of path patterns that should be excluded from the `application.zip` deployed to Vespa. | | [models](stateless-model-reference.html)/ | No | Machine-learned models in the application package. Refer to [stateless model evaluation](../stateless-model-evaluation.html), [Tensorflow](../tensorflow.html), [Onnx](../onnx.html), [XGBoost](../xgboost.html), and [LightGBM](../lightgbm.html), also see [deploying remote models](../application-packages.html#deploying-remote-models) | | [schemas](../schemas.html)/ | No | Contains the \*.sd files describing the document types of the application and how they should be queried and processed. | | [schemas/[schema]](schema-reference.html#rank-profile)/ | No | Contains \*.profile files defining [rank profiles](../ranking.html#rank-profiles). This is an alternative to defining rank profiles inside the schema. | | [security/clients.pem](/en/cloud/security/guide.html) | Yes, for Vespa Cloud | PEM encoded X.509 certificates for data plane access. See the [security guide](/en/cloud/security/guide.html) for how to generate and use. | | [components](../jdisc/container-components.html)/ | No | Contains \*.jar files containing searcher(s) for the JDisc Container. | | [rules](semantic-rules.html)/ | No | Contains \*.sr files containing rule bases for semantic recognition and translation of the query | | [search/query-profiles](query-profile-reference.html)/ | No | Contains \*.xml files containing a named set of search request parameters with values | | [constants](../tensor-user-guide.html#constant-tensors)/ | No | Constant tensors | | [tests](testing.html)/ | No | Test files for automated tests | | ext/ | No | Files that are guaranteed to be ignored by Vespa: They are excluded when processing the application package and cannot be referenced from any other element in it. | Additional files and directories can be placed anywhere in the application package. These will be not be processed explicitly by Vespa when deploying the application package (i.e. they will only be considered if they are referred to from within the application package), but there is no guarantee to how these might be processed in a future release. To extend the application package in a way that is guaranteed to be ignored by Vespa in all future releases, use the _ext/_ directory. ##### Deploy | Command | Description | | --- | --- | | upload | Uploads an application package to the config server. Normally not used, as _prepare_ includes _upload_ | | prepare | 1. Verifies that a configuration server is up and running 2. Uploads the application to the configuration server, which stores it in _$VESPA\_HOME/var/db/vespa/config\_server/serverdb/tenants/default/sessions/[sessionid]_. _[sessionid]_ increases for each _prepare_-call. The config server also stores the application in a [ZooKeeper](/en/operations-selfhosted/configuration-server.html) instance at _/config/v2/tenants/default/sessions/[sessionid]_ - this distributes the application to all config servers 3. Creates metadata about the deployed the applications package (which user deployed it, which directory was it deployed from and at what time was it deployed) and stores it in _...sessions/[sessionid]/.applicationMetaData_ 4. Verifies that the application package contains the required files and performs a consistency check 5. Validates the xml config files using the [schema](https://github.com/vespa-engine/vespa/tree/master/config-model/src/main/resources/schema), found in _$VESPA\_HOME/share/vespa/schema_ 6. Checks if there are config changes between the active application and this prepared application that require actions like restart or re-feed (like changes to [schemas](../schemas.html)). These actions are returned as part of the prepare step in the [deployment API](deploy-rest-api-v2.html#prepare-session). This prevents breaking changes to production - also read about [validation overrides](validation-overrides.html) 7. Distributes constant tensors and bundles with [components](../jdisc/container-components.html) to nodes using [file distribution](../application-packages.html#file-distribution). Files are downloaded to _$VESPA\_HOME/var/db/vespa/filedistribution_, URL download starts downloading to _$VESPA\_HOME/var/db/vespa/download_ | | activate | 1. Waits for prepare to complete 2. Activates new configuration version 3. Signals to containers to load new bundles - read more in [container components](../jdisc/container-components.html) | | fetch | Use _fetch_ to download the active application package | An application package can be zipped for deployment: ``` $ zip -r ../app.zip . ``` Use any name for the zip file - then refer to the file instead of the path in [deploy](/en/vespa-cli.html#deployment) commands. **Important:** Using `tar` / `gzip` is not supported.[Details](https://github.com/vespa-engine/vespa/issues/17837). ##### Preprocess directives Use preprocess directives to: - _preprocess:properties_: define properties that one can refer to everywhere in _services.xml_ - _preprocess:include_: split _services.xml_ in smaller chunks Below, _${container.port}_ is replaced by _4099_. The contents of _content.xml_ is placed at the _include_ point. This is applied recursively, one can use preprocess directives in included files, as long as namespaces are defined in the top level file: ``` \ \4099\ \ \ ``` Sample _content.xml_: ``` 1 ``` ##### Versioning application packages An application can be given a user-defined version, available at[/ApplicationStatus](../jdisc/container-components.html#monitoring-the-active-application). Configure the version in [services.xml](services.html) (at top level): ``` 42 ... ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Application V2 Tenant ### /application/v2/tenant API reference This is the /application/v2/tenant API reference with examples for the HTTP REST API to [list](#list-tenants), [create](#create-tenant) and [delete](#delete-tenant) a tenant, which can be used to [deploy](deploy-rest-api-v2.html) an application. #### /application/v2/tenant API reference This is the /application/v2/tenant API reference with examples for the HTTP REST API to [list](#list-tenants), [create](#create-tenant) and [delete](#delete-tenant) a tenant, which can be used to [deploy](deploy-rest-api-v2.html) an application. The response format is JSON. The tenant value is "default". The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) service configserver to find config server hosts - example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/` ##### HTTP requests | HTTP request | application/v2/tenant operation | Description | | --- | --- | --- | | GET | List tenant information. | | | List tenants | ``` /application/v2/tenant/ ``` Example response: ``` ``` [ "default" ] ``` ``` | | | Get tenant | ``` /application/v2/tenant/default ``` Example response: ``` ``` { "message": "Tenant 'default' exists." } ``` ``` | | PUT | Create a new tenant. | | | Create tenant | ``` /application/v2/tenant/default ``` Response: A message with the name of the tenant created - example: ``` ``` { "message" : "Tenant default created." } ``` ``` **Note:** This operation is asynchronous, it will eventually propagate to all config servers. | | DELETE | Delete a tenant. | | | Delete tenant | ``` /application/v2/tenant/default ``` Response: A message with the deleted tenant: ``` ``` { "message" : "Tenant default deleted." } ``` ``` **Note:** This operation is asynchronous, it will eventually propagate to all config servers. | ##### Request parameters None. ##### HTTP status codes Non-exhaustive list of status codes. Any additional info is included in the body of the return call, JSON-formatted. | Code | Description | | --- | --- | | 400 | Bad request. Client error. The error message should indicate the cause. | | 404 | Not found. For example using a session id that does not exist. | | 405 | Method not implemented. E.g. using GET where only POST or PUT is allowed. | | 500 | Internal server error. Generic error. The error message should indicate the cause. | ##### Response format Responses are in JSON format, with the following fields: | Field | Description | | --- | --- | | message | An info/error message. | Copyright © 2025 - [Cookie Preferences](#) --- ## Applications ### Vespa applications You use Vespa by deploying an _application_ to it. #### Vespa applications You use Vespa by deploying an _application_ to it. Why applications? Because Vespa handles both data and the computations you do over them - together an application. An application is specified by an _application package_ - a directory with some files. The application package contains _everything_ that is needed to run your application: Config, schemas, components, ML models, and so on. The _only_ way to change an application is to make the change in the application package and then deploy it again. Vespa will then safely change the running system to match the new application package revision, without impacting queries, writes, or data. ##### A minimal application package You can create a complete application package with just a single file: services.xml. This file specifies the clusters that your application should run. It could just be a single stateless cluster - what's called _container_ - like this: ``` ``` ``` ``` Put this in a file called services.xml, and you have created the world's smallest application package. However, this won't do much, usually you want to have a `content` cluster which can store data, maintain indexes, and run the distributed part of queries. You'll also want your container cluster to load the necessary middleware for this. With that we get a services file like this: ``` ``` 2 ``` ``` This specifies a pretty normal simple Vespa application, but now we need another file: The schema of the document type we'll use. This goes into the directory `schemas/`, so our application package now looks like this: ``` services.xml schemas/myschema.sd ``` The schema file describes a kind of data and the computations (such as ranking/scoring) you want to do over it. At minimum it just lists the fields of that data type and if and each field should be indexed: ``` schema myschema { document myschema { field text type string { indexing: summary | index } field embedding type tensor(x[384]) { indexing: attribute | index } field popularity type double { indexing: summary | attribute } } } ``` With these two files we have specified a fully functional application that can do text, vector and hybrid search with filtering. Rather than creating applications from scratch like this, you can also clone one of our sample applications as a starting point like we did in [getting started](deploy-an-application.html). To read more on schemas, see the [schemas](schemas.html) guide. To see everything an application package can contain, see the[application package reference](reference/application-packages-reference.html). ##### Deploying applications To create running instances of an application, or make the changes to one take effect, you _deploy_ it. Deployments to the dev zone and to self-managed clusters sets up a single instance, while deployments to production can set up multiple instances in one or more regions. To deploy an application package you use the [deploy command](vespa-cli.html#deployment) in Vespa CLI: ``` ``` $ vespa deploy . ``` ``` This will deploy the application package at the current dir to the current target and the default dev zone (use `vespa deploy -h` to see other options). Deployment to production zones use a separate command: ``` ``` $ vespa prod deploy . ``` ``` Production deployments also require an additional file in the application package to specify where it should be deployed: deployment.xml. See [production deployment](cloud/production-deployment.html). The recommended way to deploy to production is by setting up a continuous deployment job, see[automated deployments](https://cloud.vespa.ai/en/automated-deployments). Deploying a change to an application package is generally safe to do at any time. It does not disrupt queries and writes, and invalid or destructive changes are rejected before taking effect. You can also add tests that verifies the application before deployment to production zones. Copyright © 2025 - [Cookie Preferences](#) --- ## Approximate Nn Hnsw ### Approximate Nearest Neighbor Search using HNSW Index For an introduction to nearest neighbor search, see [nearest neighbor search](nearest-neighbor-search.html) documentation, for practical usage of Vespa's nearest neighbor search, see [nearest neighbor search - a practical guide](nearest-neighbor-search-guide.html), and to have Vespa create vectors for you, see [embedding](embedding.html). #### Approximate Nearest Neighbor Search using HNSW Index For an introduction to nearest neighbor search, see [nearest neighbor search](nearest-neighbor-search.html) documentation, for practical usage of Vespa's nearest neighbor search, see [nearest neighbor search - a practical guide](nearest-neighbor-search-guide.html), and to have Vespa create vectors for you, see [embedding](embedding.html). This document describes how to speed up searches for nearest neighbors by adding a[HNSW index](reference/schema-reference.html#index-hnsw) to the tensor field. Vespa implements a modified version of the Hierarchical Navigable Small World (HNSW) graph algorithm [paper](https://arxiv.org/abs/1603.09320). The implementation in Vespa supports: - **Filtering** - The search for nearest neighbors can be constrained by query filters as the nearest neighbor search in Vespa is expressed as a query operator. The [nearestNeighbor](reference/query-language-reference.html#nearestneighbor) query operator can be combined with other filters or query terms using the [Vespa query language](query-language.html). See many query examples in the [practical guide](nearest-neighbor-search-guide.html#combining-approximate-nearest-neighbor-search-with-query-filters). - **Multi-vector Indexing** - Since Vespa 8.144 multiple vectors per document can be indexed. In this case documents are retrieved by the closest vector in each document compared to the query vector. See the [Multi-vector indexing sample application](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing)for examples. For use cases and implementation details see the following blog post:[Revolutionizing semantic search with multi-vector HNSW indexing in Vespa](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/#implementation) - **Real Time Indexing** - CRUD (Create, Add, Update, Remove) vectors in the index with low latency and high throughput. - **Mutable HNSW Graph** - No query or indexing overhead from searching multiple _HNSW_ graphs. In Vespa, there is one graph per tensor field per content node. No segmented or partitioned graph where a query against a content node need to scan multiple HNSW graphs. - **Multithreaded Indexing** - The costly part when performing real time changes to the _HNSW_ graph is distance calculations while searching the graph layers to find which links to change. These distance calculations are performed by multiple indexing threads. ##### Using Vespa's approximate nearest neighbor search The query examples in [nearest neighbor search](nearest-neighbor-search.html) uses exact search, which has perfect accuracy. However, this is computationally expensive for large document volumes as distances are calculated for every document which matches the query filters. To enable fast approximate matching, the tensor field definition needs an `index` directive. A Vespa [document schema](schemas.html) can declare multiple tensor fields with `HNSW` enabled. ``` field image_embeddings type tensor(i{},x[512]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } } } field text_embedding type tensor(x[384]) { indexing: summary | attribute | index attribute { distance-metric: prenormalized-angular } index { hnsw { max-links-per-node: 24 neighbors-to-explore-at-insert: 200 } } } ``` In the schema snippet above, fast approximate search is enabled by building an `HNSW` index for the`image_embeddings` and the `text_embedding` tensor fields.`image_embeddings` indexes multiple vectors per document, while `text_embedding` indexes one vector per document. The two vector fields use different [distance-metric](reference/schema-reference.html#distance-metric)and `HNSW` index settings: - `max-links-per-node` - a higher value increases recall accuracy, but also memory usage, indexing and search cost. - `neighbors-to-explore-at-insert` - a higher value increases recall accuracy, but also indexing cost. Choosing the value of these parameters affects both accuracy, search performance, memory usage and indexing performance. See [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/)for a detailed description of these tradeoffs. See [HNSW index reference](reference/schema-reference.html#index-hnsw) for details on the index parameters. ###### Indexing throughput ![Real-time indexing throughput](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/throughput.png) The `HNSW` settings impacts indexing throughput. Higher values of `max-links-per-node` and `neighbors-to-explore-at-insert`reduces indexing throughput. Example from [Billion-scale vector search with Vespa - part two](https://blog.vespa.ai/billion-scale-knn-part-two/). ###### Memory usage Higher value of `max-links-per-node` impacts memory usage, higher values means higher memory usage: ![Memory footprint](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/memory.png) ###### Accuracy ![Accuracy](https://blog.vespa.ai/assets/2022-01-27-billion-scale-knn-part-two/ann.png) Higher `max-links-per-node` and `neighbors-to-explore-at-insert` improves the quality of the graph and recall accuracy. As the search-time parameter [hnsw.exploreAdditionalHits](reference/query-language-reference.html#hnsw-exploreadditionalhits) is increased, the lower combination reaches about 70% recall@10, while the higher combination reaches about 92% recall@10. The improvement in accuracy needs to be weighted against the impact on indexing performance and memory usage. ##### Using approximate nearest neighbor search With an _HNSW_ index enabled on the tensor field one can choose between approximate or exact (brute-force) search by using the [approximate query annotation](reference/query-language-reference.html#approximate) ``` { "yql": "select * from doc where {targetHits: 100, approximate:false}nearestNeighbor(image_embeddings,query_image_embedding)", "hits": 10 "input.query(query_image_embedding)": [0.21,0.12,....], "ranking.profile": "image_similarity" } ``` By default, `approximate` is true when searching a tensor field with `HNSW` index enabled. The `approximate` parameter allows quantifying the accuracy loss of using approximate search. The loss can be calculated by performing an exact neighbor search using `approximate:false` and compare the retrieved documents with `approximate:true` and calculate the overlap@k metric. Note that exact searches over a large vector volume require adjustment of the[query timeout](reference/query-api-reference.html#timeout). The default [query timeout](reference/query-api-reference.html#timeout) is 500ms, which will be too low for an exact search over many vectors. In addition to [targetHits](reference/query-language-reference.html#targethits), there is a [hnsw.exploreAdditionalHits](reference/query-language-reference.html#hnsw-exploreadditionalhits) parameter which controls how many extra nodes in the graph (in addition to `targetHits`) that are explored during the graph search. This parameter is used to tune accuracy quality versus query performance. ##### Combining approximate nearest neighbor search with filters The [nearestNeighbor](reference/query-language-reference.html#nearestneighbor) query operator can be combined with other query filters using the [Vespa query language](reference/query-language-reference.html) and its query operators. There are two high-level strategies for combining query filters with approximate nearest neighbor search: - [pre-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#pre-filtering-strategy) (the default) - [post-filtering](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#post-filtering-strategy) These strategies can be configured in a rank profile using[approximate-threshold](reference/schema-reference.html#approximate-threshold) and[post-filter-threshold](reference/schema-reference.html#post-filter-threshold). See[Controlling the filtering behavior with approximate nearest neighbor search](https://blog.vespa.ai/constrained-approximate-nearest-neighbor-search/#controlling-the-filtering-behavior-with-approximate-nearest-neighbor-search)for more details. Note that when using `pre-filtering` the following query operators are not included when evaluating the filter part of the query: - [geoLocation](reference/query-language-reference.html#geolocation) - [predicate](reference/query-language-reference.html#predicate) These are instead evaluated after the approximate nearest neighbors are retrieved, more like a `post-filter`. This might cause the search to expose fewer hits to ranking than the wanted `targetHits`. Since Vespa 8.78 the `pre-filter` can be evaluated using[multiple threads per query](performance/practical-search-performance-guide.html#multithreaded-search-and-ranking). This can be used to reduce query latency for larger vector datasets where the cost of evaluating the `pre-filter` is significant. Note that searching the `HNSW` index is always single-threaded per query. Multithreaded evaluation when using `post-filtering` has always been supported, but this is less relevant as the `HNSW` index search first reduces the document candidate set based on `targetHits`. ##### Nearest Neighbor Search Considerations - **targetHits**: The [targetHits](reference/query-language-reference.html#targethits)specifies how many hits one wants to expose to [ranking](ranking.html) _per content node_. Nearest neighbor search is typically used as an efficient retriever in a [phased ranking](phased-ranking.html)pipeline. See [performance sizing](performance/sizing-search.html). - **Pagination**: Pagination uses the standard [hits](reference/query-api-reference.html#hits) and [offset](reference/query-api-reference.html#offset) query api parameters. There is no caching of results in between pagination requests, so a query for a higher `offset` will cause the search to be performed over again. This aspect is no different from [sparse search](using-wand-with-vespa.html) not using nearest neighbor query operator. - **Total hit count is not accurate**: Technically, all vectors in the searchable index are neighbors. There is no strict boundary between a match and no match. Both exact (`approximate:false`) and approximate (`approximate:true`) usages of the [nearestNeighbor](reference/query-language-reference.html#nearestneighbor) query operator does not produce an accurate `totalCount`. This is the same behavior as with sparse dynamic pruning search algorithms like[weakAnd](reference/query-language-reference.html#weakand) and [wand](reference/query-language-reference.html#wand). - **Grouping** counts are not accurate: Grouping counts from [grouping](grouping.html) are not accurate when using [nearestNeighbor](reference/query-language-reference.html#nearestneighbor)search. This is the same behavior as with other dynamic pruning search algorithms like[weakAnd](reference/query-language-reference.html#weakand) and[wand](reference/query-language-reference.html#wand). See the [Result diversification](https://blog.vespa.ai/result-diversification-with-vespa/) blog post on how grouping can be combined with nearest neighbor search. ##### Scaling Approximate Nearest Neighbor Search ###### Memory Vespa tensor fields are [in-memory](attributes.html) data structures and so is the `HNSW` graph data structure. For large vector datasets the primary memory resource usage relates to the raw vector field memory usage. Using lower tensor cell type precision can reduce memory footprint significantly, for example using `bfloat16` instead of `float` saves close to 50% memory usage without significant accuracy loss. Vespa [tensor cell value types](performance/feature-tuning.html#cell-value-types) include: - `int8` - 1 byte per value. Used to represent binary vectors, for example 64 bits can be represented using 8 `int8` values. - `bfloat16` - 2 bytes per value. See [bfloat16 floating-point format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). - `float` - 4 bytes per value. Standard float. - `double` - 8 bytes per value. Standard double. ###### Search latency and document volume The `HNSW` greedy search algorithm is sublinear (close to log(N) where N is the number of vectors in the graph). This has interesting properties when attempting to add more nodes horizontally using [flat data distribution](performance/sizing-search.html#data-distribution). Even if the document volume per node is reduced by a factor of 10, the search latency is only reduced by 50%. Still, flat scaling helps scale document volume, and increasing indexing throughput as vectors are partitioned randomly over a set of nodes. Pure vector search applications (without filtering, or re-ranking) should attempt to scale up document volume by using larger instance type and maximize the number of vectors per node. To scale with query throughput, use [grouped data distribution](performance/sizing-search.html#data-distribution) to replicate content. Note that strongly sublinear search is not necessarily true if the application uses nearest neighbor search for candidate retrieval in a [multi-phase ranking](phased-ranking.html) pipeline, or combines nearest neighbor search with filters. ##### HNSW Operations Changing the [distance-metric](reference/schema-reference.html#distance-metric)for a tensor field with `hnsw` index requires [restarting](reference/schema-reference.html#changes-that-require-restart-but-not-re-feed), but not re-indexing (re-feed vectors). Similar, changing the `max-links-per-node` and`neighbors-to-explore-at-insert` construction parameters requires re-starting. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Using Vespa's approximate nearest neighbor search](#using-vespas-approximate-nearest-neighbor-search) - [Indexing throughput](#indexing-throughput) - [Memory usage](#memory-usage) - [Accuracy](#accuracy) - [Using approximate nearest neighbor search](#using-approximate-nearest-neighbor-search) - [Combining approximate nearest neighbor search with filters](#combining-approximate-nearest-neighbor-search-with-filters) - [Nearest Neighbor Search Considerations](#nearest-neighbor-search-considerations) - [Scaling Approximate Nearest Neighbor Search](#scaling-approximate-nearest-neighbor-search) - [Memory](#memory) - [Search latency and document volume](#search-latency-and-document-volume) - [HNSW Operations](#hnsw-operations) --- ## Archive Guide Aws ### AWS Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in AWS S3. #### AWS Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in AWS S3. This guide explains how to access this data. Access to the data must happen through an AWS account controlled by the tenant. Data traffic to access this data is charged to this AWS account. These resources are needed to get started: - An AWS account - An IAM Role in that AWS account - The [AWS command line client](https://aws.amazon.com/cli/) Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab to see the settings below. ##### Register IAM Role ![Authorize IAM Role](/assets/img/archive-1-aws.png) First, the IAM Role must be granted access to the S3 buckets in Vespa Cloud. This is done by entering the IAM Role in the setting seen above. Vespa Cloud will then grant access to that role to the S3 buckets. ##### Grant access to Vespa Cloud resources ![Allow access to IAM Role](/assets/img/archive-2-aws.png) Second, the IAM Role must be granted access to resources inside Vespa Cloud. AWS requires both permissions to be registered in both Vespa Cloud's AWS account (step 1) and the tenant's AWS account (step 2). Copy the policy from the user interface and attach it to the IAM Role - or make your own equivalent policy should you have other requirements. For more information, see the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html). ##### Access files using AWS CLI ![Download files](/assets/img/archive-3-aws.png) Once permissions have been granted, the IAM Role can access the contents of the archive buckets. Any AWS S3 client will work, but the AWS command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications. The `--request-payer=requester` parameter is mandatory to make sure network traffic is charged to the correct AWS account. Refer to [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to install and use `aws cli`, which can be used to download logs as in the illustration, or e.g. list objects: ``` $ aws s3 ls --profile=archive --request-payer=requester \ s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/ PRE album-rec-searcher/ PRE cord-19/ PRE vespacloud-docsearch/ ``` In the example above, the S3 bucket name is _vespa-cloud-data-prod.aws-us-east-1c-9eb633_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like: ``` /vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` The URI to this object is hence: ``` s3://vespa-cloud-data-prod.aws-us-east-1c-9eb633/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` Objects are exported once generated - access log files are compressed and exported at least once per hour. If you are having problems accessing the files, please run ``` aws sts get-caller-identity ``` to verify that you are correctly assuming the role which has been granted access. ##### Lambda processing When processing logs using a lambda function, write a minimal function to list objects, to sort out access / keys / roles: ``` const aws = require("aws-sdk"); const s3 = new aws.S3({ apiVersion: "2006-03-01" }); const findRelevantKeys = ({ Bucket, Prefix }) => { console.log(`Finding relevant keys in bucket ${Bucket}`); return s3 .listObjectsV2({ Bucket: Bucket, Prefix: Prefix, RequestPayer: "requester" }) .promise() .then((res) => res.Contents.map((content) => ({ Bucket, Key: content.Key })) ) .catch((err) => Error(err)); }; exports.handler = async (event, context) => { const options = { Bucket: "vespa-cloud-data-prod.aws-us-east-1c-9eb633", Prefix: "MY-TENANT-NAME/" }; return findRelevantKeys(options) .then((res) => { console.log("response: ", res); return { statusCode: 200 }; }) .catch((err) => ({ statusCode: 500, message: err })); }; ``` Note: Always set `RequestPayer: "requester"` to access the objects - transfer cost is assigned to the requester. Once the above lists the log files from S3, review [access-log-lambda](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/access-log-lambda/README.md)for how to write a function to decompress and handle the log data. Copyright © 2025 - [Cookie Preferences](#) --- ## Archive Guide Gcp ### GCP Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in Google Cloud Storage. #### GCP Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to buckets in Google Cloud Storage. This guide explains how to access this data. Access to the data is through a GCP project controlled by the tenant. Data traffic to access this data is charged to this GCP project. These resources are needed to get started: - A GCP project - A Google user account - The [gcloud command line interface](https://cloud.google.com/sdk/docs/install) Access is configured through the Vespa Cloud Console in the tenant account screen. Choose the "archive" tab, then "GCP" tab to see the settings below. ##### Register IAM principal ![Register IAM principal](/assets/img/archive-1-gcp.png) First, a principal must be granted access to the Cloud Storage bucket in Vespa Cloud. This is done by entering a [principal](https://cloud.google.com/iam/docs/overview) with a supported prefix. See the accepted format in the description below the input field. ##### Access files using Gcloud CLI ![Download files](/assets/img/archive-2-gcp.png) Once permissions have been granted, the GPC member can access the contents of the archive buckets. Any Cloud Storage client will work, but the `gsutil` command line client is an easy tool to use. The settings page will list all buckets where data is stored, typically one bucket per zone the tenant has applications. The `-u user-project` parameter is mandatory to make sure network traffic is charged to the correct GCP project. ``` $ gsutil -u my-project ls \ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/album-rec-searcher/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/cord-19/ gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/ ``` In the example above, the bucket name is _vespa-cloud-data-prod.gcp-us-central1-f-73770f_and the tenant name is _vespa-team_ (for that particular prod zone). Archiving is per tenant, and a log file is normally stored with a key like: ``` /vespa-team/vespacloud-docsearch/default/h7644a/logs/access/JsonAccessLog.20221011080000.zst ``` The URI to this object is hence: ``` gs://vespa-cloud-data-prod.gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h2946a/logs/access/JsonAccessLog.default.20210629100001.zst ``` Objects are exported once generated - access log files are compressed and exported at least once per hour. Note: Always set a user project to access the objects - transfer cost is assigned to the requester. Copyright © 2025 - [Cookie Preferences](#) --- ## Archive Guide ### Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to storage buckets. #### Archive guide Vespa Cloud exports log data, heap dumps, and Java Flight Recorder sessions to storage buckets. The bucket system used will depend on which cloud provider is backing the zone your application is running in. AWS S3 will be used in the AWS zones, and Cloud Storage will be used in the GCP zones. How to access and use the storage buckets is found in the documentation for the respective cloud providers: - [AWS S3](archive-guide-aws) - [Google Cloud Storage](archive-guide-gcp) ##### Examples These examples use GCP as source, replace with AWS commands as needed. Here, _resonant-triode-123456_ is the Google project ID that owns the target bucket _my\_access\_logs_ for data copy (and will get the data download cost, if any). Use the CLUSTERS view in the Vespa Cloud Console to find hostname(s) for the nodes to export logs from - then list contents: ``` $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/ $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance $ gsutil -u resonant-triode-123456 ls \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/mytenant/myapp/myinstance/h404a/logs/access ``` Copy files for a host to the _my\_access\_logs_ bucket: ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ cp -r \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ gs://my_access_logs/vespa-files ``` `rsync` can be used to reduce number of files copied, using `-x` to exclude paths: ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ rsync -r \ -x '.*/connection/.*|.*/vespa/.*|.*/zookeeper/.*' \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ gs://my_access_logs/vespa-files ``` Refer to [cloud-functions](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions)and [lambda](https://github.com/vespa-engine/sample-apps/tree/master/examples/aws/lambda)for how to write and deploy simple functions to process files in Google Cloud and AWS. For local processing, copy files for a host to local file system (or use `rsync`): ``` $ gsutil -u resonant-triode-123456 \ -m -o "GSUtil:parallel_process_count=1" \ cp -r \ gs://vespa-cloud-data-prod-gcp-us-central1-f-73770f/vespa-team/vespacloud-docsearch/default/h404a \ . ``` Use [zstd](https://facebook.github.io/zstd/) to decompress files: ``` $ zstd -d * ``` Example: Filter out healthchecks using [jq](https://stedolan.github.io/jq/): ``` $ cat JsonAccessLog.20230117* | jq '. | select (.uri != "/status.html") | select (.uri != "/state/v1/metrics") | select (.uri != "/state/v1/health")' ``` Add a human-readable date field per access log entry: ``` $ cat JsonAccessLog.20230117* | jq '. | select (.uri != "/status.html") | select (.uri != "/state/v1/metrics") | select (.uri != "/state/v1/health") | . +{iso8601date:(.time|todateiso8601)}' ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Enclave ### Log archive in Vespa Cloud Enclave **Warning:** The structure of log archive buckets may change without notice #### Log archive in Vespa Cloud Enclave **Warning:** The structure of log archive buckets may change without notice After an Enclave is established in your cloud provider account using Terraform, the module will have created a storage bucket per Vespa Cloud zone you configured in your Enclave. These storage buckets are used to archive logs from the machines that run Vespa inside your account. There will be one storage bucket per Vespa Cloud Zone that is configured in the Enclave. The name of the bucket will depend on the cloud provider you are setting up the Enclave in. Files are synchronized to the archive bucket when the file is rotated by the logging system, or when a virtual machine is deprovisioned from the application. The consequence of this is that frequency of uploads will depend on the activity of the Vespa application. ##### Directory structure The directory structure in the bucket is as follows: ``` ////logs// ``` - `tenant` is the tenant ID. - `application` is the application ID that generated the log. - `instance` is the instance ID of the generated log, e.g. `default`. - `host` is the name prefix of the host that generated the log, e.g. `e103a`. - `logtype` is the type of log in the directory (see below). - `logfile` is the specific file of the log. ##### Log types There are three log types that are synced to this bucket. - `vespa`: [Vespa logs](https://docs.vespa.ai/en/reference/logs.html) - `access`: [Access logs](https://docs.vespa.ai/en/access-logging.html) - `connection`: [Connection logs](https://docs.vespa.ai/en/access-logging.html#connection-log) Copyright © 2025 - [Cookie Preferences](#) --- ### Vespa Cloud Enclave AWS Architecture Each Enclave in the tenant AWS account corresponds to a Vespa Cloud[zone](https://cloud.vespa.ai/en/reference/zones.html). #### Vespa Cloud Enclave AWS Architecture Each Enclave in the tenant AWS account corresponds to a Vespa Cloud[zone](https://cloud.vespa.ai/en/reference/zones.html). Inside the tenant AWS account one Enclave is contained within one single[VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html). ![Enclave architecture](/assets/img/vespa-cloud-enclave-aws.png) ###### EC2 Instances, Load Balancers, and S3 buckets Configuration Servers inside the Vespa Cloud zone makes the decision to create or destroy EC2 instances ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Network Load Balancers needed to communicate with the deployed Vespa application. Each Vespa Host will periodically sync its logs to a S3 bucket ("Log Archive"). This bucket is "local" to the Enclave and provisioned by the Terraform module inside the tenant's AWS account. ###### Networking The Enclave VPC is very network restricted. Vespa Hosts do not have public IPv4 addresses and there is no[NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html)available in the VPC. Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud. When a Vespa Host is booted it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up. ###### Security The Vespa Cloud operations team does _not_ have any direct access to the resources that is part of the customer account. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by the tenant itself. For further details, see the documentation for the[`ssh`-submodule](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest/submodules/ssh). All communication between the Enclave and the Vespa Cloud configuration servers is encrypted, authenticated and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service. All data stored is encrypted at rest using[KMS](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html). All keys are managed by the tenant in the tenant's AWS account. The resources provisioned in the tenant AWS account is either provisioned by the Terraform module executed by the tenant, or by the orchestration services inside a Vespa Cloud Zone. Resources are provisioned by the Vespa Cloud configuration servers, using the[`provision_policy`](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/modules/provision/main.tf)AWS IAM policy document defined in the Terraform module. The tenant that registered the AWS account is the only tenant that can deploy applications targeting the Enclave. For more general information about security in Vespa Cloud, see the[whitepaper](https://cloud.vespa.ai/en/security/whitepaper). Copyright © 2025 - [Cookie Preferences](#) --- ### Getting started with Vespa Cloud Enclave in AWS Setting up Vespa Cloud Enclave requires: #### Getting started with Vespa Cloud Enclave in AWS Setting up Vespa Cloud Enclave requires: 1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing tenant. 2. Registration of the AWS account ID in Vespa Cloud 3. Running a [Terraform](https://www.terraform.io/) configuration to provision AWS resources in the account. Go through the [AWS tutorial](https://developer.hashicorp.com/terraform/tutorials/aws-get-started) as needed. 4. Deployment of a Vespa application. ###### 1. Vespa Cloud Tenant setup Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/). ###### 2. Onboarding Contact [support@vespa.ai](mailto:support@vespa.ai) stating which tenant should be on-boarded to use Vespa Cloud Enclave. Also include the [AWS account ID](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindAccountId)to associate with the tenant. **Note:** We recommend using a _dedicated_ account for your Vespa Cloud Enclave. Vespa Cloud will manage resources in the Enclave VPCs created in the AWS resource provisioning step. Primarily EC2 instances, load balancers and service endpoints. One account can host all your Vespa applications, there is no need for multiple tenants or accounts. ###### 3. Configure AWS Account The same AWS account used in step two must be prepared for deploying Vespa applications using either _Terraform_ or _Cloudformation_. ###### Terraform Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/aws/latest) published by the Vespa team. Modify the[multi-region Terraform files](https://github.com/vespa-cloud/terraform-aws-enclave/blob/main/examples/multi-region/main.tf)for your deployment. If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS and GCP. Terraform has published an[AWS](https://developer.hashicorp.com/terraform/tutorials/aws-get-started)tutorial, and we strongly encourage Enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](/en/cloud/notifications.html)will let you know when a new release is available. ###### Cloudformation Vespa also supports Cloudformation if you prefer the AWS-native solution. Download the Cloudformation stacks in our [github repository](https://github.com/vespa-cloud/cloudformation-aws-enclave) and refer to the README for stack-specific instructions. ###### 4. Deploy a Vespa application By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your Enclave account, update [deployment.xml](/en/reference/deployment.html) to reference the account used in step two: ``` ``` Useful resources are [getting started](/en/cloud/getting-started)and [migrating to Vespa Cloud](/en/cloud/migrating-to-cloud.html) - put _deployment.xml_ next to _services.xml_. ##### Next steps After a successful deployment to the [dev](https://cloud.vespa.ai/en/reference/environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles. For production serving, deploy to the [prod](https://cloud.vespa.ai/en/reference/environments.html#prod) environment - follow the steps in [production deployment](/en/cloud/production-deployment.html). ##### Enclave teardown To tear down a Vespa Cloud Enclave system, do the steps above in reverse order: 1. [Undeploy the application(s)](/en/cloud/deleting-applications.html) 2. Undeploy the Terraform changes It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup) - [2. Onboarding](#2-onboarding) - [3. Configure AWS Account](#3-configure-aws-account) - [4. Deploy a Vespa application](#4-deploy-a-vespa-application) - [Next steps](#next-steps) - [Enclave teardown](#enclave-teardown) --- ### Vespa Cloud Enclave ![enclave architecture](/assets/img/enclave-architecture.png) #### Vespa Cloud Enclave ![enclave architecture](/assets/img/enclave-architecture.png) Vespa Cloud Enclave allows Vespa Cloud applications to run inside the tenant's own cloud accounts while everything is still fully managed by Vespa Cloud's automation, giving the tenant full access to Vespa Cloud features inside their own cloud account. This allows tenant data to always remain within the bounds of services controlled by the tenant, and also to build closer integrations with Vespa applications inside the cloud services. Vespa Cloud Enclave is available in AWS and GCP. Azure is on the roadmap. **Note:** As the Vespa Cloud Enclave resources run in _your_ account, this incurs resource costs from your cloud provider in _addition_ to the Vespa Cloud costs. ##### AWS - [Getting started](/en/cloud/enclave/aws-getting-started) - [Architecture and security](/en/cloud/enclave/aws-architecture) ##### GCP - [Getting started](/en/cloud/enclave/gcp-getting-started) - [Architecture and security](/en/cloud/enclave/gcp-architecture) ##### Guides - [Log archive](/en/cloud/enclave/archive) - [Operations and Support](/en/cloud/enclave/operations) ##### FAQ **Which kind of permission is needed for the Vespa control plane to access my AWS accounts / GCP projects?**The permissions required are coded into the Terraform modules found at: - [terraform-aws](https://github.com/vespa-cloud/terraform-aws-enclave/tree/main) - [terraform-google](https://github.com/vespa-cloud/terraform-google-enclave/tree/main) Navigate to the _modules_ directory for details. **How can I configure agents/daemons on Vespa hosts securely?**Use terraform to grant Vespa hosts access to necessary secrets, and create an RPM that retrieves it and configures your application. See [enclave-examples](https://github.com/vespa-cloud/enclave-examples/tree/main/systemd-secrets)for a complete example. Copyright © 2025 - [Cookie Preferences](#) --- ### Architecture for Vespa Cloud Enclave in GCP Each Enclave in the tenant GCP project corresponds to a Vespa Cloud[zone](https://cloud.vespa.ai/en/reference/zones.html). #### Architecture for Vespa Cloud Enclave in GCP ###### Architecture Each Enclave in the tenant GCP project corresponds to a Vespa Cloud[zone](https://cloud.vespa.ai/en/reference/zones.html). Inside the tenant GCP project one Enclave is contained within one single [VPC](https://cloud.google.com/vpc/). ![Enclave architecture](/assets/img/vespa-cloud-enclave-gcp.png) ###### Compute Instances, Load Balancers, and Cloud Storage buckets Configuration Servers inside the Vespa Cloud zone makes the decision to create or destroy compute instances ("Vespa Hosts" in diagram) based on the Vespa applications that are deployed. The Configuration Servers also set up the Network Load Balancers needed to communicate with the deployed Vespa application. Each Vespa Host will periodically sync its logs to a Cloud Storage bucket ("Log Archive"). This bucket is "local" to the Enclave and provisioned by the Terraform module inside the tenant's GCP project. ###### Networking The Enclave VPC is very network restricted. Vespa Hosts do not have public IPv4 addresses and there is no[NAT gateway](https://cloud.google.com/nat/docs/overview) available in the VPC. Vespa Hosts have public IPv6 addresses and are able to make outbound connections. Inbound connections are not allowed. Outbound IPv6 connections are used to bootstrap communication with the Configuration Servers, and to report operational metrics back to Vespa Cloud. When a Vespa Host is booted it will set up an encrypted tunnel back to the Configuration Servers. All communication between Configuration Servers and the Vespa Hosts will be run over this tunnel after it is set up. ###### Security The Vespa Cloud operations team does _not_ have any direct access to the resources that is part of the customer account. The only possible access is through the management APIs needed to run Vespa itself. In case it is needed for, e.g. incident debugging, direct access can only be granted to the Vespa team by the tenant itself. Enabling direct access is done by setting the`enable_ssh` input to true in the enclave module. For further details, see the documentation for the[enclave module inputs](https://registry.terraform.io/modules/vespa-cloud/enclave/google/latest/?tab=inputs). All communication between the Enclave and the Vespa Cloud configuration servers is encrypted, authenticated and authorized using[mTLS](https://en.wikipedia.org/wiki/Mutual_authentication#mTLS) with identities embedded in the certificate. mTLS communication is facilitated with the[Athenz](https://www.athenz.io/) service. All data stored is encrypted at rest using[Cloud Key Management](https://cloud.google.com/security-key-management). All keys are managed by the tenant in the tenant's GCP project. The resources provisioned in the tenant GCP project is either provisioned by the Terraform module executed by the tenant, or by the orchestration services inside a Vespa Cloud zone. Resources are provisioned by the Vespa Cloud configuration servers, using the[`vespa_cloud_provisioner_role`](https://github.com/vespa-cloud/terraform-google-enclave/blob/main/main.tf)IAM role defined in the Terraform module. The tenant that registered the GCP project is the only tenant that can deploy applications targeting the Enclave. For more general information about security in Vespa Cloud, see the[whitepaper](https://cloud.vespa.ai/en/security/whitepaper). Copyright © 2025 - [Cookie Preferences](#) --- ### Getting started with Vespa Cloud Enclave in GCP Setting up Vespa Cloud Enclave requires: #### Getting started with Vespa Cloud Enclave in GCP Setting up Vespa Cloud Enclave requires: 1. Registration at [Vespa Cloud](https://console.vespa-cloud.com), or use a pre-existing tenant. 2. Registration of the GCP project in Vespa Cloud 3. Running a [Terraform](https://www.terraform.io/) configuration to provision necessary GCP resources in the project. 4. Deployment of a Vespa application. ###### 1. Vespa Cloud Tenant setup Register at [Vespa Cloud](https://console.vespa-cloud.com) or use an existing tenant. Note that the tenant must be on a [paid plan](https://vespa.ai/pricing/). ###### 2. Onboarding Contact [support@vespa.ai](mailto:support@vespa.ai) stating which tenant should be on-boarded to use Vespa Cloud Enclave. Also include the [GCP Project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects)to associate with the tenant. **Note:** We recommend using a _dedicated_ project for your Vespa Cloud Enclave. Resources in this project will be fully managed by Vespa Cloud. One project can host all your Vespa applications, there is no need for multiple tenants or projects. ###### 3. Configure GCP Project The same project used in step two must be prepared for deploying Vespa applications. Use [Terraform](https://www.terraform.io/) to set up the necessary resources using the[modules](https://registry.terraform.io/modules/vespa-cloud/enclave/google/latest)published by the Vespa team. Modify the[multi-region example](https://github.com/vespa-cloud/terraform-google-enclave/blob/main/examples/multi-region/main.tf)for your deployment. If you are unfamiliar with Terraform: It is a tool to manage resources and their configuration in various cloud providers, like AWS and GCP. Terraform has published a[GCP](https://developer.hashicorp.com/terraform/tutorials/gcp-get-started)tutorial, and we strongly encourage Enclave users to read and follow the Terraform recommendations for[CI/CD](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The Terraform module we provide is regularly updated to add new required resources or extra permissions for Vespa Cloud to automate the operations of your applications. In order for your enclave applications to use the new features you must re-apply your terraform templates with the latest release. The [notification system](/en/cloud/notifications.html)will let you know when a new release is available. ###### 4. Deploy a Vespa application By default, all applications are deployed on resources in Vespa Cloud accounts. To deploy in your Enclave account, update [deployment.xml](/en/reference/deployment.html) to reference the account used in step 1: ``` ``` Useful resources are [getting started](/en/cloud/getting-started)and [migrating to Vespa Cloud](/en/cloud/migrating-to-cloud.html) - put _deployment.xml_ next to _services.xml_. ##### Next steps After a successful deployment to the [dev](https://cloud.vespa.ai/en/reference/environments.html#dev) environment, iterate on the configuration to implement your application on Vespa. The _dev_ environment is ideal for this, with rapid deployment cycles. For production serving, deploy to the [prod](https://cloud.vespa.ai/en/reference/environments.html#prod) environment - follow the steps in [production deployment](/en/cloud/production-deployment.html). ##### Enclave teardown To tear down a Vespa Cloud Enclave system, do the steps above in reverse order: 1. [Undeploy the application(s)](/en/cloud/deleting-applications.html) 2. Undeploy the Terraform changes It is important to undeploy the Vespa application(s) first. After running the Terraform, Vespa Cloud cannot manage the resources allocated, so you must clean up these yourself. ##### Troubleshooting **Identities restricted by domain**: If your GCP organization is using[domain restriction for identities](https://cloud.google.com/resource-manager/docs/organization-policy/restricting-domains)you will need to permit Vespa.ai GCP identities to be added to your project. For Vespa Cloud the organization ID to allow identities from is: _1056130768533_, and the Google Customer ID is _C00u32w3e_. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [1. Vespa Cloud Tenant setup](#1-vespa-cloud-tenant-setup) - [2. Onboarding](#2-onboarding) - [3. Configure GCP Project](#3-configure-gcp-project) - [4. Deploy a Vespa application](#4-deploy-a-vespa-application) - [Next steps](#next-steps) - [Enclave teardown](#enclave-teardown) - [Troubleshooting](#troubleshooting) --- ### Operations and Support for Vespa Cloud Enclave Vespa Cloud Enclave requires that resources provisioned within the VPC are wholly managed by the Vespa Cloud orchestration services, and must not be manually managed by tenant operations. #### Operations and Support for Vespa Cloud Enclave Vespa Cloud Enclave requires that resources provisioned within the VPC are wholly managed by the Vespa Cloud orchestration services, and must not be manually managed by tenant operations. Changing or removing the resources created by the Configuration Servers will negatively impact your Vespa application and may prevent Vespa Cloud from properly managing the applications as well as Vespa engineers from support it. The Terraform modules might see occasional backwards compatible updates. It is recommended that the tenant applies updates to their system on a regular basis. For more information, see the Terraform documentation on[using Terraform in automation](https://developer.hashicorp.com/terraform/tutorials/automation/automate-terraform). The network access granted to Vespa Hosts must be in place for the Vespa application to operate properly. If network access is restricted the Vespa application might stop working. ##### Quota Make sure your organization's AWS or GCP quotas are set high enough to support common Vespa Cloud use cases. A common use case is migrating to new instance types, and this causes temporary doubled (or more) resource usage in the data migration transition period. Other use cases with temporary increased resource usage are node replacements. Best practise is to ensure the quota is 3x of current resource usage, to also cover for capacity expansion. This is not to be confused with the [Vespa Cloud quota](https://cloud.vespa.ai/en/reference/quota). Copyright © 2025 - [Cookie Preferences](#) --- ## Attributes ### Attributes An _attribute_ is a [schema](reference/schema-reference.html#attribute) keyword, specifying the indexing for a field: #### Attributes An _attribute_ is a [schema](reference/schema-reference.html#attribute) keyword, specifying the indexing for a field: ``` field price type int { indexing: attribute } ``` Attribute properties and use cases: - Flexible [match modes](reference/schema-reference.html#match) including exact match, prefix match, and case-sensitive matching, but not text matching (tokenization and linguistic processing). - High sustained update rates (avoiding read-apply-write patterns). Any mutating operation against an attribute field is written to Vespa's [transaction log](proton.html#transaction-log) and persisted, but appending to the log is sequential access, not random. Read more in [partial updates](partial-updates.html). - Instant query updates - values are immediately searchable. - [Document Summaries](document-summaries.html) are memory-only operations if all fields are attributes. - [Numerical range queries](reference/query-language-reference.html#numeric). ``` where price > 100 ``` - [Grouping](grouping.html) - aggregate results into groups - it is also great for generating diversity in results. ``` all(group(customer) each(max(3) each(output(summary())))) ``` - [Ranking](ranking.html) - use attribute values directly in rank functions. ``` rank-profile rank_fresh { first-phase { expression { freshness(timestamp) } } } ``` - [Sorting](reference/sorting.html) - order results by attribute value. ``` order by price asc, release_date desc ``` - [Parent/child](parent-child.html) - import attribute values from global parent documents. ``` import field advertiser_ref.name as advertiser_name {} ``` The other field option is _index_ - use [index](proton.html#index) for fields used for [text search](text-matching.html), with [stemming](linguistics.html#stemming) and [normalization](linguistics.html#normalization). An attribute is an in-memory data structure. Attributes speed up query execution and [document updates](partial-updates.html), trading off memory. As data structures are regularly optimized, consider both static and temporary resource usage - see [attribute memory usage](#attribute-memory-usage) below. Use attributes in document summaries to limit access to storage to generate result sets. ![Attribute is an in-memory data structure](/assets/img/attributes-update.svg) Configuration overview: | fast-search | Also see the [reference](reference/schema-reference.html#attribute). Add an [index structure](#index-structures) to improve query performance: ``` field titles type array { indexing : summary | attribute attribute: fast-search } ``` | | fast-access | For high-throughput updates, all nodes with a replica should have the attribute loaded in memory. Depending on replication factor and other configuration, this is not always the case. Use [fast-access](reference/schema-reference.html#attribute) to increase feed rates by having replicas on all nodes in memory - see the [reference](reference/schema-reference.html#attribute) and [sizing feeding](performance/sizing-feeding.html). ``` field titles type array { indexing : summary | attribute attribute: fast-access } ``` | | distance-metric | Features like [nearest neighbor search](nearest-neighbor-search.html) require a [distance-metric](reference/schema-reference.html#distance-metric), and can also have an `hsnw index` to speed up queries. Read more in [approximate nearest neighbor](approximate-nn-hnsw.html). Pay attention to the field's `index` setting to enable the index: ``` field image_sift_encoding type tensor(x[128]) { indexing: summary | attribute |indexattribute { distance-metric: euclidean }index{ hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 500 } } } ``` | ##### Data structures The attribute field's data type decides which data structures are used by the attribute to store values for that field across all documents on a content node. For some data types, a combination of data structures is used: - _Attribute Multivalue Mapping_ stores arrays of values for array and weighed set types. - _Attribute Enum Store_ stores unique strings for all string attributes and unique values for attributes with [fast-search](attributes.html#fast-search). - _Attribute Tensor Store_ stores tensor values for all tensor attributes. In the following illustration, a row represents a document, while a named column represents an attribute. ![Attribute in-memory stores](/assets/img/attributes.svg) Attributes can be: | Type | Size | Description | | --- | --- | --- | | Single-valued | Fixed | Like the "A" attribute, example `int`. The element size is the size of the type, like 4 bytes for an integer. A memory buffer (indexed by Local ID) holds all values directly. | | Multi-valued | Fixed | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The _Multivalue Mapping_ consists of multiple memory buffers, where arrays of the same size are co-located in the same buffer. | | Multi-valued | Variable | Like the "B" attribute, example `array`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Multivalue Mapping_ the arrays are stored. The unique strings are stored in the _Enum Store_, and the arrays in the _Multivalue Mapping_ stores the references (32 bit) to the strings in the _Enum Store_. The _Enum Store_ consists of multiple memory buffers. | | Single-valued | Variable | Like the "C" attribute, example `string`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Enum Store_ the strings are stored. | | Tensor | Fixed / Variable | Like the "D" attribute, example `tensor(x{},y[64])`. A memory buffer (indexed by Local ID) is holding references (32 bit) to where in the _Tensor Store_ the tensor values are stored. The memory layout in the _Tensor Store_ depends on the tensor type. | The "A", "B", "C" and "D" attribute memory buffers have attribute values or references in Local ID (LID) order - see [document meta store](#document-meta-store). When updating an attribute, the full value is written. This also applies to [multivalue](schemas.html#field) fields - example adding an item to an array: 1. Space for the new array is reserved in a memory buffer 2. The current value is copied 3. The new element is written This means that larger fields will copy more data at updates. It also implies that updates to [weighted sets](reference/schema-reference.html#weightedset) are faster when using numeric keys (less memory and easier comparisons). Data stored in the _Multivalue Mapping_, _Enum Store_ and _Tensor Store_ is referenced using 32 bit references. This address space can go full, and then feeding is blocked - [learn more](operations/feed-block.html). For array or weighted set attributes, the max limit on the number of documents that can have the same number of values is approx 2 billion per node. For string attributes or attributes with [fast-search](attributes.html#fast-search), the max limit on the number of unique values is approx 2 billion per node. ##### Index structures Without `fast-search`, attribute access is a memory lookup, being one value or all values, depending on query execution. An attribute is a linear array-like data structure - matching documents potentially means scanning _all_ attribute values. Setting [fast-search](reference/schema-reference.html#attribute) creates an index structure for quicker lookup and search. This consists of a [dictionary](reference/schema-reference.html#dictionary) pointing to posting lists. This uses more memory, and also more CPU when updating documents. It increases steady state memory usage for all attribute types and also add initialization overhead for numeric types. The default dictionary is a b-tree of attribute _values_, pointing to an _occurrence_ b-tree (posting list) of local doc IDs for each value, exemplified in the A-attribute below. Using `dictionary: hash` on the attribute generates a hash table of attributes values pointing to the posting lists, as in the C-attribute (short posting lists are represented as arrays instead of b-trees): ![Attribute index structures](/assets/img/attributes-indexes.svg) Notes: - If a value occurs in many documents, the _occurrence_ b-tree grows large. For such values, a boolean-occurrence list (i.e. bitvector) is generated in addition to the b-tree. - Setting `fast-search` is not observable in the files on disk, other than size. - `fast-search` causes a memory increase even for empty fields, due to the extra index structures created. E.g. single value fields will have the "undefined value" when empty, and there is a posting list for this value. - The _value_ b-tree enables fast range-searches in numerical attributes. This is also available for `hash`-based dictionaries, but slower as a full scan is needed. Using `fast-search` has many implications, read more in [when to use fast-search](performance/feature-tuning.html#when-to-use-fast-search-for-attribute-fields). ##### Attribute memory usage Attribute structures are regularly optimized, and this causes temporary resource usage - read more in [maintenance jobs](proton.html#proton-maintenance-jobs). The memory footprint of an attribute depends on a few factors, data type being the most important: - Numeric (int, long, byte, and double) and Boolean (bit) types - fixed length and fix cost per document - String type - the footprint depends on the length of the strings and how many unique strings that needs to be stored. Collection types like array and weighted sets increases the memory usage some, but the main factor is the average number of values per document. String attributes are typically the largest attributes, and requires most memory during initialization - use boolean/numeric types where possible. Example, refer to formulas below: ``` schema foo { document bar { field titles type array { indexing: summary | attribute } } } ``` - Assume average 10 values per document, average string length 15, 100k unique strings and 20M documents. - Steady state memory usage is approx 1 GB (20M\*4\*(6/5) + 20M\*10\*4\*(6/5) + 100k\*(15+1+4+4)\*(6/5)). - During initialization (loading attribute from disk) an additional 2.4 GB is allocated (20M\*10\*(4+4+4), for each value: - local document id - enum value - weight - Increasing the average number of values per document to 20 (double) will also double the memory footprint during initialization (4.8 GB). When doing the capacity planning, keep in mind the maximum footprint, which occurs during initialization. For the steady state footprint, the number of unique values is important for string attributes. Check the [Example attribute sizing spreadsheet](files/Attribute-memory-Vespa.xls), with various data types and collection types. It also contains estimates for how many documents a 48 GB RAM node can hold, taking initialization into account. [Multivalue](schemas.html#field) attributes use an adaptive approach in how data is stored in memory, and up to 2 billion documents per node is supported. **Pro-tip:** The proton _/state/v1/_ interface can be explored for attribute memory usage. This is an undocumented debug-interface, subject to change at any moment - example: _http://localhost:19110/state/v1/custom/component/documentdb/music/subdb/ready/attribute/artist_ ##### Attribute file usage Attribute data is stored in two locations on disk: - The attribute store in memory, which is regularly flushed to disk. At startup, the flushed files are used to quickly populate the memory structures, resulting in a much quicker startup compared to generating the attribute store from the source in the document store. - The document store on disk. Documents here are used to (re)generate index structures, as well as being the source for replica generation across nodes. The different field types use various data types for storage, see below, a conservative rule of thumb for steady-state disk usage is hence twice the data size. ##### Sizing Attribute sizing is not an exact science but rather an approximation. The reason is that they vary in size. Both the number of documents, number of values, and uniqueness of the values are variable. The components of the attributes that occupy memory are: | Abbreviation | Concept | Comment | | --- | --- | --- | | D | Number of documents | Number of documents on the node, or rather the maximum number of local document ids allocated | | V | Average number of values per document | Only applicable for arrays and weighted sets | | U | Number of unique values | Only applies for strings or if [fast-search](reference/schema-reference.html#attribute) is set | | FW | Fixed data width | sizeof(T) for numerics, 1 byte for strings, 1 bit for boolean | | WW | Weight width | Width of the weight in a weighted set, 4 bytes. 0 bytes for arrays. | | EIW | Enum index width | Width of the index into the enum store, 4 bytes. Used by all strings and other attributes if [fast-search](reference/schema-reference.html#attribute) is set | | VW | Variable data width | strlen(s) for strings, 0 bytes for the rest | | PW | Posting entry width | Width of a posting list entry, 4 bytes for singlevalue, 8 bytes for array and weighted sets. Only applies if [fast-search](reference/schema-reference.html#attribute) is set. | | PIW | Posting index width | Width of the index into the store of posting lists; 4 bytes | | MIW | Multivalue index width | Width of the index into the multivalue mapping; 4 bytes | | ROF | Resize overhead factor | Default is 6/5. This is the average overhead in any dynamic vector due to resizing strategy. Resize strategy is 50% indicating that structure is 5/6 full on average. | ###### Components | Component | Formula | Approx Factor | Applies to | | --- | --- | --- | --- | | Document vector | D \* ((FW or EIW) or MIW) | ROF | FW for singlevalue numeric attributes and MIW for multivalue attributes. EIW for singlevalue string or if the attribute is singlevalue fast-search | | Multivalue mapping | D \* V \* ((FW or EIW) + WW) | ROF | Applicable only for array or weighted sets. EIW if string or fast-search | | Enum store | U \* ((FW + VW) + 4 + ((EIW + PIW) or EIW)) | ROF | Applicable for strings or if fast-search is set. (EIW + PIW) if fast-search is set, EIW otherwise. | | Posting list | D \* V \* PW | ROF | Applicable if fast-search is set | ###### Variants | Type | Components | Formula | | --- | --- | --- | | Numeric singlevalue plain | Document vector | D \* FW \* ROF | | Numeric multivalue value plain | Document vector, Multivalue mapping | D \* MIW \* ROF + D \* V \* (FW+WW) \* ROF | | Numeric singlevalue fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* PW \* ROF | | Numeric multivalue value fast-search | Document vector, Multivalue mapping, Enum store, Posting List | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF | | Singlevalue string plain | Document vector, Enum store | D \* EIW \* ROF + U \* (FW+VW+4+EIW) \* ROF | | Singlevalue string fast-search | Document vector, Enum store, Posting List | D \* EIW \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* PW \* ROF | | Multivalue string plain | Document vector, Multivalue mapping, Enum store | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW) \* ROF | | Multivalue string fast-search | Document vector, Multivalue mapping, Enum store, Posting list | D \* MIW \* ROF + D \* V \* (EIW+WW) \* ROF + U \* (FW+VW+4+EIW+PIW) \* ROF + D \* V \* PW \* ROF | | Boolean singlevalue | Document vector | D \* FW \* ROF | ##### Paged attributes Regular attribute fields are guaranteed to be in-memory, while the [paged](reference/schema-reference.html#attribute) attribute setting allows paging the attribute data out of memory to disk. The `paged` setting is _not_ supported for the following types: - [tensor](reference/schema-reference.html#tensor) with [fast-rank](reference/schema-reference.html#attribute). - [predicate](reference/schema-reference.html#predicate). For attribute fields using [fast-search](reference/schema-reference.html#attribute), the memory needed for dictionary and index structures are never paged out to disk. Using the `paged` setting for attributes is an alternative when there are memory resource constraints and the attribute data is only accessed by a limited number of hits per query during ranking. E.g. a dense tensor attribute which is only used during a [re-ranking phase](phased-ranking.html), where the number of attribute accesses are limited by the re-ranking phase count. For example using a second phase [rerank-count](reference/schema-reference.html#secondphase-rerank-count) of 100 will limit the maximum number of page-ins/disk access per query to 100. Running at 100 QPS would need up to 10K disk accesses per second. This is the worst case if none of the accessed attribute data were paged into memory already. This depends on access locality and memory pressure (size of the attribute data versus available memory). In this example, we have a dense tensor with 1024 [int8](reference/tensor.html#tensor-type-spec) values. The tensor attribute is only accessed during re-ranking (second-phase ranking expression): ``` schema foo { document foo { field tensordata type tensor(x[1024]) { indexing: attribute attribute: paged } } rank-profile foo { first-phase {} second-phase { rerank-count: 100 expression: sum(attribute(tensordata)) } } } ``` For some use cases where serving latency SLA is not strict and query throughput is low, the `paged` attribute setting might be a tuning alternative, as it allows storing more data per node. ###### Paged attributes disadvantages The disadvantages of using _paged_ attributes are many: - Unpredictable query latency as attribute access might touch disk. Limited queries per second throughput per node (depends on the locality of document re-ranking requests). - Paged attributes are implemented by file-backed memory mappings. The performance depends on the [Linux virtual memory management](https://tldp.org/LDP/tlk/mm/memory.html) ability to page data in and out. Using many threads per search/high query throughput might cause high system (kernel) CPU and system unresponsiveness. - The content node's total memory utilization will be close to 100% when using paged attributes. It's up to the Linux kernel to determine what part of the attribute data is paged into memory based on access patterns. A good understanding of how the Linux virtual memory management system works is recommended before enabling paged attributes. - The[memory usage metrics](/en/performance/sizing-search.html#metrics-for-vespa-sizing)from content nodes are not reflecting the reality when using paged attributes. They can indicate a usage that is much higher than the available memory on the node. This is because attribute memory usage is reported as the amount of data contained in the attribute, and whether this data is paged out to disk is controlled by the Linux kernel. - Using paged attributes doubles the disk usage of attribute data. For example if the original attribute size is 92 GB (100M documents of the above 1024 int8 per document schema), using the `paged` setting will double the attribute disk usage to close to 200 GB. - Changing the `paged` setting (e.g. removing the option) on a running system might cause hard out-of-memory situations as without `paged`, the content nodes will attempt loading the attribute into memory without the option for page outs. - Using a paged attribute in [first-phase](phased-ranking.html) ranking can result in extremely high query latency if a large amount of the corpus is retrieved by the query. The number of disk accesses will, in the worst case, be equal to the number of hits the query produces. A similar problem can occur if running a query that searches a paged attribute. - Using `paged` in combination with [HNSW indexing](approximate-nn-hnsw.html) is _strongly_ discouraged._HNSW_ indexing also searches and reads tensors during indexing, causing random access during feeding. Once the system memory usage reaches 100%, the Linux kernel will start paging pages in and out of memory. This can cause a high system (kernel) CPU and slows down HNSW indexing throughput significantly. ##### Mutable attributes [Mutable attributes](reference/schema-reference.html#mutate) is document metadata for matching and ranking performance per document. The attribute values are mutated as part of query execution, as defined in rank profiles - see [rank phase statistics](phased-ranking.html#rank-phase-statistics) for details. ##### Document meta store The document meta store is an in-memory data structure for all documents on a node. It is an _implicit attribute_, and is [compacted](proton.html#lid-space-compaction) and [flushed](proton.html#attribute-flush). Memory usage for applications with small documents / no other attributes can be dominated by this attribute. The document meta store scales linearly with number of documents - using approximately 30 bytes per document. The metric _content.proton.documentdb.ready.attribute.memory\_usage.allocated\_bytes_ for `"field": "[documentmetastore]"` is the size of the document meta store in memory - use the [metric API](reference/state-v1.html#state-v1-metrics) to find the size - in this example, the node has 9M ready documents with 52 bytes in memory per document: ``` { "name": "content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes", "description": "The number of allocated bytes", "values": { "average": 4.69736008E8, "count": 12, "rate": 0.2, "min": 469736008, "max": 469736008,"last": 469736008}, "dimensions": { "documenttype": "doctype","field": "[documentmetastore]"} }, ``` The above is for the _ready_ documents, also check _removed_ and _notready_ - refer to [sub-databases](proton.html#sub-databases). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Data structures](#data-structures) - [Index structures](#index-structures) - [Attribute memory usage](#attribute-memory-usage) - [Attribute file usage](#attribute-file-usage) - [Sizing](#sizing) - [Components](#components) - [Variants](#variants) - [Paged attributes](#paged-attributes) - [Paged attributes disadvantages](#paged-attributes-disadvantages) - [Mutable attributes](#mutable-attributes) - [Document meta store](#document-meta-store) --- ## Automated Deployments ### Automated Deployments ![Picture of an automated deployment](/assets/img/automated-deployments-overview.png) #### Automated Deployments ![Picture of an automated deployment](/assets/img/automated-deployments-overview.png) Vespa Cloud provides: - A [CD test framework](#cd-tests) for safe deployments to production zones. - [Multi-zone deployments](#deployment-orchestration) with orchestration and test steps. This guide goes through details of an orchestrated deployment. Read / try [production deployment](production-deployment) first to have a baseline understanding. The [developer guide](https://cloud.vespa.ai/en/developer-guide) is useful for writing tests. Use [example GitHub Actions](#automating-with-github-actions) for automation. ##### CD tests Before deployment in production zones, [system tests](#system-tests) and [staging tests](#staging-tests) are run. Tests are run in a dedicated and [downsized](https://cloud.vespa.ai/en/reference/environments) environment. These tests are optional, see details in the sections below. Status and logs of ongoing tests can be found in the _Deployment_ view in the [Vespa Cloud Console](https://console.vespa-cloud.com/): ![Minimal deployment pipeline](/assets/img/deployment-with-system-test.png) These tests are also run during [Vespa Cloud upgrades](#vespa-cloud-upgrades). Find deployable example applications in [CI-CD](https://github.com/vespa-cloud/examples/tree/main/CI-CD). ###### System tests When a system test is run, the application is deployed in the [test environment](https://cloud.vespa.ai/en/reference/environments#test). The system test suite is then run against the endpoints of the test deployment. The test deployment is empty when the test execution begins. The application package and Vespa platform version is the same as that to be deployed to production. A test suite includes at least one [system test](/en/testing.html#system-tests). An application can be deployed to a production zone without system tests - this step will then only test that the application starts successfully. See [production deployment](production-deployment) for an example without tests. Read more about [system tests](/en/testing.html#system-tests). ###### Staging tests A staging test verifies the transition of a deployment of a new application package - i.e., from application package `Appold` to `Appnew`. A test suite includes at least one [staging setup](/en/testing.html#staging-tests), and [staging test](/en/testing.html#staging-tests). 1. All production zone deployments are polled for the current versions. As there can be multiple versions already being deployed (i.e. multiple `Appold`), there can be a series of staging test runs. 2. The application at revision `Appold` is deployed in the [staging environment](https://cloud.vespa.ai/en/reference/environments#staging). 3. The staging setup test code is run, typically making the cluster reasonably similar to a production cluster. 4. The test deployment is then upgraded to application revision `Appnew`. 5. Finally, the staging test code is run, to verify the deployment works as expected after the upgrade. An application can be deployed to a production zone without staging tests - this step will then only test that the application starts successfully before and after the change. See [production deployment](production-deployment) for an example without tests. Read more about [staging tests](/en/testing.html#staging-tests). ###### Disabling tests To deploy without testing, remove the test files from the application package. Tests are always run, regardless of _deployment.xml_. To temporarily deploy without testing, do a deploy and hit the "Abort" button (see illustration above, hover over the test step in the Console) - this skips the test step and makes the orchestration progress to the next step. ###### Running tests only To run a system test, without deploying to any nodes after, add a new test instance. In _deployment.xml_, add the instance without `dev` or`prod` elements, like: ``` ``` ... ``` ``` Note that this will leave an empty instance in the console, as the deployment is for testing only, so no resources deployed to after test. Make sure to run `vespa prod deploy` to invoke the pipeline for testing, and use a separate application for this test. ##### Deployment orchestration The _deployment orchestration_ is flexible. One can configure dependencies between deployments to production zones, production verification tests, and configured delays; by ordering these in parallel and serial blocks of steps: ![Picture of a complex automated deployment](/assets/img/automated-deployments-complex.png) On a higher level, instances can also depend on each other in the same way. This makes it easy to configure a deployment process which gradually rolls out changes to increasingly larger subsets of production nodes, as confidence grows with successful production verification tests. Refer to [deployment.xml](https://cloud.vespa.ai/en/reference/deployment) for details. Deployments run sequentially by default, but can be configured to [run in parallel](https://cloud.vespa.ai/en/reference/deployment). Inside each zone, Vespa Cloud orchestrates the deployment, such that the change is applied without disruption to read or write traffic against the application. A production deployment in a zone is complete when the new configuration is active on all nodes. Most changes are instant, making this a quick process. If node restarts are needed, e.g., during platform upgrades, these will happen automatically and safely as part of the deployment. When this is necessary, deployments will take longer to complete. System and staging tests, if present, must always be successfully run before the application package is deployed to production zones. ###### Source code repository integration Each new _submission_ is assigned an increasing build number, which can be used to track the roll-out of the new package to the instances and their zones. With the submission, add a source code repository reference for easy integration - this makes it easy to track changes: ![Build numbers and source code repository reference](/assets/img/CI-integration.png) Add the source diff link to the pull request - see example [GitHub Action](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml): ``` $ vespa prod deploy \ --source-url "$(git config --get remote.origin.url | sed 's+git@\(.*\):\(.*\)\.git+https://\1/\2+')/commit/$(git rev-parse HEAD)" ``` ###### Block-windows Use block-windows to block deployments during certain windows throughout the week, e.g., avoid rolling out changes during peak hours / during vacations. Hover over the instance (here "default") to find block status - see [block-change](https://cloud.vespa.ai/en/reference/deployment#block-change): ![Application block window](/assets/img/block-window.png) ###### Validation overrides Some configuration changes are potentially destructive / change the application behavior - examples are removing fields and changing linguistic processing. These changes are disallowed by default, the deploy-command will fail. To override and force a deploy, use a [validation override](/en/reference/validation-overrides.html): ``` ``` tensor-type-change ``` ``` ###### Production tests Production tests are optional and configured in [deployment.xml](https://cloud.vespa.ai/en/reference/deployment). Production tests do not have access to the Vespa endpoints, for security reasons. Dependent steps in the release pipeline will stop if the tests fail, but upgraded regions will remain on the version where the test failed. A production test is hence used to block deployments to subsequent zones and only makes sense in a multi-zone deployment. ###### Deploying Components Vespa is [backwards compatible](https://vespa.ai/releases.html#versions) within major versions, and major versions rarely change. This means that [Components](/en/jdisc/container-components.html) compiled against an older version of Vespa APIs can always be run on the same major version. However, if the application package is compiled against a newer API version, and then deployed to an older runtime version in production, it might fail. See [vespa:compileVersion](production-deployment.html#production-deployment-with-components) for how to solve this. ##### Automating with GitHub Actions Auto-deploy production applications using GitHub Actions - examples: - [deploy-vector-search.yaml](https://github.com/vespa-cloud/vector-search/blob/main/.github/workflows/deploy-vector-search.yaml) deploys an application to a production environment - a good example to start from! - [deploy.yaml](https://github.com/vespa-cloud/examples/blob/main/.github/workflows/deploy.yaml) deploys an application with basic HTTP tests. - [deploy-vespa-documentation-search.yaml](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/.github/workflows/deploy-vespa-documentation-search.yaml) deploys an application with Java-tests. The automation scripts use an API-KEY to deploy: ``` $ vespa auth api-key ``` This creates a key, or outputs: ``` Error: refusing to overwrite /Users/me/.vespa/mytenant.api-key.pem Hint: Use -f to overwrite it This is your public key: -----BEGIN PUBLIC KEY----- ABCDEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEB2UFsh8ZjoWNtkrDhyuMyaZQe1ze qLB9qquTKUDQTuM2LOr2dawUs02nfSc3UTfC08Lgr/dvnTnHpc0/fY+3Aw== -----END PUBLIC KEY----- Its fingerprint is: 12:34:56:78:65:30:77:90:30:ab:83:ee:a9:67:68:2c To use this key in Vespa Cloud click 'Add custom key' at https://console.vespa-cloud.com/tenant/mytenant/account/keys and paste the entire public key including the BEGIN and END lines. ``` This means, if there is a key, it is not overwritten, it is safe to run. Make sure to add the deploy-key to the tenant using the Vespa Cloud Console. After the deploy-key is added, everything is ready for deployment. You can upload or create new Application keys in the console, and store them as a secret in the repository like the GitHub actions example above. Some services like [Travis CI](https://travis-ci.com) do not accept multi-line values for Environment Variables in Settings. A workaround is to use the output of ``` $ openssl base64 -A -a < mykey.pem && echo ``` in a variable, say `VESPA_MYAPP_API_KEY`, in Travis Settings. `VESPA_MYAPP_API_KEY` is exported in the Travis environment, example output: ``` Setting environment variables from repository settings $ export VESPA_MYAPP_API_KEY=[secure] ``` Then, before deploying to Vespa Cloud, regenerate the key value: ``` $ MY_API_KEY=`echo $VESPA_MYAPP_API_KEY | openssl base64 -A -a -d` ``` and use `${MY_API_KEY}` in the deploy command. ##### Vespa Cloud upgrades Vespa upgrades follows the same pattern as for new application revisions in [CD tests](#cd-tests), and can be tracked via its version number in the Vespa Cloud Console. System tests are run the same way as for deploying a new application package. A staging test verifies the upgrade from application package `Appold` to `Appnew`, and from Vespa platform version `Vold` to `Vnew`. The staging test then consists of the following steps: 1. All production zone deployments are polled for the current `Vold` / `Appold` versions. As there can be multiple versions already being deployed (i.e. multiple `Vold` / `Appold`), there can be a series of staging test runs. 2. The application at revision `Appold` is deployed on platform version `Vold`, to a zone in the [staging environment](https://cloud.vespa.ai/en/reference/environments#staging). 3. The _staging setup_ test code is run, typically making the cluster reasonably similar to a production cluster. 4. The test deployment is then upgraded to application revision `Appnew` and platform version `Vnew`. 5. Finally, the _staging test_ test code is run, to verify the deployment works as expected after the upgrade. Note that one or both of the application revision and platform may be upgraded during the staging test, depending on what upgrade scenario the test is run to verify. These changes are usually kept separate, but in some cases is necessary to allow them to roll out together. ##### Next steps - Read more about [feature switches and bucket tests](/en/testing.html#feature-switches-and-bucket-tests). - A challenge with continuous deployment can be integration testing across multiple services: Another service depends on this Vespa application for its own integration testing. Use a separate [application instance](https://cloud.vespa.ai/en/reference/deployment#instance) for such integration testing. - Set up a deployment badge - available from the console's deployment view - example: ![vespa-team.vespacloud-docsearch.default overview](https://api-ctl.vespa-cloud.com/badge/v1/vespa-team/vespacloud-docsearch/default) - Set up a [global query endpoint](https://cloud.vespa.ai/en/reference/deployment#endpoints-global). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [CD tests](#cd-tests) - [System tests](#system-tests) - [Staging tests](#staging-tests) - [Disabling tests](#disabling-tests) - [Running tests only](#running-tests-only) - [Deployment orchestration](#deployment-orchestration) - [Source code repository integration](#source-code-repository-integration) - [Block-windows](#block-windows) - [Validation overrides](#validation-overrides) - [Production tests](#production-tests) - [Deploying Components](#deploying-components) - [Automating with GitHub Actions](#automating-with-github-actions) - [Vespa Cloud upgrades](#vespa-cloud-upgrades) - [Next steps](#next-steps) --- ## Autoscaling ### Autoscaling Autoscaling lets you adjust the hardware resources allocated to application clusters automatically depending on actual usage. #### Autoscaling Autoscaling lets you adjust the hardware resources allocated to application clusters automatically depending on actual usage. It will attempt to keep utilization of all allocated resources close to ideal, and will automatically reconfigure to the cheapest option allowed by the ranges when necessary. You can turn it on by specifying _ranges_ in square brackets for the [nodes](/en/reference/services.html#nodes) and/or [node resource](/en/reference/services.html#resources) values in _services.xml_. Vespa Cloud will monitor the resource utilization of your clusters and automatically choose the cheapest resource allocation within ranges that produces close to optimal utilization. You can see the status and recent actions of the autoscaler in the _Resources_ view under a deployment in the console. Autoscaling is not considering latency differences achieved by different configurations. If your application has certain configurations that produce good throughput but too high latency, you should not include these configurations in your autoscaling ranges. Adjusting the allocation of a cluster may happen quickly for stateless container clusters, and much more slowly for content clusters with a lot of data. Autoscaling will adjust each cluster on the timescale it typically takes to rescale it (including any data redistribution). The ideal utilization takes into account that a node may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for maintenance operations and handling requests with low latency. It acts on what it has observed on your system in the recent past. If you need much more capacity in the near future than you do currently, you may want to set the lower limit to take this into account. Upper limits should be set to the maximum size that makes business sense. ##### When to use autoscaling Autoscaling is useful in a number of scenarios. Some typical ones are: - You have a new application which you can't benchmark with realistic data and usage, making you unsure what resources to allocate: Set wide ranges for all resource parameters and let the system choose a configuration. Once you gain experience you can consider tightening the configuration space. - You have load that varies quickly during the day, or that may suddenly increase quickly due to some event, and want container cluster resources to quickly adjust to the load: Set a range for the number of nodes and/or vcpu on containers. - Your expect your data volume to grow over time, but you don't want to allocate resources prematurely, nor constantly worry about whether it is time to increase: Configure ranges for content nodes and/or node resources such that the size of the system grows with the data. ##### Resource tradeoffs Some other considerations when deciding resources: - Making changes to resources/nodes is easy and safe, and one of Vespa Cloud's strength. We advise you make controlled changes and observe effect on latencies, data migration and cost. Everything is automated, just deploy a new application package. This is useful learning when later needed during load peaks and capacity requirement changes. - Node resources cannot be chosen freely in all zones, CPU/Memory often comes in increments of x 2. Try to make sure that the resource configuration is a good fit. - CPU is the most expensive component, optimize for this for most applications. - Having few nodes means more overcapacity as Vespa requires that the system will handle one node being down (or one group, in content clusters having multiple groups). 4-5 nodes minimum is a good rule of thumb. Whether 4-5 or 9-10 nodes of half the size is better depends on quicker upgrade cycles vs. smoother resource auto-scale curves. Latencies can be better or worse, depending on static vs dynamic query cost. - Changing a node resource may mean allocating a new node, so it may be faster to scale container nodes by changing the number of nodes. - As a consequence, during resource shortage (say almost full disk), add nodes and keep the rest unchanged. - It is easiest to reason over capacity when changing one thing at a time. It is often safe to follow the _suggested resources_ advice when shown in the console and feel free to contact us if you have questions. ##### Mixed load A Vespa application must handle a combination of reads and writes, from multiple sources. User load often resembles a sine-like curve. Machine-generated load, like a batch job, can be spiky and abrupt. In the default Vespa configuration, all kinds of load uses \_one\_ default container cluster. Example: An application where daily batch jobs update the corpus at high rate: ![nodes and resources](/assets/img/load.png) Autoscaling scales _up_ much quicker than _down_, as the probability of a new spike is higher after one has been observed. In this example, see the rapid cluster growth for the daily load spike - followed by a slow decay. The best solution for this case is to slow down the batch job, as it is of short duration. It is not always doable to slow down jobs - in these cases, setting up[multiple container clusters](/en/operations-selfhosted/routing.html#multiple-container-clusters)can be a smart thing - optimize each cluster for its load characteristics. This could be a combination of clusters using autoscale and clusters with a fixed size. Autoscaling often works best for the user-generated load, whereas the machine-generated load could either be tuned or routed to a different cluster in the same Vespa application. ##### Related reading - [Feed sizing](/en/performance/sizing-feeding.html) - [Query sizing](/en/performance/sizing-search.html) Copyright © 2025 - [Cookie Preferences](#) --- ## Batch Delete ### Batch delete Options for batch deleting documents: #### Batch delete Options for batch deleting documents: 1. Use [vespa feed](../vespa-cli.html#documents): ``` $ vespa feed -t my-endpoint deletes.json ``` 2. Find documents using a query, delete, repeat. Pseudocode: ``` while True; do query and read document ids, if empty exit delete document ids using[/document/v1](../reference/document-v1-api-reference.html#delete)wait a sec # optional, add wait to reduce load while deleting ``` 3. Use a [document selection](../documents.html#document-expiry) to expire documents. This deletes all documents _not_ matching the expression. It is possible to use parent documents and imported fields for expiry of a document set. The content node will iterate over the corpus and delete documents (that are later compacted out): ``` ``` ``` ``` 4. Use [/document/v1](../reference/document-v1-api-reference.html#delete) to delete documents identified by a [document selection](../reference/document-select-language.html) - example dropping all documents from the _my\_doctype_ schema. The _cluster_ value is the ID of the content cluster in _services.xml_, e.g., ``: ``` $ curl -X DELETE \ "$ENDPOINT/document/v1/my_namespace/my_doctype/docid?selection=true&cluster=my_cluster" ``` 5. It is possible to drop a schema, with all its content, by removing the mapping to the content cluster. To understand what is happening, here is the status before the procedure: ##### Example This is an end-to-end example on how to track number of documents, and delete a subset using a [selection string](/en/reference/document-select-language.html). ###### Feed sample documents Feed a batch of documents, e.g. using the [vector-search](https://github.com/vespa-cloud/vector-search) sample application: ``` $ vespa feed <(python3 feed.py 100000 3) ``` See number of documents for a node using the [content.proton.documentdb.documents.total](/en/reference/searchnode-metrics-reference.html#content_proton_documentdb_documents_total) metric (here 100,000): ``` $ docker exec vespa curl -s http://localhost:19092/prometheus/v1/values | grep ^content.proton.documentdb.documents.total content_proton_documentdb_documents_total_max{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 content_proton_documentdb_documents_total_last{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 ``` Using the metric above is useful while feeding this example. Another alternative is [visiting](../visiting.html) all documents to print the ID: ``` $ vespa visit --field-set "[id]" | wc -l 100000 ``` At this point, there are 100,000 document in the index. ###### Define selection Define the subset of documents to delete - e.g. by age or other criteria. In this example, select random 1%. Do a test run: ``` $ vespa visit --field-set "[id]" --selection 'id.hash().abs() % 100 == 0' | wc -l 1016 ``` Hence, the selection string `id.hash().abs() % 100 == 0` hits 1,016 documents. ###### Delete documents Delete documents, see the number of documents deleted in the response: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors" { "pathId":"/document/v1/mynamespace/vector/docid", "documentCount":1016 } ``` In case of a large result set, a continuation token might be returned in the response, too: ``` "continuation": "AAAAEAAAA" ``` If so, add the token and redo the request: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors&continuation=AAAAEAAAA" ``` Repeat as long as there are tokens in the output. The token changes in every response. ###### Validate Check that all documents matching the selection criterion are deleted: ``` $ vespa visit --selection 'id.hash().abs() % 100 == 0' --field-set "[id]" | wc -l 0 ``` List remaining documents: ``` $ vespa visit --field-set "[id]" | wc -l 98984 ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Example](#example) - [Feed sample documents](#feed-sample-documents) - [Define selection](#define-selection) - [Delete documents](#delete-documents) - [Validate](#validate) --- ## Benchmarking ### Benchmarking This is a step-by-step guide to get started with benchmarking on Vespa Cloud, based on the [Vespa benchmarking guide](/en/performance/vespa-benchmarking.html), using the [sample app](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation). #### Benchmarking This is a step-by-step guide to get started with benchmarking on Vespa Cloud, based on the [Vespa benchmarking guide](/en/performance/vespa-benchmarking.html), using the [sample app](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation). Overview: ![Vespa Cloud Benchmarking](/assets/img/cloud-benchmarks.svg) ##### Set up a performance test instance Use an instance in a [dev zone](/en/cloud/environments.html#dev) for benchmarks. To deploy an instance there, use the [getting started](/en/deploy-an-application.html) guide, and make sure to specify the resources using a `deploy:environment="dev"` attribute: ``` ``` ``` ``` ``` $ vespa deploy --wait 600 ``` Feed documents: ``` $ vespa feed ext/documents.jsonl ``` Query documents to validate the feed: ``` $ vespa query "select * from music where true" ``` Query documents using curl: ``` $ curl \ --cert ~/.vespa/mytenant.myapp.default/data-plane-public-cert.pem \ --key ~/.vespa/mytenant.myapp.default/data-plane-private-key.pem \ -H "Content-Type: application/json" \ --data '{"yql" : "select * from music where true"}' \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/search/ ``` At this point, the instance is ready, with data, and can be queried using data-plane credentials. ##### Test using vespa-fbench The rest of the guide assumes the data-plane credentials are in working directory: ``` $ ls -1 *.pem data-plane-private-key.pem data-plane-public-cert.pem ``` Prepare a query file: ``` $ echo "/search/?yql=select+*+from+music+where+true" > query001.txt ``` Test using [vespa-fbench](/en/operations/tools.html#vespa-fbench) running in a docker container: ``` $ docker run -v $(pwd):/files -w /files \ --entrypoint /opt/vespa/bin/vespa-fbench \ vespaengine/vespa \ -C data-plane-public-cert.pem \ -K data-plane-private-key.pem \ -T /etc/ssl/certs/ca-bundle.crt \ -n 1 -q query001.txt -s 1 -c 0 \ -o output.txt \ baaae1db.b68ddc0d.z.vespa-app.cloud 443 ``` `-o output.txt` is useful when validating the test - remove this option when load testing. Make sure there are no `SSL_do_handshake` errors in the output. Expect HTTP status code 200: ``` Starting clients... Stopping clients Clients stopped. . Clients Joined. ***HTTP keep-alive statistics*** connection reuse count -- 4 *****************Benchmark Summary***************** clients: 1 ran for: 1 seconds cycle time: 0 ms lower response limit: 0 bytes skipped requests: 0 failed requests: 0 successful requests: 5 cycles not held: 5 minimum response time: 128.17 ms maximum response time: 515.35 ms average response time: 206.38 ms 25 percentile: 128.70 ms 50 percentile: 129.60 ms 75 percentile: 130.20 ms 90 percentile: 361.32 ms 95 percentile: 438.36 ms 99 percentile: 499.99 ms actual query rate: 4.80 Q/s utilization: 99.03 % zero hit queries: 5 http request status breakdown: 200 : 5 ``` At this point, running queries using _vespa-fbench_ works well from local laptop. ##### Run queries inside data center Next step is to run this from the same location (data center) as the dev zone. In this example, an AWS [zone](/en/cloud/zones.html). Deduce the AWS zone from Vespa Cloud zone name. Below is an example using a host with Amazon Linux 2023 AMI (HVM) image: 1. Create the host - here assume key pair is named _key.pem_. No need to do anything other than default. 2. Log in, update, install docker: 3. Copy credentials for endpoint access, log in and validate docker setup: 4. Make a dummy query: 5. Run vespa-fbench and verify 200 response: At this point, you are able to benchmark using _vespa-fbench_ in the same zone as the Vespa Cloud dev instance. ##### Run benchmark Use the [Vespa Benchmarking Guide](/en/performance/vespa-benchmarking.html) to plan and run benchmarks. Also see [sizing](#sizing) below. Make sure the client running the benchmark tool has sufficient resources. Export [metrics](/en/operations/metrics.html): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values ``` Notes: - Periodically dump all metrics using `consumer=Vespa`. - Make sure you will not exhaust your serving threads on your container nodes while in production. This can be verified by making sure this expression stays well below 100% (typically below 50%) for the traffic you expect: `100 * (jdisc.thread_pool.active_threads.sum / jdisc.thread_pool.active_threads.count) / jdisc.thread_pool.size.max` for each `threadpool` value. You can increase the number of threads in the pools by using larger container nodes, more container nodes or by tuning the number of threads as described in [services-search](/en/reference/services-search.html#threadpool). In the case you do exhaust a threadpool and its queue you will experience HTTP 503 responses for requests that are rejected by the container. ##### Making changes Whenever deploying changes to configuration, track progress in the Deployment dashboard. Some changes, like changing [requestthreads](/en/reference/services-content.html#requestthreads) will restart content nodes, and this is done in sequence and takes time. Wait for successful completion in _Wait for services and endpoints to come online_. When changing node type/count, wait for auto data redistribution to complete, watching the `vds.idealstate.merge_bucket.pending.average` metric: ``` $ while true; do curl -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ https://baaae1db.b68ddc0d.z.vespa-app.cloud/prometheus/v1/values?consumer=Vespa | \ grep idealstate.merge_bucket.pending.average; \ sleep 10; done ``` Notes: - Dump all metrics using `consumer=Vespa`. - After changing the number of content nodes, this metric will jump, then decrease (not necessarily linearly) - speed depending on data volume. ##### Sizing Using Vespa Cloud enables the Vespa Team to assist you to optimise the application to reduce resource spend. Based on 150 applications running on Vespa Cloud today, savings are typically 50%. Cost optimization is hard to do without domain knowledge - but few teams are experts in both their application and its serving platform. Sizing means finding both the right node size and the right cluster topology: ![Resize to fewer and smaller nodes](/assets/img/nodes.svg) Applications use Vespa for their primary business use cases. Availability and performance vs. cost are business decisions. The best sized application can handle all expected load situations, and is configured to degrade quality gracefully for the unexpected. Even though Vespa is cost-efficient out of the box, Vespa experts can usually spot over/under-allocations in CPU, memory and disk space/IO, and discuss trade-offs with the application team. Using [automated deployments](/en/cloud/automated-deployments.html) applications go live with little risk. After launch, right-size the application based on true load after using Vespa’s elasticity features with automated data migration. Use the [Vespa sizing guide](/en/performance/sizing-search.html)to size the application and find metrics used there. Pro-tips: - 60% is a good max memory allocation - 50% is a good max CPU allocation, although application dependent. - 70% is a good max disk allocation Rules of thumb: - Memory and disk scales approximately linearly for indexed fields' data - attributes have a fixed cost for empty fields. - Data variance will impact memory usage. - Undersized instances will [block writes](/en/operations/feed-block.html). - If is often a good idea to use the `dev` zone to test memory impact of adding large fields, e.g. adding an embedding. ##### Notes - The user running benchmarks must have read access to the endpoint - if you already have, you can skip this section. Refer to the [Vespa security guide](https://docs.vespa.ai/en/cloud/security/guide.html). - [Monitoring](/en/cloud/monitoring.html) is useful to track metrics when benchmarking. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Set up a performance test instance](#set-up-a-performance-test-instance) - [Test using vespa-fbench](#test-using-vespa-fbench) - [Run queries inside data center](#run-queries-inside-data-center) - [Run benchmark](#run-benchmark) - [Making changes](#making-changes) - [Sizing](#sizing) - [Notes](#notes) --- ## Binarizing Vectors ### Binarizing Vectors Binarization in this context is mapping numbers in a vector (embedding) to bits (reducing the value range), and representing the vector of bits efficiently using the `int8` data type. #### Binarizing Vectors Binarization in this context is mapping numbers in a vector (embedding) to bits (reducing the value range), and representing the vector of bits efficiently using the `int8` data type. Examples: | input vector | binarized floats | pack\_bits (to INT8) | | --- | --- | --- | | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] | -1 | | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | -128 | | [2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0] | -127 | Binarization is key to reducing memory requirements and, therefore, cost. Binarization can also improve feeding performance, as the memory bandwidth requirements go down accordingly. Refer to [embedding](/en/embedding.html) for more details on how to create embeddings from text. ##### Summary This guide maps all the steps required to run a successful binarization project using Vespa only - there is no need to re-feed data. This makes a project feasible with limited incremental resource usage and man-hours required. Approximate Nearest Neighbor vector operations are run using an HNSW index in Vespa, with online data structures. The cluster is operational during the procedure, gradually building the required data structures. This guide is useful to map the steps and tradeoffs made for a successful vector binarization. Other relevant articles on how to reduce vector size in memory are: - [Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa](https://blog.vespa.ai/matryoshka-embeddings-in-vespa/) - [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/) Adding to this, using algorithms like SPANN can solve problems with huge vector data sizes, read more in [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/). A binarization project normally involves iteration over different configuration settings, measuring quality loss for each iteration - this procedure it built with that in mind. ##### Converters Vespa’s built-in indexing language [converters](/en/reference/indexing-language-reference.html#converters)`binarize` and `pack_bits` let you easily generate binarized vectors. Example schema definitions used to generate the vectors in the table above: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } field doc_embedding_binarized_floats type tensor(x[8]) { indexing: input doc_embedding | binarize | attribute } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute } } ``` We see that the `binarize` function itself will not compress vectors to a smaller size, as the output cell type is the same as the input - it is only the values that are mapped to 0 or 1. Above, the vectors are binarized using a threshold value of 0, the Vespa default - any number \> 0 will map to 1 - this threshold is configurable. `pack_bits` reads binarized vectors and represents them using int8. In the example above: - `tensor(x[8])` is 8 x sizeof(float) = 8 x 32 bits = 256 bits = 32 bytes - `tensor(x[1])` is 1 x sizeof(int8) = 1 x 8 bits = 8 bits = 1 byte In other words, a compression factor of 32, which is expected, mapping a 32-bit float into 1 bit. As memory usage often is the cost driver for applications, this has huge potential. However, there is a loss of precision, so the tradeoff must be evaluated. Read more in [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) and[combining-matryoshka-with-binary-quantization-using-embedder](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/). ##### Binarizing an existing embedding field In the example above, we see that `doc_embedding` has the original embedding data, and the fields `doc_embedding_binarized_floats` and `doc_embedding_binarized` are generated from `doc_embedding`. This is configured through the `indexing: input …` statement, and defining the generated fields outside the `document { … }` block. **Note:** The `doc_embedding_binarized_floats` field is just for illustration purposes, as input to the `doc_embedding_binarized` field, which is the target binarized and packed field with low memory requirements. From here, we will call this the binarized embedding. This is a common case for many applications - how to safely binarize and evaluate the binarized data for subsequent use. The process can be broken down into: - Pre-requisites. - Define the new binarized embedding, normally as an addition to the original field. - Deploy and re-index the data to populate the binarized embedding. - Create new ranking profiles with the binarized embeddings. - Evaluate the quality of the binarized embedding. - Remove the original embedding field from memory to save cost. ##### Pre-requisites Adding a new field takes resources, on disk and in memory. A new binarized embedding field is smaller - above, it is 1/32 of the original field. Also note that embedding fields often have an index configured, like: ``` field doc_embeddings type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } } } ``` The index is used for approximate nearest neighbor (ANN) searches, and also consumes memory. Use the Vespa Cloud console to evaluate the size of original fields and size of indexes to make sure that there is room for the new embedding field, possibly with an index. **Note:** The size of an index is a function of the number of documents, regardless of tensor type. In this context, this means that adding a new field with and index, the new index will have the same size as the index of the existing embedding field. Use status pages to find the index size in memory - example: https://api-ctl.vespa-cloud.com/application/v4/tenant/ TENANT\_NAME/application/APP\_NAME/instance/INSTANCE\_NAME/environment/prod/region/REGION/ service/searchnode/NODE\_HOSTNAME/ state/v1/custom/component/documentdb/SCHEMA/subdb/ready/attribute/ATTRIBUTE\_NAME ###### Example ``` tensor: { compact_generation: 33946879, ref_vector: { memory_usage: { used: 1402202052, dead: 0, allocated: 1600126976, onHold: 0 } }, tensor_store: { memory_usage: { used: 205348904436, dead: 10248636768, allocated:206719921232, onHold: 0 } }, nearest_neighbor_index: { memory_usage: { all: { used: 10452397992, dead: 360247164, allocated:13346516304, onHold: 0 } ``` In this example, the index is 13G, the tensor data is 206G, so the index is 6.3% of the tensor data. The original tensor is of type `bfloat16`, a binarized version is 1/16 of this and hence 13G. As an extra index is 13G, the temporal incremental memory usage is approximately 26G during the procedure. ##### Define the binarized embedding field The new field is _added_ to the schema, example schema, before: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } } ``` After: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute } } ``` The above are simple examples, with no ANN settings on the fields. Following is a more complex example - schema before: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } } ``` Schema after: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } ``` Note that we replicate the index settings to the new field. ##### Deploy and reindex the binarized embedding field Deploying the field will trigger a reindexing on Vespa Cloud to populate the binarized embedding, fully automated. Self-hosted, the `deploy` operation will output the below - [trigger a reindex](/en/operations/reindexing.html). ``` $ vespa deploy Uploading application package... done Success: Deployed '.' with session ID 3 WARNING Change(s) between active and new application that may require re-index: reindexing: Consider re-indexing document type 'doc' in cluster 'doc' because: 1) Document type 'doc': Non-document field 'doc_embedding_binarized' added; this may be populated by reindexing ``` Depending on the size of the corpus and resources configured, the reindexing process takes time. ##### Create new ranking profiles and queries using the binarized embeddings After reindexing, you can query using the new, binarized embedding field. Assuming a query using the doc\_embedding field: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` The same query, with a binarized query vector, to the binarized field: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` See [tensor-hex-dump](/en/reference/document-json-format.html#tensor-hex-dump)for more information about how to create the int8-typed tensor. ###### Quick Hamming distance intro Example embeddings: | document embedding | binarized floats | pack\_bits (to INT8) | | --- | --- | --- | | [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0] | [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] | 0 | | **query embedding** | **binarized floats** | **to INT8** | | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0] | -119 | Use [matchfeatures](/en/reference/schema-reference.html#match-features)to debug ranking (see ranking profile `app_ranking_bin` below): ``` "matchfeatures": { "attribute(doc_embedding_binarized)": { "type": "tensor(x[1])", "values": [0] }, "distance(field,doc_embedding_binarized)": 3.0, "query(q_bin)": { "type": "tensor(x[1])", "values": [-119] } } ``` See distance calculated to 3.0, which is the number of bits different in the binarized vectors, which is the hamming distance. ##### Rank profiles and queries Assuming a rank profile like: ``` rank-profile app_ranking { match-features { distance(field, doc_embedding) query(q) attribute(doc_embedding) } inputs { query(q) tensor(x[8]) } first-phase { expression: closeness(field, doc_embedding) } } ``` Query: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding, q)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0]' \ 'ranking=app_ranking' ``` A binarized version is like: ``` rank-profile app_ranking_bin { match-features { distance(field, doc_embedding_binarized) query(q_bin) attribute(doc_embedding_binarized) } inputs { query(q_bin) tensor(x[1]) } first-phase { expression: closeness(field, doc_embedding_binarized) } } ``` Query: ``` $ vespa query \ 'yql=select * from doc where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin' ``` Query with full-precision query vector, against a binarized vector - rank profile: ``` rank-profile app_ranking_bin_full { match-features { distance(field, doc_embedding_binarized) query(q) query(q_bin) attribute(doc_embedding_binarized) } function unpack_to_float() { expression: 2*unpack_bits(attribute(doc_embedding_binarized), float)-1 } function dot_product() { expression: sum(query(q) * unpack_to_float) } inputs { query(q) tensor(x[8]) query(q_bin) tensor(x[1]) } first-phase { expression: closeness(field, doc_embedding_binarized) } second-phase { expression: dot_product } } ``` Notes: - The first-phase ranking is as the binarized query above. - The second-phase ranking is using the full-precision query vector query(q) with a bit-precision vector cast to float for type match. - Both query vectors must be supplied in the query. Note the differences when using full values in the query tensor, see the relevance score for the results: ``` $ vespa query \ 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' ... "relevance": 3.0 ``` ``` $ vespa query \ 'yql=select * from music where {targetHits:5}nearestNeighbor(doc_embedding_binarized, q_bin)' \ 'input.query(q)=[2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]' \ 'input.query(q_bin)=[-119]' \ 'ranking=app_ranking_bin_full' "relevance": 4.0 ``` Read the [closeness](/en/reference/rank-features.html#closeness(dimension,name)) reference documentation. ###### TargetHits for ANN Given the lower precision with binarization, it might be a good idea to increase the `{targetHits:5}` annotation in the query, to generate more candidates for later ranking phases. ##### Evaluate the quality of the binarized embeddings This exercise is about evaluating a lower-precision retrieval phase, using the original full-sized (here we use floats) query-result pairs as reference. Experiments, query-document precision: 1. float-float 2. binarized-binarized 3. float-binarized 4. float-float, with binarized retrieval To evaluate the precision, compute the differences for each query @10, like: ``` def compute_list_differences(list1, list2): set1 = set(list1) set2 = set(list2) return len(set1 - set2) list1 = [1, 3, 5, 7, 9, 11, 13, 15, 17, 20] list2 = [2, 3, 5, 7, 9, 11, 14, 15, 18, 20] num_hits = compute_list_differences(list1, list2) print(f"Hits different: {num_hits}") ``` ##### Remove the original embedding field from memory The purpose of the binarization is reducing memory footprint. Given the results of the evaluation above, store the full-precision embeddings on disk or remove them altogether. Example with paging the attribute to disk-only: ``` schema doc { document doc { field doc_embedding type tensor(x[8]) { indexing: summary | attribute | index attribute: paged } } field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } ``` This example only indexes the binarized embedding, with data binarized before indexing: ``` schema doc { document doc { field doc_embedding_binarized type tensor(x[1]) { indexing: input doc_embedding | binarize | pack_bits | attribute | index attribute { distance-metric: hamming } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 200 } } } } } ``` ##### Appendix: Binarizing from text input To generate the embedding from other data types, like text, use the [converters](/en/reference/indexing-language-reference.html#converters) - example: ``` field doc_embedding type tensor(x[1]) { indexing: (input title || "") . " " . (input content || "") | embed | attribute attribute { distance-metric: hamming } } ``` Find examples in [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/). ##### Appendix: conversion to int8 Find examples of how to binarize values in code: ``` import numpy as np def floats_to_bits(floats): if len(floats) != 8: raise ValueError("Input must be a list of 8 floats.") bits = [1 if f > 0 else 0 for f in floats] return bits def bits_to_int8(bits): bit_string = ''.join(str(bit) for bit in bits) int_value = int(bit_string, 2) int8_value = np.int8(int_value) return int8_value def floats_to_int8(floats): bits = floats_to_bits(floats) int8_value = bits_to_int8(bits) return int8_value floats = [0.5, -1.2, 3.4, 0.0, -0.5, 2.3, -4.5, 1.2] int8_value = floats_to_int8(floats) print(f"The int8 value is: {int8_value}") ``` ``` import numpy as np def binarize_tensor(tensor: torch.Tensor) -> str: """ Binarize a floating-point 1-d tensor by thresholding at zero and packing the bits into bytes. Returns the hex str representation of the bytes. """ if not tensor.is_floating_point(): raise ValueError("Input tensor must be of floating-point type.") return ( np.packbits(np.where(tensor > 0, 1, 0), axis=0).astype(np.int8).tobytes().hex() ) ``` Multivector example, from[ColPali: Efficient Document Retrieval with Vision Language Models](https://pyvespa.readthedocs.io/en/latest/examples/colpali-document-retrieval-vision-language-models-cloud.html): ``` import numpy as np from typing import Dict, List from binascii import hexlify def binarize_token_vectors_hex(vectors: List[torch.Tensor]) -> Dict[str, str]: vespa_tensor = list() for page_id in range(0, len(vectors)): page_vector = vectors[page_id] binarized_token_vectors = np.packbits( np.where(page_vector > 0, 1, 0), axis=1 ).astype(np.int8) for patch_index in range(0, len(page_vector)): values = str( hexlify(binarized_token_vectors[patch_index].tobytes()), "utf-8" ) if ( values == "00000000000000000000000000000000" ): # skip empty vectors due to padding of batch continue vespa_tensor_cell = { "address": {"page": page_id, "patch": patch_index}, "values": values, } vespa_tensor.append(vespa_tensor_cell) return vespa_tensor ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Summary](#summary) - [Converters](#converters) - [Binarizing an existing embedding field](#binarizing-an-existing-embedding-field) - [Pre-requisites](#pre-requisites) - [Example](#example) - [Define the binarized embedding field](#define-the-binarized-embedding-field) - [Deploy and reindex the binarized embedding field](#deploy-and-reindex-the-binarized-embedding-field) - [Create new ranking profiles and queries using the binarized embeddings](#create-new-ranking-profiles-and-queries-using-the-binarized-embeddings) - [Quick Hamming distance intro](#quick-hamming-distance-intro) - [Rank profiles and queries](#rank-profiles-and-queries) - [TargetHits for ANN](#targethits-for-ann) - [Evaluate the quality of the binarized embeddings](#evaluate-the-quality-of-the-binarized-embeddings) - [Remove the original embedding field from memory](#remove-the-original-embedding-field-from-memory) - [Appendix: Binarizing from text input](#appendix-binarizing-from-text-input) - [Appendix: conversion to int8](#appendix-conversion-to-int8) --- ## Bm25 ### BM25 Reference The[bm25 rank feature](rank-features.html#bm25)implements the[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25)ranking function used to estimate the relevance of a text document given a search query. #### BM25 Reference The[bm25 rank feature](rank-features.html#bm25)implements the[Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25)ranking function used to estimate the relevance of a text document given a search query. It is a pure text ranking feature which operates over an[indexed string field](schema-reference.html#indexing-index). The feature is cheap to compute, about 3-4 times faster than[nativeRank](nativerank.html), while still providing a good rank score quality wise. It is a good candidate to use in a first phase ranking function when ranking text documents. ##### Ranking function The _bm25_ feature calculates a score for how good a query with termsq1,...,qnmatches an indexed string field _t_ in a document _D_. The score is calculated as follows: ∑inIDF(qi)⋅f(qi,D)⋅(k1+1)f(qi,D)+k1⋅(1-b+b⋅field\_lenavg\_field\_len) Where the components in the function are: - IDF(qi): The [inverse document frequency](https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Inverse_document_frequency) (_IDF_) of query term _i_ in field _t_. This is calculated as: - f(qi,D): The number of occurrences (term frequency) of query term _i_ in the field _t_ of document _D_. For multi-value fields we use the sum of occurrences over all elements. - field\_len: The field length (in number of words) of field _t_ in document _D_. For multi-value fields we use the sum of field lengths over all elements. - avg\_field\_len: The average field length of field _t_ among the documents on the content node. Can be configured using [rank-properties](rank-feature-configuration.html#bm25). - k1: A parameter used to limit how much a single query term can affect the score for document _D_. With a higher value the score for a single term can continue to go up relatively more when more occurrences for that term exists. Default value is 1.2. Can be configured using [rank-properties](rank-feature-configuration.html#bm25). - b: A parameter used to control the effect of the field length of field _t_ compared to the average field length. Default value is 0.75. Can be configured using [rank-properties](rank-feature-configuration.html#bm25). ##### Example In the following example we have an indexed string field _content_, and a rank profile using the _bm25_ rank feature. Note that the field must be enabled for usage with the bm25 feature by setting the _enable-bm25_ flag in the[index](schema-reference.html#index)section of the field definition. ``` schema example { document example { field content type string { indexing: index | summary index: enable-bm25 } } rank-profile default { first-phase { expression { bm25(content) } } } } ``` If the _enable-bm25_ flag is turned on after documents are already fed then [proton](../proton.html) performs a [memory index flush](../proton.html#memory-index-flush)followed by a [disk index fusion](../proton.html#disk-index-fusion) to prepare the posting lists for use with _bm25_. Use the [custom component state API](../proton.html#custom-component-state-api) on each content node and examine `pending_urgent_flush` to determine if the preparation is still ongoing: ``` /state/v1/custom/component/documentdb/mydoctype/subdb/ready/index ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Boolean Library ### Predicate Search Java Library **Important:** The Predicate Search Java Library is **deprecated** for removal in[Vespa 9](vespa9-release-notes.html). #### Predicate Search Java Library **Important:** The Predicate Search Java Library is **deprecated** for removal in[Vespa 9](vespa9-release-notes.html). Use [predicate fields](predicate-fields.html) instead. The rationale for predicate fields is described in the[predicate fields document](predicate-fields.html). Vespa also has a standalone Java library for searching predicate fields, for boolean matching tightly integrated with a Java program, e.g. running on a grid. The performance is similar to predicate search in Vespa. Find API documentation in the[javadoc](https://javadoc.io/doc/com.yahoo.vespa/predicate-search). Get started - add a dependency in _pom.xml_: ``` com.yahoo.vespa predicate-search ``` ###### Indexing documents Unlike Vespa predicate fields, which have dynamic indexes, the Java library requires the entire index to be built before any searches are run. Once built, the index cannot be changed. Build an index using an instance of `PredicateIndexBuilder`. Use `indexDocument(id, predicate)` to index documents. This method takes two arguments, a 32-bit document id and the document itself (in form of a `Predicate` object). Once all documents are indexed, create the index by invoking `build()`. This method returns a `PredicateIndex` object. Use [Predicate.fromString()](https://javadoc.io/doc/com.yahoo.vespa/predicate-search-core/latest/com/yahoo/document/predicate/Predicate.html) to parse predicate expressions from strings. The expressions use the [predicate syntax](predicate-fields.html). ###### Index configuration Just as with Vespa predicate fields, specify the [arity](predicate-fields.html#index-size) and[upper- and lower bounds](predicate-fields.html#upper-and-lower-bounds) for the index, to make the index more efficient and trade index size for query performance. This is set when creating a `PredicateIndexBuilder` object, and cannot be changed without creating a new `PredicateIndexBuilder`. Check the [Config](https://javadoc.io/doc/com.yahoo.vespa/predicate-search/latest/com/yahoo/search/predicate/Config.html)class for more information on other configuration parameters. ###### Serializing the index The predicate index supports serialization. Use the `PredicateIndex.writeToOutputStream(out)` to serialize the index to an output stream, and the `PredicateIndex.fromInputStream(in)` to deserialize an index from an input stream. Deserializing an index is significantly faster than creating a new index through `PredicateIndexBuilder`. ###### Creating a searcher `PredicateIndex` has a method called `searcher()`, which creates a new searcher object. The searcher exposes one method, `search(query)`, which searches the index. The index itself is thread-safe, but a searcher is not. When searching from multiple threads, make sure to create a separate searcher object for each thread. ###### Creating a query A query is represented as a `PredicateQuery` object. The `PredicateQuery` object contains a set of features with `String` values, and a set of range features with `long` values. Each feature in the query may have a 64-bit sub-query bitmap set. ###### Executing a query The `search()` method on `PredicateIndex.Searcher`returns a stream object which lazily evaluates the query when the results are needed. Each `Hit` in the result stream contains the document id for the hit, as well as a sub-query bitmap indicating which sub-queries the hit was a match for. The hits are returned in the order they were indexed. ##### Performance As with Vespa, the arity and upper- and lower bound configuration impacts on the performance and index size. E.g, increasing the arity increases QPS at a cost of a larger index. The library may cache common posting lists to increase performance. The cache consists of the most expensive posting lists based on size and their prevalence in queries. The cache is disabled by default and must be build manually using`PredicateIndex.rebuildPostingListCache()`. It is recommended to rebuild the cache regularly for optimal performance, for instance every 1 millionth query or every 30 minutes. The cache rebuild is thread-safe and can be executed safely during concurrent search operations. ##### Sample code ``` package com.yahoo.example; import com.yahoo.document.predicate.Predicate; import com.yahoo.search.predicate.Config; import com.yahoo.search.predicate.Hit; import com.yahoo.search.predicate.PredicateIndex; import com.yahoo.search.predicate.PredicateIndexBuilder; import com.yahoo.search.predicate.PredicateQuery; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.DataInputStream; import java.io.DataOutputStream; import java.io.IOException; import java.util.List; import java.util.stream.Stream; import static com.yahoo.document.predicate.Predicates.and; import static com.yahoo.document.predicate.Predicates.feature; import static java.util.stream.Collectors.toList; public class App { public static void main( String[] args ) throws IOException { // Create index configuration Config config = new Config.Builder() .setArity(10) .setLowerBound(0) // Minimum value for 'age' range feature .setUpperBound(150) // Maximum value for 'age' range feature .build(); // Create index builder PredicateIndexBuilder indexBuilder = new PredicateIndexBuilder(config); // Pass document id and predicate. 'age' is a range feature, while 'gender' is a normal feature indexBuilder.indexDocument(1, Predicate.fromString("age in [20..40] and gender in ['male', 'female']")); indexBuilder.indexDocument(2, and(feature("age").inRange(40, 60), feature("gender").inSet("male"))); // Create index from builder PredicateIndex index = indexBuilder.build(); // Create query1 PredicateQuery query1 = new PredicateQuery(); query1.addFeature("gender", "male", 0b01); // Subquery 0 query1.addFeature("gender", "female", 0b10); // Subquery 1 query1.addRangeFeature("age", 30, 0b11); // Subquery 0 and 1 // Run queries using multiple threads Runnable searchRunnable = () -> { // Create a searcher // Note: PredicateIndex.Searcher is not thread safe, so each thread has to use a separate Searcher PredicateIndex.Searcher searcher = index.searcher(); // Search index. A stream of hits is returned Stream hitStream = searcher.search(query1); // Prints document id and subquery bitmap ('[0, 0x3]'). Bitmap is 0b11 as document matches both subqueries System.out.println("Hit: " + hitStream.findFirst().get()); }; new Thread(searchRunnable).start(); new Thread(searchRunnable).start(); // Rebuild posting list cache to improve performance index.rebuildPostingListCache(); new Thread(searchRunnable).start(); new Thread(searchRunnable).start(); // Serialized index to a byte array ByteArrayOutputStream baos = new ByteArrayOutputStream(); index.writeToOutputStream(new DataOutputStream(baos)); byte[] serializedIndex = baos.toByteArray(); // Load the index from byte array PredicateIndex deserializedIndex = PredicateIndex.fromInputStream( new DataInputStream( new ByteArrayInputStream(serializedIndex))); // Create new query (which is not using subqueries this time) PredicateQuery query2 = new PredicateQuery(); query2.addFeature("gender", "male"); query2.addRangeFeature("age", 40); // Search using deserialized index List hits = deserializedIndex.searcher().search(query2).collect(toList()); // Prints the id for both documents. No subquery bitmap printed System.out.println("Hits from deserialized index: " + hits); } } ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Indexing documents](#indexing-documents) - [Index configuration](#index-configuration) - [Serializing the index](#serializing-the-index) - [Creating a searcher](#creating-a-searcher) - [Creating a query](#creating-a-query) - [Executing a query](#executing-a-query) - [Performance](#performance) - [Sample code](#sample-code) --- ## Buckets ### Buckets The content layer splits the document space into chunks called _buckets_, and algorithmically maps documents to buckets by their id. #### Buckets The content layer splits the document space into chunks called _buckets_, and algorithmically maps documents to buckets by their id. The cluster automatically splits and joins buckets to maintain a uniform distribution across all nodes and to keep bucket sizes within configurable limits. Documents have string identifiers that maps to a 58 bit numeric location. A bucket is defined as all the documents that shares a given amount of the least significant bits within the location. The amount of bits used controls how many buckets will exist. For instance, if a bucket contains all documents whose 8 LSB bits is 0x01, the bucket can be split in two by using the 9th bit in the location to split them. Similarly, buckets can be joined by requiring one less bit in common. ##### Distribution Distribution happens in several layers. - Documents map to 58 bit numeric locations. - Locations map to buckets - Buckets map to distributors responsible for handling requests related to those buckets. - Buckets map to content nodes responsible for storing replicas of buckets. ###### Document to location distribution Document identifiers use [document identifier schemes](../documents.html)to map documents to locations. This way it is possible to co-locate data within buckets by enforcing some documents to have common LSB bits. Specifying a group or numeric value with the n and g options overrides the 32 LSB bits of the location. Only use when required, e.g. when using streaming search for personal search. ###### Location to bucket distribution The cluster state contains a distribution bit count, which is the amount of location bits to use to generate buckets which can be mapped to distributors. The cluster state may change the number of distribution bits to adjust the number of buckets distributed at this level. When adding more nodes to the cluster, the number of buckets increases in order for the distribution to remain uniform. Altering the distribution bit count causes a redistribution of all buckets. If locations have been overridden to co-localize documents into few units, the distribution of documents into these buckets may be skewed. ###### Bucket to distributor distribution Buckets are mapped to distributors using the ideal state algorithm. ###### Bucket to content node distribution Buckets are mapped to content nodes using the ideal state algorithm. As the content nodes persist data, changing bucket ownership takes more time/resources than on the distributors. There is usually a replica of a bucket on the same content node as the distributor owning the bucket, as the same algorithm is used. The distributors may split the buckets further than the distribution bit count indicates, allowing more units to be distributed among the content nodes to create a more even distribution, while not affecting routing from client to distributors. ##### Maintenance operations The content layer defines a set of maintenance operations to keep the cluster balanced. Distributors schedule maintenance operations and issue them to content nodes. Maintenance operations are typically not high priority requests. Scheduling a maintenance operation does not block any external operations. | Split bucket | Split a bucket in two, by enforcing the documents within the new buckets to have more location bits in common. Buckets are split either because they have grown too big, or because the cluster wants to use more distribution bits. | | Join bucket | Join two buckets into one. If a bucket has been previously split due to being large, but documents have now been deleted, the bucket can be joined again. | | Merge bucket | If there are multiple replicas of a bucket, but they do not store the same set of versioned documents, _merge_ is used to synchronize the replicas. A special case of a merge is a one-way merge, which may be done if some of the replicas are to be deleted right after the merge. Merging is used not only to fix inconsistent bucket replicas, but also to move buckets between nodes. To move a bucket, an empty replica is created on the target node, a merge is executed, and the source bucket is deleted. | | Create bucket | This operation exist merely for the distributor to notify a content node that it is now to store documents for this bucket too. This allows content nodes to refuse operations towards buckets it does not own. The ability to refuse traffic is a safeguard to avoid inconsistencies. If a client talks to a distributor that is no longer working correctly, we rather want its requests to fail than to alter the content cluster in strange ways. | | Delete bucket | Drop stored state for a bucket and reject further requests for it | | (De)activate bucket | Activate bucket for search results - refer to [bucket management](../proton.html#bucket-management) | | Garbage collections | If configured, documents are periodically garbage collected through background maintenance operations. | ###### Bucket split size The distributors may split existing buckets further to keep bucket sizes at manageable levels, or to ensure more units to split among the backends and their partitions. Using small buckets, the distribution will be more uniform and bucket operations will be smaller. Using large buckets, less memory is needed for metadata operations and bucket splitting and joining is less frequent. The size limits may be altered by configuring [bucket splitting](../reference/services-content.html#bucket-splitting). ##### Document to bucket distribution Each document has a document identifier following a document identifier[uri scheme](../documents.html). From this scheme a 58 bit numeric _location_ is generated. Typically, all the bits are created from an MD5 checksum of the whole identifier. Schemes specifying a _groupname_, will have the LSB bits of the location set to a hash of the _groupname_. Thus, all documents belonging to that group will have locations with similar least significant bits, which will put them in the same bucket. If buckets end up split far enough to use more bits than the hash bits overridden by the group, the data will be split into many buckets, but each will typically only contain data for that group. MD5 checksums maps document identifiers to random locations. This creates a uniform bucket distribution, and is default. For some use cases, it is better to co-locate documents, optimizing grouped access - an example is personal documents. By enforcing some documents to map to similar locations, these documents are likely to end up in the same actual buckets. There are several use cases for where this may be useful: - When migrating documents for some entity between clusters, this may be implemented more efficient if the entity is contained in just a few buckets rather than having documents scattered around all the existing buckets. - If operations to the cluster is clustered somehow, clustering the documents equally in the backend may make better use of caches. For instance, if a service stores data for users, and traffic is typically created for users at short intervals while the users are actively using the service, clustering user data may allow a lot of the user traffic to be easily cached by generic bucket caches. If the `n=` option is specified, the 32 LSB bits of the given number overrides the 32 LSB bits of the location. If the `g=` option is specified, a hash is created of the group name, the hash value is then used as if it were specified with `n=`. When the location is calculated, it is mapped to a bucket. Clients map locations to buckets using[distribution bits](#location-to-bucket-distribution). Distributors map locations to buckets by searching their bucket database, which is sorted in inverse location order. The common case is that there is one. If there are several, there is currently inconsistent bucket splitting. If there are none, the distributor will create a new bucket for the request if it is a request that may create new data. Typically, new buckets are generated split according to the distribution bit count. Content nodes should rarely need to map documents to buckets, as distributors specify bucket targets for all requests. However, as external operations are not queued during bucket splits and joins, the content nodes remap operations to avoid having to fail them due to a bucket having recently been split or joined. ###### Limitations One basic limitation to the document to location mapping is that it may never change. If it changes, then documents will suddenly be in the wrong buckets in the cluster. This would violate a core invariant in the system, and is not supported. To allow new functionality, document identifier schemes may be extended or created that maps to location in new ways, but the already existing ones must map the same way as they have always done. Current document identifier schemes typically allow the 32 least significant bits to be overridden for co-localization, while the remaining 26 bits are reserved for bits created from the MD5 checksum. ###### Splitting When there are enough documents co-localized to the same bucket, causing the bucket to be split, it will typically need to split past the 32 LSB. At this split-level and beyond, there is no longer a 1-1 relationship between the node owning the bucket and the nodes its replica data will be stored on. The effect of this is that documents sharing a location will be spread across nodes in the entire cluster once they reach a certain size. This enables efficient parallel processing. ##### Bucket space Buckets exist in the _default_ or _global_ bucket space. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Distribution](#distribution) - [Document to location distribution](#documents-to-location-distribution) - [Location to bucket distribution](#location-to-bucket-distribution) - [Bucket to distributor distribution](#bucket-to-distributor-distribution) - [Bucket to content node distribution](#bucket-to-content-node-distribution) - [Maintenance operations](#maintenance-operations) - [Bucket split size](#bucket-split-size) - [Document to bucket distribution](#document-to-bucket-distribution) - [Limitations](#limitations) - [Splitting](#splitting) - [Bucket space](#bucket-space) --- ## Build Install Vespa ### Build / install Vespa To develop with Vespa, follow the [guide](https://github.com/vespa-engine/vespa#building) to set up a development environment on AlmaLinux 8 using Docker. #### Build / install Vespa To develop with Vespa, follow the [guide](https://github.com/vespa-engine/vespa#building) to set up a development environment on AlmaLinux 8 using Docker. Build Vespa Java artifacts with Java \>= 17 and Maven \>= 3.6.3. Once built, Vespa Java artifacts are ready to be used and one can build a Vespa application using the [bundle plugin](components/bundles.html#maven-bundle-plugin). ``` $ export MAVEN_OPTS="-Xms128m -Xmx1024m" $ ./bootstrap.sh java && mvn install ``` See [vespa.ai releases](https://vespa.ai/releases). ##### Container images | Image | Description | | --- | --- | | [docker.io/vespaengine/vespa](https://hub.docker.com/r/vespaengine/vespa) [ghcr.io/vespa-engine/vespa](https://github.com/orgs/vespa-engine/packages/container/package/vespa) | Container image for running Vespa. | | [docker.io/vespaengine/vespa-build-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-build-almalinux-8) | Container image for building Vespa on AlmaLinux 8. | | [docker.io/vespaengine/vespa-dev-almalinux-8](https://hub.docker.com/r/vespaengine/vespa-dev-almalinux-8) | Container image for development of Vespa on AlmaLinux 8. Used for incremental building and system testing. | ##### RPMs Dependency graph: ![RPM overview](/assets/img/rpms.svg) Installing Vespa on AlmaLinux 8: ``` $ dnf config-manager \ --add-repo https://raw.githubusercontent.com/vespa-engine/vespa/master/dist/vespa-engine.repo $ dnf config-manager --enable powertools $ dnf install -y epel-release $ dnf install -y vespa ``` Package repository hosting is graciously provided by [Cloudsmith](https://cloudsmith.com) which is a fully hosted, cloud-native and universal package management solution:[![OSS hosting by Cloudsmith](https://img.shields.io/badge/OSS%20hosting%20by-cloudsmith-blue?logo=cloudsmith&style=flat-square)](https://cloudsmith.com) **Important:** Please note that the retention of released RPMs in the repository is limited to the latest 50 releases. Use the Docker images (above) for installations of specific versions older than this. Any problems with released rpm packages will be fixed in subsequent releases, please [report any issues](https://vespa.ai/support) - troubleshoot using the [install example](/en/operations-selfhosted/multinode-systems.html#aws-ec2-singlenode). Refer to [vespa.spec](https://github.com/vespa-engine/vespa/blob/master/dist/vespa.spec). Build RPMs for a given Vespa version X.Y.Z: ``` $ git clone https://github.com/vespa-engine/vespa $ cd vespa $ git checkout vX.Y.Z $ docker run --rm -ti -v $(pwd):/wd:Z -w /wd \ docker.io/vespaengine/vespa-build-almalinux-8:latest \ make -f .copr/Makefile rpms outdir=/wd $ ls *.rpm | grep -v debug vespa-8.599.6-1.el8.src.rpm vespa-8.599.6-1.el8.x86_64.rpm vespa-ann-benchmark-8.599.6-1.el8.x86_64.rpm vespa-base-8.599.6-1.el8.x86_64.rpm vespa-base-libs-8.599.6-1.el8.x86_64.rpm vespa-clients-8.599.6-1.el8.x86_64.rpm vespa-config-model-fat-8.599.6-1.el8.x86_64.rpm vespa-jars-8.599.6-1.el8.x86_64.rpm vespa-libs-8.599.6.el8.x86_64.rpm vespa-malloc-8.599.6-1.el8.x86_64.rpm vespa-node-admin-8.599.6-1.el8.x86_64.rpm vespa-tools-8.599.6-1.el8.x86_64.rpm ``` Find most utilities in the vespa-x.y.z\*.rpm - other RPMs: | RPM | Description | | --- | --- | | vespa-tools | Tools accessing Vespa endpoints for query or document operations: - [vespa-destination](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-destination) - [vespa-fbench](/en/operations/tools.html#vespa-fbench) - [vespa-feeder](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-feeder) - [vespa-get](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get) - [vespa-query-profile-dump-tool](/en/operations/tools.html#vespa-query-profile-dump-tool) - [vespa-stat](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stat) - [vespa-summary-benchmark](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-summary-benchmark) - [vespa-visit](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-visit) - [vespa-visit-target](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-visit-target) | | vespa-malloc | Vespa has its own memory allocator, _vespa-malloc_ - refer to _/opt/vespa/etc/vespamalloc.conf_ | | vespa-clients | _vespa-feed-client.jar_ - see [vespa-feed-client](vespa-feed-client.html) | Copyright © 2025 - [Cookie Preferences](#) --- ## Bundles ### Bundles The Container uses [OSGi](https://osgi.org) to provide a modular platform for developing applications that can be composed of many reusable components. #### Bundles The Container uses [OSGi](https://osgi.org) to provide a modular platform for developing applications that can be composed of many reusable components. The user can deploy, upgrade and remove these components at runtime. ##### OSGi OSGi is a framework for modular development of Java applications, where a set of resources called _bundles_ can be installed. OSGi allows the developer to control which resources (Java packages) in a bundle that should be available to other bundles. Hence, you can explicitly declare a bundle's public API, and also ensure that internal implementation details remain hidden. Unless you're already familiar with OSGi, we recommend reading Richard S. Hall's presentation [Learning to ignore OSGi](https://cwiki.apache.org/confluence/download/attachments/7956/Learning_to_ignore_OSGi.pdf), which explains the most important aspects that you must relate to as a bundle developer. There are other good OSGi tutorials available: - [OSGi for Dummies](https://thiloshon.wordpress.com/2020/03/04/osgi-for-dummies/) - [OSGi Modularity and Services - Tutorial](https://www.vogella.com/tutorials/OSGi/article.html) (You can ignore the part about OSGi services.) JDisc uses OSGi's _module_ and _lifecycle_ layers, and does not provide any functionality from the _service_ layer. ##### OSGi bundles An OSGi bundle is a regular JAR file with a MANIFEST.MF file that describes its content, what the bundle requires (imports) from other bundles, and what it provides (exports) to other bundles. Below is an example of a typical bundle manifest with the most important headers: ``` Bundle-SymbolicName: com.yahoo.helloworld Bundle-Description: A Hello World bundle Bundle-Version: 1.0.0 Export-Package: com.yahoo.helloworld;version="1.0.0" Import-Package: org.osgi.framework;version="1.3.0" ``` The meaning of the headers in this bundle manifest is as follows: - `Bundle-SymbolicName` - The unique identifier of the bundle. - `Bundle-Description` - A human-readable description of the bundle's functionality. - `Bundle-Version` - Designates a version number to the bundle. - `Export-Package` - Expresses which Java packages contained in a bundle will be made available to the outside world. - `Import-Package` - Indicates which Java packages will be required from the outside world to fulfill the dependencies needed in a bundle. Note that OSGi has a strict definition of version numbers that need to be followed for bundles to work correctly. See the [OSGi javadoc](https://docs.osgi.org/javadoc/r4v42/org/osgi/framework/Version.html#Version(java.lang.String)) for details. As a general advice, never use more than three numbers in the version (major, minor, micro). ##### Building an OSGi bundle As long as the project was created by following steps in the [Developer Guide](../developer-guide.html), the code is already being packaged into an OSGi bundle by the [Maven bundle plugin](#maven-bundle-plugin). However, if migrating an existing Maven project, change the packaging statement to: ``` ``` container-plugin ``` ``` and add the plugin to the build instructions: ``` ``` com.yahoo.vespa bundle-plugin 8.599.6 true true ``` ``` Because OSGi introduces a different runtime environment from what Maven provides when running unit tests, one will not observe any loading and linking errors until trying to deploy the application onto a running Container. Errors triggered at this stage will be the likes of `ClassNotFoundException` and `NoClassDefFoundError`. To debug these types of errors, inspect the stack traces in the [error log](../reference/logs.html), and refer to [troubleshooting](#troubleshooting). [vespa-logfmt](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-logfmt) with its _--nldequote_ option is useful when reading logs. The test suite needs to cover deployment of the application bundle to ensure that its dynamic loading and linking issues are covered. ##### Depending on non-OSGi ready libraries Unfortunately, many popular Java libraries have yet to be bundled with the appropriate manifest that makes them OSGi-compatible. The simplest solution to this is to set the scope of the problematic dependency to **compile** in your pom.xml file. This will cause the bundle plugin to package the whole library into your bundle's JAR file. Until the offending library becomes available as an OSGi bundle, it means that your bundle will be bigger (in number of bytes), and that classes of that library can not be shared across application bundles. The practical implication of this feature is that the bundle plugin copies the compile-scoped dependency, and its transitive dependencies, into the final JAR file, and adds a `Bundle-ClassPath` instruction to its manifest that references those dependencies. Although this approach works for most non-OSGi libraries, it only works for libraries where the jar file is _self-contained_. If, on the other hand, the library depends on other installed files, it must be treated as if it was a [JNI library](#depending-on-JNI-libraries). ##### Depending on JNI Libraries This section details alternatives for using native code in the container. ###### OSGi bundles containing native code OSGi jars may contain .so files, which can be loaded in the standard way from Java code in the bundle. Note that since only one instance of an .so can be loaded at any time, it is not possible to hot swap a jar containing .so files - when such jars are changed the [new configuration will not take effect until the container is restarted](../jdisc/container-components.html#JNI-requires-restart). Therefore, it is often a good idea to package a .so file and its Java API into a separate bundle from the rest of your code to avoid having to restart the container on all code changes. ###### Add JNI code to the global classpath When the JNI dependency cannot be packaged in a bundle, and you run on an environment where you can install files locally on the container nodes, you can add the dependency to the container's classpath and explicitly export the packages to make them visible to OSGi bundles. Add the following configuration in the top level _services_ element in [services.xml](../reference/services-container.html): ``` ``` /lib/jars/foo.jar:/path/bar.jar com.foo,com.bar ... ``` ``` Adding the config at the top level ensures that it's applied to all jdisc clusters. The packages are now available and visible, but they must still be imported by the application bundle that uses the library. Here is how to configure the bundle plugin to enforce an import of the packages to the bundle: ``` com.yahoo.vespa bundle-plugin true\\com.foo,com.bar\\ ``` When adding a library to the classpath it becomes globally visible, and exempt from the package visibility management of OSGi. If another bundle contains the same library, there will be class loading issues. ##### Maven bundle plugin The _bundle-plugin_ is used to build and package components for the [Vespa Container](../jdisc/container-components.html) with Maven. Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a practical example. The minimal Maven _pom.xml_ configuration is: ``` 4.0.0 com.yahoo.example basic-application container-plugin\ 8.599.6 \ ``` ``` **Note:** If the requested document-summary only contains fields that are[attributes](../attributes.html), the summary store (and cache) is not used. ##### Protocol phases caches _ranking.queryCache_ and _groupingSessionCache_described in the [Query API reference](../reference/query-api-reference.html)are only caching data in between phases for a given a query, so other queries do not get any benefits, but these caches saves container - content node(s) round-trips for a _given_ query. Copyright © 2025 - [Cookie Preferences](#) --- ## Chained Components ### Chained Components [Processors](../jdisc/processing.html), [searcher plug-ins](../searcher-development.html) and [document processors](../document-processing.html) are chained components. #### Chained Components [Processors](../jdisc/processing.html), [searcher plug-ins](../searcher-development.html) and [document processors](../document-processing.html) are chained components. They are executed serially, with each providing some service or transform, and other optionally depending on these. In other words, a chain is a set of components with dependencies. Javadoc: [com.yahoo.component.chain.Chain](https://javadoc.io/doc/com.yahoo.vespa/chain/latest/com/yahoo/component/chain/Chain.html) It is useful to read the [federation guide](../federation.html) before this document. A chained component has three basic differences from a component in general: - The named services it _provides_ to other components in the chain. - The list of services or checkpoints which the component itself should be _before_ in a chain, in other words, its dependents. - The list of services or checkpoints which the component itself should be _after_ in a chain, in other words, its dependencies. What a component should be placed before, what it should be placed after and what itself provides, may be either defined using Java annotations directly on the component class, or it may be added specifically to the component declarations in [services.xml](../reference/services-container.html). In general, the implementation should have as many of the necessary annotations as practical, leaving the application specific configuration clean and simple to work with. ##### Ordering Components The execution order of the components in a chain is not defined by the order of the components in the configuration. Instead, the order is defined by adding the _ordering constraints_ to the components: - Any component may declare that it `@Provides` some named functionality (the names are just labels that have no meaning to the container). - Any component may declare that it must be placed `@Before` some named functionality, - or that it must be placed `@After` some functionality. The container will pick any ordering of a chain consistent with the constraints of the components in the chain. Dependencies can be added in two ways. Dependencies which are due to the code should be added as annotations in the code: ``` import com.yahoo.processing.*; import com.yahoo.component.chain.dependencies.*;@Provides("SourceSelection") @Before("Federation") @After("IntentModel")public class SimpleProcessor extends Processor { @Override public Response process(Request request, Execution execution) { //TODO: Implement this } } ``` Multiple functionality names may be specified by using the syntax `@Provides/Before/After({"A", "B"})`. Annotations which do not belong in the code may be added in the[configuration](../reference/services-container.html): ``` \ai.vespa.examples.Processor1\ ``` For convenience, components always `Provides` their own fully qualified class name (the package and simple class name concatenated, e.g.`ai.vespa.examples.SimpleProcessor`) and their simple name (that is, only the class name, like`SimpleProcessor` in our searcher case), so it is always possible to declare that one must execute before or after some particular component. This goes for both general processors, searchers and document processors. Finally, note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”. ##### Chain Inheritance As implied by examples above, chains may inherit other chains in _services.xml_. ``` ``` ``` ``` A chain will include all components from the chains named in the optional `inherits` attribute, exclude from that set all components named in the also optional`excludes` attribute and add all the components listed inside the defining tag. Both `inherits` and`excludes` are space delimited lists of reference names. For search chains, there are two built-in search chains which are especially useful to inherit from, `native` and `vespa`.`native` is a basic search chain, containing the basic functionality most systems will need anyway,`vespa` inherits from `native` and adds a few extra searchers which most installations containing Vespa backends will need. ``` ``` ``` ``` ##### Unit Tests A component should be unit tested in a chain containing the components it depends on. It is not necessary to run the dependency handling framework to achieve that, as the `com.yahoo.component.chain.Chain` class has several constructors which are easy to use while testing. ``` Chain c = new Chain(new UselessSearcher("first"), new UselessSearcher("second"), new UselessSearcher("third")); Execution e = new Execution(c, Execution.Context.createContextStub(null)); Result r = e.search(new Query()); ``` The above is a rather useless test, but it illustrates how the basic workflow can be simulated. The constructor will create a chain with supplied searchers in the given order (it will not analyze any annotations). ##### Passing Information Between Components When different searchers or document processors depend on shared classes or field names, it is good practice defining the name only in a single place. An [example](../searcher-development.html#passing-information-between-searchers) in the searcher development introduction illustrates an easy way to do that. ##### Invoking a Specific Search Chain The search chain to use can be selected in the request, by adding the request parameter:`searchChain=myChain` If no chain is selected in the query, the chain called`default` will be used. If no chain called`default` has been configured, the chain called`native` will be used. The _native_ chain is always present and contains a basic set of searchers needed in most applications. Custom chains will usually inherit the native chain to include those searchers. The search chain can also be set in a [query profile](../query-profiles.html). ##### Example: Configuration Annotations which do not belong in the code may be added in the configuration, here a simple example with[search chains](../reference/services-search.html#chain): ``` \Cache\\Statistics\\Logging\\SimpleTest\ ``` And for [document processor chains](../reference/services-docproc.html), it becomes: ``` \TextMetrics\ ``` For searcher plugins the class[com.yahoo.search.searchchain.PhaseNames](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/searchchain/PhaseNames.html)defines a set of checkpoints third party searchers may use to help order themselves when extending the Vespa search chains. Note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”. ##### Example: Cache with async write Use case: In a search chain, do early return and do further search asynchronously using [ExecutorService](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ExecutorService.html). Pseudocode: If cache hit (e.g. using Redis), just return cached data. If cache miss, return null data and let the following searcher finish further query and write back to cache: ``` ``` public Result search(Query query, Execution execution) { // cache lookup if (cache_hit) { return result; } else { execution.search(query); // invoke async cache update searcher next in chain return result; } } ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Ordering Components](#ordering-components) - [Chain Inheritance](#chain-inheritance) - [Unit Tests](#unit-tests) - [Passing Information Between Components](#passing-information-between-components) - [Invoking a Specific Search Chain](#invoking-a-specific-search-chain) - [Example: Configuration](#example-configuration) - [Example: Cache with async write](#example-cache-with-async-write) --- ## Chunking Reference ### Chunking Reference Reference configuration for _chunkers_: Components that splits text into pieces in[chunk indexing expressions](indexing-language-reference.html#chunk), as in #### Chunking Reference Reference configuration for _chunkers_: Components that splits text into pieces in[chunk indexing expressions](indexing-language-reference.html#chunk), as in ``` indexing: input myTextField | chunk fixed-length 500 | index ``` See also the [guide to working with chunks](../working-with-chunks.html). ##### Built-in chunkers Vespa provides these built-in chunkers: | Chunker id | Arguments | Description | | --- | --- | --- | | sentence | - | Splits the text into chunks at sentence boundaries. | | fixed-length | target chunk length in characters | Splits the text into chunks with roughly equal length. This will prefer to make chunks of similar length, and to split at reasonable locations over matching the target length exactly. | ##### Chunker components Chunkers are [components](../jdisc/container-components.html), so you can also add your own: ``` ``` foo ``` ``` You create a chunker component by implementing the[com.yahoo.language.process.Chunker](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Chunker.java)interface, see [these examples](https://github.com/vespa-engine/vespa/tree/master/linguistics/src/main/java/ai/vespa/language/chunker). Copyright © 2025 - [Cookie Preferences](#) --- ## Cloning Applications And Data ### Cloning applications and data This is a guide on how to replicate a Vespa application in different environments, with or without data. #### Cloning applications and data This is a guide on how to replicate a Vespa application in different environments, with or without data. Use cases for cloning include: - Get a copy of the application and (some) data on a laptop to work offline, or attach a debugger. - Deploy local experiments to the `dev` environment to easily cooperate and share. - Set up a copy of the application and (some) data to test a new major version of Vespa. - Replicate a bug report in a non-production environment. - Set up a copy of the application and (some) data in a `prod` environment to experiment with a CI/CD pipeline, without touching the current production serving. - Onboard a new team member by setting up a copy of the application and test data in a `dev` environment. - Clone to a `dev` environment for load testing. This guide uses _applications_. One can also use _instances_, but that will not work across Vespa major versions on Vespa Cloud - refer to [tenant, applications, instances](tenant-apps-instances) for details. Vespa Cloud has different environments `dev` and `prod`, with different characteristics -[details](https://cloud.vespa.ai/en/reference/environments). Clone to `dev` for short-lived experiments/development/benchmarking, use `prod` for serving applications with a [CI/CD pipeline](automated-deployments.html). As some steps are similar, it is a good idea to read through all, as details are added only the first time for brevity. Examples are based on the[album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation) sample application. **Note:** When done, it is easy to tear down resources in Vespa Cloud. E.g., _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deploy_ or_https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_ to find a delete-link. Instances in `dev` environments are auto-expired ([details](https://cloud.vespa.ai/en/reference/environments)), so application cloning is a safe way to work with Vespa. Find more information in [deleting applications](deleting-applications). ##### Cloning - self-hosted to Vespa Cloud **Source setup:** ``` $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa deploy -t http://localhost:19071 ``` **Target setup:** [Create a tenant](getting-started) in the Vespa Cloud console, in this guide using "mytenant". **Export source application package:** This gets the application package and copies it out of the container to local file system: ``` $ vespa fetch -t http://localhost:19071 && \ unzip application.zip -x application.zip ``` **Deploy target application package** The procedure differs a little whether deploying to dev or prod [environment](https://cloud.vespa.ai/en/reference/environments). The `mvn -U clean package` step is only needed for applications with custom code. Configure application name and create data plane credentials: ``` $ vespa config set target cloud && \ vespa config set application mytenant.myapp $ vespa auth login $ vespa auth cert -f $ mvn -U clean package ``` **Note:** When deploying to a new app, one will often want to generate a new data plane cert/key pair. To do this, use `vespa auth cert -f`. If reusing a cert/key pair, drop `-f` and make sure to put the pair in _.vespa_, to avoid errors like`Error: open /Users/me/.vespa/mytenant.myapp.default/data-plane-public-cert.pem: no such file or directory`in the subsequent deploy step. Then deploy the application. Depending on the use case, deploy to `dev` or `prod`: - `dev`: ``` $ vespa deploy ``` Expect something like: ``` Uploading application package ... done Success: Triggered deployment of . with run ID 1 Use vespa status for deployment status, or follow this deployment at https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default/job/dev-aws-us-east-1c/run/1 ``` - Deployments to the `prod` environment requires [deployment.xml](https://cloud.vespa.ai/en/reference/deployment) - select which [zone](https://cloud.vespa.ai/en/reference/zones) to deploy to: ``` $ cat < deployment.xml aws-us-east-1c EOF ``` `prod` deployments also require `resources` specifications in [services.xml](https://cloud.vespa.ai/en/reference/services) - use [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/services.xml) as an example and add/replace `nodes` elements for `container` and `content` clusters. If in doubt, just add a small config to start with, and change later: ``` ``` Deploy the application package: ``` $ vespa prod deploy ``` Expect something like: ``` Hint: See[production deployment](production-deployment)Success: Deployed . See https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deployment for deployment progress ``` A proper deployment to a `prod` zone should have automated tests, read more in [automated deployments](automated-deployments) **Data copy** Export documents from the local instance and feed to the Vespa Cloud instance: ``` $ vespa visit -t http://localhost:8080 | vespa feed - ``` Add more parameters as needed to `vespa feed` for other endpoints. **Get access log from source:** ``` $ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default ``` ##### Cloning - Vespa Cloud to self-hosted **Download application from Vespa Cloud** Validate the endpoint, and fetch the application package: ``` $ vespa config get application application = mytenant.myapp.default $ vespa fetch Downloading application package... done Success: Application package written to application.zip ``` The application package can also be downloaded from the Vespa Cloud Console: - dev: Navigate to _https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/default_, click _Application_ to download: - prod: Navigate to _https://console.vespa-cloud.com/tenant/mytenant1/application/myapp/prod/deployment?tab=builds_ and select the version of the application to download: **Target setup:** Note the name of the application package .zip-file just downloaded. If changes are needed, unzip it and use `vespa deploy -t http://localhost:19071 `to deploy from current directory: ``` $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa config set target local $ vespa deploy -t http://localhost:19071 mytenant.myapp.default.dev.aws-us-east-1c.zip ``` **Data copy** Set config target cloud for `vespa visit` and pipe the jsonl output into `vespa feed` to the local instance: ``` $ vespa config set target cloud $ vespa visit | vespa feed - -t http://localhost:8080 ``` **data copy - minimal** For use cases requiring a few documents, visit just a few documents: ``` $ vespa visit --chunk-count 10 ``` **Get access log from source:** Use the Vespa Cloud Console to get access logs ##### Cloning - Vespa Cloud to Vespa Cloud This is a combination of the procedures above. Download the application package from dev or prod, make note of the source name, like mytenant.myapp.default. Then use `vespa deploy` or `vespa prod deploy` as above to deploy to dev or prod. If cloning from `dev` to `prod`, pay attention to changes in _deployment.xml_ and _services.xml_as in [cloning to Vespa Cloud](#cloning---self-hosted-to-vespa-cloud). **Data copy** Set the feed endpoint name / paths, e.g. mytenant.myapp-new.default: ``` $ vespa config set target cloud $ vespa visit | vespa feed - -t https://default.myapp-new.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud ``` **Data copy 5%**Set the –selection argument to `vespa visit` to select a subset of the documents. ##### Cloning - self-hosted to self-hosted Creating a copy from one self-hosted application to another. Self-hosted means running [Vespa](https://vespa.ai/) on a laptop or a [multinode system](/en/operations/multinode-systems.html). This example sets up a source app and deploys the [application package](https://cloud.vespa.ai/en/developer-guide) - use [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation)as an example. The application package is then exported from the source and deployed to a new target app. Steps: **Source setup:** ``` $ vespa config set target local $ docker run --detach --name vespa1 --hostname vespa-container1 \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa $ vespa deploy -t http://localhost:19071 ``` **Target setup:** ``` $ docker run --detach --name vespa2 --hostname vespa-container2 \ --publish 8081:8080 --publish 19072:19071 \ vespaengine/vespa ``` **Export source application package** Export files: ``` $ vespa fetch -t http://localhost:19071 ``` **Deploy application package to target** Before deploying, one can make changes to the application package files as needed. Deploy to target: ``` $ vespa deploy -t http://localhost:19072 application.zip ``` **Data copy from source to target** This pipes the source data directly into `vespa feed` - another option is to save the data to files temporarily and feed these individually: ``` $ vespa visit -t http://localhost:8080 | vespa feed - -t http://localhost:8081 ``` **Data copy 5%** This is an example on how to use a [selection](/en/reference/document-select-language.html)to specify a subset of the documents - here a "random" 5% selection: ``` $ vespa visit -t http://localhost:8080 --selection 'id.hash().abs() % 20 = 0' | \ vespa feed - -t http://localhost:8081 ``` **Get access log from source** Get the current query access log from the source application (there might be more files there): ``` $ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Cloudconfig Model Plugins ### Developing Cloud Config Model Plugins The Cloud Config System (CCS) provides the ability to write custom plugins to the config model. #### Developing Cloud Config Model Plugins The Cloud Config System (CCS) provides the ability to write custom plugins to the config model. This allows cloud config consumers to provide a custom syntax for their users to configure the system. Why create a Plugin? While creating a config model plugin is not strictly necessary for using CCS, it increases the usability of the platform. With a plugin, one can provide custom syntax that allows the users to configure the system at a higher level of abstraction. ##### What problem does it solve Imagine that you are developing a platform that requires the user to configure a cluster of servers running some service that can potentially scale to hundreds of servers. A service may be composed of multiple processes running on the same server, and those processes might be containers running many components inside them. The amount of configuration parameters of those processes and components can be huge. Often, the differences in the configurations might be small, but just enough that you need to duplicate information that you later on need to ensure that does not conflict. Once you have this large amount of configuration parameters, the logical step is to create some sort of script that automatically generates a valid configuration based on a few parameters to reduce the amount of human errors. It is also common to create some form of validation program that can be run to ensure that the configuration options are valid and does not conflict. CCS model plugins allows you to take this even further. In CCS, the configuration is represented as an application package, which contains various files that can represent the entire system configuration. However, the configuration is not directly given to the consumer processes and their components. First, the application package is provided as input to the config server, which builds a java object model of the system, also known as the _config model_. The config model is built from a set of _model plugins_that are able to produce a java object model based on the application package contents. The object model can then serve the appropriate configuration to a component within a service. ![The config server assembles app config](/assets/img/config-assembly.svg) ###### Benefits of writing plugins The model plugins allows you to create abstractions on top of the low level configuration parameters that are given to the services. Here are some specific use cases the model plugins solve: - The application can be validated to ensure that there are no syntax errors in the configuration files - The configuration files can be checked for conflicting configuration values - The plugin can allocate resources without the user needing to specify them (typically server ports) - The plugin may support any form of syntax as long as it can generate the object model, though using XML allows you to take advantage of many existing library features that i.e. make node and port allocation automatic or to allow the service to be automatically bootstrapped - The plugin may provide a high level of abstraction that allows the user to change one configuration value, while in reality more than one of the configuration values served to the services may change ###### Example Vespa uses CCS with multiple config model plugins. In Vespa, you can set up a complete system with e.g. this configuration: ``` ``` 1 ``` ``` The Vespa model plugins ensure that the following is set up based on the above XML: - A config server cluster - A log server cluster receiving logs from all the nodes - A service location broker cluster (vespa-slobrok) which maintains the current observed state of all the nodes in the other clusters - A document processing cluster processing incoming data - A container cluster set up to receive searches - A cluster controller cluster which takes singular decisions about the current state of nodes in the content cluster - A distributor cluster managing content distribution - A search+content cluster storing and searching a partition In addition, it sets up the wiring between these clusters, the matching configuration of content and query processing and more. Going from the simple user-facing config shown above to this complete system specification is the task of the config model(s). Technically the models are just Java objects that are instantiated in response to hitting an element immediately below \ in _services.xml_(so the above will create one _admin_, _container_ and _content_ model), which are embedded into the config server at runtime and answers incoming requests for config instances. During construction of the complete system model (consisting of multiple such config models), the models may create additional implicit config models and exchange information between themselves. They may also read other files of any format from the application package. ##### Building a Model In this section, we will build a plugin for a simple echo service. First, we must have a config to serve. An echo server just needs to know which port it must listen to. To keep the initial example simple, we ignore bootstrap and which host the node should run on for now (see [bootstrapping services](#bootstrapping-services) for an extended example). The following config is created and stored in`src/main/resources/configdefinitions` in the plugin source tree: ``` namespace=echo port int ``` See [Using the Cloud config API](configapi-dev.html)and the [Configuration File Reference](../reference/config-files.html)for more information on config definition files. The service will be configured using: ``` ``` 1337 ``` ``` To deal with your custom syntax, you first need to create a parser. The parser class must extend `ConfigModelBuilder`. The builder is required to specify which XML tags it can handle via the `handlesElements` method, and must hook the model into a subclass of `ConfigModel` object in the `doBuild` method. Its constructor must forward the class of the model in order for the superclass to instantiate the correct class and pass it to doBuild: ``` ``` public class EchoModelBuilder extends ConfigModelBuilder { public EchoModelBuilder() { super(EchoModel.class); } @Override public List handlesElements() { return Arrays.asList(ConfigModelId.fromName("echo")); } @Override public void doBuild(EchoModel configModel, Element spec, ConfigModelContext modelContext) { int port = Integer.parseInt(XML.getValue(XML.getChild(spec, "port"))); configModel.addServer(new EchoServer(configModel.getConfigProducer(), port)); } } ``` ``` The config model represents the model that serves config when requested. The `ConfigModel` object does not itself serve config, but is used to proxy request to the appropriate config producers for that model. For the echo service, the model is simple: ``` ``` public class EchoModel extends ConfigModel { private final List servers = new ArrayList(); public EchoModel(ConfigModelContext modelContext) { super(modelContext); } public void addServer(EchoServer server) { servers.add(server); } public int numServers() { return servers.size(); } } ``` ``` The `EchoServer` is the actual object producing the config, and is a subclass of the `AbstractConfigProducer` class, which automatically hooks it into the parent node in the tree, and causes config requests to be relayed to this producer: ``` public class EchoServer extends AbstractConfigProducer implements EchoConfig.Producer { private final int port; public EchoServer(AbstractConfigProducer parent, int port) { super(parent, "server"); this.port = port; } public void getConfig(EchoConfig.Builder builder) { builder.port(port); } } ``` The `EchoConfig` class is the one generated from the config definition file we specified earlier. Now we have a complete plugin that is able to serve the `echo` config to anyone who asks with the appropriate _config id_. The config id of a config producer is relative to its parent. In this example, the root producers has the id `echo`, while the server has the id `echo/server`. ##### Depending on other models To create a modular system and facilitate code reuse, it is possible to depend on and use other models when building your own model. For instance, if we have a `EchoProxyModel` that handled a `` tag, and which is dependent on the `EchoModel` being built first, we can add it as a constructor argument to signal the dependency and also allow us to access the `EchoModel` when building the `EchoProxyModel`. ``` public class EchoProxyModel extends ConfigModel { public EchoProxyModel(ConfigModelContext context, EchoModel echoModel) { // Store away model for use in builder. } } ``` ##### Unit Testing Creating a unit test for your plugin is easy. Mock classes acting as the config model is available and can be used in unit tests: ``` ``` public class EchoModelTest { @Test public void testEchoModel() { EchoModelBuilder builder = new EchoModelBuilder(); assertThat(builder.handlesElements().size(), is(1)); assertThat(builder.handlesElements().get(0).getName(), is("echo")); String xml = "" + "1337" + "" TestDriver testDriver = new TestDriver().addBuilder(builder); TestRoot root = testDriver.buildModel(xml); EchoConfig config = root.getConfig(EchoConfig.class, "server"); assertThat(config.port(), is(1337)); EchoModel model = testDriver.getConfigModels(EchoModel.class).get(0); assertThat(model.numServers(), is(1)); } } ``` ``` The `ConfigModelTester` is a helper class designed to make unit testing config model builders a breeze. Any builders required to parse the xml is added to the tester using the `addBuilder` method. To build the entire model, with all dependencies resolved, call `buildModel` with either _services.xml_or an `ApplicationPackage` as input. The result is a `TestRoot` object which can be used to inspect the entire model. The config retrieved from the `TestRoot`is the same as you would get from the config server in a production system. The MockRoot class mocks the config model producer tree, and can be used to retrieve the config once the producers have been added to the tree. Before using it, however, the topology must be frozen by calling`freezeModelTopology`. Use [vespa-get-config](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-config)to retrieve in the payload format: ``` $ vespa-get-config -n echo.echo -i echo/server -a path/to/echo.def ``` ##### Bootstrapping Services To ease the bootstrapping of services that must be run, Vespa provides helper classes that help define which hosts a service should run on, and which performs automatic start/stop of those services based on the config. We will now extend the echo service to support specifying clusters of echo servers that are automatically bootstrapped and run on multiple nodes. In CCS, the term _node_ is used about any machine capable of running the services. It can be an alias for physical host, a VM or simply a set of parameters describing the resource requirements, and CCS will acquire a node. In this example, a traditional host alias specification is used. First, the XML syntax must be changed in order to support the new requirements and for the helper classes to recognize them: ``` ``` ``` ``` This syntax says "Run two echoservers on node0 using port 1337 and 1338, respectively. Run one echoserver on node1 using port 1337". Now we need to change the plugin (only shows methods that we have changed): ``` ``` public class EchoModelBuilder extends ConfigModelBuilder { … @Override public void doBuild(EchoModel configModel, Element spec, ConfigModelContext modelContext) { configModel.setCluster(new EchoServerClusterBuilder().build(configModel.getConfigProducer(), spec)); } … } ``` ``` ``` public class EchoModel extends ConfigModel { private EchoServerCluster cluster; public EchoModel(ConfigModelContext modelContext) { super(modelContext); } public void setCluster(EchoServerCluster cluster) { this.cluster = cluster; } } ``` ``` ``` public class EchoServerClusterBuilder extends DomConfigProducerBuilder { @Override protected EchoServerCluster doBuild(AbstractConfigProducer ancestor, Element producerSpec) { final Element repeatElement = XML.getChild(producerSpec, "repeat"); final int repeat = repeatElement != null ? Integer.parseInt(XML.getValue(repeatElement)) : -1; final EchoServerCluster cluster = new EchoServerCluster(ancestor, "echocluster", repeat); for (Element server : XML.getChildren(producerSpec, "server")) { EchoServerBuilder builder = new EchoServerBuilder(cluster.numServers()); cluster.addServer(builder.build(cluster, server)); } return cluster; } } ``` ``` ``` ``` public class EchoServerCluster extends AbstractConfigProducer implements EchoConfig.Producer { private final List echoServers = new ArrayList(); private final int repeat; public EchoServerCluster(AbstractConfigProducer parent, String subId, int repeat) { super(parent, subId); this.repeat = repeat; } public void addServer(EchoServer server) { echoServers.add(server); } public int numServers() { return echoServers.size(); } @Override public void getConfig(EchoConfig.Builder builder) { if (repeat >= 0) builder.repeat(repeat); } } ``` ``` ``` ``` public class EchoServerBuilder extends DomConfigProducerBuilder { private final int serverNumber; public EchoServerBuilder(int serverNumber) { this.serverNumber = serverNumber; } @Override protected EchoServer doBuild(AbstractConfigProducer parent, Element element) { int port = Integer.valueOf(element.getAttribute("port")); return new EchoServer(parent, "server." + serverNumber, port); } } ``` ``` ``` public class EchoServer extends AbstractService implements EchoConfig.Producer { private final int port; public EchoServer(AbstractConfigProducer parent, String name, int port) { super(parent, name); this.port = port; } @Override public String getStartupCommand() { return "/mydir/bin/echoserver"; } public void getConfig(EchoConfig.Builder builder) { builder.port(port); } } ``` By using the helper classes, there is a minimal amount of extra code, but the gain is huge. Using CCS, all the echoservers will automatically start when Vespa is run on their respective nodes, and they will get their appropriate config id passed as an environment variable. See [Using the Cloud config API](configapi-dev.html)for examples of how the echo server can subscribe and use the config. In addition to host management, the `DomConfigProducerBuilder` class supports parsing config overrides as well. ##### Installing model plugins To make use of the model plugins, they must be installed on the configserver host before the config server is started. The default folder for model plugins is `$VESPA_HOME/lib/jars/config-models`. Having installed the model plugin, the configserver needs to be instructed to load the plugin. To load the example `EchoModelBuilder`, add the following to `$VESPA_HOME/conf/configserver-app/config-models/echomodel.xml` ``` ``` ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [What problem does it solve](#what-problem-does-it-solve) - [Benefits of writing plugins](#benefits-of-writing-plugins) - [Example](#example) - [Building a Model](#building-a-model) - [Depending on other models](#depending-on-other-models) - [Unit Testing](#testing) - [Bootstrapping Services](#bootstrapping-services) - [Installing model plugins](#installing) --- ## Security ### Using Cloudflare Workers with Vespa Cloud This guide describes how you can access mutual TLS protected Vespa Cloud endpoints using[Cloudflare Workers](https://workers.cloudflare.com/). #### Using Cloudflare Workers with Vespa Cloud This guide describes how you can access mutual TLS protected Vespa Cloud endpoints using[Cloudflare Workers](https://workers.cloudflare.com/). ##### Writing and reading from Vespa Cloud Endpoints Vespa Cloud's endpoints are protected using mutual TLS. This means the client must present a TLS certificate that the Vespa application trusts. The application knows which certificate to trust because the certificate is included in the Vespa application package. ###### mTLS Configuration Mutual TLS certificates can be created using the[Vespa CLI](/en/vespa-cli.html): For example, for tenant `samples` with application `vsearch` and instance `default`: ``` $ vespa auth cert --application samples.vsearch.default Success: Certificate written to security/clients.pem Success: Certificate written to $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem Success: Private key written to $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem ``` Refer to the [security guide](guide) for details. ###### Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints In March 2023, Cloudflare announced [Mutual TLS available for Workers](https://blog.cloudflare.com/mtls-workers/), see also [Workers Runtime API mTLS](https://developers.cloudflare.com/workers/runtime-apis/mtls/). Install wrangler and create a worker project. Wrangler is the Cloudflare command line interface (CLI), refer to[Workers:Get started guide](https://developers.cloudflare.com/workers/get-started/guide/). Once configured and authenticated, one can upload the Vespa Cloud data plane certificates to Cloudflare. Upload Vespa Cloud mTLS certificates to Cloudflare: ``` $ npx wrangler mtls-certificate upload \ --cert $HOME/.vespa/samples.vsearch.default/data-plane-public-cert.pem \ --key $HOME/.vespa/samples.vsearch.default/data-plane-private-key.pem \ --name vector-search-dev ``` The output will look something like this: ``` Uploading mTLS Certificate vector-search-dev... Success! Uploaded mTLS Certificate vector-search-dev ID: 63316464-1404-4462-baf7-9e9f81114d81 Issuer: CN=cloud.vespa.example Expires on 3/11/2033 ``` Notice the `ID` in the output; This is the `certificate_id` of the uploaded mTLS certificate. To use the certificate in the worker code, add an `mtls_certificates` variable to the `wrangler.toml` file in the project to bind a name to the certificate id. In this case, bind to `VESPA_CERT`: ``` mtls_certificates = [ { binding = "VESPA_CERT", certificate_id = "63316464-1404-4462-baf7-9e9f81114d81" } ] ``` With the above binding in place, you can access the `VESPA_CERT` in Worker code like this: ``` export default { async fetch(request, env) { return await env.VESPA_CERT.fetch("https://vespa-cloud-endpoint"); } } ``` Notice that `env` is a variable passed by the Cloudflare worker infrastructure. ###### Worker example The following worker example forwards POST and GET HTTP requests to the `/search/` path of the Vespa cloud endpoint. It rejects other paths or other HTTP methods. ``` /** * Simple Vespa proxy that forwards read (POST and GET) requests to the * /search/ endpoint * Learn more at https://developers.cloudflare.com/workers/ */ export default { async fetch(request, env, ctx) { //Change to your endpoint url, obtained from the Vespa Cloud Console. //Use global endpoint if you have global routing with multiple Vespa regions const vespaEndpoint = "https://vsearch.samples.aws-us-east-1c.dev.z.vespa-app.cloud"; async function MethodNotAllowed(request) { return new Response(`Method ${request.method} not allowed.`, { status: 405, headers: { Allow: 'GET,POST', } }); } async function NotAcceptable(request) { return new Response(`Path not Acceptable.`, { status: 406, }); } if (request.method !== 'GET' && request.method !== 'POST') { return MethodNotAllowed(request); } let url = new URL(request.url) const { pathname, search } = url; if (!pathname.startsWith("/search/")) { return NotAcceptable(request); } const destinationURL = `${vespaEndpoint}${pathname}${search}`; let new_request = new Request(destinationURL, request); return await env.VESPA_CERT.fetch(new_request) }, }; ``` To deploy the above to the worldwide global edge network of Cloudflare, use: ``` $ npx wrangler publish ``` To start a local instance, use: ``` $ npx wrangler dev ``` Test using `curl`: ``` $ curl --json '{"yql": "select * from sources * where true"}' http://127.0.0.1:8787/search/ ``` After publishing to Cloudflare production: ``` $ curl --json '{"yql": "select * from sources * where true"}' https://your-worker-name.workers.dev/search/ ``` ##### Data plane access control permissions Vespa Cloud supports having multiple certificates to separate `read` and `write` access. This way, one can upload the read-only certificate to a Cloudflare worker to limit write access. See [Data plane access control permissions](guide#permissions). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Writing and reading from Vespa Cloud Endpoints](#writing-and-reading-from-vespa-cloud-endpoints) - [mTLS Configuration](#mtls-configuration) - [Creating a Cloudflare Worker to interact with mTLS Vespa Cloud endpoints](#creating-a-cloudflare-worker-to-interact-with-mtls-vespa-cloud-endpoints) - [Worker example](#worker-example) - [Data plane access control permissions](#data-plane-access-control-permissions) --- ### Security Guide Vespa Cloud has several security mechanisms it is important for developers to understand. #### Security Guide Vespa Cloud has several security mechanisms it is important for developers to understand. Vespa Cloud has two different interaction paths, _Data Plane_ and_Control Plane_. Communication with the Vespa application goes through the _Data Plane_, while the _Control Plane_ is used to manage Vespa tenants and applications. The _Control Plane_ and the _Data Plane_ has different security mechanisms, described in this guide. ##### Data Plane Data plane is protected using mutual TLS or optionally tokens. ###### Configuring mTLS Certificates can be created using the[Vespa CLI](/en/vespa-cli.html): ``` $ vespa auth cert --application .. ``` ``` $ vespa auth cert --application scoober.albums.default Success: Certificate written to security/clients.pem Success: Certificate written to $HOME/.vespa/scoober.albums.default/data-plane-public-cert.pem Success: Private key written to $HOME/.vespa/scoober.albums.default/data-plane-private-key.pem ``` The certificates can be created regardless of the application existence in Vespa Cloud. One can use this command to generate `security/clients.pem`for an application package: ``` $ cp $HOME/.vespa/scoober.albums.default/data-plane-public-cert.pem security/clients.pem ``` Certificates can also be created using OpenSSL: ``` $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/clients.pem ``` The certificate is placed inside the application package in[security/clients.pem](https://cloud.vespa.ai/en/reference/application-package). Make sure`clients.pem` is placed correctly if the certificate is created with OpenSSL, while the Vespa CLI will handle this automatically. `security/clients.pem` files can contain multiple PEM encoded certificates by concatenating them. This allows you to have multiple clients with separate private keys, making it possible to rotate to a new certificate without any downtime. ###### Permissions To support different permissions for clients, it is possible to limit the permissions of a client. Only `read` or `write` permissions are supported. ###### Request mapping The request actions are mapped from HTTP method. The default mapping rule is: - GET → `read` - PUT, POST, DELETE → `write` For `/search/` this is replaced by: - GET, POST → `read` ###### Example Create 3 different certificates, for three different use cases: - Serving - `read` - Ingest - `write` - Full access - `read, write` ``` $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/serve.pem $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/ingest.pem $ openssl req -x509 -sha256 -days 1825 -newkey rsa:2048 -keyout key.pem -out security/full_access.pem ``` Notes: - Files must be placed in the _security_ folder inside the application package - Certificates must be unique - Certificate chains are currently not supported - Files must be written using PEM encoding Reference the certificate files from services xml using the `clients` element: ``` ... ... ``` ###### Custom request mapping The default mapping can be changed by overriding `requestHandlerSpec()`: ``` /** * Example overriding acl mapping of POST requests to read */ public class CustomAclHandler extends ThreadedHttpRequestHandler { private final static RequestHandlerSpec REQUEST_HANDLER_SPEC = RequestHandlerSpec.builder().withAclMapping( HttpMethodAclMapping.standard() .override(Method.POST, AclMapping.Action.READ) .build()) .build(); @Override public RequestHandlerSpec requestHandlerSpec() { return REQUEST_HANDLER_SPEC; } ``` ###### Configure tokens While mTLS continues to be the recommended option, the application can also be configured to consume token based authentication when mTLS is not available for the client (e.g. in case of edge functions). Note that it is still required to define at least one client for mTLS. **Note:** Token authentication must be explicitly enabled when used in combination with[Private Endpoints](https://cloud.vespa.ai/en/private-endpoints.html). ###### Create tokens using the console Tokens are managed in the console under **Account \> Tokens**. All tokens are identified by a name, and can contain multiple versions to easily support token rotation. To create a new token: 1. Click **Add token** 2. Enter a name for the token, note that this name must also be referenced in the application later. 3. Select an expiration for the token. 4. Click add. Remember to copy the token value and store securely. The value is not stored in Vespa Cloud. To add a new version: 1. Find the existing token, click **Add version** 2. Select expiration and click **Add**. Copy the token value and store securely. To revoke a version: 1. Find the existing token version, click **Revoke** To manually rotate a token: 1. Add a new token version following the above steps 2. Revoke the old version when no clients use the old version ###### Application configuration After creating a token in the console it must be configured for accessing a container cluster, using [clients](https://cloud.vespa.ai/en/reference/services.html#clients) configuration. Below is a simplified example for an application with two container clusters, one for feeding and document access (i.e. read+write), and another for query access (i.e. read) - one token for each: ``` ... ... ... ... ``` Notes: - Some applications use _one_ container cluster, and the settings will then be like the `documentapi` cluster above. - If the application also uses the default `security/clients.pem to configure mTLS, a configuration must be added for this, as above. ###### Security recommendations The cryptographic properties of token authentication vs mTLS are comparable. There are however a few key differences in how they are used: - tokens are sent as a header with every request - since they are part of the request they are also more easily leaked in log outputs or source code (e.g. curl commands). It is therefore recommended to - create tokens with a short expiry (keeping the default of 30 days). - keep tokens in a secret provider, and remember to hide output. - never commit secret tokens into source code repositories! ###### Use endpoints ###### Using mTLS Once the application is configured and deployed with a certificate in the application package, requests can be sent to the application. Again, the Vespa CLI can help to use the correct certificate. ``` $ vespa curl --application .. /ApplicationStatus ``` ``` $ curl --key $HOME/.vespa/scoober.albums.default/data-plane-private-key.pem \ --cert $HOME/.vespa/scoober.albums.default/data-plane-public-key.pem \ $ENDPOINT ``` ###### Using tokens The token endpoint must be used when using tokens. After deployment is complete, the token endpoint will be available in the token endpoint list (marked “Token”). To use the token endpoint, the token should be sent as a bearer authorization header: ``` $ vespa query \ --header="Authorization: Bearer $TOKEN" \ 'yql=select * from music where album contains "head"' ``` ``` curl -H "Authorization: Bearer $TOKEN" $ENDPOINT ``` ###### Using a browser In Vespa guides, curl is used in examples, like: ``` $ curl --cert ./data-plane-public-cert.pem --key ./data-plane-private-key.pem $ENDPOINT ``` To use a browser, install key/cert pair into KeyChain Access (MacOS Sonoma), assuming Certificate Common Name is "cloud.vespa.example" (as in the guides): 1. Install key/cert pair: ``` $ cat data-plane-public-cert.pem data-plane-private-key.pem > pkcs12.pem $ openssl pkcs12 -export -out pkcs12.p12 -in pkcs12.pem ``` 2. New password will be requested, and it will be used in the next steps. 3. In Keychain Access: With login keychain - Click "File" -\> Import Items. - Choose pkcs12.p12 file created before and type the password. - Double-click the imported certificate, open "Trust" and set "When using this certificate" to "Always Trust". - Right-click and "New Certificate Preference...", then add the $ENDPOINT. 4. Open the same URL in Chrome, choose the example.com certificate and allow Chrome to read the private key. ###### Using Postman Many developers prefer interactive tools like[Postman](https://postman.com/). The Vespa blog has an article on[how to use Postman with Vespa](https://blog.vespa.ai/interface-with-vespa-apis-using-postman/). ###### Using Cloudflare Workers See [Using Cloudflare Workers with Vespa Cloud](cloudflare-workers). ##### Control Plane The control plane is used to manage the Vespa applications. There are two different ways for access the Control Plane, using`vespa auth login` to log in as a regular user and using Application Keys.`vespa auth login` is intended for developers deploying manually to dev, while Application Keys are intended for deploying applications to production, typically by a continuous build tool. See more about these two methods below. ###### Managing users Tenant administrators manage user access through the Vespa Console. ![Vespa Console user management](/assets/img/manage-users.png) Users have two different privilege levels - **Admin:** Can administrate the tenants metadata and the users of the tenant. - **Developer:** Can administrate the applications deployed in the tenant. ###### User access to Control Plane Outside using the Vespa Console, communicating with the Control Plane is easiest with the [Vespa CLI](/en/vespa-cli.html). ``` $ vespa auth login Your Device Confirmation code is: ****-**** If you prefer, you can open the URL directly for verification Your Verification URL: https://vespa.auth0.com/activate?user_code=****-**** Press Enter to open the browser to log in or ^C to quit... Waiting for login to complete in browser ... done Successfully logged in. ``` After logging in with the Vespa CLI, the CLI can be used to deploy applications. Users are logged in with the same privilege as the user described in the Vespa Console. ###### Application Key If programmatic access to the Control Plane is needed, for example from a CI/CD system like GitHub Actions, the Application Key can be used - see example [deploy-vector-search.yaml](https://github.com/vespa-cloud/vector-search/blob/main/.github/workflows/deploy-vector-search.yaml). ###### Configuration The Application Key can be generated in the Console from the Deployment Screen. The key is generated in the browser but the private key appears as a download in the browser. The public key can be downloaded separately from Deployment Screen. The private key is never persisted in Vespa Cloud, so it is important that the private key is kept securely. If lost, the private key is unrecoverable. ![Vespa Console application key management](/assets/img/application-key.png) The Application Key can also be generated using the Vespa CLI. ``` $ vespa auth api-key -a .. ``` ``` $ vespa auth api-key -a scoober.albums.default Success: API private key written to $HOME/.vespa/scoober.api-key.pem This is your public key: -----BEGIN PUBLIC KEY----- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE5fQUq12J/IlQQdE8pWC5596S7x9f HpPcyxCX2dXBS4aqKxnfN5HEyTkLCNGCo9HQljgLziqW1VFzshAdm3hHQg== -----END PUBLIC KEY----- Its fingerprint is: 91:1f:de:e3:9f:d3:21:28:1b:1b:05:40:52:72:81:4f To use this key in Vespa Cloud click 'Add custom key' at https://console.vespa-cloud.com/tenant/scoober/keys and paste the entire public key including the BEGIN and END lines. ``` ###### Using the application key The Application Key can be used from the Vespa CLI to run requests again the Control Plane. Action like deploying applications to Vespa Cloud. ``` $ vespa deploy -z dev.aws-us-east-1c ``` ##### Dataplane access Vespa Cloud users on paid plans have access to Vespa Cloud Support. For cases where the Vespa Team needs access to the application's data to provide support, the Vespa support personnel can request access after an explicit approval from the customer in the open support case. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Data Plane](#data-plane) - [Configuring mTLS](#configuring-mtls) - [Permissions](#permissions) - [Configure tokens](#configure-tokens) - [Use endpoints](#use-endpoints) - [Control Plane](#control-plane) - [Managing users](#managing-users) - [User access to Control Plane](#user-access-to-control-plane) - [Application Key](#application-key) - [Dataplane access](#dataplane-access) --- ### Secret Store Vespa Cloud supports secure storage and management of secrets for use in your application. #### Secret Store Vespa Cloud supports secure storage and management of secrets for use in your application. A secret is a text-based value such as an API key, a token or other private configuration value required by your application. By organizing secrets into vaults, setting application-specific access controls, and integrating secrets cleanly into your application code, Vespa Cloud ensures that sensitive data like API keys and tokens are kept safe and are easily updatable. This guide takes you through secret management for your tenant and how to use them in your application. Use the [Retrieval Augmented Generation (RAG) in Vespa](https://github.com/vespa-engine/sample-apps/tree/master/retrieval-augmented-generation#deploying-to-the-vespa-cloud-using-gpu)sample application for a practical example getting started using the Secret Store. This example uses the Secret Store to store an OpenAI API key. ##### Secret management In the Vespa Cloud console, the "Account" section of your tenant contains a "Secret store" tab. This is where you configure all secrets for your tenant. ###### Vaults Secrets are organized into vaults, where each vault can contain a number of secrets. The vault also contains rules for which applications can use the secrets in the vault. You can have any number of vaults. To create a new vault, click the "+ New vault" button. The vault name must match the rule `[.a-zA-Z0-9_-]` meaning only alphanumeric characters and `.`,`_`, and `-` are allowed. Spaces are not allowed. ![Secret store overview](/assets/img/secret-store.png) After creation, you can delete the entire vault by clicking the red trash bin button on the top right. ###### Access control Each vault has an "Access control" section which determines which application has access to the secrets in the vault. For each application, you can set up which environment - [dev](https://cloud.vespa.ai/en/reference/environments#dev)or [prod](https://cloud.vespa.ai/en/reference/environments#prod) (including test and staging) - the application should have access within. Note that the application must have been created before you can set access control to it. Use the steps at [Retrieval Augmented Generation (RAG) in Vespa](https://github.com/vespa-engine/sample-apps/tree/master/retrieval-augmented-generation#deploying-to-the-vespa-cloud-using-gpu)to create an application and grant access. ###### Secrets To add a new secret, click the "+ New secret" button. The same naming rules apply for secrets. You can give any value to the secret. Note that once this is saved the secret will never be visible again. You can update the secret to new values, but never retrieve the actual value. Maximum length for a secret is 64K characters. Each tenant has a limit of 15 secrets. ![Creating new secret](/assets/img/secret-store-secret.png) After the secret has been created, you can update the secret to a new value or delete it. Note that when a secret is updated, applications using it will start using this new value within 60 seconds. Also note that your application will not deploy successfully if the application requests a secret that for some reason is not available, by either not being defined or does not have access to it. ##### Example: Using an OpenAI API key for RAG Set up a RAG search chain that uses an OpenAI API key as secret: ``` apiKey openai ``` Try [Retrieval Augmented Generation (RAG) in Vespa](https://github.com/vespa-engine/sample-apps/tree/master/retrieval-augmented-generation#deploying-to-the-vespa-cloud-using-gpu)for a practical example. ##### Using secrets To use the secret in an application, add `secrets` to `services.xml`: ``` ``` In this example, we refer to a secret named `my-api-key` in the vault`my-vault` with the name `myApiKey` in the application. To access this secret in a custom component, inject the `Secrets` as a constructor parameter in the component, like a Searcher: ``` import ai.vespa.secret.Secret; import ai.vespa.secret.Secrets; ... public class MySearcher extends Searcher { private final Secret apiKeySecret; public MySearcher(Secrets secrets) { apiKeySecret = secrets.get("myApiKey"); } @Override public Result search(Query query, Execution execution) { String apiKey = apiKeySecret.current(); // ... do something with the current value of secret ... return execution.search(query); } } ``` Typically, store the `Secret` in your class, and when you want to use the secret value itself, you call `Secret.current();`. This ensures that you will use the current secret value if it is updated. Note that it can take up to 60 seconds for the current secret value to be updated for your container code. Ensure that you do not store the `current` value itself - then the secret value will not be updated when the configuration is changed. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Secret management](#secret-management) - [Vaults](#vaults) - [Access control](#access-control) - [Secrets](#secrets) - [Example: Using an OpenAI API key for RAG](#example-using-an-openai-api-key-for-rag) - [Using secrets](#using-secrets) --- ### Vespa Cloud Security - [**Security Guide**](/en/cloud/security/guide.html) is a practical guide to using the different security features and getting started with them. #### Vespa Cloud Security - [**Security Guide**](/en/cloud/security/guide.html) is a practical guide to using the different security features and getting started with them. - [**Secret Store**](/en/cloud/security/secret-store.html) is a guide on how to integrate AWS Parameter Stores with Vespa Cloud. - [**Cloudflare Workers**](/en/cloud/security/cloudflare-workers.html)describes how you can access mutal TLS protected Vespa Cloud endpoints using Cloudflare Workers. - [**Whitepaper**](/en/cloud/security/whitepaper.html) is an in-depth description of the security architecture of Vespa Cloud. Copyright © 2025 - [Cookie Preferences](#) --- ### Vespa Cloud Security Whitepaper _Last updated: 2025-06-04_ #### Vespa Cloud Security Whitepaper _Last updated: 2025-06-04_ ##### Table of Contents - [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [Concepts and architecture](#concepts-and-architecture) - [Service deployment](#service-deployment) - [Control plane authentication and authorization](#control-plane-authentication-and-authorization) - [Control plane API access](#control-plane-api-access) - [Roles and privileges](#roles-and-privileges) - [Control plane audit logs](#control-plane-audit-logs) - [Service isolation](#service-isolation) - [Access control and service identity](#access-control-and-service-identity) - [Node isolation](#node-isolation) - [Host isolation](#host-isolation) - [Configuration isolation](#configuration-isolation) - [Network isolation](#network-isolation) - [Communication](#communication) - [Data plane](#data-plane) - [Federation](#federation) - [Data Storage](#data-storage) - [Encryption at Rest](#encryption-at-rest) - [Data classification](#data-classification) - [Asset types](#asset-types) - [Logs](#logs) - [Access management](#access-management) - [Security Measures](#security-measures) - [Security Testing](#security-testing) - [Secure Development](#secure-development) - [Vulnerability Management](#vulnerability-management) - [Incident Response](#incident-response) ##### Introduction This document describes the Vespa Cloud service security features and operational procedures. ##### Concepts and architecture ![Vespa Cloud overall architecture diagram](/assets/img/overall-architecture.png) The Vespa Cloud consists of a _Control Plane_ and a _Date Plane_. Each have their own web service APIs, respectively managing Vespa applications (Control), and interacting with a deployed Vespa application (Data). The Control Plane manages deployment of applications in the zones they specify, and lets tenant administrators manage their tenant information in Vespa Cloud. The Control Plane is shared among all tenants in Vespa Cloud and is globally synchronized. The Data Plane lets the tenants communicate with their deployed Vespa applications. It supports queries, feeding, and any other type of requests the tenant has configured and deployed in their application. The Data Plane is isolated for each tenant, application, and (optionally) service. The Vespa Cloud is divided into _Zones_. A zone is a combination of an_environment_ and a _region_ and have names like _prod.aws-us-east-1c_. Zones are stand-alone and does not have critical dependencies on services outside the zone. Tenants can implement service redundancy by specifying that applications be deployed in multiple zones. A Zone is managed by a _Configuration Server_ cluster. These receive the application packages from the _Control Plane_ on deployment and manages the local deployment process in the zone, including provisioning the node resources required to run the deployed application in the zone. Separately, it is responsible for maintaining those resources - replacing failed nodes, upgrading their OS and similar. Vespa applications run on _Nodes_ - a Linux container executed on a _Host_. The_Host_ is the actual machine running the containers. Each Host has a management process that receives instructions from the Configuration Server about what containers should run on the Host. Once started, the containers ask the Configuration Server cluster what Vespa services of what application they should run. It is the individual Node that contains the customer data such as indexes and document, and which receives the queries and feeding requests from the customer's authenticated and authorized clients. Each Node is always dedicated to a single Vespa application cluster. Hosts are shared by default, but applications may specify that they require dedicated hosts to obtain an additional level of security isolation. ##### Service deployment ###### Control plane authentication and authorization ###### Control plane API access All API operations towards the Vespa Cloud control plane require authorization, and no tenant or application information will be presented for unauthorized access. A user can present a valid OAuth2 token which will be verified by the API. If a OAuth2 token is not available the user can choose to use an API key instead. The intended use for API keys is for service automation (e.g. CI/CD workflows or GitHub actions), but they can also be used by developers. ###### Roles and privileges Members of tenants in Vespa Cloud can be assigned to three different roles that grant different privileges: - **Reader:** Can read tenant and application metadata. This is the minimal privilege which is implicitly granted to all members of a tenant. - **Developer:** Can create applications, deploy to dev and prod zones. These are the privileges needed by members working on applications. - **Administrator:** Can manage members of a tenant and tenant metadata, such as tenant contact information and billing actions. All role memberships are stored in an external identity provider. ###### Control plane audit logs All operations against the control plane are persisted in an audit log capturing _timestamp_, _client_, _principal_ (user), _HTTP method_, _resource_ accessed, and _payload_ (for certain requests). As this data can potentially be sensitive, it is available upon request from Vespa Cloud support. ###### Service isolation ![image](/assets/img/service-isolation.png "Service isolation") Nodes belonging to the same application are allowed to communicate with each other while nodes of different applications are isolated on the network layer and through authorization. Communication between Vespa services is encrypted and authenticated using mutual TLS (mTLS). Identities and certificates are provided by infrastructure components that can validate the configuration. ###### Access control and service identity Each host and node has a unique cryptographic service identity. This identity is required in all inter-service communication, including HTTPS and internal binary RPC protocols. On the host, node, and configuration server level there are authorization rules in place to ensure that only relevant services can communicate with each other and retrieve resources from shared services, like the configuration server. ###### Node isolation The identity of the node is based on the tenant, application, and instance the node is part of. The host and configuration server will together establish the identity of the node. The configuration server tells the host which nodes it should start, and the host requests a cryptographic identity for the nodes from the identity provider. This node identity is used for all internal communication inside the application. Nodes are implemented as Linux containers on the hosts. Each node runs in their own container user namespaces, and each node has a dedicated IP address. ###### Host isolation The lowest physical resource in the service architecture is a host. The configuration server is responsible for provisioning hosts and will keep track of known hosts, and reject any unknown hosts. Hosts only communicate directly with the configuration server and cannot communicate with each other. ###### Configuration isolation Both nodes and hosts will consume application configuration from the configuration server. The configuration server will apply authorization rules based on the host and node identity. Authorization rules are based on least privilege. Hosts will only see which nodes to run, while the nodes are able to access the application configuration. ###### Network isolation All communication between services is protected through mTLS. mTLS authorization is based on the identity mentioned above. In addition, network level isolation is used to prevent any unauthorized network access between services. The network rules are configured in the configuration server and applied by the host. Changes to the topology are reflected within minutes. ##### Communication ###### Data plane All access to application endpoints are secured by mTLS and optionally token authentication. Upon deployment, every application is provided a certificate with SAN DNS names matching the endpoint names. This certificate will be automatically refreshed every 90 days. The application owner must provide a set of trusted Certificate Authorities which will be used by all clients when accessing the endpoints using mTLS. ###### Federation It is possible for an application owner to federate calls to 3rd party services. Either as scheduled jobs, or per request. To support this use case we provide access to a credential storage in the customer's AWS account. ##### Data Storage ###### Encryption at Rest All customer data is encrypted at rest using the cloud provider's native encryption capabilities (AWS KMS or Google Cloud KMS). Encryption is performed with the following properties: - Cipher: A strong, industry-standard cipher such as AES-256 (or the provider's default strong cipher) - Key Management: Customer-managed keys within the respective cloud provider's key management service (AWS KMS or Google Cloud KMS)   Access to the keys is strictly controlled and audited through IAM roles and policies employing least privilege. Key rotation is managed automatically by the cloud provider on a regular basis. ###### Data classification All data handled by Vespa Cloud is classified into two different classes which has different policies associated with them. - **Internal data:** Information intended for internal consumption in Vespa Cloud operations. This includes system level logs from services that do not handle customer data. Internal data is readable by authenticated and authorized members of the Vespa Cloud engineering team. - **Confidential data:** Confidential data is data that is sensitive to Vespa Cloud or Vespa Cloud customers. Access to confidential data is subject to stringent business need-to-know. Access to confidential data is regulated and only granted to Vespa Cloud team members in a peer-approved, time-limited, and audited manner. _All customer data is considered confidential._ ###### Asset types | Asset | Class | Description | | --- | --- | --- | | Control Plane data | Internal | The Control Plane maintains a database to facilitate orchestration of Vespa applications in multiple zones. This contains metadata about tenants and applications in Vespa Cloud. | | Configuration Server data | Confidential | The configuration server database contains the Vespa application model as well as the orchestration database. Since the configuration server is part of establishing node and host identities, the configuration server data is considered confidential. | | Infrastructure logs | Internal | Logs from infrastructure services like the configuration servers, the control plane services, etc. are considered internal. This includes logs from Control Plane, Configuration Servers, and Hosts. | | Application package | Internal | The application.zip file uploaded to Vespa Cloud by the customer is considered internal. The application package contains settings and configuration that Vespa Cloud operations needs insight in to operate the platform. | | Node logs | Confidential | The logs inside the Node may contain data printed by the customer. Because of this the logs are classified as confidential since Vespa Cloud cannot guarantee they are free of confidential data. This includes Data Plane access logs in addition to the node Vespa logs. | | Core dumps / heap dumps | Confidential | Occasionally core dumps and heap dumps are generated for running services. These files may contain customer data and are considered confidential. | | Node data | Confidential | All data on the node itself is considered confidential. This data includes the document data and the indexes of the application. | ###### Logs All logs are stored on the nodes where they are generated, but also archived to a remote object storage. All logs are kept for a maximum of 30 days. Access to logs is based on the classifications described above. All logs are persisted in the same geographic region as the Vespa application that generated them. Archived logs are encrypted at rest with keys automatically rotated at regular intervals. Logs on the node are encrypted at rest with the same mechanism that encrypts indexes and document databases. ###### Access management Access to confidential data is only granted on a case-by-case basis. Access is reviewed, time-limited, and audited. No Vespa Cloud team member is allowed to access any confidential data without review. ##### Security Measures Vespa Cloud employs a multi-layered approach to security, encompassing vulnerability management, secure development practices, and proactive testing. These include: ##### Security Testing Vespa Cloud proactively assesses its security posture through: - A vulnerability disclosure program, detailed at https://vespa.ai/responsible-disclosure/, enabling security researchers to responsibly report potential vulnerabilities. - A yearly hybrid security pentest program, conducted in partnership with Intigriti, to proactively identify and address vulnerabilities. ###### Secure Development Vespa Cloud follows a CI/CD process with mandatory code review for all commits. Static analysis tools are employed to detect issues in source code and third-party dependencies. In addition, the security team conducts regular internal security reviews of code and infrastructure to identify and address potential vulnerabilities throughout the development lifecycle. ###### Vulnerability Management Vespa is released up to 4 times a week, and we strive to keep all applications and dependencies updated to the latest versions. Operating system upgrades are rolled out every 90 days to address OS-level vulnerabilities. In case of a severe security issue, fixes are applied and rolled out as quickly as possible. ###### Incident Response Any unexpected production issue, including security incidents, is handled through our incident management process. Non-security incidents are announced through our console. Security incidents are communicated directly to affected customers. A post-mortem review process is initiated after every incident. In the event of a potential security breach, a forensic investigation is conducted. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Table of Contents](#table-of-contents) - [Introduction](#introduction) - [Concepts and architecture](#concepts-and-architecture) - [Service deployment](#service-deployment) - [Control plane authentication and authorization](#control-plane-authentication-and-authorization) - [Service isolation](#service-isolation) - [Communication](#communication) - [Data plane](#data-plane) - [Federation](#federation) - [Data Storage](#data-storage) - [Encryption at Rest](#encryption-at-rest) - [Data classification](#data-classification) - [Asset types](#asset-types) - [Access management](#access-management) - [Security Measures](#security-measures) - [Security Testing](#security-testing) - [Secure Development](#secure-development) - [Vulnerability Management](#vulnerability-management) - [Incident Response](#incident-response) --- ## Cluster V2 ### /cluster/v2 API reference The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. #### /cluster/v2 API reference The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. To find the URL to access this API, identify the [cluster controller services](../content/content-nodes.html#cluster-controller) in the application. Only the master cluster controller will be able to respond. The master cluster controller is the cluster controller alive that has the lowest index. Thus, one will typically use cluster controller 0, but if contacting it fails, try number 1 and so on. Using [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect): ``` $ vespa-model-inspect service -u container-clustercontroller container-clustercontroller @ hostname.domain.com : admin admin/cluster-controllers/0 http://hostname.domain.com:19050/ (STATE EXTERNAL QUERY HTTP) http://hostname.domain.com:19117/ (EXTERNAL HTTP) tcp/hostname.domain.com:19118 (MESSAGING RPC) tcp/hostname.domain.com:19119 (ADMIN RPC) ``` In this example, there is only one clustercontroller, and the State Rest API is available on the port marked STATE and HTTP, 19050 in this example. This information can also be retrieved through the model config in the config server. Find examples of API usage in [content nodes](../content/content-nodes.html#cluster-v2-API-examples). ##### HTTP requests | HTTP request | cluster/v2 operation | Description | | --- | --- | --- | | GET | List cluster and nodes. Get cluster, node or disk states. | | | List content clusters | ``` /cluster/v2/ ``` | | | Get cluster state and list service types within cluster | ``` /cluster/v2/ ``` | | | List nodes per service type for cluster | ``` /cluster/v2// ``` | | | Get node state | ``` /cluster/v2/// ``` | | PUT | Set node state | | | Set node user state | ``` /cluster/v2/// ``` | ##### Node state Content and distributor nodes have state: | State | Description | | --- | --- | | `Up` | The node is up and available to keep buckets and serve requests. | | `Down` | The node is not available, and can not be used. | | `Stopping` | This node is stopping and is expected to be down soon. This state is typically only exposed to the cluster controller to tell why the node stopped. The cluster controller will expose the node as down or in maintenance mode for the rest of the cluster. This state is thus not seen by the distribution algorithm. | | `Maintenance` | This node is temporarily unavailable. The node is available for bucket placement, so redundancy is lower. Using this mode, new replicas of the documents stored on this node will not be created, allowing the node to be down with less of a performance impact on the rest of the cluster. This mode is typically used to mask a down state during controlled node restarts, or by an administrator that need to do some short maintenance work, like upgrading software or restart the node. | | `Retired` | A retired node is available and serves requests. This state is used to remove nodes while keeping redundancy. Buckets are moved to other nodes (with low priority), until empty. Special considerations apply when using [grouped distribution](../elasticity.html#grouped-distribution) as buckets are not necessarily removed. | Distributor nodes start / transfer buckets quickly and are hence not in `maintenance` or `retired`. Refer to [examples](../content/content-nodes.html#cluster-v2-API-examples) of manipulating states. ##### Types | Type | Spec | Description | | --- | --- | --- | | cluster | _\_ | The name given to a content cluster in a Vespa application. | | description | _.\*_ | Description can contain anything that is valid JSON. However, as the information is presented in various interfaces, some which may present reasons for all the states in a cluster or similar, keeping it short and to the point makes it easier to fit the information neatly into a table and get a better cluster overview. | | group-spec | _\_(\._\_)\* | The hierarchical group assignment of a given content node. This is a dot separated list of identifiers given in the application services.xml configuration. | | node | [0-9]+ | The index or distribution key identifying a given node within the context of a content cluster and a service type. | | service-type | (distributor|storage) | The type of the service to look at state for, within the context of a given content cluster. | | state-disk | (up|down) | One of the valid disk states. | | state-unit | [up](#up) | [stopping](#stopping) | [down](#down) | The cluster controller fetches states from all nodes, called _unit states_. States reported from the nodes are either `up` or `stopping`. If the node can not be reached, a `down` state is assumed. This means, the cluster controller detects failed nodes. The subsequent _generated states_ will have nodes in `down`, and the [ideal state algorithm](../content/idealstate.html) will redistribute [buckets](../content/buckets.html) of documents. | | state-user | [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) | Use tools for [user state management](/en/operations-selfhosted/admin-procedures.html#cluster-state). - Retire a node from a cluster - use `retired` to move buckets to other nodes - Short-lived maintenance work - use `maintenance` to avoid merging buckets to other nodes - Fail a bad node. The cluster controller or an operator can set a node `down` | | state-generated | [up](#up) | [down](#down) | [maintenance](#maintenance) | [retired](#retired) | The cluster controller generates the cluster state from the `unit` and `user` states, over time. The generated state is called the _cluster state_. | ##### Request parameters | Parameter | Type | Description | | --- | --- | --- | | recursive | number | Number of levels, or `true` for all levels. Examples: - Use `recursive=1` for a node request to also see all data - use `recursive=2` to see all the node data within each service type In recursive mode, you will see the same output as found in the spec below. However, where there is a `{ "link" : "" }` element, this element will be replaced by the content of that request, given a recursive value of one less than the request above. | ##### HTTP status codes Non-exhaustive list of status codes: | Code | Description | | --- | --- | | 200 | OK. | | 303 | Cluster controller not master - master known. This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller. As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master. ``` HTTP/1.1 303 See Other Location: http://\/\Content-Type: application/json { "message" : "Cluster controllerindexnot master. Use master at indexindex. } ``` | | 503 | Cluster controller not master - unknown or no master. This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master. ``` HTTP/1.1 503 Service Unavailable Content-Type: application/json { "message" : "No known master cluster controller currently exist." } ``` | ##### Response format Responses are in JSON format, with the following fields: | Field | Description | | --- | --- | | message | An error message — included for failed requests. | | ToDo | Add more fields here. | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [HTTP requests](#http-requests) - [Node state](#node-state) - [Types](#types) - [Request parameters](#request-parameters) - [HTTP status codes](#http-status-codes) - [Response format](#response-format) --- ## Clustercontroller Metrics Reference ### ClusterController Metrics | Name | Unit | Description | #### ClusterController Metrics | Name | Unit | Description | | --- | --- | --- | | cluster-controller.down.count | node | Number of content nodes down | | cluster-controller.initializing.count | node | Number of content nodes initializing | | cluster-controller.maintenance.count | node | Number of content nodes in maintenance | | cluster-controller.retired.count | node | Number of content nodes that are retired | | cluster-controller.stopping.count | node | Number of content nodes currently stopping | | cluster-controller.up.count | node | Number of content nodes up | | cluster-controller.cluster-state-change.count | node | Number of nodes changing state | | cluster-controller.nodes-not-converged | node | Number of nodes not converging to the latest cluster state version | | cluster-controller.stored-document-count | document | Total number of unique documents stored in the cluster | | cluster-controller.stored-document-bytes | byte | Combined byte size of all unique documents stored in the cluster (not including replication) | | cluster-controller.cluster-buckets-out-of-sync-ratio | fraction | Ratio of buckets in the cluster currently in need of syncing | | cluster-controller.busy-tick-time-ms | millisecond | Time busy | | cluster-controller.idle-tick-time-ms | millisecond | Time idle | | cluster-controller.work-ms | millisecond | Time used for actual work | | cluster-controller.is-master | binary | 1 if this cluster controller is currently the master, or 0 if not | | cluster-controller.remote-task-queue.size | operation | Number of remote tasks queued | | cluster-controller.node-event.count | operation | Number of node events | | cluster-controller.resource\_usage.nodes\_above\_limit | node | The number of content nodes above resource limit, blocking feed | | cluster-controller.resource\_usage.max\_memory\_utilization | fraction | Current memory utilisation, for content node with the highest value | | cluster-controller.resource\_usage.max\_disk\_utilization | fraction | Current disk space utilisation, for content node with the highest value | | cluster-controller.resource\_usage.memory\_limit | fraction | Memory space limit as a fraction of available memory | | cluster-controller.resource\_usage.disk\_limit | fraction | Disk space limit as a fraction of available disk space | | reindexing.progress | fraction | Re-indexing progress | Copyright © 2025 - [Cookie Preferences](#) --- ## Component Reference ### Component Reference A component is any Java class whose lifetime is controlled by the container, see the [Developer Guide](../developer-guide.html) for an introduction. #### Component Reference A component is any Java class whose lifetime is controlled by the container, see the [Developer Guide](../developer-guide.html) for an introduction. Components are specified and configured in services.xml and can have other components, and config (represented by generated "Config" classes) [injected](../jdisc/injecting-components.html) at construction time, and in turn be injected into other components. Whenever a component or a resource your component depends on is changed by a redeployment, your component is reconstructed. Once all changed components are reconstructed, new requests are atomically switched to use the new set and the old ones are destructed. If you have multiple constructors in your component, annotate the one to use for injection by `@com.yahoo.component.annotation.Inject`. Identifiable components must implement `com.yahoo.component.Component`, and components that need to destruct resources at removal must subclass `com.yahoo.component.AbstractComponent` and implement `deconstruct()`. See the [example](../operations/metrics.html#example-qa) for common questions about component uniqueness / lifetime. ##### Component Types Vespa defined various component types (superclasses) for common tasks: | Component type | Description | | --- | --- | | Request handler | [Request handlers](../jdisc/developing-request-handlers.html) allow applications to implement arbitrary HTTP APIs. A request handler accepts a request and returns a response. Custom request handlers are subclasses of [ThreadedHttpRequestHandler](https://javadoc.io/doc/com.yahoo.vespa/container-core/latest/com/yahoo/container/jdisc/ThreadedHttpRequestHandler.html). | | Processor | The [processing framework](../jdisc/processing.html) can be used to create general composable synchronous request-response systems. Searchers and search chains are an instantiation (through subclasses) of this general framework for a specific domain. Processors are invoked synchronously and the response is a tree of arbitrary data elements. Custom output formats can be defined by adding [renderers](#renderers). | | Renderer | Renderers convert a Response (or query Result) into a serialized form sent over the network. Renderers are subclasses of [com.yahoo.processing.rendering.Renderer](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/processing/rendering/Renderer.java). | | Searcher | Searchers processes Queries and their Results. Since they are synchronous, they can issue multiple queries serially or in parallel to e.g. implement federation or decorate queries with information fetched from a content cluster. Searchers are composed into _search chains_ defined in services.xml. A query request selects a particular search chain which implements the logic of that query. [Read more](../searcher-development.html). | | Document processor | Document processors processes incoming document operations. Similar to Searchers and Processors they can be composed in chains, but document processors are asynchronous. [Read more](../document-processing.html). | | Binding | A binding matches a request URI to the correct [filter chain](#filter) or [request handler](#request-handlers), and route outgoing requests to the correct [client](#client). For instance, the binding _http://\*/\*_ would match any HTTP request, while _http://\*/processing_ would only match that specific path. If several bindings match, the most specific one is chosen. | Server binding | A server binding is a rule for matching incoming requests to the correct request handler, basically the JDisc building block for implementing RESTful APIs. | | Client binding | A client binding is a pattern which is used to match requests originating inside the container, e.g. when doing federation, to a client provider. That is, it is a rule which determines what code should handle a given outgoing request. | | | Filter | A filter is a lightweight request checker. It may set some specific request property, or it may do security checking and simply block requests missing some mandatory property or header. | | Client | Clients, or client providers, are implementations of clients for different protocols, or special rules for given protocols. When a JDisc application acts as a client, e.g. fetches a web page from another host, it is a client provider that handles the transaction. Bindings are used, as with request handlers and filters, to choose the correct client, matching protocol, server, etc., and then hands off the request to the client provider. There is no problem in using arbitrary other types of clients for external services in processors and request handlers. | ##### Component configurations This illustrates a typical component configuration set up by the Vespa container: ![Vespa container component configuration](/assets/img/container-components.svg) The network layer associates a Request with a _response handler_ and routes it to the correct type of [request handler](#request-handlers) (typically based on URI binding patterns). If an application needs lightweight request-response processing using decomposition by a series of chained logical units, the [processing framework](../jdisc/processing.html) is the correct family of components to use. The request will be routed from ProcessingHandler through one or more chains of [Processor](#processors) instances. The exact format of the output is customizable using a [Renderer](#renderers). If doing queries, SearchHandler will create a Query object, route that to the pertinent chain of [Searcher](#searchers) instances, and associate the returned Result with the correct [Renderer](#renderers) instance for optional customization of the output format. The DocumentProcessingHandler is usually invoked from messagebus, and used for feeding documents into an index or storage. The incoming data is used to build a Document object, and this is then feed through a chain of [DocumentProcessor](#document-processors) instances. If building an application with custom HTTP APIs, for instance arbitrary REST APIs, the easiest way is building a custom [RequestHandler](#request-handlers). This gets the Request, which is basically a set of key-value pairs, and returns a stream of arbitrary data back to the network. ##### Injectable Components These components are available from Vespa for [injection](../jdisc/injecting-components.html) into applications in various contexts: | Component | Description | | --- | --- | | Always available | | --- | | [AthenzIdentityProvider](https://github.com/vespa-engine/vespa/blob/master/container-disc/src/main/java/com/yahoo/container/jdisc/athenz/AthenzIdentityProvider.java) | Provides the application's Athenz-identity and gives access to identity/role certificate and tokens. | | [BertBaseEmbedder](https://github.com/vespa-engine/vespa/blob/master/model-integration/src/main/java/ai/vespa/embedding/BertBaseEmbedder.java) | A BERT-Base compatible embedder, see [BertBase embedder](../embedding.html#bert-embedder). | | [ConfigInstance](https://github.com/vespa-engine/vespa/blob/master/config-lib/src/main/java/com/yahoo/config/ConfigInstance.java) | Configuration is injected into components as `ConfigInstance` components - see [configuring components](../configuring-components.html). | | [Executor](https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executor.html) | Default threadpool for processing requests in threaded request handler | | [Linguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/Linguistics.java) | Inject a Linguistics component like [SimpleLinguistics](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/simple/SimpleLinguistics.java) or provide a custom implementation - see [linguistics](../linguistics.html). | | [Metric](https://github.com/vespa-engine/vespa/blob/master/jdisc_core/src/main/java/com/yahoo/jdisc/Metric.java) | Jdisc core interface for metrics. Required by all subclasses of ThreadedRequestHandler. | | [MetricReceiver](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/metrics/simple/MetricReceiver.java) | Use to emit metrics from a component. Find an example in the [metrics](../operations/metrics.html#metrics-from-custom-components) guide. | | [ModelsEvaluator](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/ModelsEvaluator.java) | Evaluates machine-learned models added to Vespa applications and available as config form. | | [SentencePieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java) | A native Java implementation of SentencePiece, see [SentencePiece embedder](embedding-reference.html#sentencepiece-embedder). | | [VespaCurator](https://github.com/vespa-engine/vespa/blob/master/zkfacade/src/main/java/com/yahoo/vespa/curator/api/VespaCurator.java) | A client for ZooKeeper. For use in container clusters that have ZooKeeper enabled. See [using ZooKeeper](../using-zookeeper.html). | | [VipStatus](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/container/handler/VipStatus.java) | Use this to gain control over the service status (up/down) to be emitted from this container. | | [WordPieceEmbedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java) | An implementation of the WordPiece embedder, usually used with BERT models. Refer to [WordPiece embedder](embedding-reference.html#wordpiece-embedder). | | [SystemInfo](https://github.com/vespa-engine/vespa/blob/master/hosted-zone-api/src/main/java/ai/vespa/cloud/SystemInfo.java) | Vespa Cloud: Provides information about the environment the component is running in. [Read more](/en/jdisc/container-components.html#the-systeminfo-injectable-component). | | Available in containers having `search` | | --- | | [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../document-api-guide.html). | | [ExecutionFactory](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/searchchain/ExecutionFactory.java) | To execute new queries from code. [Read more](../developing-web-services.html#queries). | | [Map\](https://github.com/vespa-engine/vespa/blob/master/model-evaluation/src/main/java/ai/vespa/models/evaluation/Model.java) | Use to inject a set of Models, see [Stateless Model Evaluation](../stateless-model-evaluation.html). | | Available in containers having `document-api` or `document-processing` | | --- | | [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java) | To use the [Document API](../document-api-guide.html). | ##### Component Versioning Components as well as many other artifacts in the container can be versioned. This document explains the format and semantics of these versions and how they are referred. ###### Format Versions are on the form: ``` version ::= major ["." minor [ "." micro [ "." qualifier]]] ``` Where `major`, `minor`, and `micro` are integers and `qualifier` is any string. A version is appended to an id separated by a colon. In cases where a file is created for each component version, the colon is replaced by a dash in the file name. ###### Ordering Versions are ordered first by major, then minor, then micro and then by doing a lexical ordering on the qualifier. This means that `a:1 < a:1.0 < a:1.0.0 < a:1.1 < a:1.1.0 < a:2` ###### Referencing a versioned Component Whenever component is referenced by id (in code or configuration), a fully or partially specified version may be included in the reference by using the form `id:versionSpecification`. Such references are resolved using the following rules: - An id without any version specification resolves to the highest version not having a qualifier. - A partially or full version specification resolves to the highest version not having a qualifier which matches the specification. - Versions with qualifiers are matched only by exact match. Example: Given a component with id `a` having these versions: `[1.1, 1.2, 1.2, 1.3.test, 2.0]` - The reference `a` will resolve to `a:2.0` - The reference `a:1` will resolve to `a:1.2` - The only way to resolve to the "test" qualified version is by using the exact reference `a:1.3.test` - These references will not resolve: `a:1.3`, `a:3`, `1.2.3` ###### Merging specifications for chained Components In some cases, there is a need for merging multiple references into one. An example is inheritance of chains of version references, where multiple inherited chains may reference the same component. Two version references are said to be _compatible_ if one is a prefix of the other. In this case the most specific version is used. If they are not compatible they are _conflicting_. Example: ``` bundle="the name in in your pom.xml" bundle="the name in in your pom.xml" bundle="the name in in your pom.xml" ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Component Types](#component-types) - [Component configurations](#component-configurations) - [Injectable Components](#injectable-components) - [Component Versioning](#component-versioning) - [Format](#format) - [Ordering](#ordering) - [Referencing a versioned Component](#referencing-a-versioned-component) - [Merging specifications for chained Components](#merging-specifications-for-chained-components) --- ## Concrete Documents ### Concrete document types In [document processing](document-processing.html),`setFieldValue()` and `getFieldValue()`is used to access fields in a `Document`. #### Concrete document types In [document processing](document-processing.html),`setFieldValue()` and `getFieldValue()`is used to access fields in a `Document`. The data for each of the fields in the document instance is wrapped in field values. If the documents use structs, they are handled the same way. Example: ``` book.setFieldValue("title", new StringFieldValue("Moby Dick")); ``` Alternatively, use code generation to get a _concrete document type_, a `Document` subclass that represents the exact document type (defined for example in the file `book.sd`). To generate, include it in the build, plugins section in _pom.xml_: ``` com.yahoo.vespa vespa-documentgen-plugin 8.599.6 \etc/schemas\ document-gen document-gen ``` `schemasDirectory` contains the[schemas](reference/schema-reference.html). Generated classes will be in _target/generated-sources_. The document type `book` will be represented as the Java class `Book`, and it will have native methods for data access, so the code example above becomes: ``` book.setTitle("Moby Dick"); ``` | Configuration | Description | | --- | --- | | Java package | Specify the Java package of the generated types by using the following configuration: ``` com.yahoo.mypackage ``` | | User provided annotation types | To provide the Java implementation of a given annotation type, yielding _behaviour of annotations_ (implementing additional interfaces may be one scenario): ``` etc/schemas NodeImpl com.yahoo.vespa.document.NodeImpl DocumentImpl com.yahoo.vespa.document.DocumentImpl ``` Here, the plugin will not generate a type for `NodeImpl` and `DocumentImpl`, but the `ConcreteDocumentFactory` will support them, so that code depending on this will work. | | Abstract annotation types | Make a generated annotation type abstract: ``` myabstractannotationtype ``` | ##### Inheritance If input document types use single inheritance, the generated Java types will inherit accordingly. However, if a document type inherits from more than one type (example: `document myDoc inherits base1, base2`), the Java type for `myDoc` will just inherit from `Document`, since Java has single inheritance. Refer to [schema inheritance](schemas.html#schema-inheritance) for examples. ##### Feeding Concrete types are often used in a docproc, used for feeding data into stateful clusters. To make Vespa use the correct type during feeding and serialization, include in `` in [services.xml](reference/services.html ): ``` in your pom.xml"class="com.yahoo.mypackage.Book"/> ``` Vespa will make the type `Book` and all other concrete document, annotation and struct types from the bundle available to the docproc(s) in the container. The specified bundle must be the `Bundle-SymbolicName`. It will also use the given Java type when feeding through a docproc chain. If the class is not in the specified bundle, the container will emit an error message about not being able to load`ConcreteDocumentFactory` as a component, and not start. There is no need to `Export-Package` the concrete document types from the bundle, a `package-info.java` is generated that does that. ##### Factory and copy constructor Along with the actual types, the Maven plugin will also generate a class `ConcreteDocumentFactory`, which holds information about the actual concrete types present. It can be used to initialize an object given the document type: ``` Book b = (Book) ConcreteDocumentFactory.getDocument("book", new DocumentId("id:book:book::0")); ``` This can be done for example during deserialization, when a document is created. The concrete types also have copy constructors that can take a generic`Document` object of the same type. The contents will be deep-copied: ``` Document bookGeneric; // … Book book = new Book(bookGeneric, bookGeneric.getId()); ``` All the accessor and mutator methods on `Document` will work as expected on concrete types. Note that `getFieldValue()` will _generate_ an ad-hoc `FieldValue` _every time_, since concrete types don't use them to store data.`setFieldValue()` will pack the data into the native Java field of the type. ##### Document processing In a document processor, cast the incoming document base into the concrete document type before accessing it. Example: ``` public class ConcreteDocDocProc extends DocumentProcessor { public Progress process(Processing processing) { DocumentPut put = (DocumentPut) processing.getDocumentOperations().get(0); Book b = (Book) (put.getDocument()); b.setTitle("The Title"); return Progress.DONE; } } ``` Concrete document types are not supported for document updates or removes. Copyright © 2025 - [Cookie Preferences](#) --- ## Config Files ### Custom Configuration File Reference This is the reference for config file definitions. #### Custom Configuration File Reference This is the reference for config file definitions. It is useful for developing applications that has[configurable components](../configuring-components.html)for the [Vespa Container](../jdisc/index.html), where configuration for individual components may be provided by defining[``](#generic-configuration-in-services-xml)elements within the component's scope in services.xml. ##### Config definition files Config definition files are part of the source code of your application and have a _.def_ suffix. Each file defines and documents the content and semantics of one configuration type. Vespa's builtin _.def_ files are found in`$VESPA_HOME/share/vespa/configdefinitions/`. ###### Package Package is a mandatory statement that is used to define the package for the java class generated to represent the file. For [container component](../jdisc/container-components.html) developers, it is recommended to use a separate package for each bundle that needs to export config classes, to avoid conflicts between bundles that contain configurable components. Package must be the first non-comment line, and can only contain lower-case characters and dots: ``` package=com.mydomain.mypackage ``` ###### Parameter names Config definition files contain lines on the form: ``` parameterName type [default=value] [range=[min,max]] ``` camelCase in parameter names is recommended for readability. ###### Parameter types Supported types for variables in the _.def_ file: | int | 32 bit signed integer value | | long | 64 bit signed integer value | | double | 64 bit IEEE float value | | enum | Enumerated types. A set of strings representing the valid values for the parameter, e.g: ``` foo enum {BAR, BAZ, QUUX} default=BAR ``` | | bool | A boolean (true/false) value | | string | A String value. Default values must be enclosed in quotation marks (" "), and any internal quotation marks must be escaped by backslash. Likewise, newlines must be escaped to `\n` | | path | A path to a physical file or directory in the application package. This makes it possible to access files from the application package in container components. The path is relative to the root of the [application package](../applications.html). A path parameter cannot have a default value, but may be optional (using the _optional_ keyword after the type). An optional path does not have to be set, in which case it will be an empty value. The content will be available as a `java.nio.file.Path` instance when the component accessing this config is constructed, or an `Optional` if the _optional_ keyword is used. | | url | Similar to `path`, an arbitrary URL of a file that should be downloaded and made available to container components. The file content will be available as a java.io.File instance when the component accessing this config is constructed. Note that if the file takes a long time to download, it will also take a long time for the container to come up with the configuration referencing it. See also the [note about changing contents for such an url](../configuring-components.html#adding-files-to-the-component-configuration). | | model | A pointer to a machine-learned model. This can be a model-id, url or path, and multiple of these can be specified as a single config value, where one is used depending on the deployment environment: - If a model-id is specified and the application is deployed on Vespa Cloud, the model-id is used. - Otherwise, if a URL is specified, it is used. - Otherwise, path is used. You may also use remote URLs protected by bearer-token authentication by supplying the optional `secret-ref` attribute. See [using private Huggingface models](../reference/embedding-reference#private-model-hub). On the receiving side, this config value is simply represented as a file path regardless of how it is resolved. This makes it easy to refer to models in multiple ways such that the appropriate one is used depending on the context. The special syntax for setting these config values is documented in [adding files to the configuration](../configuring-components.html#adding-files-to-the-component-configuration). | | reference | A config id to another configuration (only for internal vespa usage) | ###### Structs Structs are used to group a number of parameters that naturally belong together. A struct is declared by adding a '.' between the struct name and each member's name: ``` basicStruct.foo string basicStruct.bar int ``` ###### Arrays Arrays are declared by appending square brackets to the parameter name. Arrays can either contain simple values, or have children. Children can be simple parameters and/or structs and/or other arrays. Arbitrarily complex structures can be built to any depth. Examples: ``` intArr[] int # Integer value array row[].column[] int # Array of integer value arrays complexArr[].foo string # Complex array that contains complexArr[].bar double # … two simple parameters complexArr[].coord.x int # … and a struct called 'coord' complexArr[].coord.y int complexArr[].coord.depths[] double # … that contains a double array ``` Note that arrays cannot have default values, even for simple value arrays. An array that has children cannot contain simple values, and vice versa. In the example above, `intArr` and `row.column` could not have children, while `row` and `complexArr` are not allowed to contain values. ###### Maps Maps are declared by appending curly brackets to the parameter name. Arbitrarily complex structures are supported also here. Examples: ``` myMap{} int complexMap{}.nestedMap{}.id int complexMap{}.nestedMap{}.name string ``` ##### Generic configuration in services.xml `services.xml`has four types of elements: | individual service elements | (e.g. _searcher_, _handler_, _searchnode_) - creates a service, but has no child elements that create services | | service group elements | (e.g. _content_, _container_, _document-processing_ - creates a group of services and can have all types of child elements | | dedicated config elements | (e.g. _accesslog_) - configures a service or a group of services and can only have other dedicated config elements as children | | generic config elements | always named _config_ | Generic config elements can be added to most elements that lead to one or more services being created - i.e. service group elements and individual service elements. The config is then applied to all services created by that element and all descendant elements. For example, by adding _config_ for _container_, the config will be applied to all container components in that cluster. Config at a deeper level has priority, so this config can be overridden for individual components by setting the same config values in e.g. _handler_ or _server_ elements. Given the following config definition, let's say its name is `type-examples.def`: ``` package=com.mydomain stringVal string myArray[].name string myArray[].type enum {T1, T2, T3} default=T1 myArray[].intArr[] int myMap{} string basicStruct.foo string basicStruct.bar int default=0 range=[-100,100] boolVal bool myFile path myUrl url myOptionalPath path optional ``` To set all the values for this config in `services.xml`, add the following xml at the desired element (the name should be _\.\_): ``` val elem_0 T2 0 1 elem_1 T3 0 1 val1 val2 str 3 true components/file1.txt https://docs.vespa.ai/en/reference/query-api-reference.html ``` Note that each '.' in the parameter's definition corresponds to a child element in the xml. It is not necessary to set values that already have a default in the _.def_ file, if you want to keep the default value. Hence, in the example above, `basicStruct.bar` and `myArray[].type`could have been omitted in the xml without generating any errors when deploying the application. ###### Configuring arrays Assigning values to _arrays_ is done by using the `` element. This ensures that the given config values do not overwrite any existing array elements from higher-level xml elements in services, or from Vespa itself. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Config definition files](#config-definition-files) - [Package](#package) - [Parameter names](#parameter-names) - [Parameter types](#parameter-types) - [Structs](#structs) - [Arrays](#arrays) - [Maps](#maps) - [Generic configuration in services.xml](#generic-configuration-in-services-xml) - [Configuring arrays](#configuring-arrays) --- ## Config Proxy ### Configuration proxy Read [application packages](/en/application-packages.html) for an overview of the cloud config system. #### Configuration proxy Read [application packages](/en/application-packages.html) for an overview of the cloud config system. The _config proxy_ runs on every Vespa node. It has a set of config sources, defined in [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables). The config proxy will act as a proxy for config clients on the same machine, so that all clients can ask for config on _localhost:19090_. The _config source_ that the config proxy uses is set in [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) and consists of one or more config sources (the addresses of [config servers](/en/operations-selfhosted/configuration-server.html)). The proxy has a memory cache that is used to serve configs if it is possible. In default mode, the proxy will have an outstanding request to the config server that will return when the config has changed (a new generation of config). This means that every time config changes on the config server, the proxy will get a response, update its cache and respond to all its clients with the changed config. The config proxy has two modes: | Mode | Description | | --- | --- | | default | Gets config from server and stores in memory cache. The config proxy will always be started in _default_ mode. Serves from cache if possible. Always uses a config source. If restarted, it will lose all configs that were cached in memory. | | memorycache | Serves config from memory cache only. Never uses a config source. A restart will lose all cached configs. Setting the mode to _memorycache_ will make all applications on the node work as before (given that they have previously been running and requested config), since the config proxy will serve config from cache and work without connection to any config server. Applications on this node will not work if the config proxy stops, is restarted or crashes. | Use [vespa-configproxy-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-configproxy-cmd)to inspect cached configs, mode, config sources etc., there are also some commands to change some of the settings. Run the command as: ``` $ vespa-configproxy-cmd -m ``` to see all possible commands. ##### Detaching from config servers ``` $ vespa-configproxy-cmd -m setmode memorycache ``` ##### Inspecting config To inspect the configuration for a service, in this example a searchnode (proton) instance, do: 1. Find the active config generation used by the service, using [/state/v1/config](/en/reference/state-v1.html#state-v1-config) - example for _http://localhost:19110/state/v1/config_, here the generation is 2: ``` ``` { "config": { "generation": 2, "proton": { "generation": 2 }, "proton.documentdb.music": { "generation": 2 } } } ``` ``` 2. Find the relevant _config definition name_, _config id_ and _config generation_ using [vespa-configproxy-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-configproxy-cmd) - e.g.: ``` $ vespa-configproxy-cmd | grep protonvespa.config.search.core.proton,music/search/cluster.music/0,2,MD5:40087d6195cedb1840721b55eb333735,XXHASH64:43829e79cea8e714 ``` `vespa.config.search.core.proton` is the _config definition name_ for this particular config, `music/search/cluster.music/0` is the _config id_ used by the proton service instance on this node and `2` is the active config generation. This means, the service is using the correct config generation as it is matching the /state/v1/config response (a restart can be required for some config changes). 3. Get the generated config using [vespa-get-config](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-config) - e.g.: ``` $ vespa-get-config -n vespa.config.search.core.proton -i music/search/cluster.music/0 basedir "/opt/vespa/var/db/vespa/search/cluster.music/n0" rpcport 19106 httpport 19110 ... ``` **Important:** Omitting `-i` will return the default configuration, meaning not generated for the active service instance. Copyright © 2025 - [Cookie Preferences](#) --- ## Config Rest Api V2 ### Config API Vespa provides a REST API for listing and retrieving config - alternatives are the programmatic [C++](../contributing/configapi-dev-cpp.html) or [Java](../contributing/configapi-dev-java.html) APIs. #### Config API Vespa provides a REST API for listing and retrieving config - alternatives are the programmatic [C++](../contributing/configapi-dev-cpp.html) or [Java](../contributing/configapi-dev-java.html) APIs. The Config API provides a way to inspect and retrieve all the config that can be generated by the config model for a given [tenant's active application](deploy-rest-api-v2.html). Some, but not necessarily all, of those configs are used by services by [subscribing](../contributing/configapi-dev.html) to them. The response format is JSON. The current API version is 2. All config servers provide the REST API. The API port is 19071 - use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/msbe/application/articlesearch/` The API is available after an application has been [deployed and activated](../applications.html#deploy). ##### The application id The API provides two ways to identify your application, given a tenant: one using only an application name, and one using application name, environment, region and instance. For the former, "short" form, a default environment, region and instance is used. More formally, an _application id_ is a tuple of the form (_application_, _environment_, _region_, _instance_). The system currently provides shorthand to the id (_application_, "default", "default", "default"). Note: Multiple environments, regions and instances are not currently supported for application deployments, _default_ is always used. Example URL using only application name: `http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/media.config.server-list/clusters/0` | Part | Description | | --- | --- | | media | Tenant | | articlesearch | Application | | media.config | Namespace of the requested config | | server-list | Name of the requested config | | clusters/0 | Config id of the requested config | Example URL using full application id:`http://myconfigserver.mydomain.com:19071/config/v2/tenant/media/application/articlesearch/environment/test/region/us/instance/staging/media.config.server-list/clusters/0` | Part | Description | | --- | --- | | media | Tenant | | articlesearch | Name of the application | | test | Environment | | us | Region | | staging | Instance | | media.config | Namespace of the requested config | | server-list | Name of the requested config | | clusters/0 | Config id of the requested config | In this API specification, the short form of the application id, i.e. only the application name, is used. The tenant `mytenant` and the application name `myapplication` is used throughout in examples. ##### GET /config/v2/tenant/mytenant/application/myapplication/ List the configs in the model, as [config id](../contributing/configapi-dev.html#config-id) specific URLs. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | recursive | false | If true, include each config id in the model which produces the config, and list only the links to the config payload. If false, include the first level of the config ids in the listing of new list URLs, as explained above. | | | Request body | None | | Response | A list response includes two arrays: - List-links to descend one level down in the config id hierarchy, named `children`. - [Config payload](#payload) links for the current (top) level, named `configs`. | | Error Response | N/A | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model/admin/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components/search/" ], "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.model", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/container.components" ] ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/?recursive=true` ``` ``` { "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/myconfigserver.mydomain.com", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/config.sentinel/hosts/myconfigserver.mydomain.com" ``` ``` ##### GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/ | Parameters | Same as above. | | Request body | None | | Response | List the configs in the model with the given namespace and name. List semantics as above. | | Error Response | 404 if the given namespace.name is not known to the config model. | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/", "http://myconfigserver.mydomain.com:19071/config/v1/vespaclient.config.feeder/docproc/" ] "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder", ] } ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/?recursive=true` ``` ``` { "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways", "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/clients/gateways/gateway", ``` ``` ##### GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/ | Parameters | Same as above. | | Request body | None | | Response | List the configs in the model with the given namespace and name, and for which the given config id segment is a prefix. | | Error Response | - 404 if the given namespace.name is not known to the config model. - 404 if the given config id is not in the model. | Examples: `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/` ``` ``` { "children": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/" ] "configs": [ "http://myconfigserver.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search" ] } ``` ``` `GET /config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/?recursive=true` ``` ``` { "configs": [ "http://myhost.mydomain.com:19071/config/v2/tenant/mytenant/application/myapplication/vespaclient.config.feeder/search/qrsclusters/default" ] } ``` ``` ##### GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id] | Parameters | None | | Request body | None | | Response | Returns the config payload of the given `namespace.name/config/id`, formatted as JSON. | | Error Response | Same as above. | Example: `GET /config/v2/tenant/mytenant/application/myapplication/container.core.container-http/search/qrsclusters/default/qrserver.0` ``` ``` { "enabled": "true", "requestbuffersize": "65536", "port": { "search": "8080", "host": "" }, "fileserver": { "throughsearch": "true" } } ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [The application id](#application-id) - [GET /config/v2/tenant/mytenant/application/myapplication/](#list-configs) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/](#list-namespace) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/subid]/](#list-prefix) - [GET /config/v2/tenant/mytenant/application/myapplication/[namespace.name]/[config/id]](#payload) --- ## Config Sentinel ### Config sentinel The config sentinel starts and stops services - and restart failed services unless they are manually stopped. #### Config sentinel The config sentinel starts and stops services - and restart failed services unless they are manually stopped. All nodes in a Vespa system have at least these running processes: | Process | Description | | --- | --- | | [config-proxy](/en/operations-selfhosted/config-proxy.html) | Proxies config requests between Vespa applications and the configserver node. All configuration is cached locally so that this node can maintain its current configuration, even if the configserver shuts down. | | config-sentinel | Registers itself with the _config-proxy_ and subscribes to and enforces node configuration, meaning the configuration of what services should be run locally, and with what parameters. | | [vespa-logd](../reference/logs.html#logd) | Monitors _$VESPA\_HOME/logs/vespa/vespa.log_, which is used by all other services, and relays everything to the [log-server](/en/reference/logs.html#log-server). | | [metrics-proxy](/en/operations-selfhosted/monitoring.html#metrics-proxy) | Provides APIs for metrics access to all nodes and services. | ![Vespa node configuration, startup and logs](/assets/img/config-sentinel.svg) Start sequence: 1. _config server(s)_ are started and application config is deployed to them - see [config server operations](/en/operations-selfhosted/configuration-server.html). 2. _config-proxy_ is started. The environment variables [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGSERVER\_RPC\_PORT](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) are used to connect to the [config-server(s)](/en/operations-selfhosted/configuration-server.html). It will retry all config servers in case some are down. 3. _config-sentinel_ is started, and subscribes to node configuration (i.e. a service list) from _config-proxy_ using its hostname as the [config id](/en/contributing/configapi-dev.html#config-id). See [Node and network setup](/en/operations-selfhosted/node-setup.html) for details about how the hostname is detected and how to override it. The config for the config-sentinel (the service list) lists the processes to be started, along with the _config id_ to assign to each, typically the logical name of that service instance. 4. _config-proxy_ subscribes to node configuration from _config-server_, caches it, and returns the result to _config-sentinel_ 5. _config-sentinel_ starts the services given in the node configuration, with the config id as argument. See example output below, like _id="search/qrservers/qrserver.0"_. _logd_ and _metrics-proxy_ are always started, regardless of configuration. Each service: 1. Subscribes to configuration from _config-proxy_. 2. _config-proxy_ subscribes to configuration from _config-server_, caches it and returns result to the service. 3. The service runs according to its configuration, logging to _$VESPA\_HOME/logs/vespa/vespa.log_. The processes instantiate internal components, each assigned the same or another config id, and instantiating further components. Also see [cluster startup](#cluster-startup) for a minimum nodes-up start setting. When new config is deployed to _config-servers_ they propagate the changed configuration to nodes subscribing to it. In turn, these nodes reconfigure themselves accordingly. ##### User interface The config sentinel runs an RPC service which can be used to list, start and stop the services supposed to run on that node. This can be useful for testing and debugging. Use [vespa-sentinel-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-sentinel-cmd) to trigger these actions. Example output from `vespa-sentinel-cmd list`: ``` vespa-sentinel-cmd 'sentinel.ls' OK. container state=RUNNING mode=AUTO pid=27993 exitstatus=0 id="default/container.0" container-clustercontroller state=RUNNING mode=AUTO pid=27997 exitstatus=0 id="admin/cluster-controllers/0" distributor state=RUNNING mode=AUTO pid=27996 exitstatus=0 id="search/distributor/0" logd state=RUNNING mode=AUTO pid=5751 exitstatus=0 id="hosts/r6-3/logd" logserver state=RUNNING mode=AUTO pid=27994 exitstatus=0 id="admin/logserver" searchnode state=RUNNING mode=AUTO pid=27995 exitstatus=0 id="search/search/cluster.search/0" slobrok state=RUNNING mode=AUTO pid=28000 exitstatus=0 id="admin/slobrok.0" ``` To learn more about the processes and services, see [files and processes](/en/operations-selfhosted/files-processes-and-ports.html). Use [vespa-model-inspect host _hostname_](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) to list services running on a node. ##### Cluster startup The config sentinel will not start services on a node unless it has connectivity to a minimum of other nodes, default 50%. Find an example of this feature in the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA#start-the-admin-server) example application. Example configuration: ``` ``` 20 1 ``` ``` Example: `minOkPercent 10` means that services will be started only if more than or equal to 10% of nodes are up. If there are 11 nodes in the application, the first node started will not start its services - when the second node is started, services will be started on both. `maxBadCount` is for connectivity checks where the other node is up, but we still do not have proper two-way connectivity. Normally, one-way connectivity means network configuration is broken and needs looking into, so this may be set low (1 or even 0 are the recommended values). If there are some temporary problems (in the example below non-responding DNS which leads to various issues at startup) the config sentinel will loop and retry, so the service startup will just be slightly delayed. Example log: ``` [2021-06-15 14:33:25] EVENT : starting/1 name="sbin/vespa-config-sentinel -c hosts/le40808.ostk (pid 867)" [2021-06-15 14:33:25] EVENT : started/1 name="config-sentinel" [2021-06-15 14:33:25] CONFIG : Sentinel got 4 service elements [tenant(footest), application(bartest), instance(default)] for config generation 1001 [2021-06-15 14:33:25] CONFIG : Booting sentinel 'hosts/le40808.ostk' with [stateserver port 19098] and [rpc port 19097] [2021-06-15 14:33:25] CONFIG : listening on port 19097 [2021-06-15 14:33:25] CONFIG : Sentinel got model info [version 7.420.21] for 35 hosts [config generation 1001] [2021-06-15 14:33:25] CONFIG : connectivity.maxBadCount = 3 [2021-06-15 14:33:25] CONFIG : connectivity.minOkPercent = 40 [2021-06-15 14:33:28] INFO : Connectivity check details: 2086533.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le01287.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23256.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23267.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23297.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23312.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le23317.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23319.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30550.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30553.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30556.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le30560.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30567.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40387.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40389.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40808.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40817.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40833.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40834.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40841.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40858.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40860.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40863.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40873.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40892.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40900.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40905.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40914.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02318.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02324.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02340.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40672.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40712.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40728.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt41329.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] WARNING : 8 of 35 nodes up but with network connectivity problems (max is 3) [2021-06-15 14:33:28] WARNING : Bad network connectivity (try 1) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le30556.ostk' -> '1234:5678:90:123::abcd' (5.00528 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40834.ostk' -> '1234:5678:90:456::efab' (5.00527 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40860.ostk' -> '1234:5678:90:789::cdef' (5.00459 s) [2021-06-15 14:33:31] INFO : Connectivity check details: le23312.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le23319.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30553.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30556.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40834.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40841.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40860.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:31] INFO : Connectivity check details: le40863.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Enough connectivity checks OK, proceeding with service startup [2021-06-15 14:33:31] EVENT : starting/1 name="searchnode" ... ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Config System ### The Config System The config system in Vespa is responsible for turning the application package into live configuration of all the nodes, processes and components that realizes the running system. #### The Config System The config system in Vespa is responsible for turning the application package into live configuration of all the nodes, processes and components that realizes the running system. Here we deep dive into various aspects of how this works. ##### Node configuration The problem of configuring nodes can be divided into three parts, each addressed by different solutions: - **Node system level configuration:** Configure OS level settings such as time zone as well as user privileges on the node. - **Package management**: Ensure that the correct set of software packages is installed on the nodes. This functionality is provided by three tools working together. - **Vespa configuration:** Starts the configured set of processes on each node with their configured startup parameters and provides dynamic configuration to the modules run by these services. _Configuration_ here is any data which: - can not be fixed at compile time - is static most of the time Note that by these definitions, this allows all the nodes to have the same software packages (disregarding version differences, discussed later), as variations in what services are run on each node and in their behavior is achieved entirely by using Vespa Configuration. This allows managing the complexity of node variations completely within the configuration system, rather than across multiple systems. Configuring a system can be divided into: - **Configuration assembly:** Assembly of a complete set of configurations for delivery from the inputs provided by the parties involved in configuring the system - **Configuration delivery:** Definition of individual configurations, APIs for requesting and accessing configuration, and the mechanism for delivering configurations from their source to the receiving components This division allows the problem of reliable configuration delivery in large distributed systems to be addressed in configuration delivery, while the complexities of assembling complete configurations can be treated as a vm-local design problem. An important feature of Vespa Configuration is the nature of the interface between the delivery and assembly subsystems. The assembly subsystem creates as output a (Java) object model of the distributed system. The delivery subsystem queries this model to obtain concrete configurations of all the components of the system. This allows the assembly subsystem to accept higher level, and simpler to use, abstractions as input and automatically derive detailed configurations with the correct interdependencies. This division insulates the external interface and the components being configured from changes in each other. In addition, the system model provides the home for logic implementing node/component instance variations of configuration. ##### Configuration assembly Config assembly is the process of turning the configuration input sources into an object model of the desired system, which can respond to queries for configs given a name and config id. Config assembly for Vespa systems can become complex, because it involves merging information owned by multiple parties: - **Vespa operations** own the nodes and controls assignment of nodes to services/applications - **Vespa service providers** own services which hosts multiple applications running on Vespa - **Vespa applications** define the final applications running on nodes and shared services The current config model assembly procedure uses a single source - the _application package_. The application package is a directory structure containing defined files and subdirectories which together completely defines the system - including which nodes belong in the system, which services they should run and the configuration of these services and their components. When the application deployer wants to change the application,[vespa prepare](#deploy) is issued to a config server, with the application package as argument. At this point the system model is assembled and validated and any feedback is issued to the deployer. If the deployer decides to make the new configuration active, a [vespa activate](#deploy) is then issued, causing the config server cluster to switch to the new system model and respond with new configs on any active subscriptions where the new system model caused the config to change. This ensures that subscribers gets new configs timely on changes, and that the changes propagated are the minimal set such that small changes to an application package causes correspondingly small changes to the system. ![The config server assembles app config](/assets/img/config-assembly.svg) The config model itself is pluggable, so that service providers may write plugins for assembling a particular service. The plugins are written in Java, and is installed together with the Vespa Configuration. Service plugins define their own syntax for specifying services that may be configured by Vespa applications. This allows the applications to be specified in an abstract manner, decoupled from the configuration that is delivered to the components. ##### Configuration delivery Configuration delivery encompasses the following aspects: - Definition of configurations - The component view (API) of configuration - Configuration delivery mechanism These aspects work together to realize the following goals: - Eliminate inconsistency between code and configuration. - Eliminate inconsistency between the desired configuration and the state on each node. - Limit temporary inconsistencies after reconfiguration. The next three subsections discusses the three aspects above, followed by subsections on two special concerns - bootstrapping and system upgrades. ###### Configuration definitions A _configuration_ is a set of simple or array key-values with a name and a type, which can possibly be nested - example: ``` myProperty "myvalue" myArray[1] myArray[0].key1 "someValue" myArray[0].key2 1337 ``` The _type definition_ (or class) of a configuration object defines and documents the set of fields a configuration may contain with their types and default values. It has a name as well as a namespace. For example, the above config instance may have this definition: ``` namespace=foo.bar #### Documentation of this key myProperty string default="foo" #### etc. myArray[].key1 string myArray[].key2 int default=0 ``` An individual config typically contains a coherent set of settings regarding some topic, such as _logging_ or _indexing_. A complete system consists of many instances of many config types. ###### Component view Individual components of a system consumes one or more such configs and use their values to influence their behavior. APIs are needed for _requesting_ configs and for _accessing_ the values of those configs as they are provided. _Access_ to configs happens through a (Java or C++) class generated from the config definition file. This ensures that any inconsistency between the fields declared in a config type and the expectations of the code accessing it are caught at compile time. The config definition is best viewed as another class with an alternative form of source syntax belonging to the components consuming it. A Maven target is provided for generating such classes from config definition types. Components may use two different methods for _requesting_ configurations (refer to [Config API](/en/contributing/configapi-dev-cpp.html) for C++ code) - subscription and dependency injection: **Subscription:** The component sets up_ConfigSubscriber_, then subscribes to one or more configs. This is the simple approach, there are [other ways of](/en/contributing/configapi-dev-java.html)getting configs too: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle handle = subscriber.subscribe(MyConfig.class, "myId"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); if (handle.isChanged()) { String message = handle.getConfig().myKey(); // ... consume the rest of this config } ``` ``` **Dependency injection:** The component declares its config dependencies in the constructor and subscriptions are set up on its behalf. When changed configs are available a new instance of the component is created. The advantage of this method is that configs are immutable throughout the lifetime of the component such that no thread coordination is required. This method is currently only available in Java using the [Container](/en/jdisc/index.html). ``` ``` public MyComponent(MyConfig config) { String myKey = config.myKey(); // ... consume the rest of this config } ``` ``` For unit testing,[configs can be created with Builders](/en/contributing/configapi-dev-java.html#unit-testing), submitted directly to components. ###### Delivery mechanism The config delivery mechanism is responsible for ensuring that a new config instance is delivered to subscribing components, each time there is a change to the system model causing that config instance to change. A config subscription is identified by two parameters, the _config definition name and namespace_and the [config id](/en/contributing/configapi-dev.html#config-id)used to identify the particular component instance making the subscription. The in-process config library will forward these subscription requests to a node local[config proxy](/en/operations-selfhosted/config-proxy.html), which provides caching and fan-in from processes to node. The proxy in turn issues these subscriptions to a node in the configuration server cluster, each of which hosts a copy of the system model and resolves config requests by querying the system model. To provide config server failover, the config subscriptions are implemented as long-timeout gets, which are immediately resent when they time out, but conceptually this is best understood as push subscriptions: ![Nodes get config from a config server cluster](/assets/img/config-delivery.svg) As configs are not stored as files locally on the nodes, there is no possibility of inconsistencies due to local edits, or of nodes coming out of maintenance with a stale configuration. As configuration changes are pushed as soon as the config server cluster allows, time inconsistencies during reconfigurations are minimized, although not avoided as there is no global transaction. Application code and config is generally pulled from the config server - it is however possible to use the [url](/en/reference/config-files.html#url)config type to refer to any resource to download to nodes. ###### Bootstrapping Each Vespa node runs a [config-sentinel](/en/operations-selfhosted/config-sentinel.html) process which start and maintains services run on a node. ###### System upgrades The configuration server will up/downgrade between config versions on the fly on minor upgrades which causes discrepancies between the config definitions requested from those produced by the configuration model. Major upgrades, which involve incompatible changes to the configuration protocol or the system model, require a [procedure](/en/operations-selfhosted/config-proxy.html). ##### Notes Find more information for using the Vespa config API in the[reference doc](/en/contributing/configapi-dev.html). Vespa Configuration makes the following assumptions about the nodes using it: - All nodes have the software packages needed to run the configuration system and any services which will be configured to run on the node. This usually means that all nodes have the same software, although this is not a requirement - All nodes have [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) set - All nodes know their fully qualified domain name Reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container - for this purpose, refer to[Configuring components](/en/configuring-components.html). ##### Further reads - [Configuration server operations](/en/operations-selfhosted/configuration-server.html) is a good resource for troubleshooting. - Refer to the [bundle plugin](/en/components/bundles.html#maven-bundle-plugin) for how to build an application package with Java components. - During development on a local instance it can be handy to just wipe the state completely and start over: 1. [Delete all config server state](/en/operations-selfhosted/configuration-server.html#zookeeper-recovery) on all config servers 2. Run [vespa-remove-index](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-remove-index) to wipe content nodes Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Node configuration](#node-configuration) - [Configuration assembly](#configuration-assembly) - [Configuration delivery](#configuration-delivery) - [Configuration definitions](#configuration-definitions) - [Component view](#component-view) - [Delivery mechanism](#delivery-mechanism) - [Bootstrapping](#bootstrapping) - [System upgrades](#upgrades) - [Notes](#notes) - [Further reads](#further-reads) --- ## Configapi Dev Cpp ### Using the C++ Cloud config API This document describes how to use the C++ cloud config API. #### Using the C++ Cloud config API This document describes how to use the C++ cloud config API. We are assuming you have created a[config definition file](configapi-dev.html) (def file), which is the schema for one of your configs. Developing with the C++ Cloud Config API requires you to - Generate C++ code from your config definitions - Subscribe to the config using the API ##### Generating Config In my example application, I have the following hierarchy related to config: ``` src/config-defs/motd.def ``` Generate code while standing in the `src/` folder: ``` $ make-config.pl $(pwd) $(pwd)/config-defs/motd.def ``` This will generate a _config-defs/config-motd.h_ and _config-defs/config-motd.cpp_. These classes are immutable pure data objects that you can use to configure your application. The objects may be copied. In this example, the class MotdConfig is the generated config object. ##### Subscribing and Getting Config To retrieve the config in your application, create a ConfigSubscriber. A ConfigSubscriber is capable of subscribing to one or more configs. The subscribe method takes an optional parameter as well, a timeout. If the ConfigSubscriber was unable to subscribe within the timeout, it will throw a ConfigRuntimeException. To use the API, you must include the header of the generated classes as well as the header shown in the example below. The config API resides in the `config` namespace: ``` ``` #include using namespace config; … ConfigSubscriber s; try { std::unique_ptr> handle = s.subscribe("my.config.id"); } catch (ConfigRuntimeException & exception) { // Handle exception } ``` ``` Note that a ConfigSubscriber is **NOT thread safe**. It is up to the API user to ensure that the ConfigSubscriber is not used by multiple threads. Once you have subscribed to all the configs you need, you may invoke the nextConfig() call on the ConfigSubscriber: ``` s.nextConfig(1000); ``` Given N subscriptions, the nextConfig call will wait up to 1 second for 1 to N configs to change. If they have changed, it returns true. If not, it returns false. See [config API guidelines](configapi-dev.html#guidelines)for a more advanced usage of the ConfigSubscriber. **Important:** One **can not** subscribe or unsubscribe to more configs once nextConfig() is called. This means that in order to change the set of subscribed configs, one must create a new ConfigSubscriber with the new set. Having called nextConfig(), the ConfigHandle can be asked for the current config: ``` ``` std::unique_ptr cfg = handle->getConfig(); ``` ``` This will retrieve the currently available config. If the subscribe calls succeeded and valid configs was returned by the config server, you are guaranteed that it will give you a correct config. For getting updates, the `nextConfig` method can be used like: ``` ``` if (s.nextConfig(3000)) { std::unique_ptr cfg = handle->getConfig(); } ``` ``` The method ensures that whatever getConfig() returns next will be the latest config available. nextConfig has a timeout parameter, and will return false if timeout was reached, or true if a new generation of configs was deployed, and at least 1 of them changed. When subscribing to multiple configs, a natural use case is to check which of the configs changed. Therefore, the ConfigHandle class also contains a `isChanged` method. This method returns true if the previous call to nextConfig() resulted in a change, false if not. ###### Selecting Config Source The ConfigSubscriber constructor may also be passed a `ConfigContext` object. A context can be used to share resources with multiple ConfigSubscriber objects and select the config source. The context is passed a `SourceSpec` parameter, which specifies the source of the config. The different spec types are: - `ServerSpec`, the default spec - `ConfigSet`, used for subscribing to a set of config objects - `DirSpec`, used for subscribing to config files in a directory - `FileSpec`, used for subscribing to a single file - `RawSpec`, used for subscribing to a config value specified directly Most users will use the default ServerSpec, or ConfigSet. ##### Unit Testing To help with unit testing, each config type has a corresponding builder type. For instance, given `FooConfig and BarConfig` generated config classes,`FooConfigBuilder and BarConfigBuilder` should also be available as mutable versions. The builders can then be added to a `ConfigSet`: ``` ConfigSet set; FooConfigBuilder fooBuilder; BarConfigBuilder barBuilder; set.addBuilder("id1", &fooBuilder); set.addBuilder("id1", &barBuilder); fooBuilder.foobar = 13; barBuilder.barfoo = 12; ``` Having populated the set and set values on the builders, one must create a context containing the set: ``` IConfigContext::SP ctx(new ConfigContext(set)); ``` Once the context is created, it can be passed to the ConfigSubscriber: ``` ``` ConfigSubscriber subscriber(ctx); ConfigHandle::UP fooHandle = subscriber.subscribe("id1"); ConfigHandle::UP barHandle = subscriber.subscribe("id1"); subscriber.nextConfig() // returns true first time ``` ``` Once having subscribed, nextConfig and nextGeneration methods will work as normal. If you need to update a field to test reload, you can change the field of one of the builders and call reload on the context: ``` subscriber.nextConfig() // should return false if called before fooBuilder.foobar = 188; ctx->reload(); subscriber.nextConfig() // should return true now ``` How the config id relates to the application package deployed is covered in the main [Config API](configapi-dev.html) document. We also provide some helper classes such as the `ConfigGetter`to test the config itself. **Note:** When using builders for unit testing, there is an underlying assumption that the configured application have subscribed to all configs before the builders are mutated. Otherwise, the application may try to retrieve an inconsistent configuration. In general, try to design the application so that one can verify configuration changes in tests. ##### Printing Config All config objects can be printed, and the API supports several ways of doing so. To print config, you need to include \. Config can be printed with any class implementing the `ConfigWriter` interface. A `ConfigWriter` has a write method that takes any config class as input, and writes it somewhere. The following classes are provided: - `FileConfigWriter` - Can write a config to a file - `OstreamConfigWriter` - Can write a config to a C++ ostream A `ConfigWriter` also supports another parameter in the write method, a `ConfigFormatter`. Currently, we provide two formatters: - `FileConfigFormatter` - Formats the config as the old config payload format. This is the **default** formatter - `JsonConfigFormatter` - Formats the config as JSON The `FileConfigFormatter` is the default formatter if none is specified. Example: Writing the config `MyConfig` to a file as JSON: ``` MyConfig foo; FileConfigWriter writer("myfile.json"); writer.write(foo, JsonConfigFormatter()); ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Generating Config](#generating-config) - [Subscribing and Getting Config](#subscribing-and-getting-config) - [Selecting Config Source](#selecting-config-source) - [Unit Testing](#unit-testing) - [Printing Config](#printing-config) --- ## Configapi Dev Java ### Developing with the Java Cloud Config API Assumption: a [def file](configapi-dev.html), which is the schema for one of your configs, is created and put in `src/main/resources/configdefinitions/`. #### Developing with the Java Cloud Config API Assumption: a [def file](configapi-dev.html), which is the schema for one of your configs, is created and put in `src/main/resources/configdefinitions/`. To generate source code for the def-file, invoke the`config-class-plugin` from _pom.xml_, in the ``, `` section: ``` ``` com.yahoo.vespa config-class-plugin ${vespa.version} config-gen config-gen ``` ``` The generated classes will be saved to`target/generated-sources/vespa-configgen-plugin`, when the`generate-sources` phase of the build is executed. The def-file [`motd.def`](configapi-dev.html)is used in this tutorial, and a class called `MotdConfig`was generated (in the package `myproject`). It is a subtype of `ConfigInstance`. When using only the config system (and not other parts of Vespa or the JDisc container), pull in that by using this in pom.xml: ``` ``` com.yahoo.vespa config ${vespa.version} provided ``` ``` ##### Subscribing and getting config To retrieve the config in the application, create a `ConfigSubscriber`. A `ConfigSubscriber` is capable of subscribing to one or more configs. The example shown here uses simplified error handling: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle handle = subscriber.subscribe(MotdConfig.class, "motdserver2/0"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); if (handle.isChanged()) { String message = handle.getConfig().message(); int port = handle.getConfig().port(); } ``` ``` Note that `isChanged()` always will be true after the first call to `nextConfig()`, it is included here to illustrate the API. In many cases one will do this from a thread which loops the`nextConfig()` call, and reconfigures your application if`isChanged()` is true. The second parameter to `subscribe()`, _"motdserver2/0"_, is the [config id](configapi-dev.html#config-id). If one `ConfigSubscriber` subscribes to multiple configs,`nextConfig()` will only return true if the configs are of the same generation, i.e. they are "in sync". See the[com.yahoo.config](https://javadoc.io/doc/com.yahoo.vespa/config-lib) javadoc for details. Example: ``` ``` ConfigSubscriber subscriber = new ConfigSubscriber(); ConfigHandle motdHandle = subscriber.subscribe(MotdConfig.class, "motdserver2/0"); ConfigHandle anotherHandle = subscriber.subscribe(AnotherConfig.class, "motdserver2/0"); if (!subscriber.nextConfig()) throw new RuntimeException("Config timed out."); // We now have a synchronized new generation for these two configs. if (motdHandle.isChanged()) { String message = motdHandle.getConfig().message(); int port = motdHandle.getConfig().port(); } if (anotherHandle.isChanged()) { String myfield = anotherHandle.getConfig().getMyField(); } ``` ``` ##### Simplified subscription In cases like the first example above, where you only subscribe to one config, you may also subscribe using the`ConfigSubscriber.SingleSubscriber` interface. In this case, you define a `configure()`method from the interface, and call a special `subscribe()`. The method will start a dedicated config fetcher thread for you. The method will throw an exception in the user thread if initial configuration fails, and print a warning in the config thread if it fails afterwards. Example: ``` ``` public class MyConfigSubscriber implements ConfigSubscriber.SingleSubscriber { public MyConfigSubscriber(String configId) { new ConfigSubscriber().subscribe(this, MotdConfig.class, configId); } @Override public void configure(MotdConfig config) { // configuration logic here } } ``` ``` The disadvantage to using this is that one cannot implement custom error handling or otherwise track config changes. If needed, use the generic method above. ##### Unit testing config When instantiating a [ConfigSubscriber](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSubscriber.html), one can give it a [ConfigSource](https://javadoc.io/doc/com.yahoo.vespa/config/latest/com/yahoo/config/subscription/ConfigSource.html). One such source is a `ConfigSet`. It consists of a set of `Builder`s. This is an example of instantiating a subscriber using this - it uses 2 types of config, that were generated from files`app.def` and `string.def`: ``` ConfigSet myConfigs = new ConfigSet(); AppConfig.Builder a0builder = new AppConfig.Builder().message("A message, 0").times(88); AppConfig.Builder a1builder = new AppConfig.Builder().message("A message, 1").times(89); myConfigs.add("app/0", a0builder); myConfigs.add("app/1", a1builder); myConfigs.add("bar", new StringConfig.Builder().stringVal("StringVal")); ConfigSubscriber subscriber = new ConfigSubscriber(myConfigs); ``` To help with unit testing, each config type has a corresponding builder type. The `Builder` is mutable whereas the `ConfigInstance` is not. Use this to set up config fixtures for unit tests. The `ConfigSubscriber` has a `reload()` method which is used in tests to force the subscriptions into a new generation. It emulates a `vespa activate` operation after you have updated the `ConfigSet`. A full example can be found in[ConfigSetSubscriptionTest.java](https://github.com/vespa-engine/vespa/blob/master/config/src/test/java/com/yahoo/config/subscription/ConfigSetSubscriptionTest.java). Copyright © 2025 - [Cookie Preferences](#) --- ## Configapi Dev ### Cloud Config API This document describes how to use the C++ and Java versions of the Cloud config API (the 'config API'). #### Cloud Config API This document describes how to use the C++ and Java versions of the Cloud config API (the 'config API'). This API is used internally in Vespa, and reading this document is not necessary in order to use Vespa or to develop Java components for the Vespa container. For this purpose, please refer to[Configuring components](../configuring-components.html) instead. Throughout this document, we will use as example an application serving up a configurable message. ##### Creating a Config Definition The first thing to do when deciding to use the config API is to define the config you want to use in your application. This is described in the[configuration file reference](../reference/config-files.html). Here we will use the definition `motd.def`from the complete example at the end of the document: ``` namespace=myproject message string default="NO MESSAGE" port int default=1337 ``` ##### Generating Source Code and Accessing Config in Code Before you can access config in your program you will need to generate source code for the config definition. Simple steps for how you can generate API code and use the API are provided for: - [C++](configapi-dev-cpp.html) - [Java](configapi-dev-java.html) (see also[javadoc](https://javadoc.io/doc/com.yahoo.vespa/config-lib)) We also recommend that you read the [general guidelines](#guidelines)for examples of advanced usage and recommendations for how to use the API. ##### Config ID The config id specified when requesting config is essentially an identifier of the component requesting config. The config server contains a config object model, which maps a request for a given config name and config id to the correct configproducer instance, which will merge default values from the config definition with config from the object model and config set in`services.xml` to produce the final config instance. The config id is given to a service via the VESPA\_CONFIG\_ID environment variable. The [config sentinel](/en/operations-selfhosted/config-sentinel.html) - see [bootstrapping](/en/application-packages.html#bootstrapping) - sets the environment variable to the id given by the config model. This id should then be used by the service to subscribe for config. If you are running multiple services, each of them will be assigned a **unique config id** for that service, and a service should not subscribe using any config id other than its own. If you need to get config for a services that is not part of the model (i.e. it is not specified in the services.xml), but that you want to specify values for in services.xml, use the config id `client`. ##### Schema Compatibility Rules A schema incompatibility occurs if the config class (for example `MotdConfig` in the C++ and Java sections above) was built from a different def-file than the one the server is seeing and using to serve config. Some such incompatibilities are automatically handled by the config system, others lead to error. This is useful to know during development/testing of a config schema. Let _S_ denote a config definition called _motd_ which the server is using, and _C_ denote a config definition also called _motd_ which the client is using, i.e. the one that created `MotdConfig` used when subscribing. The following is the system's behavior: | Compatible Changes | These schema mismatches are handled automatically by the configserver: - C is missing a config value that S has: The server will omit that value from the response. - C has an additional config value with a default value: The server will include that value in the response. - C and S both have a config value, but the default values differ: The server will use C's default value. | | Incompatible Changes | These schema mismatches are not handled by the config server, and will typically lead to error in the subscription API because of missing values (though in principle some consumers of config may tolerate them): - C has an additional config value without a default value: The server will not include anything for that value. - C has the type of a config value changed, for example from string to int: The server will print an error message, and not include anything for that value. The user must use an entirely new name for the config if such a change must be made. | As with any data schema, it is wise to be conservative about changing it if the system will have new versions in the future. For a `def` schema, removing a config value constitutes a semantic change that may lead to problems when an older version of some config subscriber asks for config. In large deployments, the risk associated with this increases, because of the higher cost of a full restart of everything. Consequently, one should prefer creating a new config name, to removing a config value from a schema. ##### The Config Server and Object Model Currently, the object model in the server is created from a series of input files (`services.xml`). The model is pluggable, and can generate config id mappings based on your own custom syntax. See [Developing Cloud Config Model plugins](cloudconfig-model-plugins.html) for information on how to create model plugins. ##### Creating a Deployable Application Package The application package consists of the following files: ``` app/services.xml app/hosts.xml ``` The services file contains the services that is handled by the config model plugin. The hosts file contains: ``` ``` node0 ``` ``` ##### Setting Up a Running System To get a running system, first install the cloudconfig package, start the config server, then deploy the application: Prepare the application: ``` $ vespa prepare /path/to/app/folder ``` Activate the application: ``` $ vespa activate /path/to/app/folder ``` Then, start vespa. This will start the application and pass it its config id via the VESPA\_CONFIG\_ID environment variable. ##### Advanced Usage of the Config API For a simple application, having only 1 config may suffice. In a typical server application, however, the number of config settings can become large. Therefore, we **encourage** that you split the config settings into multiple logical classes. This section covers how you can use a ConfigSubscriber to subscribe to multiple configs and how you should group configs based on their dependencies. Configs can either be: - Independent static configs - Dependent static configs - Dependent dynamic configs We will give a few examples of how you can cope with these different scenarios. The code examples are given in a pseudo format common to C++ and Java, but they should be easy to convert to their language specific equivalents. ###### Independent Static Configs Independent configs means that it does not matter if one of them is updated independently of the other. In this case, you might as well use one ConfigSubscriber for each of the configs, but it might become tedious to check all of them. Therefore, the recommended way is to manage all of these configs using one ConfigSubscriber. In this setup, it is also typical to split the subscription phase from the config check/retrieval part. The subscribing part: | C++ | ``` ``` ConfigSubscriber subscriber; ConfigHandle::UP fooHandle = subscriber.subscribe(…); ConfigHandle::UP barHandle = subscriber.subscribe(…); ConfigHandle::UP bazHandle = subscriber.subscribe(…); ``` ``` | | Java | ``` ``` ConfigSubscriber subscriber; ConfigHandle fooHandle = subscriber.subscribe(FooConfig.class, …); ConfigHandle barHandle = subscriber.subscribe(BarConfig.class, …); ConfigHandle bazHandle = subscriber.subscribe(BazConfig.class, …); ``` ``` | And the retrieval part: ``` if (subscriber.nextConfig()) { if (fooHandle->isChanged()) { // Reconfigure foo } if (barHandle->isChanged()) { // Reconfigure bar } if (bazHandle->isChanged()) { // Reconfigure baz } } ``` This allows you to perform the config fetch part either in its own thread or as part of some other event thread in your application. ###### Dependent Static Configs Dependent configs means that one of your configs depends on the value in another config. The most common is that you have one config which contains the config id to use when subscribing to the second config. In addition, your system may need that the configs are updated to the same **generation**. **Note:** A generation is a monotonically increasing number which is increased each time an application is deployed with `vespa deploy`. Certain applications may require that all configs are of the same generation to ensure consistency, especially container-like applications. All configs subscribed to by a ConfigSubscriber are guaranteed to be of the same generation. The configs are static in the sense that the config id used does not change. The recommended way to approach this is to use a two phase setup, where you fetch the initial configs in the first phase, and then subscribe to both the initial and derived configs in order to ensure that they are of the same generation. Assume that the InitialConfig config contains two fields named _derived1_ and _derived2_: | C++ | ``` ``` ConfigSubscriber initialSubscriber; ConfigHandle::UP initialHandle = subscriber.subscribe(…); while (!subscriber.nextConfig()); // Ensure that we actually get initial config. std::auto_ptr initialConfig = initialHandle->getConfig(); ConfigSubscriber subscriber; … = subscriber.subscribe(…); … = subscriber.subscribe(initialConfig->derived1); … = subscriber.subscribe(initialConfig->derived1); ``` ``` | | Java | ``` ``` ConfigSubscriber initialSubscriber; ConfigHandle initialHandle = subscriber.subscribe(InitialConfig.class, …); while (!subscriber.nextConfig()); // Ensure that we actually get initial config. InitialConfig initialConfig = initialHandle.getConfig(); ConfigSubscriber subscriber; … = subscriber.subscribe(InitialConfig.class, …); … = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1); … = subscriber.subscribe(DerivedConfig.class, initialConfig.derived1); ``` ``` | You can then check the configs in the same way as for independent static configs, and be sure that all your configs are of the same generation. The reason why you need to create a new ConfigSubscriber is that **once you have called nextConfig(), you cannot add or remove new subscribers**. ###### Dependent Dynamic Configs Dynamic configs mean that the set of configs that you subscribe for may change between each deployment. This is the hardest case to solve, and how hard it is depends on how many levels of configs you have. The most common one is to have a set of bootstrap configs, and another set of configs that may change depending on the bootstrap configs (typically in an application that has plugins). To cover this case, you can use a class named `ConfigRetriever`. Currently, it is **only available in the C++ API**. The ConfigRetriever uses the same mechanisms as the ConfigSubscriber to ensure that you get a consistent set of configs. In addition, two more classes called`ConfigKeySet` and `ConfigSnapshot` are added. The ConfigRetriever takes in a set of configs used to bootstrap the system in its constructor. This set does not change. It then provides one method, `getConfigs(ConfigKeySet)`. The method returns a ConfigSnapshot of the next generation of bootstrap configs or derived configs. To create the ConfigRetriever, you must first populate a set of bootstrap configs: ``` ``` ConfigKeySet bootstrapKeys; bootstrapKeys.add(configId); bootstrapKeys.add(configId); ``` ``` The bootstrap configs are typically configs that will always be needed by your application. Once you have defined your set, you can create the retriever and fetch a ConfigSnapshot of the bootstrap configs: ``` ConfigRetriever retriever(bootstrapKeys); ConfigSnapshot bootstrapConfigs = retriever.getConfigs(); ``` The ConfigSnapshot contains the bootstrap config, and you may use that to fetch the individual configs. You need to provide the config id and the type in order for the snapshot to know which config to look for: ``` ``` if (!bootstrapConfigs.empty()) { std::auto_ptr bootstrapFoo = bootstrapConfigs.getConfig(configId); std::auto_ptr bootstrapBar = bootstrapConfigs.getConfig(configId); ``` ``` The snapshot returned is empty if the retriever was unable to get the configs. In that case, you can try calling the same method again. Once you have the bootstrap configs, you know the config ids for the other components that you should subscribe for, and you can define a new key set. Let's assume that bootstrapFoo contains an array of config ids we should subscribe for. ``` ``` ConfigKeySet pluginKeySet; for (size_t i = 0; i < (*bootstrapFoo).pluginConfigId.size; i++) { pluginKeySet.add((*bootstrapFoo).pluginConfigId[i]); } ``` ``` In this example we know the type of config requested, but this could be done in another way letting the plugin add keys to the set. Now that the derived configs have been added to the pluginKeySet, we can request a snapshot of them: ``` ConfigSnapshot pluginConfigs = retriever.getConfigs(pluginKeySet); if (!pluginConfigs.empty()) { // Configure each plugin with a config picked from the snapshot. } ``` And that's it. When calling the method without any key parameters, the snapshot returned by this method may be empty if **the config could not be fetched within the timeout**, or **the generation of configs has changed**. To check if you should call getBootstrapConfigs() again, you can use the `bootstrapRequired()` method. If it returns true, you will have to call getBootstrapConfigs() again, because the plugin configs have been updated, and you need a new bootstrap generation to match it. If it returns false, you may call getConfigs() again to try and get a new generation of plugin configs. We recommend that you use the retriever API if you have a use case like this. The alternative is to create your own mechanism using two ConfigSubscriber classes, but this is **not** recommended. ###### Advice on Config Modelling Regardless of which of these types of configs you have, it is recommended that you always fetch all the configs you need**before** you start configuring your system. This is because the user may deploy multiple different version of the config that may cause your components to get conflicting config values. A common pitfall is to treat dependent configs as independent, thereby causing inconsistency in your application when a config update for config A arrives before config B. The ConfigSubscriber was created to minimize the possibility of making this mistake, by ensuring that all of the configs comes from the same config reload. **Tip:** Set up your entire _tree_ of configs in one thread to ensure consistency, and configure your system once all of the configs have arrived. This also maps best to the ConfigSubscriber, since it is not thread safe. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Creating a Config Definition](#creating-config-definition) - [Generating Source Code and Accessing Config in Code](#generate-source) - [Config ID](#config-id) - [Schema Compatibility Rules](#def-compatibility) - [The Config Server and Object Model](#config-server) - [Creating a Deployable Application Package](#deploy) - [Setting Up a Running System](#setting-up) - [Advanced Usage of the Config API](#guidelines) - [Independent Static Configs](#independent-static-configs) - [Dependent Static Configs](#guidelines-dependent-static) - [Dependent Dynamic Configs](#guidelines-dependent-dynamic) - [Advice on Config Modelling](#guidelines-tips) --- ## Configserver Metrics Reference ### ConfigServer Metrics | Name | Unit | Description | #### ConfigServer Metrics | Name | Unit | Description | | --- | --- | --- | | configserver.requests | request | Number of requests processed | | configserver.failedRequests | request | Number of requests that failed | | configserver.latency | millisecond | Time to complete requests | | configserver.cacheConfigElems | item | Time to complete requests | | configserver.cacheChecksumElems | item | Number of checksum elements in the cache | | configserver.hosts | node | The number of nodes being served configuration from the config server cluster | | configserver.tenants | instance | The number of tenants being served configuration from the config server cluster | | configserver.applications | instance | The number of applications being served configuration from the config server cluster | | configserver.delayedResponses | response | Number of delayed responses | | configserver.sessionChangeErrors | session | Number of session change errors | | configserver.unknownHostRequests | request | Config requests from unknown hosts | | configserver.newSessions | session | New config sessions | | configserver.preparedSessions | session | Prepared config sessions | | configserver.activeSessions | session | Active config sessions | | configserver.inactiveSessions | session | Inactive config sessions | | configserver.addedSessions | session | Added config sessions | | configserver.removedSessions | session | Removed config sessions | | configserver.rpcServerWorkQueueSize | item | Number of elements in the RPC server work queue | | maintenanceDeployment.transientFailure | operation | Number of maintenance deployments that failed with a transient failure | | maintenanceDeployment.failure | operation | Number of maintenance deployments that failed with a permanent failure | | maintenance.successFactorDeviation | fraction | Configserver: Maintenance Success Factor Deviation | | maintenance.duration | millisecond | Configserver: Maintenance Duration | | configserver.zkConnectionLost | connection | Number of ZooKeeper connections lost | | configserver.zkReconnected | connection | Number of ZooKeeper reconnections | | configserver.zkConnected | node | Number of ZooKeeper nodes connected | | configserver.zkSuspended | node | Number of ZooKeeper nodes suspended | | configserver.zkZNodes | node | Number of ZooKeeper nodes present | | configserver.zkAvgLatency | millisecond | Average latency for ZooKeeper requests | | configserver.zkMaxLatency | millisecond | Max latency for ZooKeeper requests | | configserver.zkConnections | connection | Number of ZooKeeper connections | | configserver.zkOutstandingRequests | request | Number of ZooKeeper requests in flight | | orchestrator.lock.acquire-latency | second | Time to acquire zookeeper lock | | orchestrator.lock.acquire-success | operation | Number of times zookeeper lock has been acquired successfully | | orchestrator.lock.acquire-timedout | operation | Number of times zookeeper lock couldn't be acquired within timeout | | orchestrator.lock.acquire | operation | Number of attempts to acquire zookeeper lock | | orchestrator.lock.acquired | operation | Number of times zookeeper lock was acquired | | orchestrator.lock.hold-latency | second | Time zookeeper lock was held before it was released | | nodes.active | node | The number of active nodes in a cluster | | nodes.nonActive | node | The number of non-active nodes in a cluster | | nodes.nonActiveFraction | node | The fraction of non-active nodes vs total nodes in a cluster | | nodes.exclusiveSwitchFraction | fraction | The fraction of nodes in a cluster on exclusive network switches | | nodes.emptyExclusive | node | The number of exclusive hosts that do not have any nodes allocated to them | | nodes.expired.deprovisioned | node | The number of deprovisioned nodes that have expired | | nodes.expired.dirty | node | The number of dirty nodes that have expired | | nodes.expired.inactive | node | The number of inactive nodes that have expired | | nodes.expired.provisioned | node | The number of provisioned nodes that have expired | | nodes.expired.reserved | node | The number of reserved nodes that have expired | | cluster.cost | dollar\_per\_hour | The cost of the nodes allocated to a certain cluster, in $/hr | | cluster.load.ideal.cpu | fraction | The ideal cpu load of a certain cluster | | cluster.load.ideal.memory | fraction | The ideal memory load of a certain cluster | | cluster.load.ideal.disk | fraction | The ideal disk load of a certain cluster | | cluster.load.peak.cpu | fraction | The peak cpu load in the period considered of a certain cluster | | cluster.load.peak.memory | fraction | The peak memory load in the period considered of a certain cluster | | cluster.load.peak.disk | fraction | The peak disk load in the period considered of a certain cluster | | zone.working | binary | The value 1 if zone is considered healthy, 0 if not. This is decided by considering the number of non-active nodes vs the number of active nodes in a zone | | cache.nodeObject.hitRate | fraction | The fraction of cache hits vs cache lookups for the node object cache | | cache.nodeObject.evictionCount | item | The number of cache elements evicted from the node object cache | | cache.nodeObject.size | item | The number of cache elements in the node object cache | | cache.curator.hitRate | fraction | The fraction of cache hits vs cache lookups for the curator cache | | cache.curator.evictionCount | item | The number of cache elements evicted from the curator cache | | cache.curator.size | item | The number of cache elements in the curator cache | | wantedRestartGeneration | generation | Wanted restart generation for tenant node | | currentRestartGeneration | generation | Current restart generation for tenant node | | wantToRestart | binary | One if node wants to restart, zero if not | | wantedRebootGeneration | generation | Wanted reboot generation for tenant node | | currentRebootGeneration | generation | Current reboot generation for tenant node | | wantToReboot | binary | One if node wants to reboot, zero if not | | retired | binary | One if node is retired, zero if not | | wantedVespaVersion | version | Wanted vespa version for the node, in the form MINOR.PATCH. Major version is not included here | | currentVespaVersion | version | Current vespa version for the node, in the form MINOR.PATCH. Major version is not included here | | wantToChangeVespaVersion | binary | One if node want to change Vespa version, zero if not | | hasWireguardKey | binary | One if node has a WireGuard key, zero if not | | wantToRetire | binary | One if node wants to retire, zero if not | | wantToDeprovision | binary | One if node wants to be deprovisioned, zero if not | | failReport | binary | One if there is a fail report for the node, zero if not | | suspended | binary | One if the node is suspended, zero if not | | suspendedSeconds | second | The number of seconds the node has been suspended | | activeSeconds | second | The number of seconds the node has been active | | numberOfServicesUp | instance | The number of services confirmed to be running on a node | | numberOfServicesNotChecked | instance | The number of services supposed to run on a node, that has not checked | | numberOfServicesDown | instance | The number of services confirmed to not be running on a node | | someServicesDown | binary | One if one or more services has been confirmed to not run on a node, zero if not | | numberOfServicesUnknown | instance | The number of services the config server does not know is running on a node | | nodeFailerBadNode | binary | One if the node is failed due to being bad, zero if not | | downInNodeRepo | binary | One if the node is registered as being down in the node repository, zero if not | | numberOfServices | instance | Number of services supposed to run on a node | | lockAttempt.acquireMaxActiveLatency | second | Maximum duration for keeping a lock, ending during the metrics snapshot, or still being kept at the end or this snapshot period | | lockAttempt.acquireHz | operation\_per\_second | Average number of locks acquired per second the snapshot period | | lockAttempt.acquireLoad | operation | Average number of locks held concurrently during the snapshot period | | lockAttempt.lockedLatency | second | Longest lock duration in the snapshot period | | lockAttempt.lockedLoad | operation | Average number of locks held concurrently during the snapshot period | | lockAttempt.acquireTimedOut | operation | Number of locking attempts that timed out during the snapshot period | | lockAttempt.deadlock | operation | Number of lock grab deadlocks detected during the snapshot period | | lockAttempt.errors | operation | Number of other lock related errors detected during the snapshot period | | hostedVespa.docker.totalCapacityCpu | vcpu | Total number of VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.totalCapacityMem | gigabyte | Total amount of memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.totalCapacityDisk | gigabyte | Total amount of disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityCpu | vcpu | Total number of free VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityMem | gigabyte | Total amount of free memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.freeCapacityDisk | gigabyte | Total amount of free disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityCpu | vcpu | Total number of allocated VCPUs on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityMem | gigabyte | Total amount of allocated memory on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.docker.allocatedCapacityDisk | gigabyte | Total amount of allocated disk space on tenant hosts managed by hosted Vespa in a zone | | hostedVespa.pendingRedeployments | task | The number of hosted Vespa re-deployments pending | | hostedVespa.docker.skew | fraction | A number in the range 0..1 indicating how well allocated resources are balanced with availability on hosts | | hostedVespa.activeHosts | host | The number of managed hosts that are in state "active" | | hostedVespa.breakfixedHosts | host | The number of managed hosts that are in state "breakfixed" | | hostedVespa.deprovisionedHosts | host | The number of managed hosts that are in state "deprovisioned" | | hostedVespa.dirtyHosts | host | The number of managed hosts that are in state "dirty" | | hostedVespa.failedHosts | host | The number of managed hosts that are in state "failed" | | hostedVespa.inactiveHosts | host | The number of managed hosts that are in state "inactive" | | hostedVespa.parkedHosts | host | The number of managed hosts that are in state "parked" | | hostedVespa.provisionedHosts | host | The number of managed hosts that are in state "provisioned" | | hostedVespa.readyHosts | host | The number of managed hosts that are in state "ready" | | hostedVespa.reservedHosts | host | The number of managed hosts that are in state "reserved" | | hostedVespa.activeNodes | host | The number of managed nodes that are in state "active" | | hostedVespa.breakfixedNodes | host | The number of managed nodes that are in state "breakfixed" | | hostedVespa.deprovisionedNodes | host | The number of managed nodes that are in state "deprovisioned" | | hostedVespa.dirtyNodes | host | The number of managed nodes that are in state "dirty" | | hostedVespa.failedNodes | host | The number of managed nodes that are in state "failed" | | hostedVespa.inactiveNodes | host | The number of managed nodes that are in state "inactive" | | hostedVespa.parkedNodes | host | The number of managed nodes that are in state "parked" | | hostedVespa.provisionedNodes | host | The number of managed nodes that are in state "provisioned" | | hostedVespa.readyNodes | host | The number of managed nodes that are in state "ready" | | hostedVespa.reservedNodes | host | The number of managed nodes that are in state "reserved" | | overcommittedHosts | host | The number of hosts with over-committed resources | | spareHostCapacity | host | The number of spare hosts | | throttledHostFailures | host | Number of host failures stopped due to throttling | | throttledNodeFailures | host | Number of node failures stopped due to throttling | | nodeFailThrottling | binary | Metric indicating when node failure throttling is active. The value 1 means active, 0 means inactive | | clusterAutoscaled | operation | Number of times a cluster has been rescaled by the autoscaler | | clusterAutoscaleDuration | second | The currently predicted duration of a rescaling of this cluster | | deployment.prepareMillis | millisecond | Duration of deployment preparations | | deployment.activateMillis | millisecond | Duration of deployment activations | | throttledHostProvisioning | binary | Value 1 if host provisioning is throttled, 0 if not | Copyright © 2025 - [Cookie Preferences](#) --- ## Configuration Server ### Configuration Servers Vespa Configuration Servers host the endpoint where application packages are deployed - and serves generated configuration to all services - see the [overview](/en/overview.html) and [application packages](/en/application-packages.html) for details. #### Configuration Servers Vespa Configuration Servers host the endpoint where application packages are deployed - and serves generated configuration to all services - see the [overview](/en/overview.html) and [application packages](/en/application-packages.html) for details. I.e. one cannot configure Vespa without config servers, and services cannot run without it. It is useful to understand the [Vespa start sequence](/en/operations-selfhosted/config-sentinel.html). Refer to the sample applications [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) and [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) for practical examples of multi-configserver configuration. Vespa configuration is set up using one or more configuration servers (config servers). A config server uses [Apache ZooKeeper](https://zookeeper.apache.org/) as a distributed data storage for the configuration system. In addition, each node runs a config proxy to cache configuration data - find an overview at [services start](/en/operations-selfhosted/config-sentinel.html). ##### Status and config generation Check the health of a running config server using (replace localhost with hostname): ``` $ curl http://localhost:19071/state/v1/health ``` Note that the config server is a service is itself, and runs with file-based configuration. The application packages deployed will not change the config server - the config server serves this configuration to all other Vespa nodes. This will hence always be config generation 0: ``` $ curl http://localhost:19071/state/v1/config ``` Details in [start-configserver](https://github.com/vespa-engine/vespa/blob/master/configserver/src/main/sh/start-configserver). ##### Redundancy The config servers are defined in [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables), [services.xml](/en/reference/services.html) and [hosts.xml](/en/reference/hosts.html): ``` $ VESPA_CONFIGSERVERS=myserver0.mydomain.com,myserver1.mydomain.com,myserver2.mydomain.com ``` ``` ``` ``` ``` ``` ``` admin0 admin1 admin2 ``` ``` [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) must be set on all nodes. This is a comma- or whitespace-separated list with the hostname of all config servers, like _myhost1.mydomain.com,myhost2.mydomain.com,myhost3.mydomain.com_. When there are multiple config servers, the [config proxy](/en/operations-selfhosted/config-proxy.html) will pick a config server randomly (to achieve load balancing between config servers). The config proxy is fault-tolerant and will switch to another config server (if there is more than one) if the one it is using becomes unavailable or there is an error in the configuration it receives. For the system to tolerate _n_ failures, [ZooKeeper](#zookeeper) by design requires using _(2\*n)+1_ nodes. Consequently, only an odd numbers of nodes is useful, so you need minimum 3 nodes to have a fault-tolerant config system. Even when using just one config server, the application will work if the server goes down (but deploying application changes will not work). Since the _config proxy_ runs on every node and caches configs, it will continue to serve config to the services on that node. However, restarting a node when config servers are unavailable means that services on the node will be unable to start since the cache will be destroyed when restarting the config proxy. Refer to the [admin model reference](/en/reference/services-admin.html#configservers) for more details on _services.xml_. ##### Start sequence To bootstrap a Vespa application instance, the high-level steps are: - Start config servers - Deploy config - Start Vespa nodes [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) is a great guide on how to start a multinode Vespa application instance - try this first. Detailed steps for config server startup: 1. Set [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) on all nodes, using fully qualified hostnames and the same value on all nodes, including the config servers. 2. Start the config server on the nodes configured in _services/hosts.xml_. Make sure the startup is successful by inspecting [/state/v1/health](/en/reference/state-v1.html#state-v1-health), default on port 19071: ``` $ curl http://localhost:19071/state/v1/health ``` ``` ``` { "time" : 1651147368066, "status" : { "code" : "up" }, "metrics" : { "snapshot" : { "from" : 1.651147308063E9, "to" : 1.651147367996E9 } } } ``` ``` If there is no response on the health API, two things can have happened: - The config server process did not start - inspect logs using `vespa-logfmt`, or check _$VESPA\_HOME/logs/vespa/vespa.log_, normally _/opt/vespa/logs/vespa/vespa.log_. - The config server process started, and is waiting for [Zookeeper quorum](#zookeeper): ``` $ vespa-logfmt -S configserver ``` ``` configserver Container.com.yahoo.vespa.zookeeper.ZooKeeperRunner Starting ZooKeeper server with /opt/vespa/var/zookeeper/conf/zookeeper.cfg. Trying to establish ZooKeeper quorum (members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)configserver Container.com.yahoo.container.handler.threadpool.ContainerThreadpoolImpl Threadpool 'default-pool': min=12, max=600, queue=0 configserver Container.com.yahoo.vespa.config.server.tenant.TenantRepository Adding tenant 'default', created 2022-04-28T13:02:24.182Z. Bootstrapping in PT0.175576S configserver Container.com.yahoo.vespa.config.server.rpc.RpcServer Rpc server will listen on port 19070 configserver Container.com.yahoo.container.jdisc.state.StateMonitor Changing health status code from 'initializing' to 'up' configserver Container.com.yahoo.jdisc.http.server.jetty.Janitor Creating janitor executor with 2 threads configserver Container.com.yahoo.jdisc.http.server.jetty.JettyHttpServer Threadpool size: min=22, max=22 configserver Container.org.eclipse.jetty.server.Server jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 11.0.14.1+1- configserver Container.org.eclipse.jetty.server.handler.ContextHandler Started o.e.j.s.ServletContextHandler@341c0dfc{19071,/,null,AVAILABLE} configserver Container.org.eclipse.jetty.server.AbstractConnector Started configserver@3cd6d147{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:19071} configserver Container.org.eclipse.jetty.server.Server Started @21955ms configserver Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components.Application config generation: 0 ``` It will hang until quorum is reached, and the second highlighted log line is emitted. Root causes for missing quorum can be: - No connectivity between the config servers. Zookeeper logs the members like `(members: [node0.vespanet, node1.vespanet, node2.vespanet], attempt 1)`. Verify that the nodes running config server can reach each other on port 2181. - No connectivity can be wrong network config. [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) uses a docker network, make sure there are no underscores in the hostnames. 3. Once all config servers return `up` on _state/v1/health_, an application package can be deployed. This means, if deploy fails, it is always a good idea to verify the config server health first - if config servers are up, and deploy fails, it is most likely an issue with the application package - if so, refer to [application packages](/en/application-packages.html). 4. A successful deployment logs the following, for the _prepare_ and _activate_ steps: ``` Container.com.yahoo.vespa.config.server.ApplicationRepository Session 2 prepared successfully. Container.com.yahoo.vespa.config.server.deploy.Deployment Session 2 activated successfully using no host provisioner. Config generation 2. File references: [file '9cfc8dc57f415c72'] Container.com.yahoo.vespa.config.server.session.SessionRepository Session activated: 2 ``` 5. Start the Vespa nodes. Technically, they can be started at any time. When troubleshooting, it is easier to make sure the config servers are started successfully, and deployment was successful - before starting any other nodes. Refer to the [Vespa start sequence](/en/operations-selfhosted/config-sentinel.html) and [Vespa start / stop / restart](/en/operations-selfhosted/admin-procedures.html#vespa-start-stop-restart). Make sure to look for logs on all config servers when debugging. ##### Scaling up Add a config server node for increased fault tolerance or when replacing a node. Read up on [ZooKeeper configuration](#zookeeper-configuration) before continuing. Although it is _possible_ to add more than one config server at a time, doing it one by one is recommended, to keep the ZooKeeper quorum intact. Due to the ZooKeeper majority vote, use one or three config servers. 1. Install _vespa_ on new config server node. 2. Append the config server node's hostname to VESPA\_CONFIGSERVERS on all nodes, then (re)start all config servers in sequence to update the ZooKeeper config. By appending, the current config server nodes keep their current ZooKeeper index. Restart the existing config server(s) first. Config server will log which servers are configured when starting up to vespa log. 3. Update _services.xml_ and _hosts.xml_ with the new set of config servers, then _vespa prepare_ and _vespa activate_. 4. Restart other nodes one by one to start using the new config servers. This will let the vespa nodes use the updated set of config servers. The config servers will automatically redistribute the application data to new nodes. ##### Scaling down This is the inverse of scaling up, and the procedure is the same. Remove config servers from the end of _VESPA\_CONFIGSERVERS_, and here one can remove two nodes in one go, if going from three to one. ##### Replacing nodes - Make sure to replace only one node at a time. - If you have only one config server you need to first scale up with a new node, then scale down by removing the old node. - If you have 3 or more you can replace one of the old nodes in VESPA\_CONFIGSERVERS with the new one instead of adding one, otherwise same procedure as in [Scaling up](#scaling-up). Repeat for each node you want to replace. ##### Tools Tools to access config: - [vespa-get-config](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-get-config) - [vespa-configproxy-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-configproxy-cmd) - [Config API](/en/reference/config-rest-api-v2.html) ##### ZooKeeper [ZooKeeper](https://zookeeper.apache.org/) handles data consistency across multiple config servers. The config server Java application runs a ZooKeeper server, embedded with an RPC frontend that the other nodes use. ZooKeeper stores data internally in _nodes_ that can have _sub-nodes_, similar to a file system. At [vespa prepare](/en/application-packages.html#deploy), the application's files, along with global configurations, are stored in ZooKeeper. The application data is stored under _/config/v2/tenants/default/sessions/[sessionid]/userapp_. At [vespa activate](/en/application-packages.html#deploy), the newest application is activated _live_ by writing the session id into _/config/v2/tenants/default/applications/default:default:default_. It is at that point the other nodes get configured. Use _vespa-zkcli_ to inspect state, replace with actual session id: ``` $ vespa-zkcli ls /config/v2/tenants/default/sessions/sessionid/userapp $ vespa-zkcli get /config/v2/tenants/default/sessions/sessionid/userapp/services.xml ``` The ZooKeeper server logs to _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log (files are rotated with sequence number)_ ###### ZooKeeper configuration The members of the ZooKeeper cluster is generated based on the contents of [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables). _$VESPA\_HOME/var/zookeeper/conf/zookeeper.cfg_ is written when (re)starting the config server. Hence, config server(s) must all be restarted when `VESPA_CONFIGSERVERS` changes. The order of the nodes is used to create indexes in _zookeeper.cfg_, do not change node order. ###### ZooKeeper recovery If the config server(s) should experience data corruption, for instance a hardware failure, use the following recovery procedure. One example of such a scenario is if _$VESPA\_HOME/logs/vespa/zookeeper.configserver.0.log_ says _java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method)_, which indicates ZooKeeper has not been able to recover after a full disk. There is no need to restart Vespa on other nodes during the procedure: 1. [vespa-stop-configserver](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-stop-configserver) 2. [vespa-configserver-remove-state](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-configserver-remove-state) 3. [vespa-start-configserver](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-start-configserver) 4. [vespa](/en/vespa-cli.html#deployment) prepare \ 5. [vespa](/en/vespa-cli.html#deployment) activate This procedure completely cleans out ZooKeeper's internal data snapshots and deploys from scratch. Note that by default the [cluster controller](../content/content-nodes.html#cluster-controller) that maintains the state of the content cluster will use the shared same ZooKeeper instance, so the content cluster state is also reset when removing state. Manually set state will be lost (e.g. a node with user state _down_). It is possible to run cluster-controllers in standalone zookeeper mode - see [standalone-zookeeper](/en/reference/services-admin.html#cluster-controllers). ###### ZooKeeper barrier timeout If the config servers are heavily loaded, or the applications being deployed are big, the internals of the server may time out when synchronizing with the other servers during deploy. To work around, increase the timeout by setting: [VESPA\_CONFIGSERVER\_ZOOKEEPER\_BARRIER\_TIMEOUT](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) to 600 (seconds) or higher, and restart the config servers. ##### Configuration To access config from a node not running the config system (e.g. doing feeding via the Document API), use the environment variable [VESPA\_CONFIG\_SOURCES](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables): ``` $ export VESPA_CONFIG_SOURCES="myadmin0.mydomain.com:19071,myadmin1.mydomain.com:19071" ``` Alternatively, for Java programs, use the system property _configsources_ and set it programmatically or on the command line with the _-D_ option to Java. The syntax for the value is the same as for _VESPA\_CONFIG\_SOURCES_. ###### System requirements The minimum heap size for the JVM it runs under is 128 Mb and max heap size is 2 GB (which can be changed with a [setting](/en/performance/container-tuning.html#config-server-and-config-proxy)). It writes a transaction log that is regularly purged of old items, so little disk space is required. Note that running on a server that has a lot of disk I/O will adversely affect performance and is not recommended. ###### Ports The config server RPC port can be changed by setting [VESPA\_CONFIGSERVER\_RPC\_PORT](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) on all nodes in the system. Changing HTTP port requires changing the port in _$VESPA\_HOME/conf/configserver-app/services.xml_: ``` ``` ``` ``` When deploying, use the _-p_ option, if port is changed from the default. ##### Troubleshooting | Problem | Description | | --- | --- | | Health checks | Verify that a config server is up and running using [/state/v1/health](/en/reference/state-v1.html#state-v1-health), see [start sequence](#start-sequence). Status code is `up` if the server is up and has finished bootstrapping. Alternatively, use [http://localhost:19071/status.html](http://localhost:19071/status.html) which will return response code 200 if server is up and has finished bootstrapping. Metrics are found at [/state/v1/metrics](/en/reference/state-v1.html#state-v1-metrics). Use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) to find host and port number, port is 19071 by default. | | Consistency | When having more than one config server, consistency between the servers is crucial. [http://localhost:19071/status](http://localhost:19071/status) can be used to check that settings for config servers are the same for all servers. [vespa-config-status](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-config-status) can be used to check config on nodes. [http://localhost:19071/application/v2/tenant/default/application/default](http://localhost:19071/application/v2/tenant/default/application/default) displays active config generation and should be the same on all servers, and the same as in response from running [vespa deploy](/en/vespa-cli.html#deployment) | | Bad Node | If running with more than one config server and one of these goes down or has hardware failure, the cluster will still work and serve config as usual (clients will switch to use one of the good servers). It is not necessary to remove a bad server from the configuration. Deploying applications will take longer, as [vespa deploy](/en/vespa-cli.html#deployment) will not be able to complete a deployment on all servers when one of them is down. If this is troublesome, lower the [barrier timeout](#zookeeper-barrier-timeout) - (default value is 120 seconds). Note also that if you have not configured [cluster controllers](/en/reference/services-admin.html#cluster-controller) explicitly, these will run on the config server nodes and the operation of these might be affected. This is another reason for not trying to manually remove a bad node from the config server setup. | | Stuck filedistribution | The config system distributes binary files (such as jar bundle files) using [file-distribution](/en/application-packages.html#file-distribution) - use [vespa-status-filedistribution](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-status-filedistribution) to see detailed status if it gets stuck. | | Memory | Insufficient memory on the host / in the container running the config server will cause startup or deploy / configuration problems - see [Docker containers](/en/operations-selfhosted/docker-containers.html). | | ZooKeeper | The following can be caused by a full disk on the config server, or clocks out of sync: ``` at com.yahoo.vespa.zookeeper.ZooKeeperRunner.startServer(ZooKeeperRunner.java:92) Caused by: java.io.IOException: The accepted epoch, 10 is less than the current epoch, 48 ``` Users have reported that "Copying the currentEpoch to acceptedEpoch fixed the problem". | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Status and config generation](#status-and-config-generation) - [Redundancy](#redundancy) - [Start sequence](#start-sequence) - [Scaling up](#scaling-up) - [Scaling down](#scaling-down) - [Replacing nodes](#replacing-nodes) - [Tools](#tools) - [ZooKeeper](#zookeeper) - [ZooKeeper configuration](#zookeeper-configuration) - [ZooKeeper recovery](#zookeeper-recovery) - [ZooKeeper barrier timeout](#zookeeper-barrier-timeout) - [Configuration](#configuration) - [System requirements](#system-requirements) - [Ports](#ports) - [Troubleshooting](#troubleshooting) --- ## Configuring Components ### Configuring Java components Any Java component might require some sort of configuration, be it simple strings or integers, or more complex structures. #### Configuring Java components Any Java component might require some sort of configuration, be it simple strings or integers, or more complex structures. Because of all the boilerplate code that commonly goes into classes to hold such configuration, this often degenerates into a collection of key-value string pairs (e.g. [javax.servlet.FilterConfig](https://docs.oracle.com/javaee/6/api/javax/servlet/FilterConfig.html)). To avoid this, Vespa has custom, type-safe configuration to all [Container](jdisc/) components. Get started with the [Developer Guide](developer-guide.html), try the [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java) sample application. Configurable components in short: - Create a [config definition](reference/config-files.html#config-definition-files) file - Use the Vespa [bundle plugin](components/bundles.html#maven-bundle-plugin) to generate a config class from the definition - Inject config objects in the application code The application code is interfacing with config through the generated code — code and config is always in sync. This configuration should be used for all state which is assumed to stay constant for the _lifetime of the component instance_. Use [deploy](applications.html) to push and activate code and config changes. ##### Config definition Write a [config definition](reference/config-files.html#config-definition-files) file and place it in the application's `src/main/resources/configdefinitions/` directory, e.g. `src/main/resources/configdefinitions/my-component.def`: ``` package=com.mydomain.mypackage myCode int default=42 myMessage string default="" ``` ##### Generating config classes Generating config classes is done by the _bundle plugin_: ``` $ mvn generate-resources ``` The generated the config classes are written to `target/generated-sources/vespa-configgen-plugin/`. In the above example, the config definition file was named _my-component.def_ and its package declaration is _com.mydomain.mypackage_. The full name of the generated java class will be _com.mydomain.mypackage.MyComponentConfig_ It is a good idea to generate the config classes first, _then_ resolve dependencies and compile in the IDE. ##### Using config in code The generated config class is now available for the component through [constructor injection](jdisc/injecting-components.html), which means that the component can declare the generated class as one of its constructor arguments: ``` package com.mydomain.mypackage; public class MyComponent { private final int code; private final String message; @Inject public MyComponent(MyComponentConfig config) { code = config.myCode(); message = config.myMessage(); } } ``` The Container will create and inject the config instance. To override the default values of the config, [specify](reference/config-files.html#generic-configuration-in-services-xml) values in `src/main/application/services.xml`, like: ``` 132 Hello, World! ``` and the deployed instance of `MyComponent` is constructed using a corresponding instance of `MyComponentConfig`. ##### Unit testing configurable components The generated config class provides a builder API that makes it easy to create config objects for unit testing. Example that sets up a unit test for the `MyComponent` class from the example above: ``` import static com.mydomain.mypackage.MyComponentConfig.*; public class MyComponentTest { @Test public void requireThatMyComponentGetsConfig() { MyComponentConfig config = new MyComponentConfig.Builder() .myCode(668) .myMessage("Neighbour of the beast") .build(); MyComponent component = new MyComponent(config); … } } ``` The config class used here is simple — see a separate example of [building a complex configuration object](unit-testing.html#unit-testing-configurable-components). ##### Adding files to the component configuration This section describes what to do if the component needs larger configuration objects that are stored in files, e.g. machine-learned models, [automata](/en/operations/tools.html#vespa-makefsa) or large tables. Before proceeding, take a look at how to create [provider components](jdisc/injecting-components.html#special-components) — instead of integrating large objects into e.g. a searcher or processor, it might be better to split the resource-demanding part of the component's configuration into a separate provider component. The procedure described below can be applied to any component type. Files can be transferred using either [file distribution](applications.html#file-distribution) or URL download. File distribution is used when the files are added to the application package. If for some reason this is not convenient, e.g. due to size, origin of file or update frequency, Vespa can download the file and make it available for the component. Both types are set up in the config definition file. File distribution uses the `path` config type, and URL downloading the `url` type. You can also use the `model` type for machine-learned models that can be referenced by both model-id, used on Vespa Cloud, and url/path, used on self-hosted deployments. See [the config file reference](reference/config-files.html) for details. In the following example we will show the usage of all three types. Assume this config definition, named `my-component.def`: ``` package=com.mydomain.mypackage myFile path myUrl url myModel model ``` The file must reside in the application package, and the path (relative to the application package root) must be given in the component's configuration in `services.xml`: ``` my-files/my-file.txt https://docs.vespa.ai/en/reference/query-api-reference.html ``` An example component that uses these files: ``` package com.mydomain.mypackage; import java.io.File; public class MyComponent { private final File fileFromFileDistribution; private final File fileFromUrlDownload; public MyComponent(MyComponentConfig config) { pathFromFileDistribution = config.myFile(); fileFromUrlDownload = config.myUrl(); modelFilePath = config.myModel(); } } ``` The `myFile()` and `myModel()` getter returns a `java.nio.Path` object, while the `myUrl()` getter returns a `java.io.File` object. The container framework guarantees that these files are fully present at the given location before the component constructor is invoked, so they can always be accessed right away. When the client asks for config that uses the `url` or `model` config type with a URL, the content will be downloaded and cached on the nodes that need it. If you want to change the content, the application package needs to be updated with a new URL for the changed content and the application [deployed](applications.html), otherwise the cached content will still be used. This avoids unintended changes to the application if the content of a URL changes. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Config definition](#config-definition) - [Generating config classes](#generate-config-class) - [Using config in code](#use-config-in-code) - [Unit testing configurable components](#unit-testing-configurable-components) - [Adding files to the component configuration](#adding-files-to-the-component-configuration) --- ## Consistency ### Vespa Consistency Model Vespa offers configurable data redundancy with eventual consistency across replicas. #### Vespa Consistency Model Vespa offers configurable data redundancy with eventual consistency across replicas. It's designed for high efficiency under workloads where eventual consistency is an acceptable tradeoff. This document aims to go into some detail on what these tradeoffs are, and what you, as a user, can expect. ###### Vespa and CAP Vespa may be considered a limited subset of AP under the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem). Under CAP, there is a fundamental limitation of whether any distributed system can offer guarantees on consistency (C) or availability (A) in scenarios where nodes are partitioned (P) from each other. Since there is no escaping that partitions can and will happen, we talk either of systems that are _either_ CP or AP. Consistency (C) in CAP implies that reads and writes are strongly consistent, i.e. the system offers_linearizability_. Weaker forms such as causal consistency or "read your writes" consistency is _not_ sufficient. As mentioned initially, Vespa is an eventually consistent data store and therefore does not offer this property. In practice, Consistency requires the use of a majority consensus algorithm, which Vespa does not currently use. Availability (A) in CAP implies that _all requests_ receive a non-error response regardless of how the network may be partitioned. Vespa is dependent on a centralized (but fault tolerant) node health checker and coordinator. A network partition may take place between the coordinator and a subset of nodes. Operations to nodes in this subset aren't guaranteed to succeed until the partition heals. As a consequence, Vespa is not _guaranteed_ to be strongly available, so we treat this as a "limited subset" of AP (though this is not technically part of the CAP definition). In _practice_, the best-effort semantics of Vespa have proven to be both robust and highly available in common datacenter networks. ###### Write durability and consistency When a client receives a successful [write](../reads-and-writes.html) response, the operation has been written and synced to disk. The replication level is configurable. Operations are by default written on _all_ available replica nodes before sending a response. "Available" here means being Up in the [cluster state](content-nodes.html#cluster-state), which is determined by the fault-tolerant, centralized Cluster Controller service. If a cluster has a total of 3 nodes, 2 of these are available and the replication factor is 3, writes will be ACKed to the client if both the available nodes ACK the operation. On each replica node, operations are persisted to a write-ahead log before being applied. The system will automatically recover after a crash by replaying logged operations. Writes are guaranteed to be synced to durable storage prior to sending a successful response to the client, so acknowledged writes are retained even in the face of sudden power loss. If a client receives a failure response for a write operation, the operation may or may not have taken place on a subset of the replicas. If not all replicas could be written to, they are considered divergent (out of sync). The system detects and reconciles divergent replicas. This happens without any required user intervention. Each document write assigns a new wall-clock timestamp to the resulting document version. As a consequence, configure servers with NTP to keep clock drift as small as possible. Large clock drifts may result in timestamp collisions or unexpected operation orderings. Vespa has support for conditional writes for individual documents through test-and-set operations. Multi-document transactions are not supported. After a successful response, changes to the search indexes are immediately visible by default. ###### Read consistency Reads are consistent on a best-effort basis and are not guaranteed to be linearizable. When using a [Get](../reference/document-v1-api-reference.html#get) or [Visit](../visiting.html) operation, the client will never observe a partially updated document. For these read operations, writes behave as if they are atomic. Searches may observe partial updates, as updates are not atomic across index structures. This can only happen _after_ a write has started, but _before_ it's complete. Once a write is complete, all index updates are visible. Searches may observe transient loss of coverage when nodes go down. Vespa will restore coverage automatically when this happens. How fast this happens depends on the configured [searchable-copies](../reference/services-content.html#searchable-copies) value. If replicas diverge during a Get, Vespa performs a read-repair. This fetches the requested document from all divergent replicas. The client then receives the version with the newest timestamp. If replicas diverge during a Visit, the behavior is slightly different between the Document V1 API and [vespa-visit](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-visit): - Document V1 will prefer immediately visiting the replica that contains the most documents. This means it's possible for a subset of documents in a bucket to not be returned. - `vespa-visit` will by default retry visiting the bucket until it is in sync. This may take a long time if large parts of the system are out of sync. The rationale for this difference in behavior is that Document V1 is usually called in a real-time request context, whereas `vespa-visit` is usually called in a background/batch processing context. Visitor operations iterate over the document corpus in an implementation-specific order. Any given document is returned in the state it was in at the time the visitor iterated over the data bucket containing the document. This means there is _no snapshot isolation_—a document mutation happening concurrently with a visitor may or may not be reflected in the returned document set, depending on whether the mutation happened before or after iteration of the bucket containing the document. ###### Replica reconciliation Reconciliation is the act of bringing divergent replicas back into sync. This usually happens after a node restarts or fails. It will also happen after network partitions. Unlike several other eventually consistent databases, Vespa doesn't use distributed replica operation logs. Instead, reconciling replicas involves exchanging sets of timestamped documents. Reconciliation is complete once the union set of documents is present on all replicas. Metadata is checksummed to determine whether replicas are in sync with each other. When reconciling replicas, the newest available version of a document will "win" and become visible. This version may be a remove (tombstone). Tombstones are replicated in the same way as regular documents. Reconciliation happens the document level, not at the field level. I.e. there is no merging of individual fields across different versions. If a test-and-set operation updates at least one replica, it will eventually become visible on the other replicas. The reconciliation operation is referred to as a "merge" in the rest of the Vespa documentation. Tombstone entries have a configurable time-to-live before they are compacted away. Nodes that have been partitioned away from the network for a longer period of time than this TTL should ideally have their indexes removed before being allowed back into the cluster. If not, there is a risk of resurrecting previously removed documents. Vespa does not currently detect or handle this scenario automatically. See the documentation on [data-retention-vs-size](/en/operations-selfhosted/admin-procedures.html#data-retention-vs-size). ###### Q/A ###### How does Vespa perform read-repair for Get-operations, and how many replicas are consulted? When the distributor process that is responsible for a particular data bucket receives a Get operation, it checks its locally cached replica metadata state for inconsistencies. If all replicas have consistent metadata, the operation is routed to a single replica—preferably located on the same host as the distributor, if present. This is the normal case when the bucket replicas are in sync. If there is at least one replica metadata mismatch, the distributor automatically initiates a read-repair process: 1. The distributor splits the bucket replicas into subsets based on their metadata, where all replicas in each subset have the same metadata. It then sends a lightweight metadata-only Get to one replica in each subset. The core assumption is that all these replicas have the same set of document versions, and that it suffices to consult one replica in the set. If a metadata read fails, the distributor will automatically fail over to another replica in the subset. 2. It then sends one full Get to a node in the replica set that returned the _highest_timestamp. This means that if you have 100 replicas and 1 has different metadata from the remaining 99, only 2 nodes in total will be initially queried, and only 1 will receive the actual (full) Get read. Similar algorithms are used by other operations that may trigger read/write-repair. ###### Since Vespa performs read-repair when inconsistencies are detected, does this mean replies are strongly consistent? Unfortunately not. Vespa does not offer any cross-document transactions, so in this case strong consistency implies single-object _linearizability_ (as opposed to_strict serializability_ across multiple objects). Linearizability requires the ability to reach a majority consensus amongst a particular known and stable configuration of replicas (side note: replica sets can be reconfigured in strongly consistent algorithms like Raft and Paxos, but such a reconfiguration must also be threaded through the consensus machinery). The active replica set for a given data bucket (and thus the documents it logically contains) is ephemeral and dynamic based on the nodes that are currently available in the cluster (as seen from the cluster controller). This precludes having a stable set of replicas that can be used for reaching majority decisions. See also [Vespa and CAP](#vespa-and-cap). ###### In what kind of scenario might Vespa return a stale version of a document? Stale document versions may be returned when all replicas containing the most recent document version have become unavailable. Example scenario (for simplicity—but without loss of generality—assuming redundancy 1) in a cluster with two nodes {A, B}: 1. Document X is stored in a replica on node A with timestamp 100. 2. Node A goes down; node B takes over ownership. 3. A write request is received for document X; it is stored on node B with timestamp 200 and ACKed to the client. 4. Node B goes down. 5. Node A comes back up. 6. A read request arrives for document X. The only visible replica is on node A, which ends up serving the request. 7. The document version at timestamp 100 is returned to the client. Since the write at `t=200` _happens-after_ the write at `t=100`, returning the version at`t=100` violates linearizability. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Vespa and CAP](#vespa-and-cap) - [Write durability and consistency](#write-durability-and-consistency) - [Read consistency](#read-consistency) - [Replica reconciliation](#replica-reconciliation) - [Q/A](#qa) --- ## Constant Tensor Json Format ### Constant Tensor JSON Format This document describes with examples the JSON formats accepted when reading tensor constants from a file. #### Constant Tensor JSON Format This document describes with examples the JSON formats accepted when reading tensor constants from a file. For convenience, compactness, and readability there are various formats that can be used depending on the detailed tensor type: - [Dense tensors](#dense-tensors): indexed dimensions only - [Sparse tensors](#sparse-tensors): mapped dimensions only - [Mixed tensors](#mixed-tensors): both indexed and mapped dimensions ##### Canonical type A tensor type can be declared with its dimension in any order, but internally they will always be sorted in alphabetical order. So the type "`tensor(category{}, brand{}, a[3], x[768], d0[1])`" has the canonical string representation "`tensor(a[3],brand{},category{},d0[1],x[768])`" and the "x" dimension with size 768 is the innermost. For constants, all indexed dimensions must have a known size. ##### Dense tensors Tensors using only indexed dimensions are used for storing a vector, a matrix, and so on and are collectively known as "dense" tensors. These are particularly easy to handle, as they always have a known number of cells in a well-defined order. They can be input as nested arrays of numerical values. Example with vector of size 5: ``` { "type": "tensor(x[5])", "values": [13.25, -22, 0.4242, 0, -17.0] } ``` The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to "Indexed tensors short form" in the [document JSON format](document-json-format.html#tensor-short-form-indexed). Example of a 3x4 matrix; note that the dimension names will always be processed in [alphabetical order](#canonical-type) from outermost to innermost. ``` { "type": "tensor(bar[3],foo[4])", "values": [ [2.5, 1.0, 2.0, 3.0], [1.0, 2.0, 3.0, 2.0], [2.0, 3.0, 2.0, 1.5] ] } ``` Note that the arrays must have exactly the declared number of elements for each dimension, and be correctly nested. Example of an ONNX model input where we have an extra "batch" dimension which is unused (size 1) for this particular input, but still requires extra brackets: ``` { "type": "tensor(d0[1],d1[5],d2[2])", "values": [ [ [1.1, 1.2], [2.1, 2.2], [3.1, 3.2], [4.1, 4.2], [5.1, 5.2] ] ] } ``` ##### Sparse tensors Tensors using only mapped dimensions are collectively known as "sparse" tensors. JSON input for these will list the cells directly. Tensors with only one mapped dimension can use as simple JSON object as input: ``` { "type": "tensor(category{})", "cells": { "tag": 2.5, "another": 2.75 } } ``` The "type" field is optional. This format is similar to "Short form for tensors with a single mapped dimension" in the [document JSON format](document-json-format.html#tensor-short-form-mapped). Tensors with multiple mapped dimensions must use an array of objects, where each object has an "address" containing the labels for all dimensions, and a "value" with the cell value: ``` { "type": "tensor(category{},product{})", "cells": [ { "address": { "category": "foo", "product": "bar" }, "value": 1.5 }, { "address": { "category": "qux", "product": "zap" }, "value": 3.5 }, { "address": { "category": "pop", "product": "rip" }, "value": 6.5 } ] } ``` Again, the "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is also known as the [general verbose form](document-json-format.html#tensor), and it's possible to use it for any tensor type. ##### Mixed tensors Tensors with both mapped and indexed dimensions can use a "blocks" format; this is similar to the "cells" formats for sparse tensors, but instead of a single cell value you get a block of values for each address. With one mapped dimension and two indexed dimensions: ``` { "type": "tensor(a{},x[3],y[4])", "blocks": { "bar": [ [1.0, 2.0, 0.0, 3.0], [2.0, 2.5, 2.0, 0.5], [3.0, 6.0, 9.0, 9.0] ], "foo": [ [1.0, 0.0, 2.0, 3.0], [2.0, 2.5, 2.0, 0.5], [3.0, 3.0, 6.0, 9.0] ] } } ``` The "type" field is optional, but must match the [canonical form of the tensor type](#canonical-type) if present. This format is similar to the first variant of "Mixed tensors short form" in the [document JSON format](document-json-format.html#tensor-short-form-mixed). With two mapped dimensions and one indexed dimensions: ``` { "type": "tensor(a{},b{},x[3])", "blocks": [ { "address": { "a": "qux", "b": "zap" }, "values": [2.5, 3.5, 4.5] }, { "address": { "a": "foo", "b": "bar" }, "values": [1.5, 2.5, 3.5] }, { "address": { "a": "pop", "b": "rip" }, "values": [3.5, 4.5, 5.5] } ] } ``` Again, the "type" field is optional. This format is similar to the second variant of "Mixed tensors short form" in the [document JSON format](document-json-format.html#tensor-short-form-mixed). Copyright © 2025 - [Cookie Preferences](#) --- ## Container Components ### Container Components This document explains the common concepts necessary to develop all types of Container components. #### Container Components This document explains the common concepts necessary to develop all types of Container components. A basic knowledge of the Vespa Container is required. All components must extend a base class from the Container code module. For example, searchers must extend the class `com.yahoo.search.Searcher`. The main available component types are: - [processors](processing.html) - [searchers](../searcher-development.html) - [document processors](../document-processing.html) - [search result renderers](../result-rendering.html) - [provider components](injecting-components.html#special-components). Searchers and document processors belong to a subclass of components called [chained components](../components/chained-components.html). For an introduction to how the different component types interact, refer to the [overview of component types](../reference/component-reference.html#component-types). The components of the search container are usually deployed as part of an [OSGi bundle](../components/bundles.html). Build the bundles using maven and the [bundle plugin](../components/bundles.html#maven-bundle-plugin). Refer to the [multiple-bundles sample app](https://github.com/vespa-engine/sample-apps/tree/master/examples/multiple-bundles) for a multi-bundle example. ##### Concurrency Components will be executed concurrently by multiple threads. This places an important constraint on all component classes: _non-final instance variables are not safe._ They must be eliminated, or made thread-safe somehow. ##### Resource management Components that use threads, file handles or other native resources that needs to be released when the component falls out of scope, must override a method called `deconstruct`. Here is an example implementation from a component that uses a thread pool named 'executor': ``` @Override public void deconstruct() { super.deconstruct(); try { executor.shutdown(); executor.awaitTermination(10, TimeUnit.SECONDS); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } ``` Note that it is always advisable to call the super-method first. Also see [SharedResource.java](https://github.com/vespa-engine/vespa/blob/master/jdisc_core/src/main/java/com/yahoo/jdisc/SharedResource.java) for how to configure [debug options](../reference/services-container.html#jvm) for use in tools like YourKit. This can be used to track component lifetime / (de)construction issues, e.g.: ``` ``` Read more in [container profiling](../performance/profiling.html#profiling-the-query-container). ##### Dependency injection The components might need to access resources, such as other components or config. These are injected directly into the constructor. The following types of constructor dependencies are allowed: - [Config objects](../configuring-components.html) - [Other components](injecting-components.html) - [The Linguistics library](../linguistics.html) - [System info](#the-systeminfo-injectable-component) The [Component Reference](../reference/component-reference.html#injectable-components) contains a complete list of built-in injectable components. If your component class needs more than one public constructor, the one to be used by the container must be annotated with `@com.yahoo.component.annotation.Inject` from [annotations](https://search.maven.org/artifact/com.yahoo.vespa/annotations). ###### The SystemInfo Injectable Component This component provides information about the environment that the component is running in, for example - The zone in the Vespa Cloud, if applicable. - The number of nodes in the container cluster, and their indices. - The index of the node this is running on. The two latter can be used e.g. for [bucket testing](../testing.html#feature-switches-and-bucket-tests) new features on a subset of nodes. Please note that the node indices are not necessarily contiguous or starting from zero. ##### Deploying a Component The container will create one or more instances of the component, as specified in [the application package](#adding-component-to-application-package). The container will create a new instance of this component only when it is reconfigured, so any data needed by the component can be read and prepared from a constructor in the component. See the full API available to components at the [Container Javadoc](https://javadoc.io/doc/com.yahoo.vespa/container-core/latest/com/yahoo/container/package-summary.html). Once the component passes unit tests, it can be deployed. The steps involved are building the component jar file, adding it to the Vespa application package and deploying the application package. These steps are described in the following sections, using a searcher as example. ###### Building the Plugin .jar To build the plugin jar, call `mvn install` in the project directory. It can then be found in the target directory, and will have the suffix _-deploy.jar_. Assume for the rest of the document that the artifactId is `com.yahoo.search.example.SimpleSearcher` and the version is `1.0`. The plugin built will then have the name _com.yahoo.search.example.SimpleSearcher-1.0-deploy.jar_. ###### Adding the Plugin to the Vespa Application Package The previous step should produce a plugin jar file, which may now be deployed to Vespa by adding it to an [application package](../applications.html): A directory containing at minimum _hosts.xml_ and _services.xml_. - put `com.yahoo.search.example.SimpleSearcher-1.0-deploy.jar` in the `components/` directory under the application package root - modify [services.xml](../reference/services.html) to include the Searcher To include the searcher, define a search chain and add the searcher to it. Example: ``` ``` The searcher id above is resolved to the plugin jar we added by the `Bundle-SymbolicName` ([a field in the manifest of the jar file](../components/bundles.html)), which is determined by the `artifactId`, and to the right class within the bundle by the class name. By keeping the `searcher id`, `class name` and `artifactId` the same, we keep things simple, but more advanced use where this is possible is also supported. This will be explained in later sections. For a reference to these tags, see [the search chains reference](../reference/services-search.html#chain). Example `hosts.xml`: ``` node1 ``` By creating a directory containing this `services.xml`, `hosts.xml` and `components/com.yahoo.search.example.SimpleSearcher-1.0-deploy.jar`, that directory becomes a complete application package containing a bundle, which can now be deployed. ###### Deploying the Application Package Set up a Vespa instance using the [quick start](../deploy-an-application-local.html). Once the component and the config are added to the application package, it can be [deployed](../applications.html#deploy) by running `vespa deploy`. These steps will copy any changed bundles to the nodes in the cluster which needs them and switch queries over to running the new component versions. This works safely without requiring any processes to be restarted, even if the application package contains changes to classes which are already running queries. The switch is atomic from the point of view of the query - all queries will execute to completion, either using only the components of the last version of the application package or only the new ones, so interdependent changes in multiple searcher components can be deployed without problems. ###### JNI requires restart The exception to the above is bundles containing JNI packages. There can only be one instance of the native library, so such bundles cannot reload. Best practice is to load the JNI library in the constructor, as this will cause the new bundle _not_ to load, but continue on the current version. A subsequent restart will load the new bundle. This will hence not cause failures. Alternatively, if the JNI library is initialized lazily (e.g. on first invocation), bundle reloads will succeed, but subsequent invocations of code using the JNI library will fail. Hence, the new version will run, but fail. A warning is issued in the log when deploying rather than the normal _Switching to the latest deployed set of handlers_ - example: ``` [2016-09-21 14:22:05.387] WARNING : container stderr Cannot load mylib native library ``` To minimize restarts, it is recommended to put JNI components in minimal, separate bundles. This will prevent reload of the JNI-bundles, unless the JNI-bundle itself is changed. ###### Monitoring the active Application All containers also provide a built-in handler that outputs JSON formatted information about the active application, including its components and chains (it can also be configured to show [a user-defined version](../reference/application-packages-reference.html#versioning-application-packages)). The handler answers to requests with the path `/ApplicationStatus`. For example, if 'localhost' runs a container with HTTP configured on port 8080: ``` http://localhost:8080/ApplicationStatus ``` ###### Including third-party libraries External dependencies [can be included into the bundle](../components/bundles.html#maven-bundle-plugin). ###### Exporting, importing and including packages in bundles [OSGi features information hiding - by default all the classes used inside a bundle are invisible from the outside.](../components/bundles.html) ###### Global and exported packages The JDisc Container has one set of _global_ packages. These are packages that are available with no import, and constitutes the supported API of the JDisc Container. Backwards incompatible changes are not made to these packages. There is also a set of _exported_ packages. These are available for import, and includes all legacy packages, plus extension packages which are not part of the core API. Note that these are not considered to be "public" APIs, as global packages are, and backwards incompatible changes _can_ be made to these packages, or they may be removed. The list of exported and global packages is available in the [container-core pom.xml](https://github.com/vespa-engine/vespa/blob/master/container-core/pom.xml), in `project/properties/exportedPackages` and `project/properties/globalPackages`. ###### Versions All the elements of the search container which may be referenced by an id may be _versioned_, that includes chains, components and query profiles. This allows multiple versions of these elements to be used at the same time, including multiple versions of the same classes, which is handy for [bucket testing](../testing.html#feature-switches-and-bucket-tests) new versions. An id or id reference may include a version by using the following syntax: `name:version`. This works with ids in search requests, services.xml, code and query profiles. A version has the format: ``` major.minor.micro.qualifier ``` where major, minor and micro are integers and qualifier is a string. Any right-hand portion of the version string may be skipped. In _versions_, skipped values mean "0" (and _empty_ for the qualifier). In _version references_ skipped values means "unspecified". Any unspecified number will be matched to the highest number available, while a qualifier specified _must_ be matched exactly if it is specified (qualifiers are rarely used). To specify the version of a bundle, specify version in pom.xml (we recommend not using _qualifier_): ``` com.yahoo.example MyPlugin major.minor.micro ``` This will automatically be used to set the `Bundle-Version` in the bundle manifest. For more details, see [component versioning](../reference/component-reference.html#component-versioning). ##### Troubleshooting ###### Container start If there is some error in the application package, it will usually be detected during the `vespa prepare` step and cause an error message. However, some classes of errors are only detected once the application is deployed. When redeploying an application, it is therefore recommended watching the vespa log by running: ``` vespa-logfmt -N ``` The new application is active after the INFO message: ``` Switched to the latest deployed set of handlers...; ``` If this message does not appear after a reasonable amount of time after completion of `vespa activate`, one will see some errors or warnings instead, that will help debug the application. ###### Component load At deployment or container start, components are constructed. Construction can fail - to debug, enable more logging (replace "container" as needed with container id): ``` $[vespa-logctl](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-logctl)container:com.yahoo.container.di.componentgraph.core.ComponentNode debug=on .com.yahoo.container.di.componentgraph.core.ComponentNode ON ON ON ON ON ON ON OFF ``` Look for "Constructing" and "Finished constructing" in _vespa.log_ - this identifies components that did not construct. Model downloading failures look like the below and are caused by a fail to download the model to the container: ``` ERROR container Container.com.yahoo.jdisc.core.StandaloneMain JDisc exiting: Throwable caught: exception= java.lang.RuntimeException: Not able to create config builder for payload '{ "tokenizerPath": "\\"\\" https://huggingface.co/Snowflake/snowflake-arctic-embed-l/raw/main/tokenizer.json \\"\\"", "transformerModel": "\\"\\" https://huggingface.co/Snowflake/snowflake-arctic-embed-l/resolve/main/onnx/model_int8.onnx \\"\\"", "transformerMaxTokens": 512, "transformerInputIds": "input_ids", "transformerAttentionMask": "attention_mask", "transformerTokenTypeIds": "token_type_ids", "transformerOutput": "last_hidden_state", "normalize": true, "poolingStrategy": "cls", "transformerExecutionMode": "sequential", "transformerInterOpThreads": 1, "transformerIntraOpThreads": -4, "transformerGpuDevice": 0 } ``` Check urls / names, and that the model can be downloaded in the network the Vespa Container is running. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Concurrency](#concurrency) - [Resource management](#resource-management) - [Dependency injection](#dependency-injection) - [The SystemInfo Injectable Component](#the-systeminfo-injectable-component) - [Deploying a Component](#deploying-a-component) - [Building the Plugin .jar](#building-the-plugin-jar) - [Adding the Plugin to the Vespa Application Package](#adding-component-to-application-package) - [Deploying the Application Package](#deploying-the-application-package) - [Including third-party libraries](#including-third-party-libraries) - [Exporting, importing and including packages in bundles](#exporting-importing-and-including-packages-from-bundles) - [Global and exported packages](#global-and-exported-packages) - [Versions](#versions) - [Troubleshooting](#troubleshooting) - [Container start](#container-start) - [Component load](#component-load) --- ## Container Http ### HTTP Performance Testing of the Container using Gatling For container testing, more flexibility and more detailed checking than straightforward saturating an interface with HTTP requests is often required. #### HTTP Performance Testing of the Container using Gatling For container testing, more flexibility and more detailed checking than straightforward saturating an interface with HTTP requests is often required. The stress test tool [Gatling](https://gatling.io/) provides such capabilities in a flexible manner with the possibility of writing arbitrary plug-ins and a DSL for the most common cases. This document shows how to get started using Gatling with Vespa. Experienced Gatling users should find there is nothing special with testing Vespa versus other HTTP services. ##### Install Gatling Refer to Gatling's [documentation for getting started](https://gatling.io/docs/gatling/reference/current/), or simply get the newest version from the[Gatling front page](https://gatling.io/), unpack the tar ball and jump straight into it. The tool runs happily from the directory created when unpacking it. This tutorial is written with Gatling 2 in mind. ##### Configure the First Test with a Query Log Refer to the Gatling documentation on how to set up the recorder. This tool acts as a browser proxy, recording what you do in the browser, allowing you to replay that as a test scenario. After running _bin/recorder.sh_ and setting package to _com.vespa.example_and class name to _VespaTutorial_, running a simple query against your node _mynode_ (running e.g.[album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)), should create a basic simulation looking something like the following in_user-files/simulations/com/vespa/example/VespaTutorial.scala_: ``` package com.vespa.example import io.gatling.core.Predef._ import io.gatling.core.session.Expression import io.gatling.http.Predef._ import io.gatling.jdbc.Predef._ import io.gatling.http.Headers.Names._ import io.gatling.http.Headers.Values._ import scala.concurrent.duration._ import bootstrap._ import assertions._ class VespaTutorial extends Simulation { val httpProtocol = http .baseURL("http://mynode:8080") .acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") .acceptEncodingHeader("gzip, deflate") .connection("keep-alive") .userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0") val headers_1 = Map("""Cache-Control""" -> """max-age=0""") val scn = scenario("Scenario Name") .exec(http("request_1") .get("""/search/?query=bad""") .headers(headers_1)) setUp(scn.inject(atOnce(1 user))).protocols(httpProtocol) } ``` Running a single query over and over again is not useful, so we have a tiny query log in a CSV file we want to run in our test,_user-files/data/userinput.csv_: ``` userinput bad religion bad lucky oops radiohead bad jackson ``` As usual for CSV files, the first line names the parameters. A literal comma may be escaped with backslash as "\,". Gatling takes hand of URL quoting, there is no need to e.g. encode space as "%20". Add a feeder: ``` package com.vespa.example import io.gatling.core.Predef._ import io.gatling.core.session.Expression import io.gatling.http.Predef._ import io.gatling.jdbc.Predef._ import io.gatling.http.Headers.Names._ import io.gatling.http.Headers.Values._ import scala.concurrent.duration._ import bootstrap._ import assertions._ class VespaTutorial extends Simulation { val httpProtocol = http .baseURL("http://mynode:8080") .acceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8") .acceptEncodingHeader("gzip, deflate") .connection("keep-alive") .userAgentHeader("Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0") val headers_1 = Map("""Cache-Control""" -> """max-age=0""") val scn = scenario("Scenario Name") .feed(csv("userinput.csv").random) .exec(http("request_1") .get("/search/") .queryParam("query", "${userinput}") .headers(headers_1)) setUp(scn.inject(constantRate(100 usersPerSec) during (10 seconds))) .protocols(httpProtocol) } ``` Now, we have done a couple of changes to the original scenario. First, we have added the feeder. Since we do not have enough queries available for running long enough to get a scenario for some traffic, we chose the "random" strategy. This means a random user input string will be chosen for each invocation, and it might be reused. Also, we have changed how the test is run, from just a single query, into a constant rate of 100 users for 10 seconds. We should expect something as close as possible to 100 QPS in our test report. ##### Running a Benchmark We now have something we can run both on a headless node and on a personal laptop, sample run output: ``` $ ./bin/gatling.sh GATLING_HOME is set to ~/tmp/gatling-charts-highcharts-2.0.0-M3a Choose a simulation number: [0] advanced.AdvancedExampleSimulation [1] basic.BasicExampleSimulation [2] com.vespa.example.VespaTutorial 2 Select simulation id (default is 'vespatutorial'). Accepted characters are a-z, A-Z, 0-9, - and _ Select run description (optional) Simulation com.vespa.example.VespaTutorial started... ================================================================================ 2014-04-09 11:54:33 0s elapsed ---- Scenario Name ------------------------------------------------------------- [-] 0% waiting: 998 / running: 2 / done:0 ---- Requests ------------------------------------------------------------------ > Global (OK=0 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:38 5s elapsed ---- Scenario Name ------------------------------------------------------------- [####################################] 49% waiting: 505 / running: 0 / done:495 ---- Requests ------------------------------------------------------------------ > Global (OK=495 KO=0 ) > request_1 (OK=495 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:43 10s elapsed ---- Scenario Name ------------------------------------------------------------- [#########################################################################] 99% waiting: 8 / running: 0 / done:992 ---- Requests ------------------------------------------------------------------ > Global (OK=992 KO=0 ) > request_1 (OK=992 KO=0 ) ================================================================================ ================================================================================ 2014-04-09 11:54:43 10s elapsed ---- Scenario Name ------------------------------------------------------------- [##########################################################################]100% waiting: 0 / running: 0 / done:1000 ---- Requests ------------------------------------------------------------------ > Global (OK=1000 KO=0 ) > request_1 (OK=1000 KO=0 ) ================================================================================ Simulation finished. Generating reports... Parsing log file(s)... Parsing log file(s) done ================================================================================ ---- Global Information -------------------------------------------------------- > numberOfRequests 1000 (OK=1000 KO=0 ) > minResponseTime 10 (OK=10 KO=- ) > maxResponseTime 30 (OK=30 KO=- ) > meanResponseTime 10 (OK=10 KO=- ) > stdDeviation 2 (OK=2 KO=- ) > percentiles1 10 (OK=10 KO=- ) > percentiles2 10 (OK=10 KO=- ) > meanNumberOfRequestsPerSecond 99 (OK=99 KO=- ) ---- Response Time Distribution ------------------------------------------------ > t < 800 ms 1000 (100%) > 800 ms < t < 1200 ms 0 ( 0%) > t > 1200 ms 0 ( 0%) > failed 0 ( 0%) ================================================================================ Reports generated in 0s. Please open the following file : ~/tmp/gatling-charts-highcharts-2.0.0-M3a/results/vespatutorial-20140409115432/index.html ``` The report gives graphs showing how the test progressed and summaries for failures and time spent. Copyright © 2025 - [Cookie Preferences](#) --- ## Container Metrics Reference ### Container Metrics | Name | Unit | Description | #### Container Metrics | Name | Unit | Description | | --- | --- | --- | | http.status.1xx | response | Number of responses with a 1xx status | | http.status.2xx | response | Number of responses with a 2xx status | | http.status.3xx | response | Number of responses with a 3xx status | | http.status.4xx | response | Number of responses with a 4xx status | | http.status.5xx | response | Number of responses with a 5xx status | | application\_generation | version | The currently live application config generation (aka session id) | | in\_service | binary | This will have the value 1 if the node is in service, 0 if not. | | jdisc.gc.count | operation | Number of JVM garbage collections done | | jdisc.gc.ms | millisecond | Time spent in JVM garbage collection | | jdisc.jvm | version | JVM runtime version | | cpu | thread | Container service CPU pressure | | jdisc.memory\_mappings | operation | JDISC Memory mappings | | jdisc.open\_file\_descriptors | item | JDISC Open file descriptors | | jdisc.thread\_pool.unhandled\_exceptions | thread | Number of exceptions thrown by tasks | | jdisc.thread\_pool.work\_queue.capacity | thread | Capacity of the task queue | | jdisc.thread\_pool.work\_queue.size | thread | Size of the task queue | | jdisc.thread\_pool.rejected\_tasks | thread | Number of tasks rejected by the thread pool | | jdisc.thread\_pool.size | thread | Size of the thread pool | | jdisc.thread\_pool.max\_allowed\_size | thread | The maximum allowed number of threads in the pool | | jdisc.thread\_pool.active\_threads | thread | Number of threads that are active | | jdisc.deactivated\_containers.total | item | JDISC Deactivated container instances | | jdisc.deactivated\_containers.with\_retained\_refs.last | item | JDISC Deactivated container nodes with retained refs | | jdisc.application.failed\_component\_graphs | item | JDISC Application failed component graphs | | jdisc.application.component\_graph.creation\_time\_millis | millisecond | JDISC Application component graph creation time | | jdisc.application.component\_graph.reconfigurations | item | JDISC Application component graph reconfigurations | | jdisc.singleton.is\_active | item | JDISC Singleton is active | | jdisc.singleton.activation.count | operation | JDISC Singleton activations | | jdisc.singleton.activation.failure.count | operation | JDISC Singleton activation failures | | jdisc.singleton.activation.millis | millisecond | JDISC Singleton activation time | | jdisc.singleton.deactivation.count | operation | JDISC Singleton deactivations | | jdisc.singleton.deactivation.failure.count | operation | JDISC Singleton deactivation failures | | jdisc.singleton.deactivation.millis | millisecond | JDISC Singleton deactivation time | | jdisc.http.ssl.handshake.failure.missing\_client\_cert | operation | JDISC HTTP SSL Handshake failures due to missing client certificate | | jdisc.http.ssl.handshake.failure.expired\_client\_cert | operation | JDISC HTTP SSL Handshake failures due to expired client certificate | | jdisc.http.ssl.handshake.failure.invalid\_client\_cert | operation | JDISC HTTP SSL Handshake failures due to invalid client certificate | | jdisc.http.ssl.handshake.failure.incompatible\_protocols | operation | JDISC HTTP SSL Handshake failures due to incompatible protocols | | jdisc.http.ssl.handshake.failure.incompatible\_chifers | operation | JDISC HTTP SSL Handshake failures due to incompatible chifers | | jdisc.http.ssl.handshake.failure.connection\_closed | operation | JDISC HTTP SSL Handshake failures due to connection closed | | jdisc.http.ssl.handshake.failure.unknown | operation | JDISC HTTP SSL Handshake failures for unknown reason | | jdisc.http.request.prematurely\_closed | request | HTTP requests prematurely closed | | jdisc.http.request.requests\_per\_connection | request | HTTP requests per connection | | jdisc.http.request.uri\_length | byte | HTTP URI length | | jdisc.http.request.content\_size | byte | HTTP request content size | | jdisc.http.requests | request | HTTP requests | | jdisc.http.requests.status | request | Number of requests to the built-in status handler | | jdisc.http.filter.rule.blocked\_requests | request | Number of requests blocked by filter | | jdisc.http.filter.rule.allowed\_requests | request | Number of requests allowed by filter | | jdisc.http.filtering.request.handled | request | Number of filtering requests handled | | jdisc.http.filtering.request.unhandled | request | Number of filtering requests unhandled | | jdisc.http.filtering.response.handled | request | Number of filtering responses handled | | jdisc.http.filtering.response.unhandled | request | Number of filtering responses unhandled | | jdisc.http.handler.unhandled\_exceptions | request | Number of unhandled exceptions in handler | | jdisc.tls.capability\_checks.succeeded | operation | Number of TLS capability checks succeeded | | jdisc.tls.capability\_checks.failed | operation | Number of TLS capability checks failed | | jdisc.http.jetty.threadpool.thread.max | thread | Configured maximum number of threads | | jdisc.http.jetty.threadpool.thread.min | thread | Configured minimum number of threads | | jdisc.http.jetty.threadpool.thread.reserved | thread | Configured number of reserved threads or -1 for heuristic | | jdisc.http.jetty.threadpool.thread.busy | thread | Number of threads executing internal and transient jobs | | jdisc.http.jetty.threadpool.thread.idle | thread | Number of idle threads | | jdisc.http.jetty.threadpool.thread.total | thread | Current number of threads | | jdisc.http.jetty.threadpool.queue.size | thread | Current size of the job queue | | jdisc.http.jetty.http\_compliance.violation | failure | Number of HTTP compliance violations | | serverNumOpenConnections | connection | The number of currently open connections | | serverNumConnections | connection | The total number of connections opened | | serverBytesReceived | byte | The number of bytes received by the server | | serverBytesSent | byte | The number of bytes sent from the server | | handled.requests | operation | The number of requests handled per metrics snapshot | | handled.latency | millisecond | The time used for requests during this metrics snapshot | | httpapi\_latency | millisecond | Duration for requests to the HTTP document APIs | | httpapi\_pending | operation | Document operations pending execution | | httpapi\_num\_operations | operation | Total number of document operations performed | | httpapi\_num\_updates | operation | Document update operations performed | | httpapi\_num\_removes | operation | Document remove operations performed | | httpapi\_num\_puts | operation | Document put operations performed | | httpapi\_ops\_per\_sec | operation\_per\_second | Document operations per second | | httpapi\_succeeded | operation | Document operations that succeeded | | httpapi\_failed | operation | Document operations that failed | | httpapi\_parse\_error | operation | Document operations that failed due to document parse errors | | httpapi\_condition\_not\_met | operation | Document operations not applied due to condition not met | | httpapi\_not\_found | operation | Document operations not applied due to document not found | | httpapi\_failed\_unknown | operation | Document operations failed by unknown cause | | httpapi\_failed\_timeout | operation | Document operations failed by timeout | | httpapi\_failed\_insufficient\_storage | operation | Document operations failed by insufficient storage | | httpapi\_queued\_operations | operation | Document operations queued for execution in /document/v1 API handler | | httpapi\_queued\_bytes | byte | Total operation bytes queued for execution in /document/v1 API handler | | httpapi\_queued\_age | second | Age in seconds of the oldest operation in the queue for /document/v1 API handler | | httpapi\_mbus\_window\_size | operation | The window size of Messagebus's dynamic throttle policy for /document/v1 API handler | | mem.heap.total | byte | Total available heap memory | | mem.heap.free | byte | Free heap memory | | mem.heap.used | byte | Currently used heap memory | | mem.direct.total | byte | Total available direct memory | | mem.direct.free | byte | Currently free direct memory | | mem.direct.used | byte | Direct memory currently used | | mem.direct.count | byte | Number of direct memory allocations | | mem.native.total | byte | Total available native memory | | mem.native.free | byte | Currently free native memory | | mem.native.used | byte | Native memory currently used | | athenz-tenant-cert.expiry.seconds | second | Time remaining until Athenz tenant certificate expires | | container-iam-role.expiry.seconds | second | Time remaining until IAM role expires | | peak\_qps | query\_per\_second | The highest number of qps for a second for this metrics snapshot | | search\_connections | connection | Number of search connections | | feed.operations | operation | Number of document feed operations | | feed.latency | millisecond | Feed latency | | feed.http-requests | operation | Feed HTTP requests | | queries | operation | Query volume | | query\_container\_latency | millisecond | The query execution time consumed in the container | | query\_latency | millisecond | The overall query latency as seen by the container | | query\_timeout | millisecond | The amount of time allowed for query execution, from the client | | failed\_queries | operation | The number of failed queries | | degraded\_queries | operation | The number of degraded queries, e.g. due to some content nodes not responding in time | | hits\_per\_query | hit\_per\_query | The number of hits returned | | query\_hit\_offset | hit | The offset for hits returned | | documents\_covered | document | The combined number of documents considered during query evaluation | | documents\_total | document | The number of documents to be evaluated if all requests had been fully executed | | documents\_target\_total | document | The target number of total documents to be evaluated when all data is in sync | | jdisc.render.latency | nanosecond | The time used by the container to render responses | | query\_item\_count | item | The number of query items (terms, phrases, etc.) | | docproc.proctime | millisecond | Time spent processing document | | docproc.documents | document | Number of processed documents | | totalhits\_per\_query | hit\_per\_query | The total number of documents found to match queries | | empty\_results | operation | Number of queries matching no documents | | requestsOverQuota | operation | The number of requests rejected due to exceeding quota | | relevance.at\_1 | score | The relevance of hit number 1 | | relevance.at\_3 | score | The relevance of hit number 3 | | relevance.at\_10 | score | The relevance of hit number 10 | | error.timeout | operation | Requests that timed out | | error.backends\_oos | operation | Requests that failed due to no available backends nodes | | error.plugin\_failure | operation | Requests that failed due to plugin failure | | error.backend\_communication\_error | operation | Requests that failed due to backend communication error | | error.empty\_document\_summaries | operation | Requests that failed due to missing document summaries | | error.illegal\_query | operation | Requests that failed due to illegal queries | | error.invalid\_query\_parameter | operation | Requests that failed due to invalid query parameters | | error.internal\_server\_error | operation | Requests that failed due to internal server error | | error.misconfigured\_server | operation | Requests that failed due to misconfigured server | | error.invalid\_query\_transformation | operation | Requests that failed due to invalid query transformation | | error.results\_with\_errors | operation | The number of queries with error payload | | error.unspecified | operation | Requests that failed for an unspecified reason | | error.unhandled\_exception | operation | Requests that failed due to an unhandled exception | | serverRejectedRequests | operation | Deprecated. Use jdisc.thread\_pool.rejected\_tasks instead. | | serverThreadPoolSize | thread | Deprecated. Use jdisc.thread\_pool.size instead. | | serverActiveThreads | thread | Deprecated. Use jdisc.thread\_pool.active\_threads instead. | | jrt.transport.tls-certificate-verification-failures | failure | TLS certificate verification failures | | jrt.transport.peer-authorization-failures | failure | TLS peer authorization failures | | jrt.transport.server.tls-connections-established | connection | TLS server connections established | | jrt.transport.client.tls-connections-established | connection | TLS client connections established | | jrt.transport.server.unencrypted-connections-established | connection | Unencrypted server connections established | | jrt.transport.client.unencrypted-connections-established | connection | Unencrypted client connections established | | max\_query\_latency | millisecond | Deprecated. Use query\_latency.max instead | | mean\_query\_latency | millisecond | Deprecated. Use the expression (query\_latency.sum / query\_latency.count) instead | | jdisc.http.filter.athenz.accepted\_requests | request | Number of requests accepted by the AthenzAuthorization filter | | jdisc.http.filter.athenz.rejected\_requests | request | Number of requests rejected by the AthenzAuthorization filter | | jdisc.http.filter.athenz.grid\_requests | request | Number of grid requests | | serverConnectionsOpenMax | connection | Maximum number of open connections | | serverConnectionDurationMax | millisecond | Longest duration a connection is kept open | | serverConnectionDurationMean | millisecond | Average duration a connection is kept open | | serverConnectionDurationStdDev | millisecond | Standard deviation of open connection duration | | serverNumRequests | request | Number of requests | | serverNumSuccessfulResponses | request | Number of successful responses | | serverNumFailedResponses | request | Number of failed responses | | serverNumSuccessfulResponseWrites | request | Number of successful response writes | | serverNumFailedResponseWrites | request | Number of failed response writes | | serverTotalSuccessfulResponseLatency | millisecond | Total duration for execution of successful responses | | serverTotalFailedResponseLatency | millisecond | Total duration for execution of failed responses | | serverTimeToFirstByte | millisecond | Time from request has been received by the server until the first byte is returned to the client | | serverStartedMillis | millisecond | Time since the service was started | | embedder.latency | millisecond | Time spent creating an embedding | | embedder.sequence\_length | byte | Size of sequence produced by tokenizer | | jvm.buffer.count | buffer | An estimate of the number of buffers in the pool | | jvm.buffer.memory.used | byte | An estimate of the memory that the Java virtual machine is using for this buffer pool | | jvm.buffer.total.capacity | byte | An estimate of the total capacity of the buffers in this pool | | jvm.classes.loaded | class | The number of classes that are currently loaded in the Java virtual machine | | jvm.classes.unloaded | class | The total number of classes unloaded since the Java virtual machine has started execution | | jvm.gc.concurrent.phase.time | second | Time spent in concurrent phase | | jvm.gc.live.data.size | byte | Size of long-lived heap memory pool after reclamation | | jvm.gc.max.data.size | byte | Max size of long-lived heap memory pool | | jvm.gc.memory.allocated | byte | Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next | | jvm.gc.memory.promoted | byte | Count of positive increases in the size of the old generation memory pool before GC to after GC | | jvm.gc.overhead | percentage | An approximation of the percent of CPU time used by GC activities | | jvm.gc.pause | second | Time spent in GC pause | | jvm.memory.committed | byte | The amount of memory in bytes that is committed for the Java virtual machine to use | | jvm.memory.max | byte | The maximum amount of memory in bytes that can be used for memory management | | jvm.memory.usage.after.gc | percentage | The percentage of long-lived heap pool used after the last GC event | | jvm.memory.used | byte | The amount of used memory | | jvm.threads.daemon | thread | The current number of live daemon threads | | jvm.threads.live | thread | The current number of live threads including both daemon and non-daemon threads | | jvm.threads.peak | thread | The peak live thread count since the Java virtual machine started or peak was reset | | jvm.threads.started | thread | The total number of application threads started in the JVM | | jvm.threads.states | thread | The current number of threads (in each state) | Copyright © 2025 - [Cookie Preferences](#) --- ## Container Tuning ### Container Tuning A collection of configuration parameters to tune the Container as used in Vespa. #### Container Tuning A collection of configuration parameters to tune the Container as used in Vespa. Some configuration parameters have native [services.xml](../application-packages.html) support while others are configured through [generic config overrides](../reference/config-files.html#generic-configuration-in-services-xml). ##### Container worker threads The container uses multiple thread pools for its operations. Most components including request handlers use the container's _default thread pool_, which is controlled by a shared executor instance. Any component can utilize the default pool by injecting an `java.util.concurrent.Executor` instance. Some built-in components have dedicated thread pools - such as the Jetty server and the search handler. These thread pools are injected through special wiring in the config model and are not easily accessible from other components. The thread pools are by default scaled on the system resources as reported by the JVM (`Runtime.getRuntime().availableProcessors()`). It's paramount that the `-XX:ActiveProcessorCount`/`jvm_availableProcessors` configuration is correct for the container to work optimally. The default thread pool configuration can be overridden through services.xml. We recommend you keep the default configuration as it's tuned to work across a variety of workloads. Note that the default configuration and pool usage may change between minor versions. The container will pre-start the minimum number of worker threads, so even an idle container may report running several hundred threads. The thread pool is pre-started with the number of thread specified in the [`threads`](../reference/services-search.html#threadpool-threads) parameter. Note that tuning the capacity upwards increases the risk of high GC pressure as concurrency becomes higher with more in-flight requests. The GC pressure is a function of number of in-flight requests, the time it takes to complete the request and the amount of garbage produced per request. Increasing the queue size will allow the application to handle shorter traffic bursts without rejecting requests, although increasing the average latency for those requests that are queued up. Large queues will also increase heap consumption in overload situations. Extra threads will be created once the queue is full (when [`boost`](../reference/services-search.html#threads.boost) is specified), and are destroyed after an idle timeout. If all threads are occupied, requests are rejected with a 503 response. The effective thread pool configuration and utilization statistics can be observed through the [Container Metrics](/en/operations/metrics.html#container-metrics). See [Thread Pool Metrics](/en/operations/metrics.html#thread-pool-metrics) for a list of metrics exported. **Note:** If the queue size is set to 0 the metric measuring the queue size -`jdisc.thread_pool.work_queue.size` - will instead switch to measure how many threads are active. ###### Recommendation A fixed size pool is preferable for stable latency during peak load, at a cost of a higher static memory load and increased context-switching overhead if excessive number of threads are configured. Variable size pool is mostly beneficial to minimize memory consumption during low-traffic periods, and in general if the size of peak load is somewhat unknown. The downside is that once all core threads are active, latency will increase as additional tasks are queued and launching extra threads is relatively expensive as it involves system calls to the OS. ###### Lower limit The container will override any configuration if the effective value is below a fixed minimum. This is to reduce the risk of certain deadlock scenarios and improve concurrency for low-resource environments. - Minimum 8 threads. - Minimum 650 queue capacity (if queue is not disabled). ###### Example ``` ``` 40 100 1000 1000 ``` ``` ##### Container memory usage > Help, my container nodes are using more than 70% memory! It's common to observe the container process utilizing its maximum configured heap size. This, by itself, is not necessarily an indication of a problem. The Java Virtual Machine (JVM) manages memory within the allocated heap, and it's designed to use as much of it as possible to reduce the frequency of garbage collection. To understand whether enough memory is allocated, look at the garbage collection activity. If GC is running frequently and using significant CPU or causing long pauses, it might indicate that the heap size is too small for the workload. In such cases, consider increasing the maximum heap size. However, if the garbage collector is running infrequently and efficiently, it's perfectly normal for the container to utilize most or all of its allocated heap, and even more (as some memory will also be allocated outside the heap; e.g. direct buffers for efficient data transfer). Vespa exports several metrics to allow you to monitor JVM GC performance, such as [jvm.gc.overhead](../reference/container-metrics-reference.html#jvm_gc_overhead) - if this exceeds 8-10% you should consider increasing heap memory and/or tuning GC settings. ##### JVM heap size Change the default JVM heap size settings used by Vespa to better suit the specific hardware settings or application requirements. By setting the relative size of the total JVM heap in [percentage of available memory](../reference/services-container.html#nodes), one does not know exactly what the heap size will be, but the configuration will be adaptable and ensure that the container can start even in environments with less available memory. The example below allocates 50% of available memory on the machine to the JVM heap: ``` ``` ``` ``` ##### JVM Tuning Use _gc-options_ for controlling GC related parameters and _options_ for tuning other parameters. See [reference documentation](../reference/services-container.html#nodes). Example: Running with 4 GB heap using G1 garbage collector and using NewRatio = 1 (equal size of old and new generation) and enabling verbose GC logging (logged to stdout to vespa.log file). ``` ``` ``` ``` The default heap size with docker image is 1.5g which can for high throughput applications be on the low side, causing frequent garbage collection. By default, the G1GC collector is used. ###### Config Server and Config Proxy The config server and proxy are not executed based on the model in _services.xml_. On the contrary, they are used to bootstrap the services in that model. Consequently, one must use configuration variables to set the JVM parameters for the config server and config proxy. They also need to be restarted (_services_ in the config proxy's case) after a change, but one does _not_ need to _vespa prepare_ or _vespa activate_ first. Example: ``` VESPA_CONFIGSERVER_JVMARGS -Xlog:gc VESPA_CONFIGPROXY_JVMARGS -Xlog:gc -Xmx256m ``` Refer to [Setting Vespa variables](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables). ##### Container warmup Some applications observe that the first queries made to a freshly started container take a long time to complete. This is typically due to some components performing lazy setup of data structures or connections. Lazy initialization should be avoided in favor of eager initialization in component constructor, but this is not always possible. A way to avoid problems with the first queries in such cases is to perform warmup queries at startup. This is done by issuing queries from the constructor of the Handler of regular queries. If using the default handler, [com.yahoo.search.handler.SearchHandler](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/handler/SearchHandler.java), subclass this and configure your subclass as the handler of query requests in _services.xml_. Add a call to a warmupQueries() method as the last line of your handler constructor. The method can look something like this: ``` ``` private void warmupQueries() { String[] requestUris = new String[] {"warmupRequestUri1", "warmupRequestUri2"}; int warmupIterations = 50; for (int i = 0; i < warmupIterations; i++) { for (String requestUri : requestUris) { handle(HttpRequest.createTestRequest(requestUri, com.yahoo.jdisc.http.HttpRequest.Method.GET)); } } } ``` ``` Since these queries will be executed before the container starts accepting external queries, they will cause the first external queries to observe a warmed up container instance. Use [metrics.ignore](../reference/query-api-reference.html#metrics.ignore) in the warmup queries to eliminate them from being reported in metrics. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Container worker threads](#container-worker-threads) - [Recommendation](#recommendation) - [Lower limit](#container-worker-threads-min) - [Example](#container-worker-threads-example) - [Container memory usage](#container-memory-usage) - [JVM heap size](#jvm-heap-size) - [JVM Tuning](#jvm-tuning) - [Config Server and Config Proxy](#config-server-and-config-proxy) - [Container warmup](#container-warmup) --- ## Container ### Container This is the Container service operational guide. #### Container This is the Container service operational guide. ![Vespa Overview](/assets/img/vespa-overview.svg) Note that "container" is an overloaded concept in Vespa - in this guide it refers to service instance nodes in blue. Refer to [container metrics](/en/operations/metrics.html#container-metrics). ##### Endpoints Container service(s) hosts the query and feed endpoints - examples: - [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) configures \_both\_ query and feed in the same container cluster (i.e. service): ``` ``` ``` ``` - [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml) configures query and feed in separate container clusters (i.e. services): ``` ``` ``` ``` Observe that `` and `` are located in separate clusters in the second example, and endpoints are therefore different. **Important:** The first thing to validate when troubleshooting query errors is to make sure that the endpoint is correct, i.e. that query requests hit the correct nodes. A query will be written to the [access log](/en/access-logging.html)on one of the nodes in the container cluster ##### Inspecting Vespa Java Services using JConsole Determine the state of each running Java Vespa service using JConsole. JConsole is distributed along with the Java developer kit. Start JConsole: ``` $ jconsole : ``` where the host and port determine which service to attach to. For security purposes the JConsole tool can not directly attach to Vespa services from external machines. ###### Connecting to a Vespa instance To attach a JConsole to a Vespa service running on another host, create a tunnel from the JConsole host to the Vespa service host. This can for example be done by setting up two SSH tunnels as follows: ``` $ ssh -N -L:localhost: & $ ssh -N -L:localhost: & ``` where port1 and port2 are determined by the type of service (see below). A JConsole can then be attached to the service as follows: ``` $ jconsole localhost: ``` Port numbers: | Service | Port 1 | Port 2 | | --- | --- | --- | | QRS | 19015 | 19016 | | Docproc | 19123 | 19124 | Updated port information can be found by running: ``` $[vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect)service ``` where the resulting RMIREGISTRY and JMX lines determine port1 and port2, respectively. ###### Examining thread states The state of each container is available in JConsole by pressing the Threads tab and selecting the thread of interest in the threads list. Threads of interest includes _search_, _connector_, _closer_, _transport_ and _acceptor_ (the latter four are used for backend communications). Copyright © 2025 - [Cookie Preferences](#) --- ## Content Node Recovery ### Content node recovery In exceptional cases, one or more content nodes may end up with corrupted data causing it to fail to restart. #### Content node recovery In exceptional cases, one or more content nodes may end up with corrupted data causing it to fail to restart. Possible reasons are - the application configuring a higher memory or disk limit such that the node is allowed to accept more data than it can manage, - hardware failure, or - a bug in Vespa. Normally a corrupted node can just be wiped of all data or removed from the cluster, but when this happens simultaneously to multiple nodes, or redundancy 1 is used, it may be necessary to recover the node(s) to avoid data loss. This documents explains the procedure. ##### Recovery steps On each of the nodes needing recovery: 1. [Stop services](/en/operations-selfhosted/admin-procedures.html#vespa-start-stop-restart) on the node if running. 2. Repair the node: - If the node cannot start due to needing more memory than available: Increase the memory available to the node, or if not possible stop all non-essential processes on the node using `vespa-sentinel-cmd list` and `vespa-sentinel-cmd stop [name]`, and (if necessary) start only the content node process using `vespa-sentinel-cmd start searchnode`. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary. - If the node cannot start due to needing more disk than available: Increase the disk available to the node, or if not possible delete non-essential data such as logs and cached packages. When the node is successfully started, issue delete operations or increase the cluster size to reduce the amount of data on the node if necessary. - If the node cannot start for any other reason, repair the data manually as needed. This procedure will depend on the specific nature of the data corruption. 3. [Start services](/en/operations-selfhosted/admin-procedures.html#vespa-start-stop-restart) on the node. 4. Verify that the node is fully up before doing the next node - metrics/interfaces to be used to evaluate if the next node can be stopped: - Check if a node is up using [/state/v1/health](/en/reference/state-v1.html#state-v1-health). - Check the `vds.idealstate.merge_bucket.pending.average` metric on content nodes. When 0, all buckets are in sync - see [example](/en/operations/metrics.html). Copyright © 2025 - [Cookie Preferences](#) --- ## Content Nodes ### Content nodes, states and metrics ![Content cluster overview](/assets/img/elastic-feed.svg) #### Content nodes, states and metrics ![Content cluster overview](/assets/img/elastic-feed.svg) Content cluster processes are _distributor_, _proton_ and _cluster controller_. The distributor calculates the correct content node using the distribution algorithm and the [cluster state](#cluster-state). With no known cluster state, the client library will send requests to a random node, which replies with the updated cluster state if the node was incorrect. Cluster states are versioned, such that clients hitting outdated distributors do not override updated states with old states. The [distributor](#distributor) keeps track of which content nodes that stores replicas of each bucket (maximum one replica each), based on [redundancy](../reference/services-content.html#redundancy) and information from the _cluster controller_. A bucket maps to one distributor only. A distributor keeps a bucket database with bucket metadata. The metadata holds which content nodes store replicas of the buckets, the checksum of the bucket content and the number of documents and meta entries within the bucket. Each document is algorithmically mapped to a bucket and forwarded to the correct content nodes. The distributors detect whether there are enough bucket replicas on the content nodes and add/remove as needed. Write operations wait for replies from every replica and fail if less than redundancy are persisted within timeout. The [cluster controller](#cluster-controller) manages the state of the distributor and content nodes. This _cluster state_ is used by the document processing chains to know which distributor to send documents to, as well as by the distributor to know which content nodes should have which bucket. ##### Cluster state There are three kinds of state: [unit state](../reference/cluster-v2.html#state-unit), [user state](../reference/cluster-v2.html#state-user) and [generated state](../reference/cluster-v2.html#state-generated) (a.k.a. _cluster state_). For new cluster states, the cluster state version is incremented, and the new cluster state is broadcast to all nodes. There is a minimum time between each cluster state change. It is possible to set a minimum capacity for the cluster state to be `up`. If a cluster has so many nodes unavailable that it is considered down, the state of each node is irrelevant, and thus new cluster states will not be created and broadcast before enough nodes are back for the cluster to come back up. A cluster state indicating the entire cluster is down, may thus have outdated data on the node level. ##### Cluster controller The main task of the cluster controller is to maintain the [cluster state](#cluster-state). This is done by _polling_ nodes for state, _generating_ a cluster state, which is then _broadcast_ to all the content nodes in the cluster. Note that clients do not interface with the cluster controller - they get the cluster state from the distributors - [details](#distributor). | Task | Description | | --- | --- | | Node state polling | The cluster controller polls nodes, sending the current cluster state. If the cluster state is no longer correct, the node returns correct information immediately. If the state is correct, the request lingers on the node, such that the node can reply to it immediately if its state changes. After a while, the cluster controller will send a new state request to the node, even with one pending. This triggers a reply to the lingering request and makes the new one linger instead. Hence, nodes have a pending state request. During a controlled node shutdown, it starts the shutdown process by responding to the pending state request that it is now stopping. **Note:** As controlled restarts or shutdowns are implemented as TERM signals from the [config-sentinel](/en/operations-selfhosted/config-sentinel.html), the cluster controller is not able to differ between controlled and other shutdowns. | | Cluster state generation | The cluster controller translates unit and user states into the generated _cluster state_ | | Cluster state broadcast | When node unit states are received, a cluster controller internal cluster state is updated. New cluster states are distributed with a minimum interval between. A grace period per unit state too - e.g., distributors and content nodes that are on the same node often stop at the same time. The version number is incremented, and the new cluster state is broadcast. If cluster state version is [reset](../operations-selfhosted/admin-procedures.html#cluster-state), distributors and content node processes may have to be restarted in order for the system to converge to the new state. Nodes will reject lower cluster state versions to prevent race conditions caused by overlapping cluster controller leadership periods. | See [cluster controller configuration](../operations-selfhosted/admin-procedures.html#cluster-controller-configuration). ###### Master election Vespa can be configured with one cluster controller. Reads and writes will work well in case of cluster controller down, but other changes to the cluster (like a content node going down) will not be handled. It is hence recommended to configure a set of cluster controllers. The cluster controller nodes elect a master, which does the node polling and cluster state broadcast. The other cluster controller nodes only exist to do master election and potentially take over if the master dies. All cluster controllers will vote for the cluster controller with the lowest index that says it is ready. If a cluster controller has more than half of the votes, it will be elected master. As a majority vote is required, the number of cluster controllers should be an odd number of 3 or greater. A fresh master will not broadcast states before a transition time is passed, allowing an old master to have some time to realize it is no longer the master. ##### Distributor Buckets are mapped to distributors using the [ideal state algorithm](idealstate.html). As the cluster state changes, buckets are re-mapped immediately. The mapping does not overlap - a bucket is owned by one distributor. Distributors do not persist the bucket database, the bucket-to-content-node mapping is kept in memory in the distributor. Document count, persisted size and a metadata checksum per bucket is stored as well. At distributor (re)start, content nodes are polled for bucket information, and return which buckets are owned by this distributor (using the ideal state algorithm). There is no centralized bucket directory node. Likewise, at any distributor cluster state change, content nodes are polled for bucket handover - a distributor will then handle a new set of buckets. Document operations are mapped to content nodes based on bucket locations - each put/update/get/remove is mapped to a [bucket](buckets.html)and sent to the right content nodes. To manage the document set as it grows and nodes change, buckets move between content nodes. Document API clients (i.e. container nodes with[\](../reference/services-container.html#document-api)) do not communicate directly with the cluster controller, and do not know the cluster state at startup. Clients therefore start out by sending requests to a random distributor. If the document operation hits the wrong distributor,`WRONG_DISTRIBUTION` is returned, with the current cluster state in the response.`WRONG_DISTRIBUTION` is hence expected and normal at cold start / state change events. ###### Timestamps [Write operations](../reads-and-writes.html)have a _last modified time_ timestamp assigned when passing through the distributor. The timestamp is guaranteed to be unique within the[bucket](buckets.html) where it is stored. The timestamp is used by the content layer to decide which operation is newest. These timestamps can be used when [visiting](../visiting.html), to process/retrieve documents within a given time range. To guarantee unique timestamps, they are in microseconds - the microsecond part is generated to avoid conflicts with other documents. If documents are migrated _between_ clusters, the target cluster will have new timestamps for their entries. Also, when [reprocessing documents](../document-processing.html) _within_ a cluster, documents will have new timestamps, even if not modified. ###### Ordering The Document API uses the [document ID](../documents.html#document-ids) to order operations. A Document API client ensures that only one operation is pending at the same time. This ensures that if a client sends multiple operations for the same document, they will be processed in a defined order. This is done by queueing pending operations _locally_ at the client. **Note:** If sending two write operations to the same document, and the first operation fails, the enqueued operation is sent. In other words, the client does not assume there exists any kind of dependency between separate operations to the same document. If you need to enforce this, use[test-and-set conditions](../document-v1-api-guide.html#conditional-writes)for writes. If _different_ clients have pending operations on the same document, the order is unspecified. ###### Maintenance operations Distributors track which content nodes have which buckets in their bucket database. Distributors then use the [ideal state algorithm](idealstate.html)to generate bucket _maintenance operations_. A stable system has all buckets located per the ideal state: - If buckets have too few replicas, new are generated on other content nodes. - If the replicas differ, a bucket merge is issued to get replicas consistent. - If a buckets has too many replicas, superfluous are deleted. Buckets are merged, if inconsistent, before deletion. - If two buckets exist, such that both may contain the same document, the buckets are split or joined to remove such overlapping buckets. Read more on [inconsistent buckets](buckets.html). - If buckets are too small/large, they will be joined or split. The maintenance operations have different priorities. If no maintenance operations are needed, the cluster is said to be in the _ideal state_. The distributors synchronize maintenance load with user load, e.g. to remap requests to other buckets after bucket splitting and joining. ###### Restart When a distributor stops, it will try to respond to any pending cluster state request first. New incoming requests after shutdown is commenced will fail immediately, as the socket is no longer accepting requests. Cluster controllers will thus detect processes stopping almost immediately. The cluster state will be updated with the new state internally in the cluster controller. Then the cluster controller will wait for maximum [min\_time\_between\_new\_systemstates](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/fleetcontroller.def) before publishing the new cluster state - this to reduce short-term state fluctuations. The cluster controller has the option of setting states to make other distributors take over ownership of buckets, or mask the change, making the buckets owned by the distributor restarting unavailable for the time being. If the distributor transitions from `up` to `down`, other distributors will request metadata from the content nodes to take over ownership of buckets previously owned by the restarting distributor. Until the distributors have gathered this new metadata from all the content nodes, requests for these buckets can not be served, and will fail back to client. When the restarting node comes back up and is marked `up` in the cluster state again, the additional nodes will discard knowledge of the extra buckets they previously acquired. For requests with timeouts of several seconds, the transition should be invisible due to automatic client resending. Requests with a lower timeout might fail, and it is up to the application whether to resend or handle failed requests. Requests to buckets not owned by the restarting distributor will not be affected. ##### Content node The content node runs _proton_, which is the query backend. ###### Restart When a content node does a controlled restart, it marks itself in the `stopping` state and rejects new requests. It will process its pending request queue before shutting down. Consequently, client requests are typically unaffected by content node restarts. The currently pending requests will typically be completed. New copies of buckets will be created on other nodes, to store new requests in appropriate redundancy. This happens whether node transitions through `down` or `maintenance` state. The difference being that if transitioning through `maintenance`, the distributor will not start any effort of synchronizing new copies with existing copies. They will just store the new requests until the maintenance node comes back up. When starting, content nodes will start with gathering information on what buckets it has data stored for. While this is happening, the service layer will expose that it is `down`. ##### Metrics | Metric | Description | | --- | --- | | .idealstate.idealstate\_diff | This metric tries to create a single value indicating distance to the ideal state. A value of zero indicates that the cluster is in the ideal state. Graphed values of this metric gives a good indication for how fast the cluster gets back to the ideal state after changes. Note that some issues may hide other issues, so sometimes the graph may appear to stand still or even go a bit up again, as resolving one issue may have detected one or several others. | | .idealstate.buckets\_toofewcopies | Specifically lists how many buckets have too few copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. | | .idealstate.buckets\_toomanycopies | Specifically lists how many buckets have too many copies. Compare to the _buckets_ metric to see how big a portion of the cluster this is. | | .idealstate.buckets | The total number of buckets managed. Used by other metrics reporting bucket counts to know how big a part of the cluster they relate to. | | .idealstate.buckets\_notrusted | Lists how many buckets have no trusted copies. Without trusted buckets operations against the bucket may have poor performance, having to send requests to many copies to try and create consistent replies. | | .idealstate.delete\_bucket.pending | Lists how many buckets that needs to be deleted. | | .idealstate.merge\_bucket.pending | Lists how many buckets there are, where we suspect not all copies store identical document sets. | | .idealstate.split\_bucket.pending | Lists how many buckets are currently being split. | | .idealstate.join\_bucket.pending | Lists how many buckets are currently being joined. | | .idealstate.set\_bucket\_state.pending | Lists how many buckets are currently altered for active state. These are high priority requests which should finish fast, so these requests should seldom be seen as pending. | Example, using the [quickstart](../deploy-an-application-local.html) - find the distributor port (look for HTTP): ``` $ docker exec vespa vespa-model-inspect service distributor distributor @ vespa-container : content music/distributor/0 tcp/vespa-container:19112 (MESSAGING) tcp/vespa-container:19113 (STATUS RPC) tcp/vespa-container:19114 (STATE STATUS HTTP) ``` Get the metric value: ``` $ docker exec vespa curl -s http://localhost:19114/state/v1/metrics | jq . | \ grep -A 10 idealstate.merge_bucket.pending "name": "vds.idealstate.merge_bucket.pending", "description": "The number of operations pending", "values": { "average": 0, "sum": 0, "count": 1, "rate": 0.016666, "min": 0, "max": 0, "last": 0 }, ``` ##### /cluster/v2 API examples Examples of state manipulation using the [/cluster/v2 API](../reference/cluster-v2.html). List content clusters: ``` $ curl http://localhost:19050/cluster/v2/ ``` ``` ``` { "cluster": { "music": { "link": "/cluster/v2/music" }, "books": { "link": "/cluster/v2/books" } } } ``` ``` Get cluster state and list service types within cluster: ``` $ curl http://localhost:19050/cluster/v2/music ``` ``` ``` { "state": { "generated": { "state": "state-generated", "reason": "description" } } "service": { "distributor": { "link": "/cluster/v2/music/distributor" }, "storage": { "link": "/cluster/v2/music/storage" } } } ``` ``` List nodes per service type for cluster: ``` $ curl http://localhost:19050/cluster/v2/music/storage ``` ``` ``` { "node": { "0": { "link": "/cluster/v2/music/storage/0" }, "1": { "link": "/cluster/v2/music/storage/1" } } } ``` ``` Get node state: ``` $ curl http://localhost:19050/cluster/v2/music/storage/0 ``` ``` ``` { "attributes": { "hierarchical-group": "group0" }, "state": { "generated": { "state": "up", "reason": "" }, "unit": { "state": "up", "reason": "" }, "user": { "state": "up", "reason": "" } }, "metrics": { "bucket-count": 0, "unique-document-count": 0, "unique-document-total-size": 0 } } ``` ``` Get all nodes, including topology information (see `hierarchical-group`): ``` $ curl http://localhost:19050/cluster/v2/music/?recursive=true ``` ``` ``` { "state": { "generated": { "state": "up", "reason": "" } }, "service": { "storage": { "node": { "0": { "attributes": { "hierarchical-group": "group0" }, "state": { "generated": { "state": "up", "reason": "" }, "unit": { "state": "up", "reason": "" }, "user": { "state": "up", "reason": "" } }, "metrics": { "bucket-count": 0, "unique-document-count": 0, "unique-document-total-size": 0 } ``` ``` Set node user state: ``` curl -X PUT -H "Content-Type: application/json" --data ' { "state": { "user": { "state": "retired", "reason": "This node will be removed soon" } } }' \ http://localhost:19050/cluster/v2/music/storage/0 ``` ``` ``` { "wasModified": true, "reason": "ok" } ``` ``` ##### Further reading - Refer to [administrative procedures](../operations-selfhosted/admin-procedures.html) for configuration and state monitoring / management. - Try the [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample app to get familiar with interfaces and behavior. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Cluster state](#cluster-state) - [Cluster controller](#cluster-controller) - [Master election](#master-election) - [Distributor](#distributor) - [Timestamps](#timestamps) - [Ordering](#ordering) - [Maintenance operations](#maintenance-operations) - [Restart](#distributor-restart) - [Content node](#content-node) - [Restart](#content-node-restart) - [Metrics](#metrics) - [/cluster/v2 API examples](#cluster-v2-API-examples) - [Further reading](#further-reading) --- ## Contributing ### Contributing to Vespa Contributions to [Vespa](http://github.com/vespa-engine/vespa)and the [Vespa documentation](http://github.com/vespa-engine/documentation)are welcome. #### Contributing to Vespa Contributions to [Vespa](http://github.com/vespa-engine/vespa)and the [Vespa documentation](http://github.com/vespa-engine/documentation)are welcome. This document tells you what you need to know to contribute. ##### Open development All work on Vespa happens directly on GitHub, using the [GitHub flow model](https://docs.github.com/en/get-started/quickstart/github-flow). We release the master branch a few times a week, and you should expect it to almost always work. In addition to the [builds seen on factory.vespa.ai](https://factory.vespa.ai)we have a large acceptance and performance test suite which is also run continuously. ###### Pull requests All pull requests are reviewed by a member of the Vespa Committers team. You can find a suitable reviewer in the OWNERS file upward in the source tree from where you are making the change (the OWNERS have a special responsibility for ensuring the long-term integrity of a portion of the code). If you want to become a committer/OWNER making some quality contributions is the way to start. We require all pull request checks to pass. ##### Versioning Vespa uses semantic versioning - see [vespa versions](https://vespa.ai/releases#versions). Notice in particular that any Java API in a package having a @PublicAPI annotation in the package-info file cannot be changed in an incompatible way between major versions: Existing types and method signatures must be preserved (but can be marked deprecated). ##### Issues We track issues in [GitHub issues](https://github.com/vespa-engine/vespa/issues). It is fine to submit issues also for feature requests and ideas, whether you intend to work on them or not. There is also a [ToDo list](https://github.com/vespa-engine/vespa/blob/master/TODO.md) for larger things which no one are working on yet. ##### Community If you have questions, want to share your experience or help others, please join our community on the [Vespa Slack](http://slack.vespa.ai), or see Vespa on [Stack Overflow](http://stackoverflow.com/questions/tagged/vespa). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Open development](#open-development) - [Pull requests](#pull-requests) - [Versioning](#versioning) - [Issues](#issues) - [Community](#community) --- ## Cpu Support ### CPU Support For maximum performance, the current version of Vespa for x86\_64 is compiled only for [Haswell (2013)](https://en.wikipedia.org/wiki/Haswell_(microarchitecture)) or later CPUs. #### CPU Support For maximum performance, the current version of Vespa for x86\_64 is compiled only for [Haswell (2013)](https://en.wikipedia.org/wiki/Haswell_(microarchitecture)) or later CPUs. If trying to run on an older CPU, you will likely see error messages like the following: ``` Problem running program /opt/vespa/bin/vespa-runserver => died with signal: illegal instruction (you probably have an older CPU than required) ``` or in older versions of Vespa, something like ``` /usr/local/bin/start-container.sh: line 67: 10 Illegal instruction /opt/vespa/bin/vespa-start-configserver ``` If you would like to run Vespa on an older CPU, we provide a [generic x86 container image](https://hub.docker.com/r/vespaengine/vespa-generic-intel-x86_64/). This image is slower, receives less testing than the regular image, and is less frequently updated. **To start a Vespa Docker container using this image:** ``` $ docker run --detach --name vespa --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa-generic-intel-x86_64 ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Cross Encoders ### Ranking With Transformer Cross-Encoder Models [Cross-Encoder Transformer](https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/) based text ranking models are generally more effective than [text embedding](embedding.html) models as they take both the query and the document as input with full cross-attention between all the query and document tokens. #### Ranking With Transformer Cross-Encoder Models [Cross-Encoder Transformer](https://blog.vespa.ai/pretrained-transformer-language-models-for-search-part-4/) based text ranking models are generally more effective than [text embedding](embedding.html) models as they take both the query and the document as input with full cross-attention between all the query and document tokens. The downside of cross-encoder models is the computational complexity. This document is a guide on how to export cross-encoder Transformer based models from [huggingface](https://huggingface.co/), and how to configure them for use in Vespa. ##### Exporting cross-encoder models For exporting models from HF to [ONNX](onnx.html), we recommend the [Optimum](https://huggingface.co/docs/optimum/main/en/index)library. Example usage for two relevant ranking models. Export [intfloat/simlm-msmarco-reranker](https://huggingface.co/intfloat/simlm-msmarco-reranker), which is a BERT-based transformer model for English texts: ``` $ optimum-cli export onnx --task text-classification -m intfloat/simlm-msmarco-reranker ranker ``` Export [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base), which is a ROBERTA-based transformer model for English and Chinese texts (multilingual): ``` $ optimum-cli export onnx --task text-classification -m BAAI/bge-reranker-base ranker ``` These two example ranking models use different language model [tokenization](reference/embedding-reference.html#huggingface-tokenizer-embedder) and also different transformer inputs. After the above Optimum export command you have two important files that is needed for importing the model to Vespa: ``` ├── ranker │   └── model.onnx └── tokenizer.json ``` The Optimum tool also supports various Transformer optimizations, including quantization to optimize the model for faster inference. ##### Importing ONNX and tokenizer model files to Vespa Add the generated `model.onnx` and `tokenizer.json` files from the `ranker` directory created by Optimum to the Vespa [application package](applications.html): ``` ├── models │   └── model.onnx └── tokenizer.json ├── schemas │   └── doc.sd └── services.xml ``` ##### Configure tokenizer embedder To speed up inference, Vespa avoids re-tokenizing the document tokens, so we need to configure the [huggingface-tokenizer-embedder](reference/embedding-reference.html#huggingface-tokenizer-embedder) in the `services.xml` file: ``` .. .. ``` This allows us to use the tokenizer while indexing documents in Vespa and also at query time to map (embed) query text to language model tokens. ##### Using tokenizer in schema Assuming we have two fields that we want to index and use for re-ranking (title, body), we can use the `embed` indexing expression to invoke the tokenizer configured above: ``` schema my_document { document my_document { field title type string {..} field body type string {..} } field tokens type tensor(d0[512]) { indexing: (input title || "") . " " . (input body || "") | embed tokenizer | attribute } } ``` The above will concat the title and body input document fields, and input to the `hugging-face-tokenizer` tokenizer which saves the output tokens as float (101.0). To use the generated `tokens` tensor in ranking, the tensor field must be defined with [attribute](attributes.html). ##### Using the cross-encoder model in ranking Cross-encoder models are not practical for _retrieval_ over large document volumes due to their complexity, so we configure them using [phased ranking](phased-ranking.html). ###### Bert-based model Bert-based models have three inputs: - input\_ids - token\_type\_ids - attention\_mask The [onnx-model](reference/schema-reference.html#onnx-model) configuration specifies the input names of the model and how to calculate them. It also specifies the file `models/model.onnx`. Notice also the [GPU](/en/operations-selfhosted/vespa-gpu-container.html). GPU inference is not required, and Vespa will fallback to CPU if no GPU device is found. See section on [performance](#performance). ``` rank-profile bert-ranker inherits default { inputs { query(q_tokens) tensor(d0[32]) } onnx-model cross_encoder { file: models/model.onnx input input_ids: my_input_ids input attention_mask: my_attention_mask input token_type_ids: my_token_type_ids gpu-device: 0 } function my_input_ids() { expression: tokenInputIds(256, query(q_tokens), attribute(tokens)) } function my_token_type_ids() { expression: tokenTypeIds(256, query(q_tokens), attribute(tokens)) } function my_attention_mask() { expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens)) } first-phase { expression: #depends on the retriever used } # The output of this model is a tensor of size ["batch", 1] global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} } } ``` The example above limits the sequence length to `256` using the built-in [convenience functions](reference/rank-features.html#tokenInputIds(length,%20input_1,%20input_2,%20...)) for generating token sequence input to Transformer models. Note that `tokenInputIds` uses 101 as start of sequence and 102 as padding. This is only compatible with BERT-based tokenizers. See section on [performance](#performance)about sequence length and impact on inference performance. ###### Roberta-based model ROBERTA-based models only have two inputs (input\_ids and attention\_mask). In addition, the default tokenizer start of sequence token is 1 and end of sequence is 2. In this case we use the`customTokenInputIds` function in `my_input_ids` function. See[customTokenInputIds](reference/rank-features.html#customTokenInputIds(start_sequence_id, sep_sequence_id, length, input_1, input_2, ...)). ``` rank-profile roberta-ranker inherits default { inputs { query(q_tokens) tensor(d0[32]) } onnx-model cross_encoder { file: models/model.onnx input input_ids: my_input_ids input attention_mask: my_attention_mask gpu-device: 0 } function my_input_ids() { expression: customTokenInputIds(1, 2, 256, query(q_tokens), attribute(tokens)) } function my_attention_mask() { expression: tokenAttentionMask(256, query(q_tokens), attribute(tokens)) } first-phase { expression: #depends on the retriever used } # The output of this model is a tensor of size ["batch", 1] global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} } } ``` ##### Using the cross-encoder model at query time At query time, we need to tokenize the user query using the [embed](embedding.html#embedding-a-query-text) support. The `embed` of the query text, sets the `query(q_tokens)`tensor that we defined in the ranking profile. ``` ``` { "yql": "select title,body from doc where userQuery()", "query": "semantic search", "input.query(q_tokens)": "embed(tokenizer, \"semantic search\")", "ranking": "bert-ranker", } ``` ``` The retriever (query + first-phase ranking) can be anything, including [nearest neighbor search](nearest-neighbor-search.html) a.k.a. dense retrieval using bi-encoders. ##### Performance There are three major scaling dimensions: - The number of hits that are re-ranked [rerank-count](reference/schema-reference.html#globalphase-rerank-count) Complexity is linear with the number of hits that are re-ranked. - The size of the transformer model used. - The sequence input length. Transformer models scales quadratic with the input sequence length. For models larger than 30-40M parameters, we recommend using GPU to accelerate inference. Quantization of model weights can drastically improve serving efficiency on CPU. See[Optimum Quantization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization) ##### Examples The [MS Marco](https://github.com/vespa-engine/sample-apps/tree/master/msmarco-ranking)sample application demonstrates using cross-encoders. ##### Using cross-encoders with multi-vector indexing When using [multi-vector indexing](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/) we can do the following to feed the best (closest) paragraph using the [closest()](reference/rank-features.html#closest(name)) feature into re-ranking with the cross-encoder model. ``` schema my_document { document my_document { field paragraphs type arraystring {..} } field tokens type tensor(p{}, d0[512]) { indexing: input paragraphs | embed tokenizer | attribute } field embedding type tensor(p{}, x[768]) { indexing: input paragraphs | embed embedder | attribute } } ``` Notice that both tokens use the same mapped embedding dimension name `p`. ``` rank-profile max-paragraph-into-cross-encoder inherits default { inputs { query(tokens) tensor(d0[32]) query(q) tensor(x[768]) } first-phase { expression: closeness(field, embedding) } function best_input() { expression: reduce(closest(embedding)*attribute(tokens), max, p) } function my_input_ids() { expression: tokenInputIds(256, query(tokens), best_input) } function my_token_type_ids() { expression: tokenTypeIds(256, query(tokens), best_input) } function my_attention_mask() { expression: tokenAttentionMask(256, query(tokens), best_input) } match-features: best_input my_input_ids my_token_type_ids my_attention_mask global-phase { rerank-count: 25 expression: onnx(cross_encoder){d0:0,d1:0} #Slice } } ``` The `best_input` uses a tensor join between the `closest(embedding)` tensor and the `tokens` tensor, which then returns the tokens of the best-matching (closest) paragraph. This tensor is used in the other Transformer-related functions (`tokenTypeIds tokenAttentionMask tokenInputIds`) as the document tokens. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Exporting cross-encoder models](#exporting-cross-encoder-models) - [Importing ONNX and tokenizer model files to Vespa](#importing-onnx-and-tokenizer-model-files-to-vespa) - [Configure tokenizer embedder](#configure-tokenizer-embedder) - [Using tokenizer in schema](#using-tokenizer-in-schema) - [Using the cross-encoder model in ranking](#using-the-cross-encoder-model-in-ranking) - [Bert-based model](#bert-based-model) - [Roberta-based model](#roberta-based-model) - [Using the cross-encoder model at query time](#using-the-cross-encoder-model-at-query-time) - [Performance](#performance) - [Examples](#examples) - [Using cross-encoders with multi-vector indexing](#using-cross-encoders-with-multi-vector-indexing) --- ## Data Management And Backup ### Data management and backup This guide documents how to export data from a Vespa cloud application and how to do mass updates or removals. #### Data management and backup This guide documents how to export data from a Vespa cloud application and how to do mass updates or removals. See [cloning applications and data](https://cloud.vespa.ai/en/cloning-applications-and-data) for how to copy documents from one application to another. Prerequisite: Use the latest version of the [vespa](/en/vespa-cli.html) command-line client. ##### Export documents To export documents, configure the application to export from, then select zone, container cluster and schema - example: ``` $ vespa config set application vespa-team.vespacloud-docsearch.default $ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc | head ``` Some of the parameters above are redundant if unambiguous. Here, the application is set up using a template found in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) with multiple container clusters. This example [visit](/en/content/visiting.html) documents from the `doc` schema. Use a [fieldset](/en/documents.html#fieldsets) to export document IDs only: ``` $ vespa visit --zone prod.aws-us-east-1c --cluster default --selection doc --field-set '[id]' | head ``` As the name implies, fieldsets are useful to select a subset of fields to export. Note that this normally does not speed up the exporting process, as the same amount of data is read from the index. The data transfer out of the Vespa application is smaller with fewer fields. ##### Backup Use the _visit_ operations above to extract documents for backup. To back up documents to your own Google Cloud Storage, see [backup](https://github.com/vespa-engine/sample-apps/tree/master/examples/google-cloud/cloud-functions#backup---experimental) for a Google Cloud Function example. ##### Feed If a document feed is generated with `vespa visit` (above), it is already in [JSON Lines](https://jsonlines.org/) feed-ready format by default: ``` $ vespa visit | vespa feed - -t $ENDPOINT ``` Find more examples in [cloning applications and data](https://cloud.vespa.ai/en/cloning-applications-and-data). A document export generated using [/document/v1](/en/document-v1-api-guide.html) is slightly different from the .jsonl output from `vespa visit` (e.g., fields like a continuation token are added). Extract the `document` objects before feeding: ``` $ gunzip -c docs.gz |[jq](https://stedolan.github.io/jq/)'.documents[]' | \ vespa feed - -t $ENDPOINT ``` ##### Delete To remove all documents in a Vespa deployment—or a selection of them—run a _deletion visit_. Use the `DELETE` HTTP method, and fetch only the continuation token from the response: ``` #!/bin/bash set -x #### The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation #### doc.path =~ "^/old/" -- all documents under the /old/ directory: SELECTION='doc.path%3D~%22%5E%2Fold%2F%22' continuation="" while token=$( curl -X DELETE -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \ | tee >( jq . > /dev/tty ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60). To purge all documents in a document export (above), generate a feed with `remove`-entries for each document ID, like: ``` $ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | head [ { "remove": "id:open:doc::open/documentation/schemas.html" }, { "remove": "id:open:doc::open/documentation/securing-your-vespa-installation.html" }, ``` Complete example for a single chunk: ``` $ gunzip -c docs.gz | jq '[.documents[] | {remove: .id} ]' | \ vespa feed - -t $ENDPOINT ``` ##### Update To update all documents in a Vespa deployment—or a selection of them—run an _update visit_. Use the `PUT` HTTP method, and specify a partial update in the request body: ``` #!/bin/bash set -x #### The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation #### doc.inlinks == "some-url" -- the weightedset inlinks has the key "some-url" SELECTION='doc.inlinks%3D%3D%22some-url%22' continuation="" while token=$( curl -X PUT -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ --data '{ "fields": { "inlinks": { "remove": { "some-url": 0 } } } }' \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?selection=${SELECTION}&cluster=${CLUSTER}&${continuation}" \ | tee >( jq . > /dev/tty ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` Each request will return a response after roughly one minute—change this by specifying _timeChunk_ (default 60). ##### Using /document/v1/ api To get started with a document export, find the _namespace_ and _document type_ by listing a few IDs. Hit the [/document/v1/](/en/reference/document-v1-api-reference.html) ENDPOINT. Restrict to one CLUSTER, see [content clusters](/en/reference/services-content.html): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "$ENDPOINT/document/v1/?cluster=$CLUSTER" ``` For ID export only, use a [fieldset](/en/documents.html#fieldsets): ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "$ENDPOINT/document/v1/?cluster=$CLUSTER&fieldSet=%5Bid%5D" ``` From an ID, like _id:open:doc::open/documentation/schemas.html_, extract - NAMESPACE: open - DOCTYPE: doc Example script: ``` #!/bin/bash set -x #### The ENDPOINT must be a regional endpoint, do not use '*.g.vespa-app.cloud/' ENDPOINT="https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud" NAMESPACE=open DOCTYPE=doc CLUSTER=documentation continuation="" idx=0 while ((idx+=1)) echo "$continuation" printf -v out "%05g" $idx filename=${NAMESPACE}-${DOCTYPE}-${out}.data.gz echo "Fetching data..." token=$( curl -s \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?wantedDocumentCount=1000&concurrency=4&cluster=${CLUSTER}&${continuation}" \ | tee >( gzip > ${filename} ) | jq -re .continuation ) do continuation="continuation=${token}" done ``` If only a few documents are returned per response, _wantedDocumentCount_ (default 1, max 1024) can be specified for a lower bound on the number of documents per response, if that many documents still remain. Specifying _concurrency_ (default 1, max 100) increases throughput, at the cost of resource usage. This also increases the number of documents per response, and _could_ lead to excessive memory usage in the HTTP container when many large documents are buffered to be returned in the same response. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Export documents](#export-documents) - [Backup](#backup) - [Feed](#feed) - [Delete](#delete) - [Update](#update) - [Using /document/v1/ api](#using-document-v1-api) --- ## Default Result Format ### Default JSON Result Format The default Vespa query response format is used when [presentation.format](../reference/query-api-reference.html#presentation.format) is unset or set to `json`. #### Default JSON Result Format The default Vespa query response format is used when [presentation.format](../reference/query-api-reference.html#presentation.format) is unset or set to `json`. Results are rendered with one or more objects: - `root`: mandatory object with the tree of returned data - `timing`: optional object with query timing information - `trace`: optional object for metadata about query execution Refer to the [query API guide](../query-api.html#result-examples) for result and tracing examples. All object names are literal strings, the node `root` is the map key "root" in the return JSON object, in other words, only strings are used as map keys. | Element | Parent | Mandatory | Type | Description | | --- | --- | --- | --- | --- | | ##### root | | root | | yes | Map of string to object | The root of the tree of returned data. | | children | root | no | Array of objects | Array of JSON objects with the same structure as `root`. | | fields | root | no | Map of string to object | | | totalCount | fields | no | Integer | Number of documents matching the query. Not accurate when using _nearestNeighbor_, _wand_ or _weakAnd_ query operators. The value is the number of hits after [first-phase dropping](schema-reference.html#rank-score-drop-limit). | | coverage | root | no | Map of string to string and number | Map of metadata about how much of the total corpus has been searched to return the given documents. | | coverage | coverage | yes | Integer | Percentage of total corpus searched (when lower than 100 this is an approximation and is a lower bound, as no info from nodes down is known) | | documents | coverage | yes | Long | The number of active documents searched. | | full | coverage | yes | Boolean | Whether the full corpus was searched. | | nodes | coverage | yes | Integer | The number of search nodes returning results. | | results | coverage | yes | Integer | The number of results merged creating the final rendered result. | | resultsFull | coverage | yes | Integer | The number of full result sets merged, e.g. when there are several sources/clusters for the results. | | degraded | coverage | no | Map of string to object | Map of match-phase degradation elements. | | match-phase | degraded | no | Boolean | Indicator whether [match-phase degradation](schema-reference.html#match-phase) has occurred. | | timeout | degraded | no | Boolean | Indicator whether the query [timed out](query-api-reference.html#timeout) before completion. | | adaptive-timeout | degraded | no | Boolean | Indicator whether the query timed out with [adaptive timeout](query-api-reference.html#ranking.softtimeout.enable) before completion. | | non-ideal-state | degraded | no | Boolean | Indicator whether the content cluster is in [ideal state](../content/idealstate.html). | | errors | root | no | Array of objects | Array of error messages with the fields given below. [Example](../query-api.html#error-result). | | code | errors | yes | Integer | Numeric identifier used by the container application. See [error codes](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/container/protect/Error.java) and [ErrorMessage.java](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/result/ErrorMessage.java) for a short description. | | message | errors | no | String | Full error message. | | source | errors | no | String | Which [data provider](../federation.html) logged the error condition. | | stackTrace | errors | no | String | Stack trace if an exception was involved. | | summary | errors | yes | String | Short description of error. | | transient | errors | no | Boolean | Whether the system is expected to recover from the faulty state on its own. If the flag is not present, this may or may not be the case, or the flag is not applicable. | | fields | root | no | Map of string to object | The named document (schema) [fields](schema-reference.html#field). Fields without value are not rendered. In addition to the fields defined in the schema, the following might be returned: | Fieldname | Description | | --- | --- | | sddocname | Schema name. Returned in the [default document summary](../document-summaries.html). | | documentid | Document ID. Returned in the [default document summary](../document-summaries.html). | | summaryfeatures | Refer to [summary-features](schema-reference.html#summary-features) and [observing values used in ranking](../getting-started-ranking.html#observing-values-used-in-ranking). | | matchfeatures | Refer to [match-features](schema-reference.html#match-features) and [example use](../nearest-neighbor-search-guide.html#strict-filters-and-distant-neighbors). | | | id | root | no | String | String identifying the hit, document or other data type. For document hits, this is the full string document id if the hit is filled with a document summary from disk. If it is not filled or only filled with data from memory (attributes), it is an internally generated unique id on the form `index:[source]/[node-index]/[hex-gid]`. Also see the [/document/v1/ guide](../document-v1-api-guide.html#troubleshooting) and [receiving-responses-of-different-formats-for-the-same-query-in-vespa](https://stackoverflow.com/questions/74033383/receiving-responses-of-different-formats-for-the-same-query-in-vespa). | | label | root | no | String | The label of a grouping list. | | limits | root | no | Object | Used in grouping, the limits of a bucket in histogram style data. | | from | limits | no | String | Lower bound of a bucket group. | | to | limits | no | String | Upper bound of a bucket group. | | relevance | root | yes | Double | Double value representing the rank score. | | source | root | no | String | Which data provider created this node. | | types | root | no | Array of string | Metadata about what kind of document or other kind of node in the result set this object is. | | value | root | no | String | Used in grouping for value groups, the argument for the grouping data which is in the fields. | | | | ##### timing | | timing | | no | Map of string to object | Query timing information, enabled by [presentation.timing](query-api-reference.html#presentation.timing). The [query performance guide](/en/performance/practical-search-performance-guide.html#basic-text-search-query-performance) is a useful resource to understand the values in its child elements. | | querytime | timing | no | Double | Time to execute the first protocol phase/matching phase, in seconds. | | summaryfetchtime | timing | no | Double | [Document summary](../document-summaries.html) fetch time, in seconds. This is the time to execute the summary fill protocol phase for the globally ordered top-k hits. | | searchtime | timing | no | Double | Approximately the sum of `querytime` and `summaryfetchtime` and is close to what a client will observe (except network latency). In seconds. | | | | ##### trace **Note:** The tracing elements below is a subset of all elements. Refer to the [search performance guide](../performance/practical-search-performance-guide.html#advanced-query-tracing) for examples. | | trace | | no | Map of string to object | Metadata about query execution. | | children | trace | no | Array of object | Array of maps with exactly the same structure as `trace` itself. | | timestamp | children | no | Long | Number of milliseconds since the start of query execution this node was added to the trace. | | message | children | no | String | Descriptive trace text regarding this step of query execution. | | message | children | no | Array of objects | Array of messages | | start\_time | message | no | String | Timestamp, e.g. 2022-07-27 09:51:21.938 UTC | | traces | message or threads | no | Array of traces or objects | | | distribution-key | message | no | Integer | The [distribution key](services-content.html#node) of the content node creating this span. | | duration\_ms | message | no | float | duration of span | | timestamp\_ms | traces | no | float | time since start of parent, see `start_time`. | | event | traces | no | String | Description of span | | tag | traces | no | String | Name of span | | threads | traces | no | Array of objects | Array of object that again has traces elements. | ##### JSON Schema Formal schema for the query API default result format: ``` ``` { "$schema": "http://json-schema.org/draft-04/schema#", "title": "Result", "description": "Schema for Vespa results", "type": "object", "properties": { "root": { "type": "document_node", "required": true }, "trace": { "type": "trace_node", "required": false } }, "definitions": { "document_node": { "properties": { "children": { "type": "array", "items": { "type": "document_node" }, "required": false }, "coverage": { "type": "coverage", "required": false }, "errors": { "type": "array", "items": { "type": "error" }, "required": false }, "fields": { "type": "object", "additionalProperties": true, "required": false }, "id": { "type": "string", "required": false }, "relevance": { "type": "number", "required": true }, "types": { "type": "array", "items": { "type": "string" }, "required": false }, "source": { "type": "string", "required": false }, "value": { "type": "string", "required": false }, "limits": { "type": "object", "required": false }, "label": { "type": "string", "required": false } }, "additionalProperties": true, }, "trace_node": { "properties": { "children": { "type": "array", "items": { "type": "trace_node" }, "required": false }, "timestamp": { "type": "number", "required": false }, "message": { "type": "string", "required": false } } }, "fields": { "properties": { "totalCount": { "type": "number", "required": true } } }, "coverage": { "properties": { "coverage": { "type": "number", "required": true }, "documents": { "type": "number", "required": true }, "full": { "type": "boolean", "required": true }, "nodes": { "type": "number", "required": true }, "results": { "type": "number", "required": true }, "resultsFull": { "type": "number", "required": true } } }, "error": { "properties": { "code": { "type": "number", "required": true }, "message": { "type": "string", "required": false }, "source": { "type": "string", "required": false }, "stackTrace": { "type": "string", "required": false }, "summary": { "type": "string", "required": true }, "transient": { "type": "boolean", "required": false } } } } } ``` ``` ##### Appendix: Legacy Vespa 7 JSON rendering There were some inconsistencies between search results and document rendering in Vespa 7, which are fixed in Vespa 8. This appendix describes the old behavior, what the changes are, and how to configure to select a specific rendering. ###### Inconsistent weightedset rendering Fields with various weightedset types has a JSON input representation (for feeding) as a JSON object; for example `{"one":1, "two":2,"three":3}` for the value of a a `weightedset` field. The same format is used when rendering a document (for example when visiting). In search results however, there are intermediate processing steps during which the field value is represented as an array of item/weight pairs, so in a search result the field value would render as `[ {"item":"one", "weight":1}, {"item":"two", "weight":2}, {"item":"three", "weight":3} ]` In Vespa 8, the default JSON renderer for search results outputs the same format as document rendering. If you have code that depends on the old format you can turn off this by setting `renderer.json.jsonWsets=false` in the query (usually via a [query profile](../query-profiles.html)). ###### Inconsistent map rendering Fields with various map types has a JSON input representation (for feeding) as a JSON object; for example `{"1001":1.0, "1002":2.0, "1003":3.0}` for the value of a a `map` field. The same format is used when rendering a document (for example when visiting). In search results however, there are intermediate processing steps and the field value is represented as an array of key/value pairs, so in a search results the field value would (in some cases) render as `[ {"key":1001, "value":1.0}, {"key":1002, "value":2.0}, {"key":1003, "value":3.0} ]` In Vespa 8, the default JSON renderer for search results output the same format as document rendering. For code that depends on the old format one can turn off this by setting `renderer.json.jsonMaps=false` in the query (usually via a [query profile](../query-profiles.html)). ###### Geo position rendering Fields with the type `position` would in Vespa 7 be rendered using the internal fields "x" and "y". These are integers representing microdegrees, aka geographical degrees times 1 million, of longitude (for x) and latitude (for y). Also, any field _foo_ of type `position` would trigger addition of two extra synthetic summary fields _foo.position_ and _foo.distance_ (see below for details). In Vespa 8, positions are rendered with two JSON fields "lat" and "lng", both having a floating-point value. The "lat" field is latitude (going from -90.0 at the South Pole to +90.0 at the North Pole). The "lng" field is longitude (going from -180.0 at the dateline seen as extreme west, via 0.0 at the Greenwich meridian, to +180.0 at the dateline again, now as extreme east). The field names are chosen so the format is the same as used in the Google "places" API. A closely related change is the removal of two synthetic summary fields which would be returned in search results. For example with this in schema: ``` field mainloc type position { indexing: attribute | summary } ``` Vespa 7 would include the _mainloc_ summary field, but also _mainloc.position_ and _mainloc.distance_; the latter only when the query actually had a position to take the distance from. The first of these (_mainloc.position_ in this case) was mainly useful for producing XML output in older Vespa versions, and now contains just the same information as the _mainloc_ summary field. The second (_mainloc.distance_ in this case) would return a distance in internal units, and can be replaced by a summary feature - here `distance(mainloc)` would give the same number, while `distance(mainloc).km` would be the recommended replacement with suitable code changes. ###### Summary-features wrapped in "rankingExpression" In Vespa 7, if a rank profile wanted a function `foobar` returned in summary-features (or match-features), it would be rendered as `rankingExpression(foobar)` in the output. For programmatic use, the `FeatureData` class has extra checking to allow lookup with `getDouble("foobar")` or `getTensor("foobar")`, but now it's present and rendered with just the original name as specified. If applications needs the JSON rendering to look exactly as in Vespa 7, one can specify that in the rank profile. For example, with this in the schema: ``` rank-profile whatever { function lengthScore() { expression: matchCount(title)/fieldLength(title) } summary-features { matchCount(title) lengthScore ... ``` could, in Vespa 7, yield JSON output containing: ``` summaryfeatures: { matchCount(title): 1, rankingExpression(lengthScore): 0.25, ... ``` in Vespa 8, you instead get the expected: ``` summaryfeatures: { matchCount(title): 1, lengthScore: 0.25, ... ``` But to get the old behavior one can specify: ``` rank-profile whatever { function lengthScore() { expression: matchCount(title)/fieldLength(title) } summary-features { matchCount(title) rankingExpression(lengthScore) ... ``` which gives you the same output as before. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [root](#root-header) - [timing](#timing-header) - [trace](#trace-heading) - [JSON Schema](#json-schema) - [Appendix: Legacy Vespa 7 JSON rendering](#appendix-legacy-vespa-7-json-rendering) - [Inconsistent weightedset rendering](#inconsistent-weightedset-rendering) - [Inconsistent map rendering](#inconsistent-map-rendering) - [Geo position rendering](#geo-position-rendering) - [Summary-features wrapped in "rankingExpression"](#summary-features-wrapped-in-rankingexpression) --- ## Default Set Metrics Reference ### Default Metric Set This document provides reference documentation for the Default metric set, including suffixes present per metric. #### Default Metric Set This document provides reference documentation for the Default metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix. ##### ClusterController Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | cluster-controller.down.count | node | last, max | Number of content nodes down | | cluster-controller.maintenance.count | node | last, max | Number of content nodes in maintenance | | cluster-controller.up.count | node | last, max | Number of content nodes up | | cluster-controller.is-master | binary | last, max | 1 if this cluster controller is currently the master, or 0 if not | | cluster-controller.resource\_usage.nodes\_above\_limit | node | last, max | The number of content nodes above resource limit, blocking feed | | cluster-controller.resource\_usage.max\_memory\_utilization | fraction | last, max | Current memory utilisation, for content node with the highest value | | cluster-controller.resource\_usage.max\_disk\_utilization | fraction | last, max | Current disk space utilisation, for content node with the highest value | ##### Container Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | http.status.1xx | response | rate | Number of responses with a 1xx status | | http.status.2xx | response | rate | Number of responses with a 2xx status | | http.status.3xx | response | rate | Number of responses with a 3xx status | | http.status.4xx | response | rate | Number of responses with a 4xx status | | http.status.5xx | response | rate | Number of responses with a 5xx status | | jdisc.gc.ms | millisecond | average, max | Time spent in JVM garbage collection | | jdisc.thread\_pool.work\_queue.capacity | thread | max | Capacity of the task queue | | jdisc.thread\_pool.work\_queue.size | thread | count, max, min, sum | Size of the task queue | | jdisc.thread\_pool.size | thread | max | Size of the thread pool | | jdisc.thread\_pool.active\_threads | thread | count, max, min, sum | Number of threads that are active | | jdisc.application.failed\_component\_graphs | item | rate | JDISC Application failed component graphs | | jdisc.singleton.is\_active | item | last, max | JDISC Singleton is active | | jdisc.http.ssl.handshake.failure.missing\_client\_cert | operation | rate | JDISC HTTP SSL Handshake failures due to missing client certificate | | jdisc.http.ssl.handshake.failure.incompatible\_protocols | operation | rate | JDISC HTTP SSL Handshake failures due to incompatible protocols | | jdisc.http.ssl.handshake.failure.incompatible\_chifers | operation | rate | JDISC HTTP SSL Handshake failures due to incompatible chifers | | jdisc.http.ssl.handshake.failure.unknown | operation | rate | JDISC HTTP SSL Handshake failures for unknown reason | | mem.heap.free | byte | average | Free heap memory | | athenz-tenant-cert.expiry.seconds | second | last, max, min | Time remaining until Athenz tenant certificate expires | | feed.operations | operation | rate | Number of document feed operations | | feed.latency | millisecond | count, sum | Feed latency | | queries | operation | rate | Query volume | | query\_latency | millisecond | average, count, max, sum | The overall query latency as seen by the container | | failed\_queries | operation | rate | The number of failed queries | | degraded\_queries | operation | rate | The number of degraded queries, e.g. due to some content nodes not responding in time | | hits\_per\_query | hit\_per\_query | average, count, max, sum | The number of hits returned | | docproc.documents | document | sum | Number of processed documents | | totalhits\_per\_query | hit\_per\_query | average, count, max, sum | The total number of documents found to match queries | | serverActiveThreads | thread | average | Deprecated. Use jdisc.thread\_pool.active\_threads instead. | ##### Distributor Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | vds.distributor.docsstored | document | average | Number of documents stored in all buckets controlled by this distributor | | vds.bouncer.clock\_skew\_aborts | operation | count | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | ##### NodeAdmin Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | endpoint.certificate.expiry.seconds | second | N/A | Time until node endpoint certificate expires | | node-certificate.expiry.seconds | second | N/A | Time until node certificate expires | ##### SearchNode Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | content.proton.documentdb.documents.total | document | last, max | The total number of documents in this documents db (ready + not-ready) | | content.proton.documentdb.documents.ready | document | last, max | The number of ready documents in this document db | | content.proton.documentdb.documents.active | document | last, max | The number of active / searchable documents in this document db | | content.proton.documentdb.disk\_usage | byte | last | The total disk usage (in bytes) for this document db | | content.proton.documentdb.memory\_usage.allocated\_bytes | byte | last | The number of allocated bytes | | content.proton.search\_protocol.query.latency | second | average, count, max, sum | Query request latency (seconds) | | content.proton.search\_protocol.docsum.latency | second | average, count, max, sum | Docsum request latency (seconds) | | content.proton.search\_protocol.docsum.requested\_documents | document | rate | Total requested document summaries | | content.proton.resource\_usage.disk | fraction | average | The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | | content.proton.resource\_usage.memory | fraction | average | The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | | content.proton.resource\_usage.feeding\_blocked | binary | last, max | Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) | | content.proton.transactionlog.disk\_usage | byte | last | The disk usage (in bytes) of the transaction log | | content.proton.documentdb.matching.docs\_matched | document | rate | Number of documents matched | | content.proton.documentdb.matching.docs\_reranked | document | rate | Number of documents re-ranked (second phase) | | content.proton.documentdb.matching.rank\_profile.query\_latency | second | average, count, max, sum | Total average latency (sec) when matching and ranking a query | | content.proton.documentdb.matching.rank\_profile.query\_setup\_time | second | average, count, max, sum | Average time (sec) spent setting up and tearing down queries | | content.proton.documentdb.matching.rank\_profile.rerank\_time | second | average, count, max, sum | Average time (sec) spent on 2nd phase ranking | ##### Sentinel Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | sentinel.totalRestarts | restart | last, max, sum | Total number of service restarts done by the sentinel since the sentinel was started | ##### Storage Metrics | Name | Unit | Suffixes | Description | | --- | --- | --- | --- | | vds.filestor.allthreads.put.count | operation | rate | Number of requests processed. | | vds.filestor.allthreads.remove.count | operation | rate | Number of requests processed. | | vds.filestor.allthreads.update.count | request | rate | Number of requests processed. | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [ClusterController Metrics](#clustercontroller-metrics) - [Container Metrics](#container-metrics) - [Distributor Metrics](#distributor-metrics) - [NodeAdmin Metrics](#nodeadmin-metrics) - [SearchNode Metrics](#searchnode-metrics) - [Sentinel Metrics](#sentinel-metrics) - [Storage Metrics](#storage-metrics) --- ## Deleting Applications ### Deleting Applications **Warning:** Following these steps will remove production instances or regions and all data within them. #### Deleting Applications **Warning:** Following these steps will remove production instances or regions and all data within them. Data will be unrecoverable. ##### Deleting an application To delete an application, use the console: - navigate to the _application_ view at http://console.vespa.ai/tenant/tenant-name/application where you can find the trash can icon to the far right, as an `ACTION`. - navigate to the _deploy_ view at_http://console.vespa.ai/tenant/tenant-name/application/app-name/prod/deploy_. ![delete production deployment](/assets/img/console/delete-production-deployment.png) When the application deployments are deleted, delete the application in the [console](http://console.vespa.ai). Remove the CI job that builds and deploys application packages, if any. ##### Deleting an instance / region To remove an instance or a deployment to a region from an application: 1. Remove the `region` from `prod`, or the `instance` from `deployment`in [deployment.xml](https://cloud.vespa.ai/en/reference/deployment#instance): 2. Add or modify [validation-overrides.xml](/en/reference/validation-overrides.html), allowing Vespa Cloud to remove production instances: 3. Build and deploy the application package. Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy An Application Java ### Deploy an application having Java components Follow these steps to deploy a Vespa application which includes Java components to the [dev zone](cloud/environments.html#dev) on Vespa Cloud (for free). #### Deploy an application having Java components Follow these steps to deploy a Vespa application which includes Java components to the [dev zone](cloud/environments.html#dev) on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application without Java components](deploy-an-application.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application locally](deploy-an-application-local.html). - [Deploy an application having Java components locally](deploy-an-application-local-java.html). **Prerequisites:** - [Java 17](https://openjdk.org/projects/jdk/17/). - [Apache Maven](https://maven.apache.org/install.html) to build the application. Steps: 1. **Create a [tenant](cloud/tenant-apps-instances.html) on Vespa Cloud:** 2. **Install the [Vespa CLI](/en/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Configure the Vespa client:** 4. **Get Vespa Cloud control plane access:** 5. **Clone a sample [application](applications.html):** 6. **Add a certificate for [data plane access](https://cloud.vespa.ai/en/security/guide#data-plane) to the application:** 7. **Build the application:** 8. **[Deploy](applications.html#deploying-applications) the application:** 9. **[Feed](reads-and-writes.html) [documents](documents.html):** 10. **Run [queries](/en/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](cloud/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). ##### Next steps - Read the [developer guide](https://docs.vespa.ai/en/developer-guide). - [Set up deployment to production](cloud/production-deployment.html). - Go to the [Vespa documentation](/). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy An Application Local Java ### Deploy an application having Java components locally Follow these steps to deploy a Vespa application having Java components on your own machine. #### Deploy an application having Java components locally Follow these steps to deploy a Vespa application having Java components on your own machine. Alternative versions of this guide: - [Deploy an application using pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application without Java components locally](deploy-an-application-local.html). This is tested with _vespaengine/vespa:8.599.6_ container image. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with Podman or [Docker](https://docs.docker.com/engine/install/) installed. See [Docker Containers](/en/operations-selfhosted/docker-containers.html) for system limits and other settings. For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html) - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations-selfhosted/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/operations/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). - [Java 17](https://openjdk.org/projects/jdk/17/). - [Apache Maven](https://maven.apache.org/install.html) is used to build the application. Steps: 1. **Validate the environment:** 2. **Install the [Vespa CLI](/en/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Set local target:** 4. **Start a Vespa Docker container:** 5. **Clone a sample [application](applications.html):** 6. **Build it:** 7. **[Deploy](applications.html#deploying-applications) the application:** 8. **[Feed](reads-and-writes.html) [documents](documents.html):** 9. **Run [queries](/en/query-api.html):** Congratulations, you have deployed your first Vespa application! ##### Next steps - Read the [developer guide](https://docs.vespa.ai/en/developer-guide). - [Set up deployment to production](cloud/production-deployment.html). - Go to the [Vespa documentation](/). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. ``` $ docker rm -f vespa ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy An Application Local ### Deploy an application locally Follow these steps to deploy a Vespa application on your own machine. #### Deploy an application locally Follow these steps to deploy a Vespa application on your own machine. Alternative versions of this guide: - [Deploy an application using pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application having Java components locally](deploy-an-application-local-java.html). This is tested with _vespaengine/vespa:8.599.6_ container image. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with Podman or [Docker](https://docs.docker.com/engine/install/) installed. See [Docker Containers](/en/operations-selfhosted/docker-containers.html) for system limits and other settings. For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html) - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations-selfhosted/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/operations/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). Steps: 1. **Validate the environment:** 2. **Install the [Vespa CLI](/en/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Set local target:** 4. **Start a Vespa Docker container:** 5. **Clone a sample [application](applications.html):** 6. **[Deploy](applications.html#deploying-applications) the application:** 7. **[Feed](reads-and-writes.html) [documents](documents.html):** 8. **Run [queries](/en/query-api.html):** 9. **Get documents:** Congratulations, you have deployed your first Vespa application! ##### Next steps - Read the [developer guide](https://docs.vespa.ai/en/developer-guide). - [Set up deployment to production](cloud/production-deployment.html). - Go to the [Vespa documentation](/). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. ``` $ docker rm -f vespa ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy An Application Shell ### Deploy an application without Vespa CLI This lets you deploy an application to the [dev zone](cloud/environments.html#dev)on Vespa Cloud (for free). #### Deploy an application without Vespa CLI This lets you deploy an application to the [dev zone](cloud/environments.html#dev)on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application](deploy-an-application.html) - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application locally](deploy-an-application-local.html). - [Deploy an application with Java components locally](deploy-an-application-local-java.html). **Prerequisites:** - git - or download the files from [album-recommendation](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation) - zip - or other tool to create a .zip file - curl - or other tool to send HTTP requests with security credentials - OpenSSL Steps: 1. **Create a [tenant](cloud/tenant-apps-instances.html) on Vespa Cloud:** 2. **Clone a sample [application](applications.html):** 3. **Add a certificate for [data plane access](https://cloud.vespa.ai/en/security/guide#data-plane) to the application:** 4. **Create a deployable application package zip:** 5. **Deploy the application:** 6. **Verify the application endpoint:** 7. **[Feed](reads-and-writes.html) [documents](documents.html):** 8. **Run [queries](/en/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](cloud/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). ##### Next steps - Read the [developer guide](https://docs.vespa.ai/en/developer-guide). - [Set up deployment to production](cloud/production-deployment.html). - Go to the [Vespa documentation](/). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy An Application ### Deploy an application Follow these steps to deploy a Vespa application to the [dev zone](cloud/environments.html#dev)on Vespa Cloud (for free). #### Deploy an application Follow these steps to deploy a Vespa application to the [dev zone](cloud/environments.html#dev)on Vespa Cloud (for free). Alternative versions of this guide: - [Deploy an application using pyvespa](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) - for Python developers - [Deploy an application having Java components](deploy-an-application-java.html) - [Deploy an application without Vespa CLI](deploy-an-application-shell.html) - [Deploy an application locally](deploy-an-application-local.html). - [Deploy an application having Java components locally](deploy-an-application-local-java.html). Steps: 1. **Create a [tenant](cloud/tenant-apps-instances.html) on Vespa Cloud:** 2. **Install the [Vespa CLI](/en/vespa-cli.html)** using [Homebrew](https://brew.sh/): 3. **Configure the Vespa client:** 4. **Get Vespa Cloud control plane access:** 5. **Clone a sample [application](applications.html):** 6. **Add a certificate for [data plane access](https://cloud.vespa.ai/en/security/guide#data-plane) to the application:** 7. **[Deploy](applications.html#deploying-applications) the application:** 8. **[Feed](reads-and-writes.html) [documents](documents.html):** 9. **Run [queries](/en/query-api.html):** Congratulations, you have deployed your first Vespa application! Application instances in the [dev zone](cloud/environments.html#dev)will by default keep running for 14 days after the last deployment. You can control this in the[console](https://console.vespa-cloud.com/). ##### Next steps - Read the [developer guide](https://docs.vespa.ai/en/developer-guide). - [Set up deployment to production](cloud/production-deployment.html). - Go to the [Vespa documentation](/). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. Copyright © 2025 - [Cookie Preferences](#) --- ## Deploy Rest Api V2 ### Deploy API This is the API specification and some examples for the HTTP Deploy API that can be used to deploy an application: #### Deploy API This is the API specification and some examples for the HTTP Deploy API that can be used to deploy an application: - [upload](#create-session) - [prepare](#prepare-session) - [activate](#activate-session) The response format is JSON. Examples are found in the [use-cases](#use-cases). Also see the [deploy guide](../application-packages.html#deploy). **Note:** To build a multi-application system, use one or three config server(s) per application. Best practise is using a [containerized](/en/operations-selfhosted/docker-containers.html) architecture, also see [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA). The current API version is 2. The API port is 19071 - use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) service configserver to find config server hosts. Example: `http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session`. Write operations return successfully after a majority of config servers have persisted changes (e.g. 2 out of 3 config servers). Entities: | session-id | The session-id used in this API is generated by the server and is required for all operations after [creating](#create-session) a session. The session-id is valid if it is an active session, or it was created before [session lifetime](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/configserver.def) has expired, the default value being 1 hour. | | path | An application file path in a request URL or parameter refers to a relative path in the application package. A path ending with "/" refers to a directory. | Use [Vespa CLI](../vespa-cli.html) to deploy from the command line. **Note:** Use [convergence](../application-packages.html#convergence)to confirm configuration activation on all nodes. ##### POST /application/v2/tenant/default/prepareandactivate Creates a new session with the application package that is included in the request, prepares it and then activates it. See details in the steps later in this document | Parameters | | Name | Default | Description | | --- | --- | --- | | | | | | | Request body | | Required | Content | Note | | --- | --- | --- | | Yes | A compressed [application package](../application-packages.html) (with gzip or zip compression) | Set `Content-Type` HTTP header to `application/x-gzip` or `application/zip`. | | | Response | See [active](#activate-session). | Example: ``` $ (cd src/main/application && zip -r - .) | \ curl --header Content-Type:application/zip --data-binary @- \ localhost:19071/application/v2/tenant/default/prepareandactivate ``` ``` ``` { "log": [ { "time": 1619448107299, "level": "WARNING", "message": "Host named 'vespa-container' may not receive any config since it is not a canonical hostname. Disregard this warning when testing in a Docker container." } ], "tenant": "default", "session-id": "3", "url": "http://localhost:19071/application/v2/tenant/default/application/default/environment/prod/region/default/instance/default", "message": "Session 3 for tenant 'default' prepared and activated.", "configChangeActions": { "restart": [], "refeed": [], "reindex": [] } } ``` ``` ##### POST /application/v2/tenant/default/session Creates a new session with the application package that is included in the request. | Parameters | | Name | Default | Description | | --- | --- | --- | | from | N/A | Use when you want to create a new session based on an active application. The value supplied should be a URL to an active application. | | | Request body | | Required | Content | Note | | --- | --- | --- | | Yes, unless `from` parameter is used | A compressed [application package](../application-packages.html) (with gzip or zip compression) | It is required to set the `Content-Type` HTTP header to `application/x-gzip` or `application/zip`, unless the `from` parameter is used. | | | Response | The response contains: - A [session-id](#session-id) to the application that was created. - A [prepared](#prepare-session) URL for preparing the application. | Examples (both requests return the same response): - `POST /application/v2/tenant/default/session` - `POST /application/v2/tenant/default/session?from=http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default` ``` { "tenant": "default", "session-id": "1", "prepared": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/prepared/", "content": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/session-id/content/", "message": "Session 1 for tenant 'default' created." } ``` ##### PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Writes the content to the given path, or creates a directory if the path ends with '/'. | Parameters | None | | Request body | - If path is a directory, none. - If path is a file, the contents of the file. | | Response | None - Any errors or warnings from writing the file/creating the directory. | ##### GET /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Returns the content of the file at this path, or lists files and directories if `path` ends with '/'. | Parameters | | Name | Default | Description | | --- | --- | --- | | recursive | false | If _true_, directory content will be listed recursively. | | return | content | - If set to content and path refers to a file, the content will be returned. - If set to content and path refers to a directory, the files and subdirectories in the directory will be listed. - If set to status and path refers to a file, the file status and hash will be returned. - If set to status and path refers to a directory, a list of file/subdirectory statuses and hashes will be returned. | | | Request body | None. | | Response | - If path is a directory: a JSON array of URLs to the files and subdirectories of that directory. - If path is a file: the contents of the file. - If status parameter is set, the status and hash will be returned. | Examples: `GET /application/v2/tenant/default/session/3/content/` ``` ``` [ "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/" ] ``` ``` `GET /application/v2/tenant/default/session/3/content/?recursive=true` ``` ``` [ "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/services.xml", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd", "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd" ] ``` ``` `GET /application/v2/tenant/default/session/3/content/hosts.xml` ``` ``` vespa1 vespa2 ``` ``` `GET /application/v2/tenant/default/session/3/content/hosts.xml?return=status` ``` ``` { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/hosts.xml", "status": "new", "md5": "03d7cff861fcc2d88db70b7857d4d452" } ``` ``` `GET /application/v2/tenant/default/session/3/content/schemas/?return=status` ``` ``` [ { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/music.sd", "status": "new", "md5": "03d7cff861fcc2d88db70b7857d4d452" }, { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/video.sd", "status": "changed", "md5": "03d7cff861fcc2d88db70b7857d4d452" }, { "name": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/content/schemas/book.sd", "status": "deleted", "md5": "03d7cff861fcc2d88db70b7857d4d452" } ] ``` ``` ##### DELETE /application/v2/tenant/default/session/[[session-id](#session-id)]/content/[[path](#path)] Deletes the resource at the given path. | Parameters | None | | Request body | None | | Response | Any errors or warnings from deleting the resource. | ##### PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared Prepares an application with the [session-id](#session-id) given. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | applicationName | N/A | Name of the application to be deployed | | environment | default | Environment where application should be deployed | | region | default | Region where application should be deployed | | instance | default | Name of application instance | | debug | false | If true, include stack trace in response if prepare fails. | | timeout | 360 seconds | Timeout in seconds to wait for session to be prepared. | | | Request body | None | | Response | Returns a [session-id](#session-id) and a link to activate the session. - Log with any errors or warnings from preparing the application. - An [activate](#activate-session) URL for activating the application with this [session-id](#session-id), if there were no errors. - A list of actions (possibly empty) that must be performed in order to apply some config changes between the current active application and this next prepared application. These actions are organized into three categories; _restart_, _reindex_, and _refeed_: - _Restart_ actions are done after the application has been activated and are handled by restarting all listed services. See [schemas](schema-reference.html#modifying-schemas) for details. - _Reindex_ actions are special refeed actions that Vespa [handles automatically](../operations/reindexing.html), if the [reindex](#reindex) endpoint below is used. - _Refeed_ actions require several steps to handle. See [schemas](schema-reference.html#modifying-schemas) for details. | Example: `PUT /application/v2/tenant/default/session/3/prepared` ``` ``` { "tenant": "default", "session-id": "3", "activate": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/session/3/active", "message": "Session 3 for tenant 'default' prepared.", "log": [ { "level": "WARNING", "message": "Warning message 1", "time": 1430134091319 }, { "level": "WARNING", "message": "Warning message 2", "time": 1430134091320 } ], "configChangeActions": { "restart": [ { "clusterName": "mycluster", "clusterType": "search", "serviceType": "searchnode", "messages": ["Document type 'test': Field 'f1' changed: add attribute aspect"], "services": [ { "serviceName": "searchnode", "serviceType": "searchnode", "configId": "mycluster/search/cluster.mycluster/0", "hostName": "myhost.mydomain.com" } ] } ], "reindex": [ { "documentType": "test", "clusterName": "mycluster", "messages": ["Document type 'test': Field 'f1' changed: add index aspect"], "services": [ { "serviceName": "searchnode", "serviceType": "searchnode", "configId": "mycluster/search/cluster.mycluster/0", "hostName": "myhost.mydomain.com" } ] } ] } } ``` ``` ##### GET /application/v2/tenant/default/session/[[session-id](#session-id)]/prepared Returns the state of a prepared session. The response is the same as a successful [prepare](#prepare-session) operation (above), however the _configChangeActions_ element will be empty. ##### PUT /application/v2/tenant/default/session/[[session-id](#session-id)]/active Activates an application with the [session-id](#session-id) given. The [session-id](#session-id) must be for a [prepared session](#prepare-session). The operation will make sure the session is activated on all config servers. | Parameters | | Parameter | Default | Description | | --- | --- | --- | | timeout | 60 seconds | Timeout in seconds to wait for session to be activated (when several config servers are used, they might need to sync before activate can be done). | | | Request body | None | | Response | Returns a [session-id](#session-id), a message and a URL to the activated application. - [session-id](#session-id) - Message | Example: `PUT /application/v2/tenant/default/session/3/active` ``` ``` { "tenant": "default", "session-id": "3", "message": "Session 3 for tenant 'default' activated.", "url": "http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default" } ``` ``` ##### GET /application/v2/tenant/default/application/ Returns a list of the currently active applications for the given tenant. | Parameters | None | | Request body | None | | Response | Returns a list of applications - Array of active applications | Example: `GET /application/v2/tenant/default/application/` ``` ``` { ["http://myconfigserver.mydomain.com:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default"] } ``` ``` ##### GET /application/v2/tenant/default/application/default Gets info about the application. | Parameters | None | | Request body | None | | Response | Returns information about the application specified. - config generation | Example: `GET /application/v2/tenant/default/application/default` ``` ``` { "generation": 2 } ``` ``` ##### GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing Returns [reindexing](../operations/reindexing.html) status for the given application. | Parameters | N/A | | Request body | N/A | | Response | JSON detailing current reindexing status for the application, with all its clusters and document types. - Status for each content cluster in the application, by name: - Status of each document type in the cluster, by name: - Last time reindexing was triggered for this document type. - Current status of reindexing. - Optional start time of reindexing. - Optional end time of reindexing. - Optional progress of reindexing, from 0 to 1. - Pseudo-speed of reindexing. | Example: `GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing` ``` ``` { "clusters": { "db": { "ready": { "test_artifact": { "readyMillis": 1607937250998, "startedMillis": 1607940060012, "state": "running", "speed": 1.0, "progress": 0.04013824462890625 }, "test_result": { "readyMillis": 1607688477294, "startedMillis": 1607690520026, "endedMillis": 1607709294236, "speed": 0.1, "state": "successful" }, "test_run": { "readyMillis": 1607937250998, "state": "pending" } } } } } ``` ``` ##### POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex Marks specified document types in specified clusters of an application as ready for [reindexing](../operations/reindexing.html). Reindexing itself starts with the next redeployment of the application. To stop an ongoing reindexing, see [updating reindexing](#update-reindexing) below. All document types in all clusters are reindexed unless restricted, using parameters as specified: | Parameters | | Name | Description | | --- | --- | | clusterId | A comma-separated list of content clusters to limit reindexing to. All clusters are reindexed if this is not present. | | documentType | A comma-separated list of document types to limit reindexing to. All document types are reindexed if this is not present. | | indexedOnly | Boolean: whether to mark reindexing ready only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. | | speed | Number (0–10], default 1: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 | | | Request body | N/A | | Response | A human-readable message indicating what reindexing was marked as ready. | Example: `POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&indexedOnly=true` ``` ``` { "message": "Reindexing document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default" } ``` ``` ##### PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex Modifies [reindexing](../operations/reindexing.html) of specified document types in specified clusters of an application. Specifically, this can be used to alter the pseudo-speed of the reindexing, optionally halting it by specifying a speed of `0`; reindexing for the specified types will remain dormant until either speed is increased again, or a new reindexing is triggered (see [trigger reindexing](#reindex)). Speed changes become effective with the next redeployment of the application. Reindexing for all document types in all clusters are affected if no other parameters are specified: | Parameters | | Name | Description | | --- | --- | | clusterId | A comma-separated list of content clusters to limit the changes to. Reindexing for all clusters are modified if this is not present. | | documentType | A comma-separated list of document types to limit the changes to. Reindexing for all document types are modified if this is not present. | | indexedOnly | Boolean: whether to modify reindexing only for document types with indexing mode _index_ and at least one field with the indexing statement `index`. Default is `false`. | | speed | Number [0–10], required: Indexing pseudo speed - balance speed vs. resource use. Example: speed=0.1 | | | Request body | N/A | | Response | A human-readable message indicating what reindexing was modified. | Example: `PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex?clusterId=foo,bar&documentType=moo,baz&speed=0.618` ``` ``` { "message": "Set reindexing speed to '0.618' for document types [moo, baz] in 'foo', [moo] in 'bar' of application default.default" } ``` ``` ##### GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[[path](#path)] Returns content at the given path for an application. See [getting content](#content-get) for usage and response. ##### DELETE /application/v2/tenant/default/application/default Deletes an active application. | Parameters | None | | Request body | None | | Response | Returns a message stating if the operation was successful or not | Example: `DELETE /application/v2/tenant/default/application/default` ``` ``` { "message": "Application 'default' was deleted" } ``` ``` ##### GET /application/v2/host/[hostname] Gets information about which tenant and application a hostname is used by. | Parameters | None | | Request body | None | | Response | Returns a message with tenant and application details. | Example: `GET /application/v2/host/myhost.mydomain.com` ``` ``` { "tenant": "default" "application": "default" "environment": "default" "region": "default" "instance": "default" } ``` ``` ##### Error Handling Errors are returned using standard HTTP status codes. Any additional info is included in the body of the return call, JSON-formatted. The general format for an error response is: ``` ``` { "error-code": "ERROR_CODE", "message": "An error message" } ``` ``` | HTTP status code | Error code | Description | | --- | --- | --- | | 400 | BAD\_REQUEST | Bad request. Client error. The error message should indicate the cause. | | 400 | INVALID\_APPLICATION\_PACKAGE | There is an error in the application package. The error message should indicate the cause. | | 400 | OUT\_OF\_CAPACITY | Not enough nodes available for the request to be fulfilled. | | 401 | | Not authorized. The error message should indicate the cause. | | 404 | NOT\_FOUND | Not found. E.g. when using a session-id that doesn't exist. | | 405 | METHOD\_NOT\_ALLOWED | Method not implemented. E.g. using GET where only POST or PUT is allowed. | | 409 | ACTIVATION\_CONFLICT | Conflict, returned when activating an application fails due to a conflict with other changes to the same application (in another session). Client should retry. | | 500 | INTERNAL\_SERVER\_ERROR | Internal server error. Generic error. The error message should indicate the cause. | ##### Access log Requests are logged in the [access log](../access-logging.html) which can be found at _$VESPA\_HOME/logs/vespa/configserver/access-json.log_, example: ``` ``` { "ip": "172.17.0.2", "time": 1655665104.751, "duration": 1.581, "responsesize": 230, "requestsize": 0, "code": 200, "method": "PUT", "uri": "/application/v2/tenant/default/session/2/prepared", "version": "HTTP/2.0", "agent": "vespa-deploy", "host": "b614c9ff04d7:19071", "scheme": "https", "localport": 19071, "peeraddr": "172.17.0.2", "peerport": 47480, "attributes": { "http2-stream-id":"1" } } ``` ``` ##### Use Cases It is assumed that the tenant _default_ is already created in these use cases, and the application package is in _app_. ###### Create, prepare and activate an application Create a session with the application package: ``` $ (cd app && zip -r - .) | \ curl -s --header Content-Type:application/zip --data-binary @- \ "http://host:19071/application/v2/tenant/default/session" ``` Prepare the application with the URL in the _prepared_ link from the response: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/prepared?applicationName=default" ``` Activate the application with the URL in the _activate_ link from the response: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/1/active" ``` ###### Modify the application package Dump _services.xml_ from session 1: ``` $ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/1/content/services.xml" ``` ``` ``` 12345 ``` ``` Session 1 is activated and cannot be changed - create a new session based on the active session: ``` $ curl -s -X POST "http://host:19071/application/v2/tenant/default/session?from=http://host:19071/application/v2/tenant/default/application/default/environment/default/region/default/instance/default" ``` Modify rpcport to 12346 in _services.xml_, deploy the change: ``` $ curl -s -X PUT --data-binary @app/services.xml \ "http://host:19071/application/v2/tenant/default/session/2/content/services.xml" ``` Get _services.xml_ from session 2 to validate: ``` $ curl -s -X GET "http://host:19071/application/v2/tenant/default/session/2/content/services.xml" ``` ``` ``` 12346 ``` ``` To add the file _files/test1.txt_, first create the directory, then add the file: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/content/files/" $ curl -s -X PUT --data-binary @app/files/test1.txt \ "http://host:19071/application/v2/tenant/default/session/2/content/files/test1.txt" ``` Prepare and activate the session: ``` $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/prepared?applicationName=fooapp" $ curl -s -X PUT "http://host:19071/application/v2/tenant/default/session/2/active" ``` ###### Rollback If you need to roll back to a previous version of the application package this can be achieved by creating a new session based on the previous known working version by passing the corresponding session-id in the _from_ argument, see [creating a session](#create-session) Also see [rollback](/en/application-packages.html#rollback). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [POST /application/v2/tenant/default/prepareandactivate](#prepareandactivate) - [POST /application/v2/tenant/default/session](#create-session) - [PUT /application/v2/tenant/default/session/[](#content-put) - [GET /application/v2/tenant/default/session/[](#content-get) - [DELETE /application/v2/tenant/default/session/[](#content-delete) - [PUT /application/v2/tenant/default/session/[](#prepare-session) - [GET /application/v2/tenant/default/session/[](#get-prepare-session) - [PUT /application/v2/tenant/default/session/[](#activate-session) - [GET /application/v2/tenant/default/application/](#get-application) - [GET /application/v2/tenant/default/application/default](#get-application-info) - [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindexing](#reindexing) - [POST /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#reindex) - [PUT /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/reindex](#update-reindexing) - [GET /application/v2/tenant/default/application/default/environment/default/region/default/instance/default/content/[](#get-application-content) - [DELETE /application/v2/tenant/default/application/default](#delete-application) - [GET /application/v2/host/[hostname]](#get-host-info) - [Error Handling](#error-handling) - [Access log](#access-log) - [Use Cases](#use-cases) - [Create, prepare and activate an application](#use-case-start) - [Modify the application package](#use-case-modify) - [Rollback](#rollback) --- ## Deployment Variants ### Instance, region, cloud and environment variants Sometimes it is useful to create configuration that varies depending on properties of the deployment, for example to set region specific endpoints of services used by [Searchers](/en/searcher-development.html), or use smaller clusters for a "beta" instance. #### Instance, region, cloud and environment variants Sometimes it is useful to create configuration that varies depending on properties of the deployment, for example to set region specific endpoints of services used by [Searchers](/en/searcher-development.html), or use smaller clusters for a "beta" instance. This is supported both for [services.xml](#services.xml-variants) and [query profiles](#query-profile-variants). ##### services.xml variants [services.xml](services.html) files support different configuration settings for different _tags_, _instances_, _environments_, _clouds_ and _regions_. To use this, import the _deploy_ namespace: ``` ``` ``` ``` Deploy directives are used to specify with which tags, and in which instance, environment, cloud and/or [region](https://cloud.vespa.ai/en/reference/zones) an XML element should be included: ``` ``` 2 ``` ``` The example above configures different node counts/configurations depending on the deployment target. Deploying the application in the _dev_ environment gives: ``` ``` 2 ``` ``` Whereas in `aws-us-west-2a` it is: ``` ``` 2 ``` ``` This can be used to modify any config by deployment target. The `deploy` directives have a set of override rules: - A directive specifying more conditions will override one specifying fewer. - Directives are inherited in child elements. - When multiple XML elements with the same name is specified (e.g. when specifying search or docproc chains), the _id_ attribute or the _idref_ attribute of the element is used together with the element name when applying directives. Some overrides are applied by default in some environments, see [environments](https://cloud.vespa.ai/en/reference/environments). Any override made explicitly for an environment will override the defaults for it. ###### Specifying multiple targets More than one tag, instance, region or environment can be specified in the attribute, separated by space. Note that `tags` by default only apply in production instances, and are matched whenever the tags of the element and the tags of the instance intersect. To match tags in other environments, an explicit `deploy:environment` directive for that environment must also match. Use tags if you have a complex instance structure which you want config to vary by. The namespace can be applied to any element. Example: ``` ``` Hello from application config Hello from east colo! ``` ``` Above, the `container` element is configured for the 3 environments only (it will not apply to `dev`) - and in region `aws-us-east-1c`, the config is different. ##### Query profile variants [Query profiles](/en/query-profiles.html) support different configuration settings for different _instances_, _environments_ and _regions_ through [query profile variants](/en/query-profiles.html#query-profile-variants). This allows you to set different query parameters for a query type depending on these deployment attributes. To use this feature, create a regular query profile variant with any of `instance`, `environment` and `region` as dimension names and let your query profile vary by that. For example: ``` ``` instance, environment, region My default value My beta value My dev value My main instance prod value ``` ``` You can pick and combine these dimensions in any way you want with other dimensions sent as query parameters, e.g: ``` ``` device, instance, usecase ``` ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Deployment ### deployment.xml _deployment.xml_ controls how an application is deployed. #### deployment.xml _deployment.xml_ controls how an application is deployed. _deployment.xml_ is placed in the root of the [application package](/en/applications.html) and specifies which environments and regions the application is deployed to during [automated application deployment](/en/cloud/automated-deployments.html), as which application instances. Deployment progresses through the `test` and `staging` environments to the `prod` environments listed in _deployment.xml_. Simple example: ``` ``` aws-us-east-1c aws-us-west-2a ``` ``` More complex example: ``` ``` aws-us-east-1c aws-us-east-1c aws-us-west-1c aws-eu-west-1a aws-us-west-2a aws-us-east-1c beta ``` ``` Some of the elements can be declared _either_ under the `` root, **or**, if one or more `` tags are listed, under these. These have a bold **or** when listing where they may be present. ##### deployment The root element. | Attribute | Mandatory | Values | | --- | --- | --- | | version | Yes | 1.0 | | major-version | No | The major version number this application is valid for. | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). | ##### instance In `` or `` (which must be a direct descendant of the root). An instance of the application; several of these may be simultaneously deployed in the same zone. If no `` is specified, all children of the root are implicitly children of an `` with `id="default"`, as in the simple example at the top. | Attribute | Mandatory | Values | | --- | --- | --- | | id | Yes | The unique name of the instance. | | tags | No | Space-separated tags which can be referenced to make [deployment variants](deployment-variants.html). | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. | ##### block-change In ``, **or** ``. This blocks changes from being deployed to production in the matching time interval. Changes are nevertheless tested while blocked. By default, both application revision changes and Vespa platform changes (upgrades) are blocked. It is possible to block just one kind of change using the `revision` and `version` attributes. Any combination of the attributes below can be specified. Changes on a given date will be blocked if all conditions are met. Invalid `` tags (i.e. that contains conditions that never match an actual date) are rejected by the system. This tag must be placed after any `` and `` tags, and before ``. It can be declared multiple times. | Attribute | Mandatory | Values | | --- | --- | --- | | revision | No, default `true` | Set to `false` to allow application deployments | | version | No, default `true` | Set to `false` to allow Vespa platform upgrades | | days | No, default `mon-sun` | List of days this block is effective - a comma-separated list of single days or day intervals where the start and end day are separated by a dash and are inclusive. Each day is identified by its english name or three-letter abbreviation. | | hours | No, default `0-23` | List of hours this block is effective - a comma-separated list of single hours or hour intervals where the start and end hour are separated by a dash and are inclusive. Each hour is identified by a number in the range 0 to 23. | | time-zone | No, default UTC | The name of the time zone used to interpret the hours attribute. Time zones are full names or short forms, when the latter is unambiguous. See [ZoneId.of](https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#of-java.lang.String-) for the full spec of acceptable values. | | from-date | No | The inclusive starting date of this block (ISO-8601, `YYYY-MM-DD`). | | to-date | No | The inclusive ending date of this block (ISO-8601, `YYYY-MM-DD`). | The below example blocks all changes on weekends, and blocks revisions outside working hours, in the PST time zone: ``` ``` ``` ``` The below example blocks: - all changes on Sundays starting on 2022-03-01 - all changes in the hours 16-23 between 2022-02-10 and 2022-02-15 - all changes until 2022-01-05 ``` ``` ``` ``` ##### upgrade In ``, or ``. Determines the strategy for upgrading the application, or one of its instances. By default, application revision changes and Vespa platform changes are deployed separately. The exception is when an upgrade fails; then, the latest application revision is deployed together with the upgrade, as these may be necessary to fix the upgrade failure. | Attribute | Mandatory | Values | | --- | --- | --- | | rollout | No, default `separate` | - `separate` is the default. When a revision catches up to a platform upgrade, it stays behind, unless the upgrade alone fails. - `simultaneous` favors revision roll-out. When a revision catches up to a platform upgrade, it joins, and then passes the upgrade. | | revision-target | No, default `latest` | - `latest` is the default. When rolling out a new revision to an instance, the latest available revision is chosen. - `next` trades speed for smaller changes. When rolling out a new revision to an instance, the next available revision is chosen. The available revisions for an instance are revisions which are not yet deployed, or revisions which have rolled out in previous instances. | | revision-change | No, default `when-failing` | - `always` is the most aggressive setting. A new, available revision may always replace the one which is currently rolling out. - `when-failing` is the default. A new, available revision may replace the one which is currently rolling out if this is failing. - `when-clear` is the most conservative setting. A new, available revision may never replace one which is currently rolling out. Revision targets will never automatically change inside [revision block window](#block-change), but may be set by manual intervention at any time. | | max-risk | No, default `0` | May only be used with `revision-change="when-clear"` and `revision-target="next"`. The maximum amount of [risk](https://cloud.vespa.ai/en/reference/vespa-cloud-api#submission-properties) to roll out per new revision target. The default of `0` results in the next build always being chosen, while a higher value allows skipping intermediate builds, as long as the cumulative risk does not exceed what is configured here. | | min-risk | No, default `0` | Must be less than or equal to the configured `max-risk`. The minimum amount of [risk](https://cloud.vespa.ai/en/reference/vespa-cloud-api#submission-properties) to start rolling out a new revision. The default of `0` results in a new revision rolling out as soon as anything is ready, while a higher value lets the system wait until enough cumulative risk is available. This can be used to avoid blocking a lengthy deployment process with trivial changes. | | max-idle-hours | No, default `8` | May only be used when `min-risk` is specified, and greater than `0`. The maximum number of hours to wait for enough cumulative risk to be available, before rolling out a new revision. | ##### test Meaning depends on where it is located: | Parent | Description | | --- | --- | | `` `` | If present, the application is deployed to the [`test`](https://cloud.vespa.ai/en/reference/environments.html#test) environment, and system tested there, even if no prod zones are deployed to. Also, when specified, system tests _must_ be present in the application test package. See guides for [getting to production](/en/cloud/production-deployment). If present in an `` element, system tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous system tests for any instance are acceptable. | | `` `` `` | If present, production tests are run against the production region with id contained in this element. A test must be _after_ a corresponding [region](#region) element. When specified, production tests _must_ be preset in the application test package. See guides for [getting to production](/en/cloud/production-deployment). | | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | For [system tests](/en/cloud/automated-deployments.html#system-tests) only: account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. Cloud account _must not_ be specified for [production tests](/en/cloud/automated-deployments.html#production-tests), which always run in the account of the corresponding deployment. | ##### staging In ``, or ``. If present, the application is deployed to the[`staging`](https://cloud.vespa.ai/en/reference/environments.html#staging) environment, and tested there, even if no prod zones are deployed to. If present in an `` element, staging tests are run for that specific instance before any production deployments of the instance may proceed — otherwise, previous staging tests for any instance are acceptable. When specified, staging tests _must_ be preset in the application test package. See guides for [getting to production](/en/cloud/production-deployment.html). | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. | ##### prod In ``, **or** in ``. If present, the application is deployed to the production regions listed inside this element, under the specified instance, after deployments and tests in the `test` and `staging` environments. | Attribute | Mandatory | Values | | --- | --- | --- | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. | ##### region In ``, ``, ``, or ``. The application is deployed to the production[region](https://cloud.vespa.ai/en/reference/zones.html) with id contained in this element. | Attribute | Mandatory | Values | | --- | --- | --- | | fraction | No | Only when this region is inside a group: The fractional membership in the group. | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. | ##### dev In ``. Optionally used to control deployment settings for the [dev environment](https://cloud.vespa.ai/en/reference/environments.html). This can be used specify a different cloud account, tags, and private endpoints. | Attribute | Mandatory | Values | | --- | --- | --- | | tags | No | Space-separated tags which can be referenced to make [deployment variants](deployment-variants.html). | | cloud-account | No | Account to deploy to with [Enclave](/en/cloud/enclave/enclave.html). Overrides parent's use of cloud-account. | ##### delay In ``, ``, ``, ``, or ``. Introduces a delay which must pass after completion of all previous steps, before subsequent steps may proceed. This may be useful to allow some grace time to discover errors before deploying a change in additional zones, or to gather higher-level metrics for a production deployment for a while, before evaluating these in a production test. The maximum total delay for the whole deployment spec is 48 hours. The delay is specified by any combination of the `hours`, `minutes` and `seconds` attributes. ##### parallel In ``, ``, or ``. Runs the contained steps in parallel: instances if in ``, or primitive steps (deployments, tests or delays) or a series of these (see [steps](#steps)) otherwise. Multiple `` elements are permitted. The following example will deploy to `us-west-1` first, then to `us-east-3` and `us-central-1`simultaneously, and, finally to `eu-west-1`, once both parallel deployments have completed: ``` ``` us-west-1 us-east-3 us-central-1 eu-west-1 ``` ``` ##### steps In ``. Runs the contained parallel or primitive steps (deployments, tests or delays) serially. The following example will in parallel: 1. deploy to `us-east-3`, 2. deploy to `us-west-1`, then delay 1 hour, and run tests for `us-west-1`, and 3. delay for two hours. Thus, the parallel block is complete when both deployments are complete, tests are successful for the second deployment, and at least two hours have passed since the block began executing. ``` ``` us-east-3 us-west-1 us-west-1 ``` ``` ##### tester In ``, `` and ``. Specifies container settings for the tester application container, which is used to run system, staging and production verification tests. The allowed elements inside this are [``](services.html#nodes). ``` ``` ``` ``` ##### endpoints (global) In ``, without any ``declared **or** in ``: This allows_global_ endpoints, via one or more [``](#endpoint-global) elements; and [zone endpoint](#endpoint-zone) and [private endpoint](#endpoint-private)elements for cloud-native private network configuration. ##### endpoints (dev) In ``. This allows[zone endpoint](#endpoint-zone) elements for cloud-native private network configuration for[dev](https://cloud.vespa.ai/en/reference/environments.html#dev) deployments. Note that [private endpoints](#endpoint-private) are only supported in `prod`. ##### endpoint (global) In `` or ``. Specifies a global endpoint for this application. Each endpoint will point to the regions that are declared in the endpoint. If no regions are specified, the endpoint defaults to the regions declared in the `` element. The following example creates a default endpoint to all regions, and a _us_ endpoint pointing only to US regions. ``` ``` aws-us-east-1c aws-us-west-2a ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | id | No | The identifier for the endpoint. This will be part of the endpoint name that is generated. If not specified, the endpoint will be the default global endpoint for the application. | | container-id | Yes | The id of the [container cluster](/en/reference/services-container.html) to which requests to the global endpoint is forwarded. | Global endpoints are implemented using Route 53 and healthchecks, to keep active zones in rotation. See [BCP](#bcp) for advanced configurations. ##### endpoint (zone) In `` or ``, with `type='zone'`. Used to disable public zone endpoints. _Non-public endpoints can not be used in global endpoints, which require that all constituent endpoints are public._The example disables the public zone endpoint for the `my-container`container cluster in all regions, except where it is explicitly enabled, in `region-1`. Changing endpoint visibility will make the service unavailable for a short period of time. ``` ``` region-1 ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | type | Yes | Private endpoints are specified with `type='zone'`. | | container-id | Yes | The id of the [container cluster](/en/reference/services-container.html) to disable public endpoints for. | | enabled | No | Whether a public endpoint for this container cluster should be enabled; default `true`. | ##### endpoint (private) In `` or ``, with `type='private'`. Specifies a private endpoint service for this application. Each service will be launched in the regions that are declared in the endpoint. If no regions are specified, the service is launched in all regions declared in the`` element, that support any of the declared [access types](#allow). The following example creates a private endpoint in two specific regions. ``` ``` aws-us-east-1c gcp-us-central1-f ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | type | Yes | Private endpoints are specified with `type='private'`. | | container-id | Yes | The id of the [container cluster](/en/reference/services-container.html) to which requests to the private endpoint service is forwarded. | | auth-method | No | The authentication method to use with this [private endpoint](/en/cloud/private-endpoints.html). Must be either `mtls` or `token`. Defaults to mTLS if not included. | ##### allow In ``. Allows a principal identified by the URN to set up a connection to the declared private endpoint service. This element must be repeated for each additional URN. An endpoint service will only consider allowed URNs of a compatible type, and will only be created if at least one compatible access type-and-URN is given: - For AWS deployments, specify `aws-private-link`, and an _ARN_. - For GCP deployments, specify `gcp-service-connect`, and a _project ID_ ``` ``` ``` ``` | Attribute | Mandatory | Values | | --- | --- | --- | | with | Yes | The private endpoint access type; must be `aws-private-link` or `gcp-service-connect`. | | arn | Maybe | Must be specified with `aws-private-link`. See [AWS documentation](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html) for more details. | | project | Maybe | Must be specified with `gcp-service-connect`. See [GCP documentation](https://cloud.google.com/vpc/docs/configure-private-service-connect-services) for more details. | ##### bcp In `` or ``. Defines the BCP (Business Continuity Planning) structure of this instance: Which zones should take over for which others during the outage of a zone and how fast they must have the capacity ready. Autoscaling uses this information to decide the ideal cpu load of a zone. If this element is not defined, it is assumed that all regions covers for an equal share of the traffic of all other regions and must have that capacity ready at all times. If a bcp element is specified at the root, and explicit instances are used, that bcp element becomes the default for all instances that does not contain a bcp element themselves. If a BCP element contains no group elements it will implicitly define a single group of all the regions of the instance in which it is used. See [BCP test](https://cloud.vespa.ai/en/reference/bcp-test.html) for a procedure to verify that your BCP configuration is correct. | Attribute | Mandatory | Values | | --- | --- | --- | | deadline | No | The max time after a region becomes unreachable until the other regions in its BCP group must be able to handle the traffic of it, given as a number of minutes followed by 'm', 'h' or 'd' (for minutes, hours or days). The default deadline is 0: Regions must at all times have capacity to handle BCP traffic immediately. By providing a deadline, autoscaling can avoid the cost of provisioning additional resources for BCP capacity if it predicts that it can grow to handle the traffic faster than the deadline in a given cluster. This is the default deadline to be used for all groups that don't specify one themselves. | Example: ``` ``` us-east1 us-east2 us-central1 us-west1 us-west2 us-central1 ``` ``` ##### group In ``. Defines a bcp group: A set of regions whose members cover for each other during a regional outage. Each region in a group will (as allowed, when autoscaling ranges are configured) provision resources sufficient to handle that any other single region in the group goes down. The traffic of the region is assumed to be rerouted in equal amount to the remaining regions in the group. That is, if a group has one member, no resources will be provisioned to handle an outage in that member. If a group has two members, each will aim to provision sufficient resources to handle the actual traffic of the other. If a group has three members, each will provision to handle half of the traffic observed in the region among the two others which receives the most traffic. A region may have fractional membership in multiple groups, meaning it will handle just that fraction of the traffic of the remaining members, and vice versa. A regions total membership among groups must always sum to exactly 1. A group may also define global endpoints for the region members in the group. This is exactly the same as defining the endpoint separately and repeating the regions of the group under the endpoint. Endpoints under a group cannot contain explicit region sub-elements. | Attribute | Mandatory | Values | | --- | --- | --- | | deadline | No | The deadline of this BCP group. See deadline on the BCP element. | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [deployment](#deployment) - [instance](#instance) - [block-change](#block-change) - [upgrade](#upgrade) - [test](#test) - [staging](#staging) - [prod](#prod) - [region](#region) - [dev](#dev) - [delay](#delay) - [parallel](#parallel) - [steps](#steps) - [tester](#tester) - [endpoints (global)](#endpoints-global) - [endpoints (dev)](#endpoints-dev) - [endpoint (global)](#endpoint-global) - [endpoint (zone)](#endpoint-zone) - [endpoint (private)](#endpoint-private) - [allow](#allow) - [bcp](#bcp) - [group](#group) --- ### Deployment In this document we explain various aspects of application deployment in detail. #### Deployment In this document we explain various aspects of application deployment in detail. Refer to [application deployment](applications.html#deploying-applications) for an overview. ##### Convergence After the deployment command has succeeded, the application package will take effect, but this does not complete immediately in the distributed system that is your running application; it happens through a distributed _convergence_ process that you can track from the command line or console. Refer to the [deploy reference](reference/application-packages-reference.html#deploy)for detailed steps run when deploying an application. You can get the status of the last deployment by using the status command: ``` ``` $ vespa status deployment ``` ``` ##### Rollback To roll back an application package change, deploy again with the previous version to roll back to - one of: 1. With automation: Revert the code in the source code repository, and let the automation roll out the new version. You can speed up the deployment by skipping tests and clicking "deploy now" in the deployment graph in the console. 2. If you have trouble rebui8lding a good package (you should not), you can download a previous package from Vespa Cloud: Use the [console](https://cloud.vespa.ai/en/automated-deployments.html#source-code-repository-integration) to pick the good version, download it and deploy again. Hover of the [instance](https://cloud.vespa.ai/en/automated-deployments.html#block-windows) (normally called "default") to skip the system and staging test to speed up the deployment, if needed. 3. On self-managed instances, regenerate the good version from source for new deployment, see also the [deploy API](/en/reference/deploy-rest-api-v2.html#rollback) ##### File distribution The application package can have components and other large files. When an app is deployed, these files are distributed to the nodes: - Components (i.e bundles) - Files with type _path_ and _url_ in config, see [Adding files to the component configuration](configuring-components.html#adding-files-to-the-component-configuration) - Machine learned models - [Constant tensors](reference/schema-reference.html#constant) When new components or files specified in config are distributed, the container gets a new file reference, waits for it to be available and switches to new config when all files are available. ![Nodes get config from a config server cluster](/assets/img/config-delivery.svg) ##### Deploying remote models Most application packages are stored as source code in a code repository. However, some resources are generated or too large to store in a code repository, like models or an [FSA](/en/operations/tools.html#vespa-makefsa). Machine learned models in Vespa, are stored in the application package under the _models_ directory. This might be inconvenient for some applications, for instance for models that are frequently retrained on some remote system. Also, models might be too large to fit within the constraints of the version control system. The solution is to download the models from the remote location during the application package build. This is simply implemented by adding a step in _pom.xml_(see [example](https://github.com/vespa-cloud/cord-19-search/blob/main/pom.xml)): ``` ``` org.codehaus.mojo exec-maven-plugin 1.4.0 download-model generate-resources exec bin/download_models.sh target/application/models MODEL-URL ``` ``` _bin/download\_model.sh_ example: ``` #!/bin/bash DIR="$1" URL="$2" echo "[INFO] Downloading $URL into $DIR" mkdir -p $DIR pushd $DIR curl -O $URL popd ``` Any necessary credentials for authentication and authorization should be added to this script, as well as any unpacking of archives (for TensorFlow models for instance). Also see the [model](reference/config-files.html#model) config type to specify resources that should be downloaded by container nodes during convergence. Copyright © 2025 - [Cookie Preferences](#) --- ## Developer Guide ### Developer Guide See [getting started](/en/getting-started.html) to deploy a basic sample application, or its Java variant to deploy an application with custom Java components. #### Developer Guide See [getting started](/en/getting-started.html) to deploy a basic sample application, or its Java variant to deploy an application with custom Java components. Keep reading for more details on how to develop applications, including basic terminology, tips on using the Vespa Cloud Console, and how to benchmark and size your application. [Automated deployments](/en/cloud/automated-deployments.html) makes production deployments safe and simple. ##### Manual deployments Developers will typically deploy their application to the `dev` [zone](/en/cloud/zones.html) during development. Each deployment is owned by a _tenant_, and each specified _instance_ is a separate copy of the application; this lets developers work on independent copies of the same application, or collaborate on a shared one, as they prefer—more details [here](/en/cloud/tenant-apps-instances.html). These values can be set in the Vespa Cloud UI when deploying, or with each of the build and deploy tools, as shown in the respective getting-started guides. Additionally, a deployment may specify a different [zone](/en/cloud/zones.html) to deploy to, instead of the default `dev` zone. ###### Auto downsizing Deployments to `dev` are downscaled to one small node by default, so that applications can be deployed there without changing `services.xml`. See [performance testing](#performance-testing) for how to disable auto downsizing using `deploy:environment="dev"`. ###### Availability The `dev` zone is a sandbox and not for production serving; It has no uptime guarantees. An automated Vespa software upgrade can be triggered at any time, and this may lead to some downtime if you have only one node per cluster (as with the default [auto downsizing](#auto-downsizing)). ##### Performance testing For performance testing, to avoid auto downsizing, lock the [resources](/en/reference/services.html) using `deploy:environment="dev"`: ``` ``` ``` ``` Read more in [benchmarking](/en/cloud/benchmarking.html) and [variants in services.xml](/en/reference/deployment-variants.html). ##### Component overview ![Vespa Overview](/assets/img/vespa-overview.svg) Application packages can contain Java components to be run in container clusters. The most common component types are: - [Searchers](searcher-development.html), which can modify or build the query, modify the result, implement workflows issuing multiple queries etc. - [Document processors](document-processing.html) that can modify incoming write operations. - [Handlers](jdisc/developing-request-handlers.html) that can implement custom web service APIs. - [Renderers](result-rendering.html) that are used to define custom result formats. Components are constructed by dependency injection and are reloaded safely on deployment without restarts. See the [container documentation](jdisc/index.html) for more details. See the sample applications in [getting started](getting-started.html), to find examples of applications containing Java components. Also see [troubleshooting](/en/operations-selfhosted/admin-procedures.html#troubleshooting). ##### Developing Components The development cycle consists of creating the component, deploying the application package to Vespa, writing tests, and iterating. These steps refer to files in [album-recommendation-java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java): | Build | All the Vespa sample applications use the [bundle plugin](components/bundles.html#maven-bundle-plugin) to build the components. | | | Configure | A key Vespa feature is code and configuration consistency, deployed using an [application package](applications.html). This ensures that code and configuration is in sync, and loaded atomically when deployed. This is done by generating config classes from config definition files. In Vespa and application code, configuration is therefore accessed through generated config classes. The Maven target `generate-sources` (invoked by `mvn install`) uses [metal-names.def](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/main/resources/configdefinitions/metal-names.def) to generate `target/generated-sources/vespa-configgen-plugin/com/mydomain/example/MetalNamesConfig.java`. After generating config classes, they will resolve in tools like [IntelliJ IDEA](https://www.jetbrains.com/idea/download/). | | | Tests | Examples unit tests are found in [MetalSearcherTest.java](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation-java/src/test/java/ai/vespa/example/album/MetalSearcherTest.java). `testAddedOrTerm1` and `testAddedOrTerm2` illustrates two ways of doing the same test: - The first setting up the minimal search chain for [YQL](query-language.html) programmatically - The second uses ` com.yahoo.application.Application`, which sets up the application package and simplifies testing Read more in [unit testing](unit-testing.html). | ##### Debugging Components **Important:** The debugging procedure only works for endpoints with an open debug port - most managed services don't do this for security reasons. Vespa Cloud does not allow debugging over the _Java Debug Wire Protocol (JDWP)_ due to the protocol's inherent lack of security measures. If you need interactive debugging, deploy your application to a self-hosted Vespa installation (below) and manually [add the _JDWP_ agent to JVM options](/en/developer-guide.html#debugging-components). You may debug your Java code by requesting either a JVM heap dump or a Java Flight Recorder recording through the [Vespa Cloud Console](https://console.vespa-cloud.com/). Go to your application's cluster overview and select _export JVM artifact_ on any _container_ node. The process will take up to a few minutes. You'll find the steps to download the dump on the Console once it's completed. Extract the files from the downloaded Zstandard-compressed archive, and use the free [JDK Mission Control](https://www.oracle.com/java/technologies/jdk-mission-control.html) utility to inspect the dump/recording. ![Generate JVM dump](/assets/img/jvm-dump.png) To debug a [Searcher](searcher-development.html) / [Document Processor](document-processing.html) / [Component](jdisc/container-components.html) running in a self-hosted container, set up a remote debugging configuration in the IDEA - IntelliJ example: 1. Run -\> Edit Configurations... 2. Click `+` to add a new configuration. 3. Select the "Remote JVM Debug" option in the left-most pane. 4. Set hostname to the host running the container, change the port if needed. 5. Set the container's [jvm options](reference/services-container.html#jvm) to the value in "Command line arguments for remote JVM": ``` \ ``` 6. Re-deploy the application, then restart Vespa on the node that runs the container. Make sure the port is published if using a Docker/Podman container, e.g.: ``` $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071--publish 127.0.0.1:5005:5005\ vespaengine/vespa ``` 7. Start debugging! Check _vespa.log_ for errors. [![Video thumbnail](/assets/img/video-thumbs/deploying-a-vespa-searcher.png)](https://www.youtube.com/embed/dUCLKtNchuE)  **Vespa videos:** Find _Debugging a Vespa Searcher_ in the vespaengine [youtube channel](https://www.youtube.com/@vespaai)! ##### Developing system and staging tests When using Vespa Cloud, system and tests are most easily developed using a test deployment in a `dev` zone to run the tests against. Refer to [general testing guide](/en/testing.html) for a discussion of the different test types, and the [basic HTTP tests](/en/reference/testing.html) or [Java JUnit tests](/en/reference/testing-java.html) reference for how to write the relevant tests. If using the [Vespa CLI](/en/vespa-cli.html) to deploy and run [basic HTTP tests](/en/reference/testing.html), the same commands as in the test reference will just work, provided the CLI is configured to use the `cloud` target. ###### Running Java tests With Maven, and [Java Junit tests](/en/reference/testing-java.html), some additional configuration is required, to infuse the test runtime on the local machine with API and data plane credentials: ``` $ mvn test \ -D test.categories=system \ -D dataPlaneKeyFile=data-plane-private-key.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem \ -D apiKey="$API_KEY" ``` The `apiKey` is used to fetch the _dev_ instance's endpoints. The data plane key and certificate pair is used by [ai.vespa.hosted.cd.Endpoint](https://github.com/vespa-engine/vespa/blob/master/tenant-cd-api/src/main/java/ai/vespa/hosted/cd/Endpoint.java) to access the application endpoint. See the [Vespa Cloud API reference](https://cloud.vespa.ai/en/reference/vespa-cloud-api) for details on configuring Maven invocations. Note that the `-D vespa.test.config` argument is gone; this configuration is automatically fetched from the Vespa Cloud API—hence the need for the API key. When running Vespa self-hosted like in the [sample application](/en/deploy-an-application-local.html), no authentication is required by default, to either API or container, and specifying a data plane key and certificate will instead cause the test to fail, since the correct SSL context is the Java default in this case. Make sure the TestRuntime is able to start. As it will init an SSL context, make sure to remove config when running locally, in order to use a default context. Remove properties from _pom.xml_ and IDE debug configuration. Developers can also set these parameters in the IDE run configuration to debug system tests: ``` -D test.categories=system -D tenant=my_tenant -D application=my_app -D instance=my_instance -D apiKeyFile=/path/to/myname.mytenant.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem -D dataPlaneKeyFile=data-plane-private-key.pem ``` ##### Tips and troubleshooting - Vespa Cloud upgrades daily, and applications in `dev` also have their Vespa platform upgraded. This usually happens at the opposite time of day of when deployments are made to each instance, and takes some minutes. Deployments without redundancy will be unavailable during the upgrade. - Failure to deploy, due to authentication (HTTP code 401) or authorization (HTTP code 403), is most often due to wrong configuration of `tenant` and/or `application`, when using command line tools to deploy. Ensure the values set with Vespa CLI or in `pom.xml` match what is configured in the UI. For Maven, also see [here](https://cloud.vespa.ai/en/reference/vespa-cloud-api) for details. - In case of data plane failure, remember to copy the public certificate to `src/main/application/security/clients.pem` before building and deploying. This is handled by the Vespa CLI `vespa auth cert` command. - To run Java [system and staging tests](/en/reference/testing-java.html) in an IDE, ensure all API and data plane keys and certificates are configured in the IDE as well; not all IDEs pick up all settings from `pom.xml` correctly: Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Manual deployments](#manual-deployments) - [Auto downsizing](#auto-downsizing) - [Availability](#availability) - [Performance testing](#performance-testing) - [Component overview](#component-overview) - [Developing Components](#developing-components) - [Debugging Components](#debugging-components) - [Developing system and staging tests](#developing-system-and-staging-tests) - [Running Java tests](#running-java-tests) - [Tips and troubleshooting](#tips-and-troubleshooting) --- ## Developing Request Handlers ### Developing request handlers This document explains how to implement and deploy a custom request handler. #### Developing request handlers This document explains how to implement and deploy a custom request handler. In most cases, implementing your own request handlers is unnecessary, as both searchers and processors can access the request data directly. However, there are a few cases where custom request handlers are useful: 1. You need to implement a custom REST API. 2. Your application needs to control which parameters are used to route requests to a particular search or processing chain. ##### Implementing a request handler Upon receiving a request, the request handler must consume its content, process it, and then return a response. The most convenient way to implement a request handler is by subclassing the [ThreadedHttpRequestHandler](https://javadoc.io/doc/com.yahoo.vespa/container-core/latest/com/yahoo/container/jdisc/ThreadedHttpRequestHandler.html). This utility base class uses a synchronous API and a multithreaded execution model. It also implements a lot of functionality that is needed by most request handlers: - queries are automatically written to the access log - an HTTP date header is added to the response (if your own code adds a date header, it will not be overwritten, though) - logging of exceptions and queries that time out - automatic shutdown when an Error is thrown ###### Example request handler implementations The [Vespa sample apps](https://github.com/vespa-engine/sample-apps) on GitHub contains a few example request handler implementations: | Handler | Description | | --- | --- | | [DemoHandler](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoHandler.java) | A handler that modifies a request before dispatching it to the `ProcessingHandler`. This handler is also used in the [HTTP API tutorial](http-api-tutorial.html). Note that since this depends on ProcessingHandler you must add \ to your \ tag to use it. If you want to issue Queries instead, have com.yahoo.search.searchchain.ExecutionFactory injected instead and use it to create executions and call search/fill on them. | ##### Deploying a request handler To deploy a request handler in an application, use the [handler](../reference/services-container.html#handler) element in _services.xml_: ``` \\http://\*/\*\\ ``` A request handler may be bound to zero or more URI patterns by adding a [binding](../reference/services-container.html#binding) element for each pattern. Copyright © 2025 - [Cookie Preferences](#) --- ## Developing Server Providers ### Developing server providers The [com.yahoo.jdisc.service.ServerProvider](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/service/ServerProvider.html) interface defines a component that is capable of acting as a server for an external client. #### Developing server providers The [com.yahoo.jdisc.service.ServerProvider](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/service/ServerProvider.html) interface defines a component that is capable of acting as a server for an external client. This document explains how to implement and deploy a custom server provider. All requests that are processed in a JDisc application are created by server providers. These are the parts of the JDisc Container that accept incoming connections. Upon accepting a request from an external client, the server provider must create and dispatch a corresponding `com.yahoo.jdisc.Request` instance. Upon receiving the `com.yahoo.jdisc.Response`, the server needs to respond back to the client. To implement a server provider, either implement the [ServerProvider](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/service/ServerProvider.html) interface directly, or subclass the more convenient [AbstractServerProvider](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/service/AbstractServerProvider.html). Please note the following: - All server providers require a local reference to `CurrentContainer`. Declare that as a constructor argument (which triggers [injection](../jdisc/injecting-components.html)), and store it locally. - All requests dispatched by a server provider should be "server" requests (i.e. requests whose `isServerRequest()` method returns `true`). To create such a request, use [this constructor](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/Request.html#Request-com.yahoo.jdisc.service.CurrentContainer-java.net.URI-). - The code necessary to dispatch a request and write its content into the returned `ContentChannel` is the same as for [dispatching a client request](low-level-request-handlers.html#dispatching-a-client-request) from a request handler. - The code necessary to handle the response and its content is the same as for [handling a client response](low-level-request-handlers.html#handling-a-client-response) in a request handler. To install a server provider in a container, use the [server](../reference/services-container.html#server) element in _services.xml_, e.g.: ``` ``` Copyright © 2025 - [Cookie Preferences](#) --- ## Developing Web Services ### Developing Web Service Applications This document explains how to develop (REST) web service type applications on the container - design options, accessing the request path, returning a status code etc. #### Developing Web Service Applications This document explains how to develop (REST) web service type applications on the container - design options, accessing the request path, returning a status code etc. There are two types of web service APIs: - Fine-grained APIs with closed semantics – for example _return the number of stars of an article_ - Coarse-grained APIs with open semantics – for example _return a page containing the most relevant mixture of stuff for this user and action_ With coarse-grained APIs, the container can help handle the complexity typically involved in the implementation of such APIs by providing a way to compose and federate components contributing to processing the request and provide and modify the returned data, and a way to allow such requests to start returning before they are finished to reduce latency with large responses. This is the [processing](jdisc/processing.html) framework (or, in the case of search-like application, the [searcher](searcher-development.html) specialization). In addition, the [container](reference/component-reference.html#component-types)features a generic mechanism allowing a [request handler](jdisc/developing-request-handlers.html)to be [bound](reference/component-reference.html#binding) to a URI pattern and invoked to handle all requests matching that pattern. This is useful where there is no need to handle complexity and/or federation of various kinds of data in the response. Both the approaches above are actually implemented as built-in request handlers. A custom request handler may be written to parse the url path/method and dispatch to an appropriate chain of processing components. A "main" processing chain may be written to do the same by dispatching to other chains. The simplest way to invoke a specific chain of processors is to forward a query to the `ProcessingHandler`with the request property `chain` set to the name of the chain to invoke: ``` import com.yahoo.component.annotation.Inject; public class DemoHandler extends com.yahoo.container.jdisc.ThreadedHttpRequestHandler { ... @Inject public DemoHandler(Executor executor, ProcessingHandler processingHandler) { super(executor); this.processingHandler = processingHandler; } ... @Override public HttpResponse handle(HttpRequest request) { HttpRequest processingRequest = new HttpRequest.Builder(request) .put(com.yahoo.processing.Request.CHAIN, "theProcessingChainIWant") .createDirectRequest(); HttpResponse r = processingHandler.handle(processingRequest); return r; } ... } ``` ##### Accessing the HTTP request Custom [request handlers](jdisc/developing-request-handlers.html), are given a[com.yahoo.container.jdisc.HttpRequest](https://javadoc.io/doc/com.yahoo.vespa/container-core/latest/com/yahoo/container/jdisc/HttpRequest.html), with direct access to associated properties and request data. In [Processing](jdisc/processing.html), the Processors are given a [com.yahoo.processing.Request](https://javadoc.io/doc/com.yahoo.vespa/container-core/latest/com/yahoo/processing/Request.html)containing the HTTP URL parameters: ``` // url parameters are added to properties String urlParameter = request.properties().get("urlParameterName"); // jdisc request context is added with prefix context Object contextValue = request.properties().get("context.contextKey"); ``` If needed, a Processor can retrieve the entire HTTP request via a utility function: ``` import com.yahoo.container.jdisc.HttpRequest; ... // Retrieve the underlying HTTP request: Optional httpRequest = HttpRequest.getHttpRequest(request); if (httpRequest.isPresent()) { // The POST data input stream: InputStream in = httpRequest.get().getData(); // The HTTP method: Method method = httpRequest.get().getMethod(); } ``` ###### Setting the HTTP status and HTTP headers In Processing, the return status can be set by adding a special Data item to the Response: ``` response.data().add(new com.yahoo.processing.handler.ResponseStatus(404, request)); ``` If no such data element is present, the status will be determined by the container. If it contains data able to render, it will be 200, otherwise it will be determined by any ErrorMessage present in the response. ###### Setting response headers from Processors Response headers may be added to any Response by adding instances of`com.yahoo.processing.handler.ResponseHeaders` to the Response (ResponseHeaders is a kind of response Data). Multiple instances of this may be added to the Response, and the complete set of headers returned is the superset of all such objects. Example Processor: ``` processingResponse.data().add(new com.yahoo.processing.handler.ResponseHeaders(myHeaders, request)); ``` Request handlers may in general set their return status, and manipulate headers directly on the HttpRequest. ##### Queries Sometimes all that is needed is letting the standard query framework reply for more paths than standard. This is possible by adding extra [binding](reference/services-search.html#binding)s inside the `` element in `services.xml`. Writing a custom [request handler](jdisc/developing-request-handlers.html)is recommended if the application is a standalone HTTP API, and especially if there are properties used with the same name as those in the[Query API](reference/query-api-reference.html). A request handler may query the search components running in the same container without any appreciable overhead: ###### Invoking Vespa queries from a component To invoke Vespa queries from a component, have an instance of [ExecutionFactory](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/searchchain/ExecutionFactory.java) injected in the constructor and use its API to construct and issue the query. The container this runs in must include the `` tag for the ExecutionFactory to be available. Example: ``` import com.yahoo.component.annotation.Inject; import com.yahoo.component.ComponentId; import com.yahoo.search.Query; import com.yahoo.search.Result; import com.yahoo.component.Chain; import com.yahoo.search.searchchain.Execution; import com.yahoo.search.searchchain.ExecutionFactory; public class MyComponent { private final ExecutionFactory executionFactory; @Inject public MyComponent(ExecutionFactory executionFactory) { this.executionFactory = executionFactory; } Result executeQuery(Query query, String chainId) { Chain searchChain = executionFactory.searchChainRegistry().getChain(new ComponentId(chainId)); Execution execution = executionFactory.newExecution(searchChain); query.getModel().setExecution(execution); return execution.search(query); } } ``` ExecutionFactory depends on the search chains, so it cannot be injected into any component which is part of the search chains. But from within a Searcher it is not needed as the Execution passed gives what is needed: - Access the search chains: execution.context().searchChainRegistry(). - Create a new Execution: new Execution(mySearchChain, execution.context()) This is the right way since it ties that execution to the one you're in. One hence cannot execute a search chain from the search chain component constructor to e.g. refresh a cache. It is impossible since the search chains can't be constructed until this constructor returns. An alternative is to extract the refreshing into a separate component which has both the client and execution factory injected into it. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Accessing the HTTP request](#accessing-the-http-request) - [Setting the HTTP status and HTTP headers](#setting-the-HTTP-status-and-http-headers) - [Setting response headers from Processors](#setting-response-headers-from-processors) - [Queries](#queries) - [Invoking Vespa queries from a component](#invoking-Vespa-queries-from-a-component) --- ## Distributor Metrics Reference ### Distributor Metrics | Name | Unit | Description | #### Distributor Metrics | Name | Unit | Description | | --- | --- | --- | | vds.idealstate.buckets\_rechecking | bucket | The number of buckets that we are rechecking for ideal state operations | | vds.idealstate.idealstate\_diff | bucket | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state | | vds.idealstate.buckets\_toofewcopies | bucket | The number of buckets the distributor controls that have less than the desired redundancy | | vds.idealstate.buckets\_toomanycopies | bucket | The number of buckets the distributor controls that have more than the desired redundancy | | vds.idealstate.buckets | bucket | The number of buckets the distributor controls | | vds.idealstate.buckets\_notrusted | bucket | The number of buckets that have no trusted copies. | | vds.idealstate.bucket\_replicas\_moving\_out | bucket | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. | | vds.idealstate.bucket\_replicas\_copying\_out | bucket | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge | | vds.idealstate.bucket\_replicas\_copying\_in | bucket | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for | | vds.idealstate.bucket\_replicas\_syncing | bucket | Bucket replicas that need syncing due to mismatching metadata | | vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec | second | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. | | vds.idealstate.delete\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.delete\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.delete\_bucket.pending | operation | The number of operations pending | | vds.idealstate.delete\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.delete\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.merge\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.merge\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.merge\_bucket.pending | operation | The number of operations pending | | vds.idealstate.merge\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.merge\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.merge\_bucket.source\_only\_copy\_changed | operation | The number of merge operations where source-only copy changed | | vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_blocked | operation | The number of merge operations where delete of unchanged source-only copies was blocked | | vds.idealstate.merge\_bucket.source\_only\_copy\_delete\_failed | operation | The number of merge operations where delete of unchanged source-only copies failed | | vds.idealstate.split\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.split\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.split\_bucket.pending | operation | The number of operations pending | | vds.idealstate.split\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.split\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.join\_bucket.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.join\_bucket.done\_failed | operation | The number of operations that failed | | vds.idealstate.join\_bucket.pending | operation | The number of operations pending | | vds.idealstate.join\_bucket.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.join\_bucket.throttled | operation | The number of operations throttled by throttling operation starter | | vds.idealstate.garbage\_collection.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.garbage\_collection.done\_failed | operation | The number of operations that failed | | vds.idealstate.garbage\_collection.pending | operation | The number of operations pending | | vds.idealstate.garbage\_collection.documents\_removed | document | Number of documents removed by GC operations | | vds.idealstate.garbage\_collection.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.garbage\_collection.throttled | operation | The number of operations throttled by throttling operation starter | | vds.distributor.puts.latency | millisecond | The latency of put operations | | vds.distributor.puts.ok | operation | The number of successful put operations performed | | vds.distributor.puts.failures.total | operation | Sum of all failures | | vds.distributor.puts.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.puts.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.puts.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.puts.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.puts.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.puts.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.puts.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.puts.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.puts.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.puts.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.puts.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removes.latency | millisecond | The latency of remove operations | | vds.distributor.removes.ok | operation | The number of successful removes operations performed | | vds.distributor.removes.failures.total | operation | Sum of all failures | | vds.distributor.removes.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.removes.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.removes.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.removes.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.removes.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removes.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.removes.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.removes.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.removes.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.removes.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.removes.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.updates.latency | millisecond | The latency of update operations | | vds.distributor.updates.ok | operation | The number of successful updates operations performed | | vds.distributor.updates.failures.total | operation | Sum of all failures | | vds.distributor.updates.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.updates.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.updates.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.updates.diverging\_timestamp\_updates | operation | Number of updates that report they were performed against divergent version timestamps on different replicas | | vds.distributor.updates.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.updates.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.updates.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.updates.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.updates.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.updates.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.updates.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.updates.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.updates.fast\_path\_restarts | operation | Number of safe path (write repair) updates that were restarted as fast path updates because all replicas returned documents with the same timestamp in the initial read phase | | vds.distributor.removelocations.ok | operation | The number of successful removelocations operations performed | | vds.distributor.removelocations.failures.total | operation | Sum of all failures | | vds.distributor.removelocations.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.removelocations.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.removelocations.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.removelocations.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.removelocations.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.removelocations.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.removelocations.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.removelocations.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.removelocations.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.removelocations.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.removelocations.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.removelocations.latency | millisecond | The average latency of removelocations operations | | vds.distributor.gets.latency | millisecond | The average latency of gets operations | | vds.distributor.gets.ok | operation | The number of successful gets operations performed | | vds.distributor.gets.failures.total | operation | Sum of all failures | | vds.distributor.gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.visitor.latency | millisecond | The average latency of visitor operations | | vds.distributor.visitor.ok | operation | The number of successful visitor operations performed | | vds.distributor.visitor.failures.total | operation | Sum of all failures | | vds.distributor.visitor.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.visitor.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.visitor.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.visitor.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.visitor.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.visitor.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.visitor.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.visitor.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.visitor.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.visitor.bytes\_per\_visitor | operation | The number of bytes visited on content nodes as part of a single client visitor command | | vds.distributor.visitor.docs\_per\_visitor | operation | The number of documents visited on content nodes as part of a single client visitor command | | vds.distributor.visitor.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.visitor.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.docsstored | document | Number of documents stored in all buckets controlled by this distributor | | vds.distributor.bytesstored | byte | Number of bytes stored in all buckets controlled by this distributor | | metricmanager.periodichooklatency | millisecond | Time in ms used to update a single periodic hook | | metricmanager.resetlatency | millisecond | Time in ms used to reset all metrics. | | metricmanager.sleeptime | millisecond | Time in ms worker thread is sleeping | | metricmanager.snapshothooklatency | millisecond | Time in ms used to update a single snapshot hook | | metricmanager.snapshotlatency | millisecond | Time in ms used to take a snapshot | | vds.distributor.activate\_cluster\_state\_processing\_time | millisecond | Elapsed time where the distributor thread is blocked on merging pending bucket info into its bucket database upon activating a cluster state | | vds.distributor.bucket\_db.memory\_usage.allocated\_bytes | byte | The number of allocated bytes | | vds.distributor.bucket\_db.memory\_usage.dead\_bytes | byte | The number of dead bytes (\<= used\_bytes) | | vds.distributor.bucket\_db.memory\_usage.onhold\_bytes | byte | The number of bytes on hold | | vds.distributor.bucket\_db.memory\_usage.used\_bytes | byte | The number of used bytes (\<= allocated\_bytes) | | vds.distributor.getbucketlists.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.getbucketlists.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.getbucketlists.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.getbucketlists.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.getbucketlists.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.getbucketlists.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.getbucketlists.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.getbucketlists.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.getbucketlists.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.getbucketlists.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.getbucketlists.failures.total | operation | Total number of failures | | vds.distributor.getbucketlists.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.getbucketlists.latency | millisecond | The average latency of getbucketlists operations | | vds.distributor.getbucketlists.ok | operation | The number of successful getbucketlists operations performed | | vds.distributor.recoverymodeschedulingtime | millisecond | Time spent scheduling operations in recovery mode after receiving new cluster state | | vds.distributor.set\_cluster\_state\_processing\_time | millisecond | Elapsed time where the distributor thread is blocked on processing its bucket database upon receiving a new cluster state | | vds.distributor.state\_transition\_time | millisecond | Time it takes to complete a cluster state transition. If a state transition is preempted before completing, its elapsed time is counted as part of the total time spent for the final, completed state transition | | vds.distributor.stats.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.stats.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.stats.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.stats.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.stats.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.stats.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.stats.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.stats.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.stats.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.stats.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.stats.failures.total | operation | The total number of failures | | vds.distributor.stats.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.stats.latency | millisecond | The average latency of stats operations | | vds.distributor.stats.ok | operation | The number of successful stats operations performed | | vds.distributor.update\_gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_gets.failures.total | operation | The total number of failures | | vds.distributor.update\_gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_gets.latency | millisecond | The average latency of update\_gets operations | | vds.distributor.update\_gets.ok | operation | The number of successful update\_gets operations performed | | vds.distributor.update\_metadata\_gets.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_metadata\_gets.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_metadata\_gets.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_metadata\_gets.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_metadata\_gets.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_metadata\_gets.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_metadata\_gets.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_metadata\_gets.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_metadata\_gets.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_metadata\_gets.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_metadata\_gets.failures.total | operation | The total number of failures | | vds.distributor.update\_metadata\_gets.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_metadata\_gets.latency | millisecond | The average latency of update\_metadata\_gets operations | | vds.distributor.update\_metadata\_gets.ok | operation | The number of successful update\_metadata\_gets operations performed | | vds.distributor.update\_puts.failures.busy | operation | The number of messages from storage that failed because the storage node was busy | | vds.distributor.update\_puts.failures.concurrent\_mutations | operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | | vds.distributor.update\_puts.failures.inconsistent\_bucket | operation | The number of operations failed due to buckets being in an inconsistent state or not found | | vds.distributor.update\_puts.failures.notconnected | operation | The number of operations discarded because there were no available storage nodes to send to | | vds.distributor.update\_puts.failures.notfound | operation | The number of operations that failed because the document did not exist | | vds.distributor.update\_puts.failures.notready | operation | The number of operations discarded because distributor was not ready | | vds.distributor.update\_puts.failures.safe\_time\_not\_reached | operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | | vds.distributor.update\_puts.failures.storagefailure | operation | The number of operations that failed in storage | | vds.distributor.update\_puts.failures.test\_and\_set\_failed | operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | | vds.distributor.update\_puts.failures.timeout | operation | The number of operations that failed because the operation timed out towards storage | | vds.distributor.update\_puts.failures.total | operation | The total number of put failures | | vds.distributor.update\_puts.failures.wrongdistributor | operation | The number of operations discarded because they were sent to the wrong distributor | | vds.distributor.update\_puts.latency | millisecond | The average latency of update\_puts operations | | vds.distributor.update\_puts.ok | operation | The number of successful update\_puts operations performed | | vds.idealstate.nodes\_per\_merge | node | The number of nodes involved in a single merge operation. | | vds.idealstate.set\_bucket\_state.blocked | operation | The number of operations blocked by blocking operation starter | | vds.idealstate.set\_bucket\_state.done\_failed | operation | The number of operations that failed | | vds.idealstate.set\_bucket\_state.done\_ok | operation | The number of operations successfully performed | | vds.idealstate.set\_bucket\_state.pending | operation | The number of operations pending | | vds.idealstate.set\_bucket\_state.throttled | operation | The number of operations throttled by throttling operation starter | | vds.bouncer.clock\_skew\_aborts | operation | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | Copyright © 2025 - [Cookie Preferences](#) --- ## Docker Containers ### Docker containers This document describes tuning and adaptions for running Vespa Docker containers, for developer use on laptop, and in production. #### Docker containers This document describes tuning and adaptions for running Vespa Docker containers, for developer use on laptop, and in production. ##### Mounting persistent volumes The [quick start](/en/deploy-an-application-local.html) and [AWS ECS multinode](/en/operations-selfhosted/multinode-systems.html#aws-ecs) guides show how to run Vespa in Docker containers. In these examples, all the data is stored inside the container - the data is lost if the container is deleted. When running Vespa inside Docker containers in production, volume mappings to the parent host should be added to persist data and logs. - /opt/vespa/var - /opt/vespa/logs ``` $ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var $ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs $ docker run --detach --name vespa --hostname vespa-container \ --volume $VESPA_VAR_STORAGE:/opt/vespa/var \ --volume $VESPA_LOG_STORAGE:/opt/vespa/logs \ --publish 8080:8080 \ vespaengine/vespa ``` ##### Start Vespa container with Vespa user You can start the container directly as the _vespa_ user. The _vespa_ user and group within the container are configured with user id _1000_ and group id _1000_. The vespa user and group must be the owner of the _/opt/vespa/var_ and _/opt/vespa/logs_ volumes that are mounted in the container for Vespa to start. This is required for Vespa to create the required directories and files within those directories. The start script will check that the correct owner uid and gid are set and fail if the wrong user or group is set as the owner. When using an isolated user namespace for the Vespa container, you must set the uid and gid of the directories on the host to the subordinate uid and gid, depending on your mapping. See the [Docker documentation](https://docs.docker.com/engine/security/userns-remap/) for more details. ``` $ mkdir -p /tmp/vespa/var; export VESPA_VAR_STORAGE=/tmp/vespa/var $ mkdir -p /tmp/vespa/logs; export VESPA_LOG_STORAGE=/tmp/vespa/logs $ sudo chown -R 1000:1000 $VESPA_VAR_STORAGE $VESPA_LOG_STORAGE $ docker run --detach --name vespa --user vespa:vespa --hostname vespa-container \ --volume $VESPA_VAR_STORAGE:/opt/vespa/var \ --volume $VESPA_LOG_STORAGE:/opt/vespa/logs \ --publish 8080:8080 \ vespaengine/vespa ``` ##### System limits When Vespa starts inside Docker containers, the startup scripts will set [system limits](/en/operations-selfhosted/files-processes-and-ports.html#vespa-system-limits). Make sure that the environment starting the Docker engine is set up in such a way that these limits can be set inside the containers. For a CentOS/RHEL base host, Docker is usually started by [systemd](https://www.freedesktop.org/software/systemd/man/systemd.exec.html). In this case, `LimitNOFILE`, `LimitNPROC` and `LimitCORE` should be set to meet the minimum requirements in [system limits](/en/operations-selfhosted/files-processes-and-ports.html#vespa-system-limits). In general, when using Docker or Podman to run Vespa, the `--ulimit` option should be used to set limits according to [system limits](/en/operations-selfhosted/files-processes-and-ports.html#vespa-system-limits). The `--pids-limit` should be set to unlimited (`-1` for Docker and `0` for Podman). ##### Transparent Huge Pages Vespa performance improves significantly by enabling [Transparent Huge Pages (THP)](https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html), especially for memory-intensive applications with large dense tensors with concurrent query and write workloads. One application improved query p99 latency from 950 ms to 150 ms during concurrent query and write by enabling THP. Using THP is even more important when running in virtualized environments like AWS and GCP due to nested page tables. When running Vespa using the container image, _THP_ settings must be set on the base host OS (Linux). The recommended settings are: ``` $ echo 1 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag $ echo always > /sys/kernel/mm/transparent_hugepage/enabled $ echo never > /sys/kernel/mm/transparent_hugepage/defrag ``` To verify that the setting is active, one should see that _AnonHugePages_ is non-zero, In this case, 75 GB has been allocated using AnonHugePages. ``` $ cat /proc/meminfo |grep AnonHuge AnonHugePages: 75986944 kB ``` Note that the Vespa container needs to be restarted after modifying the base host OS settings to make the changes effective. Vespa uses `MADV_HUGEPAGE` for memory allocations done by the [content node process (proton)](/en/proton.html). ##### Controlling which services to start The Docker image _vespaengine/vespa_'s [start script](https://github.com/vespa-engine/docker-image/blob/master/include/start-container.sh) takes a parameter that controls which services are started inside the container. Starting a _configserver_ container: ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaconfigserver ``` Starting a _services_ container (configserver will not be started): ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaservices ``` Starting a container with _both configserver and services_: ``` $ docker run \ --env VESPA_CONFIGSERVERS= \ vespaengine/vespaconfigserver,services ``` This is required in the case where the configserver container should run other services like an adminserver or logserver (see [services.html](/en/reference/services.html)) If the [VESPA\_CONFIGSERVERS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) environment variable is not specified, it will be set to the container hostname, also see [node setup](/en/operations-selfhosted/node-setup.html#hostname). Use the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application as a blueprint for how to set up config servers and services. ##### Graceful stop Stopping a running _vespaengine/vespa_ container triggers a graceful shutdown, which saves time when starting the container again (i.e., data structures are flushed). If the container is shut down forcefully, the content nodes might need to restore the state from the transaction log, which might be time-consuming. There is no chance of data loss or data corruption as the data is always written and synced to persistent storage. The default timeout for the Docker daemon to wait for the shutdown might be too low for larger number of documents per node. Below stop will wait at least 120 seconds before terminating the running container forcefully, if the stop is successfully performed before the timeout has passed, the command takes less than the timeout: ``` $ docker stop name -t 120 ``` It is also possible to configure the default Docker daemon timeout, see [--shutdown-timeout](https://docs.docker.com/reference/cli/dockerd/). A clean content node shutdown looks like: ``` [2025-05-02 10:07:52.052] EVENT searchnode proton.node.server stopping/1 name="storagenode" why="Stopped" [2025-05-02 10:07:52.056] EVENT searchnode proton stopping/1 name="servicelayer" why="clean shutdown" [2025-05-02 10:07:52.056] INFO searchnode proton.proton.server.rtchooks shutting down monitoring interface [2025-05-02 10:07:52.058] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent [2025-05-02 10:07:52.059] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8832 at 0.00 percent [2025-05-02 10:07:52.060] INFO searchnode proton.searchlib.docstore.logdatastore Flushing. Disk bloat is now at 0 of 8840 at 0.00 percent [2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server Stopping TLS [2025-05-02 10:07:52.066] INFO searchnode proton.transactionlog.server TLS Stopped [2025-05-02 10:07:52.071] EVENT searchnode proton stopping/1 name="proton" why="clean shutdown" [2025-05-02 10:07:52.078] EVENT config-sentinel sentinel.sentinel.service stopped/1 name="searchnode" pid=354 exitcode=0 ``` ##### Memory The [sample applications](https://github.com/vespa-engine/sample-apps) and [getting started guides](/en/getting-started.html) indicates the minimum memory requirements for the Docker containers. **Note:** Too little memory is a very common problem when testing Vespa in Docker containers. Use the below to troubleshoot before making a support request, and also see the [FAQ](/en/faq.html). As a rule of thumb, a single-node Vespa application requires a minimum of 4 GB for the Docker container. Using `docker stats` can be useful to track memory usage: ``` $ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 589bf5801b22 node0 213.25% 697.3MiB / 3.84GiB 17.73% 14.2kB / 11.5kB 617MB / 976MB 253 e108dde84679 node1 213.52% 492.7MiB / 3.84GiB 12.53% 15.7kB / 12.7kB 74.3MB / 924MB 252 be43aacd0bbb node2 191.22% 497.8MiB / 3.84GiB 12.66% 19.6kB / 21.6kB 64MB / 949MB 261 ``` It is not necessarily easy to verify that Vespa has started all services successfully. Symptoms of errors due to insufficient memory vary, depending on where it fails. Example: Inspect restart logs in a container named _vespa_, running the [quickstart](/en/deploy-an-application-local.html) with only 2G: ``` $ docker exec -it vespa sh -c "/opt/vespa/bin/vespa-logfmt -S config-sentinel -c sentinel.sentinel.service" INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 2.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 6.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 14.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 30.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 25.173 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 62.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 126.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 119.515 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 254.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 510.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 501.026 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1022.000 seconds INFO : config-sentinel sentinel.sentinel.service container: incremented restart penalty to 1800.000 seconds INFO : config-sentinel sentinel.sentinel.service container: will delay start by 1793.142 seconds ``` Observe that the _container_ service restarts in a loop, with increasing pause. A common problem is [config servers](/en/operations-selfhosted/configuration-server.html) not starting or running properly due to a lack of memory. This manifests itself as nothing listening on 19071, or deployment failures. Some guides/sample applications have specific configurations to minimize resource usage. Example from [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA): ``` $ docker run --detach --name node0 --hostname node0.vespanet \ -e VESPA_CONFIGSERVERS=node0.vespanet,node1.vespanet,node2.vespanet \ -eVESPA\_CONFIGSERVER\_JVMARGS="-Xms32M -Xmx128M"\ -eVESPA\_CONFIGPROXY\_JVMARGS="-Xms32M -Xmx32M"\ --network vespanet \ --publish 19071:19071 --publish 19100:19100 --publish 19050:19050 --publish 20092:19092 \ vespaengine/vespa ``` Here [VESPA\_CONFIGSERVER\_JVMARGS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) and [VESPA\_CONFIGPROXY\_JVMARGS](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) are tweaked to the minimum for a functional test only. **Important:** For production use, do not reduce memory settings in `VESPA_CONFIGSERVER_JVMARGS` and `VESPA_CONFIGPROXY_JVMARGS`unless you know what you are doing - the Vespa defaults are set for regular production use, and rarely need changing. Container memory setting are done in _services.xml_, example from [multinode-HA](https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml): ``` \ ``` Make sure that the settings match the Docker container Vespa is running in. Also see [node memory settings](/en/operations-selfhosted/node-setup.html#memory-settings) for more settings. ##### Network Vespa processes communicate over both fixed and ephemeral ports - in general, all ports must be accessible. See [example ephemeral use](/en/visiting.html#handshake-failed). Find an example application using a Docker network in [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA). ##### Resource usage Note that CPU usage will not be zero even if there are zero documents and zero queries. Starting the _vespaengine/vespa_ container image means starting the [configuration server](/en/operations-selfhosted/configuration-server.html) and the [configuration sentinel](/en/operations-selfhosted/config-sentinel.html). When deploying an application, the sentinel starts the configured service processes, and they all listen to work to do, changes in the config, and so forth. Therefore, an "idle" container instance consumes CPU and memory. ##### Troubleshooting The Vespa documentation examples use `docker`. The Vespa Team has good experience with using `podman`, too, in the examples just change from `docker` to `podman`. We recommend using Podman v5, see the [release notes](https://github.com/containers/podman/blob/main/RELEASE_NOTES.md). [emulating-docker-cli-with-podman](https://podman-desktop.io/docs/migrating-from-docker/emulating-docker-cli-with-podman) is a useful resource. Many startup failures are caused by a failed Vespa Container start due to configuration or download errors. Use `docker logs vespa` to show the log (this example assumes a Docker container named `vespa`, use `docker ps` to list containers). ###### Docker image Make sure to use a recent Vespa release (check [releases](https://factory.vespa.ai/releases)) and validate the downloaded image: ``` $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/vespaengine/vespa latest 8cfb0da22c01 35 hours ago 1.2 GB ``` ###### Model download failures If the application package depends on downloaded models, look for `RuntimeException: Not able to create config builder for payload` - [details](/en/jdisc/container-components.html#component-load). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Mounting persistent volumes](#mounting-persistent-volumes) - [Start Vespa container with Vespa user](#start-vespa-container-with-vespa-user) - [System limits](#system-limits) - [Transparent Huge Pages](#transparent-huge-pages) - [Controlling which services to start](#controlling-which-services-to-start) - [Graceful stop](#graceful-stop) - [Memory](#memory) - [Network](#network) - [Resource usage](#resource-usage) - [Troubleshooting](#troubleshooting) - [Docker image](#docker-image) - [Model download failures](#model-download-failures) --- ## Document Api Guide ### Document API This is an introduction to how to build and compile Vespa clients using the Document API. #### Document API This is an introduction to how to build and compile Vespa clients using the Document API. It can be used for feeding, updating and retrieving documents, or removing documents from the repository. See also the [Java reference](https://javadoc.io/doc/com.yahoo.vespa/documentapi). Use the [VESPA\_CONFIG\_SOURCES](/en/operations-selfhosted/files-processes-and-ports.html#environment-variables) environment variable to set config servers to interface with. The most common use case is using the async API in a [document processor](document-processing.html) - from the sample apps: - Async GET in [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java) - Async UPDATE in [ReviewProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/use-case-shopping/src/main/java/ai/vespa/example/shopping/ReviewProcessor.java) ##### Documents All data fed, indexed and searched in Vespa are instances of the `Document` class. A [document](documents.html) is a composite object that consists of: - A `DocumentType` that defines the set of fields that can exist in a document. A document can only have a single _document type_, but document types can inherit the content of another. All fields of an inherited type is available in all its descendants. The document type is defined in the [schema](reference/schema-reference.html), which is converted into a configuration file to be read by the `DocumentManager`. - A `DocumentId` which is a unique document identifier. The document distribution uses the document identifier, see the [reference](content/buckets.html#distribution) for details. - A set of `(Field, FieldValue)` pairs, or "fields" for short. The `Field` class has methods for getting its name, data type and internal identifier. The field object for a given field name can be retrieved using the `getField()` method in the `DocumentType`. Use [DocumentAccess](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/DocumentAccess.html) javadoc. Sample app: ``` com.yahoo.vespa documentapi 8.599.6 ``` ``` import com.yahoo.document.DataType; import com.yahoo.document.Document; import com.yahoo.document.DocumentId; import com.yahoo.document.DocumentPut; import com.yahoo.document.DocumentType; import com.yahoo.document.DocumentUpdate; import com.yahoo.document.datatypes.StringFieldValue; import com.yahoo.document.datatypes.WeightedSet; import com.yahoo.document.update.FieldUpdate; import com.yahoo.documentapi.DocumentAccess; import com.yahoo.documentapi.SyncParameters; import com.yahoo.documentapi.SyncSession; public class DocClient { public static void main(String[] args) { // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. DocumentAccess access = DocumentAccess.createForNonContainer(); DocumentType type = access.getDocumentTypeManager().getDocumentType("music"); DocumentId id = new DocumentId("id:namespace:music::0"); Document docIn = new Document(type, id); SyncSession session = access.createSyncSession(new SyncParameters.Builder().build()); // Put document with a1,1 WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("a1"), 1); docIn.setFieldValue("aWeightedset", wset); DocumentPut put = new DocumentPut(docIn); System.out.println(docIn.toJson()); session.put(put); // Update document with a1,10 and a2,20 DocumentUpdate upd1 = new DocumentUpdate(type, id); WeightedSet wset1 = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset1.put(new StringFieldValue("a1"), 10); wset1.put(new StringFieldValue("a2"), 20); upd1.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset1)); System.out.println(upd1.toString()); session.update(upd1); Document docOut = session.get(id); System.out.println("document get:" + docOut.toJson()); session.destroy(); access.shutdown(); } } ``` To test using the [sample apps](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation), enable more ports for client to connect to config server and other processes on localhost - change docker command: ``` $ docker run --detach --name vespa --hostnamelocalhost--privileged \ --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 \--publish 19070:19070 --publish 19071:19071 --publish 19090:19090 --publish 19099:19099 --publish 19101:19101 --publish 19112:19112\ vespaengine/vespa ``` ##### Fields Examples: ``` doc.setFieldValue("aByte", (byte)1); doc.setFieldValue("aInt", (int)1); doc.setFieldValue("aLong", (long)1); doc.setFieldValue("aFloat", 1.0); doc.setFieldValue("aDouble", 1.0); doc.setFieldValue("aBool", new BoolFieldValue(true)); doc.setFieldValue("aString", "Hello Field!"); doc.setFieldValue("unknownField", "Will not see me!"); Array intArray = new Array<>(doc.getField("aArray").getDataType()); intArray.add(new IntegerFieldValue(11)); intArray.add(new IntegerFieldValue(12)); doc.setFieldValue("aArray", intArray); Struct pos = PositionDataType.valueOf(1,2); pos = PositionDataType.fromString("N0.000002;E0.000001"); // two ways to set same value doc.setFieldValue("aPosition", pos); doc.setFieldValue("aPredicate", new PredicateFieldValue("aLong in [10..20]")); byte[] rawBytes = new byte[100]; for (int i = 0; i < rawBytes.length; i++) { rawBytes[i] = (byte)i; } doc.setFieldValue("aRaw", new Raw(ByteBuffer.wrap(rawBytes))); Tensor tensor = Tensor.Builder.of(TensorType.fromSpec("tensor>(x[2],y[2])")). cell().label("x", 0).label("y", 0).value(1.0). cell().label("x", 0).label("y", 1).value(2.0). cell().label("x", 1).label("y", 0).value(3.0). cell().label("x", 1).label("y", 1).value(5.0).build(); doc.setFieldValue("aTensor", new TensorFieldValue(tensor)); MapFieldValue map = new MapFieldValue<>(new MapDataType(DataType.STRING, DataType.STRING)); map.put(new StringFieldValue("key1"), new StringFieldValue("foo")); map.put(new StringFieldValue("key2"), new StringFieldValue("bar")); doc.setFieldValue("aMap", map); WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("strval 1"), 5); wset.put(new StringFieldValue("strval 2"), 10); doc.setFieldValue("aWeightedset", wset); ``` ##### Document updates A document update is a request to modify a document, see [reads and writes](reads-and-writes.html). Primitive, and some multivalue fields (WeightedSet and Array\), are updated using a[FieldUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/update/FieldUpdate.html). Complex, multivalue fields like Map and Array\ are updated using[AddFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AddFieldPathUpdate.html),[AssignFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/AssignFieldPathUpdate.html) and[RemoveFieldPathUpdate](https://javadoc.io/doc/com.yahoo.vespa/document/latest/com/yahoo/document/fieldpathupdate/RemoveFieldPathUpdate.html).Field path updates are only supported on non-attribute[fields](reference/schema-reference.html#field),[index](reference/schema-reference.html#index) fields, or fields containing[struct field](reference/schema-reference.html#struct-field) attributes. If a field is both an index field and an attribute, then the document is updated in the document store, the index is updated, but the attribute is not updated. Thus, you can get old values in document summary requests and old values being used in ranking and grouping. A [field path](reference/document-field-path.html) string identifies fields to update - example: ``` upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "myMap{key2}", new StringFieldValue("abc"))); ``` _FieldUpdate_ examples: ``` // Simple assignment Field intField = type.getField("aInt"); IntegerFieldValue intFieldValue = new IntegerFieldValue(2); FieldUpdate assignUpdate = FieldUpdate.createAssign(intField, intFieldValue); upd.addFieldUpdate(assignUpdate); // Arithmetic FieldUpdate addUpdate = FieldUpdate.createIncrement(type.getField("aLong"), 3); upd.addFieldUpdate(addUpdate); // Composite - add one array element upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aArray"), new IntegerFieldValue(13))); // Composite - add two array elements upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aArray"), List.of(new IntegerFieldValue(14), new IntegerFieldValue(15)))); // Composite - add weightedset element upd.addFieldUpdate(FieldUpdate.createAdd(type.getField("aWeightedset"), new StringFieldValue("add_me"),101)); // Composite - add set to set WeightedSet wset = new WeightedSet<>(DataType.getWeightedSet(DataType.STRING)); wset.put(new StringFieldValue("a1"), 3); wset.put(new StringFieldValue("a2"), 4); upd.addFieldUpdate(FieldUpdate.createAddAll(type.getField("aWeightedset"), wset)); // Composite - update array element upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aArray"), new IntegerFieldValue(1), // array index new AssignValueUpdate(new IntegerFieldValue(2)))); // value at index // Composite - increment weight upd3.addFieldUpdate(FieldUpdate.createIncrement(type.getField("aWeightedset"), new StringFieldValue("a1"), 1)); // Composite - add to set upd.addFieldUpdate(FieldUpdate.createMap(type.getField("aWeightedset"), new StringFieldValue("element1"), // value new AssignValueUpdate(new IntegerFieldValue(30)))); ``` _FieldPathUpdate_ examples: ``` // Add an element to a map Array stringArray = new Array(DataType.getArray(DataType.STRING)); stringArray.add(new StringFieldValue("my-val")); AddFieldPathUpdate addElement = new AddFieldPathUpdate(type, "aMap{key1}", stringArray); upd.addFieldPathUpdate(addElement); // Modify an element in a map upd.addFieldPathUpdate(new AssignFieldPathUpdate(type, "aMap{key2}", new StringFieldValue("abc"))); ``` ###### Update reply semantics Sending in an update for which the system can not find a corresponding document to update is _not_ considered an error. These are returned with a successful status code (assuming that no actual error occurred during the update processing). Use[UpdateDocumentReply.wasFound()](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/UpdateResponse.html#wasFound()) to check if the update was known to have been applied. If the update returns with an error reply, the update _may or may not_ have been applied, depending on where in the platform stack the error occurred. ##### Document Access The starting point of for passing documents and updates to Vespa is the `DocumentAccess` class. This is a singleton (see `get()` method) session factory (see `createXSession()` methods), that provides three distinct access types: - **Synchronous random access**: provided by the class `SyncSession`. Suitable for low-throughput proof-of-concept applications. - [**Asynchronous random access**](#asyncsession): provided by the class `AsyncSession`. It allows for document repository writes and random access with **high throughput**. - [**Visiting**](#visitorsession): provided by the class `VisitorSession`. Allows a set of documents to be accessed in order decided by the document repository, which gives higher read throughput than random access. ###### AsyncSession This class represents a session for asynchronous access to a document repository. It is created by calling`myDocumentAccess.createAsyncSession(myAsyncSessionParams)`, and provides document repository writes and random access with high throughput. The usage pattern for an asynchronous session is like: 1. `put()`, `update()`, `get()` or `remove()` is invoked on the session, and it returns a synchronous `Result` object that indicates whether the request was successful or not. The `Result` object also contains a _request identifier_. 2. The client polls the session for a `Response` through its `getNext()` method. Any operation accepted by an asynchronous session will produce exactly one response within the configured timeout. 3. Once a response is available, it is matched to the request by inspecting the response's request identifier. The response may also contain data, either a retrieved document or a failed document put or update that needs to be handled. 4. Note that the client must process the response queue or your JVM will run into garbage collection issues, as the underlying session keeps track of all responses and unless they are consumed they will be kept alive and not be garbage collected. Example: ``` import com.yahoo.document.*; import com.yahoo.documentapi.*; public class MyClient { // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. private final DocumentAccess access = DocumentAccess.createForNonContainer(); private final AsyncSession session = access.createAsyncSession(new AsyncParameters()); private boolean abort = false; private int numPending = 0; /** * Implements application entry point. * * @param args Command line arguments. */ public static void main(String[] args) { MyClient app = null; try { app = new MyClient(); app.run(); } catch (Exception e) { e.printStackTrace(); } finally { if (app != null) { app.shutdown(); } } if (app == null || app.abort) { System.exit(1); } } /** * This is the main entry point of the client. This method will not return until all available documents * have been fed and their responses have been returned, or something signaled an abort. */ public void run() { System.out.println("client started"); while (!abort) { flushResponseQueue(); Document doc = getNextDocument(); if (doc == null) { System.out.println("no more documents to put"); break; } System.out.println("sending doc " + doc); while (!abort) { Result res = session.put(doc); if (res.isSuccess()) { System.out.println("put has request id " + res.getRequestId()); ++numPending; break; // step to next doc. } else if (res.type() == Result.ResultType.TRANSIENT_ERROR) { System.out.println("send queue full, waiting for some response"); processNext(9999); } else { res.getError().printStackTrace(); abort = true; // this is a fatal error } } } if (!abort) { waitForPending(); } System.out.println("client stopped"); } /** * Shutdown the underlying api objects. */ public void shutdown() { System.out.println("shutting down document api"); session.destroy(); access.shutdown(); } /** * Returns the next document to feed to Vespa. This method should only return null when the end of the * document stream has been reached, as returning null terminates the client. This is the point at which * your application logic should block if it knows more documents will eventually become available. * * @return The next document to put, or null to terminate. */ public Document getNextDocument() { return null; // TODO: Implement at your discretion. } /** * Processes all immediately available responses. */ void flushResponseQueue() { System.out.println("flushing response queue"); while (processNext(0)) { // empty } } /** * Wait indefinitely for the responses of all sent operations to return. This method will only return * early if the abort flag is set. */ void waitForPending() { while (numPending != 0) { if (abort) { System.out.println("waiting aborted, " + numPending + " still pending"); break; } System.out.println("waiting for " + numPending + " responses"); processNext(9999); } } /** * Retrieves and processes the next response available from the underlying asynchronous session. If no * response becomes available within the given timeout, this method returns false. * * @param timeout The maximum number of seconds to wait for a response. * @return True if a response was processed, false otherwise. */ boolean processNext(int timeout) { Response res; try { res = session.getNext(timeout); } catch (InterruptedException e) { e.printStackTrace(); abort = true; return false; } if (res == null) { return false; } System.out.println("got response for request id " + res.getRequestId()); --numPending; if (!res.isSuccess()) { System.err.println(res.getTextMessage()); abort = true; return false; } return true; } } ``` ###### VisitorSession This class represents a session for sequentially visiting documents with high throughput. A visitor is started when creating the `VisitorSession`through a call to `createVisitorSession`. A visitor target, that is a receiver of visitor data, can be created through a call to `createVisitorDestinationSession`. The `VisitorSession` is a receiver of visitor data. See [visiting reference](visiting.html) for details. The `VisitorSession`: - Controls the operation of the visiting process - Handles the data resulting from visiting data in the system Those two different tasks may be set up to be handled by a `VisitorControlHandler` and a `VisitorDataHandler` respectively. These handlers may be supplied to the `VisitorSession` in the `VisitorParameters` object, together with a set of other parameters for visiting. Example: To increase performance, let more separate visitor destinations handle visitor data, then specify the addresses to remote data handlers. The default `VisitorDataHandler` used by the `VisitorSession` returned from`DocumentAccess` is `VisitorDataQueue` which queues up incoming documents and implements a polling API. The documents can be extracted by calls to the session's `getNext()` methods and can be ack-ed by the `ack()` method. The default `VisitorControlHandler` can be accessed through the session's `getProgress()`,`isDone()`, and `waitUntilDone()` methods. Implement custom `VisitorControlHandler`and `VisitorDataHandler` by subclassing them and supplying these to the `VisitorParameters` object. The `VisitorParameters` object controls how and what data will be visited - refer to the [javadoc](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html). Configure the[document selection](reference/document-select-language.html) string to select what data to visit - the default is all data. You can specify what fields to return in a result by specifying a[fieldSet](https://javadoc.io/doc/com.yahoo.vespa/documentapi/latest/com/yahoo/documentapi/VisitorParameters.html) - see [document field sets](documents.html#fieldsets). Specifying only the fields you need may improve performance a lot, especially if you can make do with only in-memory fields or if you have large fields you don't need returned. Example: ``` import com.yahoo.document.Document; import com.yahoo.document.DocumentId; import com.yahoo.documentapi.DocumentAccess; import com.yahoo.documentapi.DumpVisitorDataHandler; import com.yahoo.documentapi.ProgressToken; import com.yahoo.documentapi.VisitorControlHandler; import com.yahoo.documentapi.VisitorParameters; import com.yahoo.documentapi.VisitorSession; import java.util.concurrent.TimeoutException; public class MyClient { public static void main(String[] args) throws Exception { VisitorParameters params = new VisitorParameters("true"); params.setLocalDataHandler(new DumpVisitorDataHandler() { @Override public void onDocument(Document doc, long timeStamp) { System.out.print(doc.toXML("")); } @Override public void onRemove(DocumentId id) { System.out.println("id=" + id); } }); params.setControlHandler(new VisitorControlHandler() { @Override public void onProgress(ProgressToken token) { System.err.format("%.1f %% finished.\n", token.percentFinished()); super.onProgress(token); } @Override public void onDone(CompletionCode code, String message) { System.err.println("Completed visitation, code " + code + ": " + message); super.onDone(code, message); } }); params.setRoute(args.length > 0 ? args[0] : "[Storage:cluster=storage;clusterconfigid=storage]"); params.setFieldSet(args.length > 1 ? args[1] : "[document]"); // DocumentAccess is injectable in Vespa containers, but not in command line tools, etc. DocumentAccess access = DocumentAccess.createForNonContainer(); VisitorSession session = access.createVisitorSession(params); if (!session.waitUntilDone(0)) { throw new TimeoutException(); } session.destroy(); access.shutdown(); } } ``` The first optional argument to this client is the [route](/en/operations-selfhosted/routing.html) of the cluster to visit. The second is the [fieldset](documents.html#fieldsets) set to retrieve. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Documents](#documents) - [Fields](#fields) - [Document updates](#document-updates) - [Update reply semantics](#update-reply-semantics) - [Document Access](#document-access) - [AsyncSession](#asyncsession) - [VisitorSession](#visitorsession) --- ## Document Field Path ### Document Field Path Syntax The field path syntax is used several places in Vespa to traverse documents through arrays, structs, maps and sets and generate a set of values matching the expression. #### Document Field Path Syntax The field path syntax is used several places in Vespa to traverse documents through arrays, structs, maps and sets and generate a set of values matching the expression. Examples - If the document contains the field `mymap`, and it has a key `mykey`, the expression returns the value of the map for that key: ``` mymap{mykey} ``` Returns the value in index 3 of the `myarray` field, if set: ``` myarray[3] ``` Returns the value of the `value1` field in the struct field `mystruct`, if set: ``` mystruct.value1 ``` If mystructarray is an array field containing structs, returns the values of value1 for each of those structs: ``` mystructarray.value1 ``` The following syntax can be used for the different field types, and can be combined recursively as required: ##### Maps/weighted Sets | \{\} | Retrieve the value of a specific key | | \{$\} | Retrieve all values, setting the [variable](#variables) to the key value for each | | \.key | Retrieve all key values | | \.value | Retrieve all values | | \ | Retrieve all keys | In the case of weighted sets, the value referenced above is the weight of the item. ##### Array | \[\] | Retrieve the value in a specific index | | \[$\] | Retrieve all values in the array, setting the [variable](#variables) to the index of each | | \ | Retrieve all values in the array | ##### Struct | \{.\} | Return the value of the struct field | | \ | Return the value of all subfields | Note that when specifying values of subscripts of maps, weighted sets and arrays, only primitive types (numbers and strings) may be used. ##### Variables It can be useful to reference several field paths using a common variable. For instance, if you have an array of structs, you may want to use document selection on fields within the same array index together. This could be done by an expression like: ``` mydoctype.mystructarray{$x}.field1=="foo" AND mydoctype.mystructarray{$x}.field2=="bar" ``` Variables either have a `key` value (for maps and weighted sets), or an `index` value (for arrays). Variables cannot be used across such contexts (that is, a map key cannot be used to index into an array). Copyright © 2025 - [Cookie Preferences](#) --- ## Document Json Format ### Document JSON Format This document describes the JSON format used for sending document operations to Vespa. #### Document JSON Format This document describes the JSON format used for sending document operations to Vespa. Field types are defined in the[schema reference](schema-reference.html#field). This is a reference for: - JSON representation of [document operations](#document-operations) (put, get, remove, update) - JSON representation of [field types](#field-types) in Vespa documents - JSON representation of addressing fields for update, and [update operations](#update-operations) Also refer to [encoding troubleshooting](../troubleshooting-encoding.html). ``` [Document operations](#document-operations)[Put](#put)[Get](#get)[Remove](#remove)[Update](#update)[Test and set](#test-and-set)[Create](#create)[Field types](#field-types)[string](#string)[int](#int)[long](#long)[bool](#bool)[byte](#byte)[float](#float)[double](#double)[position](#position)[predicate](#predicate)[raw](#raw)[uri](#uri)[array](#array)[weightedset](#weightedset)[Tensors](#tensor)[Indexed tensors short form](#tensor-short-form-indexed)[Short form for tensors with a single mapped dimension](#tensor-short-form-mapped)[Mixed tensors short form](#tensor-short-form-mixed)[Cell values as binary data (hex dump format)](#tensor-hex-dump)[Tensor verbose form](#tensor-verbose-form)[struct](#struct)[map](#map)[reference](#reference)[Empty fields](#empty-fields)[Update operations](#update-operations)[assign](#assign)[Single value field](#single-value-field)[Assign tensor](#tensor-field)[Assign struct field](#(x[6]):[-1,0,17,-128,34,-2]`. For other cell types, it's possible to take the bits of the floating-point value, interpreted directly as an unsigned integer of appropriate width (16, 32, or 64 bits) and use the hex dump (respectively 4, 8, or 16 hex digits per cell) in a string. For "float" cells (32-bit IEE754 floating-point) a simple snippet for converting a cell could look like this: ``` ``` import struct def float_to_hex(f: float): return format(struct.unpack('=I', struct.pack('=f', f))[0], '08X') ``` ``` As an advanced combination example, if you have a tensor with type `tensor(tag{},x[3])` this input could be used, shown with corresponding output: ``` ``` "mixedtensor": { "foo": "3DE38E393E638E393EAAAAAB", "bar": "3EE38E393F0E38E43F2AAAAB", "baz": "3F471C723F638E393F800000" } "mixedtensor":{ "type":"tensor(tag{},x[3])", "blocks":{ "foo":[0.1111111119389534,0.2222222238779068,0.3333333432674408], "bar":[0.4444444477558136,0.5555555820465088,0.6666666865348816], "baz":[0.7777777910232544,0.8888888955116272,1.0] } } ``` ``` **Verbose:** [Tensor](../tensor-user-guide.html) fields may be represented as an array of cells: ``` ``` "tensorfield": [ { "address": { "x": "a", "y": "0" }, "value": 2.0 }, { "address": { "x": "a", "y": "1" }, "value": 3.0 }, { "address": { "x": "b", "y": "0" }, "value": 4.0 }, { "address": { "x": "b", "y": "1" }, "value": 5.0 } ] ``` ``` This works for any tensor but is verbose, so shorter forms specific to various tensor types are also supported. Use the shortest form applicable to your tensor type for the best possible performance. The cells array can optionally be nested in an object under the key "cells". This is how tensor values are returned [by default](document-v1-api-reference.html#format.tensors), along with another key "type" containing the tensor type. | | struct | ``` ``` "mystruct": { "intfield": 123, "stringfield": "foo" } ``` ``` | | map | The JSON dictionary key must be a string, even if the map key type in the schema is not a string: ``` ``` "int_to_string_map": { "123": "foo", "456": "bar", "789": "foobar" } ``` ``` Feeding in an empty map ({}) for a field will have the same effect as not feeding a value for that field, and the field will not be rendered in the document API and in document summaries. | | reference | String with document ID referring to a [parent document](../parent-child.html): ``` ``` "artist_ref": "id:mynamespace:artists::artist-1" ``` ``` | | | | ##### Empty fields In general, fields that have not received a value during feeding will be ignored when rendering the document. They are considered as empty fields. However, certain field types have some values which causes them to be considered empty. For instance, the empty string ("") is considered empty, as well as the empty array ([]). See the above table for more information for each type. ##### Document operations Refer to [reads and writes](../reads-and-writes.html) for details - alternatives: - Use the [Vespa CLI](../vespa-cli.html#documents). - [/document/v1/](document-v1-api-reference.html): This API accepts one operation per request, with the document ID encoded in the URL. - [Vespa feed client](../vespa-feed-client.html): Java APIs / command line tool to feed document operations asynchronously to Vespa, over HTTP. ###### Put The "put" payload has a "put" operation and ["fields"](#field-types) containing field values; ([/document/v1/ example](../document-v1-api-guide.html#post)): ``` ``` { "put": "id:mynamespace:music::123", "fields": { "title": "Best of Bob Dylan" } } ``` ``` ###### Get "get" does not have a payload - the response has the same "field" object as in "put", and also "id" and "pathId" fields ([/document/v1/ example](../document-v1-api-guide.html#get)): ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/123", "id": "id:mynamespace:music::123", "fields": { "title": "Best of Bob Dylan" } } ``` ``` ###### Remove The "remove" payload only has a "remove" operation ([/document/v1/ example](../document-v1-api-guide.html#delete)): ``` ``` { "remove": "id:mynamespace:music::123" } ``` ``` ###### Update The "update" payload has an "update" operation and "fields". Note: Each field must contain an [update operation](#update-operations), not just the field value directly; ([/document/v1/ example](../document-v1-api-guide.html#put)): ``` ``` { "update": "id:mynamespace:music::123", "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` Flags can be added to add a [test and set](#test-and-set) condition, or allow the update to [create](#create) a new document (a so-called "upsert" operation). ###### Test and set An optional _condition_ can be added to operations to specify a _test and set_ condition - see [conditional writes](../document-v1-api-guide.html#conditional-writes). The value of the _condition_ is a [document selection](document-select-language.html), encoded as a string. Example: Increment the _sales_ field only if it is already equal to 999 ([/document/v1/ example](../document-v1-api-guide.html#conditional-writes)): ``` ``` { "update": "id:mynamespace:music::bob/BestOf", "condition": "music.sales==999", "fields": { "sales": { "increment": 1 } } } ``` ``` **Note:** Use _documenttype.fieldname_ in the condition, not only _fieldname_. If the condition is not met, a 412 response code is returned. ###### create (create if nonexistent) **Updates** to nonexistent documents are supported using _create_; ([/document/v1/ example](../document-v1-api-guide.html#create-if-nonexistent)): ``` ``` { "update": "id:mynamespace:music::bob/BestOf", "create": true, "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` Since Vespa 8.178, _create_ can also be used together with conditional **Put** operations ([/document/v1/ example](../document-v1-api-guide.html#conditional-updates-and-puts-with-create) - review notes there before using): ``` ``` { "put": "id:mynamespace:music::123", "condition": "music.sales==999", "create": true, "fields": { "title": "Best of Bob Dylan" } } ``` ``` ##### Update operations The update operations are: [`assign`](#assign), [`add`](#add), [`remove`](#composite-remove), [arithmetics](#arithmetic) (`increment` `decrement` `multiply` `divide`), [`match`](#match), [`modify`](#tensor-modify) ##### assign `assign` is used to replace the value of a field (or an element of a collection) with a new value. When assigning, one can generally use the same syntax and structure as when feeding that field's value in a `put` operation. ###### Single value field ``` field title type string { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:music::example", "fields": { "title": { "assign": "The best of Bob Dylan" } } } ``` ``` ###### Tensor field ``` field tensorfield type tensor(x{},y{}) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "assign": { "cells": [ { "address": { "x": "a", "y": "b" }, "value": 2.0 }, { "address": { "x": "c", "y": "d" }, "value": 3.0 } ] } } } } ``` ``` This will fully replace the entire tensor stored in this field. ###### Struct field ###### Replacing all fields in a struct A full struct is replaced by assigning an object of struct key/value pairs. ``` struct person { field first_name type string {} field last_name type string {} } field contact type person { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact": { "assign": { "first_name": "Bob", "last_name": "The Plumber" } } } } ``` ``` ###### Individual struct fields Individual struct fields are updated using [field path](#fieldpath) syntax. Refer to the [reference](schema-reference.html#struct-name) for restrictions using structs. ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact.first_name": { "assign": "Bob" }, "contact.last_name": { "assign": "The Plumber" } } } ``` ``` ###### Map field Individual map entries can be updated using [field path](document-field-path.html) syntax. The following declaration defines a `map` where the `key` is an Integer and the value is a `person` struct. ``` struct person { field first_name type string {} field last_name type string {} } field contact type map { indexing: summary } ``` Example updating part of an entry in the `contact` map: - `contact` is the name of the map field to be updated - `{0}` is the key that is going to be updated - `first_name` is the struct field to be updated inside the `person` struct ``` ``` { "update": "id:mynamespace:workers::example", "fields": { "contact{0}.first_name": { "assign": "John" } } } ``` ``` Assigning an element to a key in a map will insert the key/value mapping if it does not already exist, or overwrite it with the new value if it does exist. Refer to the [reference](schema-reference.html#map) for restrictions using maps. ###### Map to primitive value ``` field my_food_scores type map { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:food::example", "fields": { "my_food_scores{Strawberries}": { "assign": "Delicious!" } } } ``` ``` ###### Map to struct ``` struct contact_info { field phone_number type string {} field email type string {} } field contacts type map { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:people::d_duck", "fields": { "contacts{\"Uncle Scrooge\"}": { "assign": { "phone_number": "555-123-4567", "email": "number_one_dime_luvr1877@example.com" } } } } ``` ``` ###### Array field ###### Array of primitive values ``` field ingredients type array { indexing: summary } ``` Assign full array: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients": { "assign": ["sugar", "butter", "vanilla", "flour"] } } } ``` ``` Assign existing elements in array: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients[3]": { "assign": "2 cups of flour (editor's update: NOT asbestos!)" } } } ``` ``` Note that the index element 3 needs to exist. Alternative using match: ``` ``` { "update": "id:mynamespace:cakes:tasty_chocolate_cake", "fields": { "ingredients": { "match": { "element": 3, "assign": "2 cups of flour (editor's update: NOT asbestos!)" } } } } ``` ``` Individual array elements may be updated using [field path](document-field-path.html) or [match](#match) syntax. ###### Array of struct Refer to the reference for restrictions using[array of structs](schema-reference.html#array). ``` struct person { field first_name type string {} field last_name type string {} } field people type array { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:students:example", "fields": { "people[34]": { "assign": { "first_name": "Bobby", "last_name": "Tables" } } } } ``` ``` Note that the element index needs to exist. Use [add](#add-array-elements) to add a new element. Alternative syntax using match: ``` ``` { "update": "id:mynamespace:students:example", "fields": { "people": { "match": { "element": 34, "assign": { "first_name": "Bobby", "last_name": "Tables" } } } } } ``` ``` ###### Weighted set field Adding new elements to a weighted set can be done using [add](#add-weighted-set), or by assigning with `field{key}` syntax. Example of the latter: ``` field int_weighted_set type weightedset { indexing: summary } field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "int_weighted_set{123}": { "assign": 123 }, "int_weighted_set{456}": { "assign": 100 }, "string_weighted_set{\"item 1\"}": { "assign": 144 }, "string_weighted_set{\"item 2\"}": { "assign": 7 } } } ``` ``` Note that using the `field{key}` syntax for weighted sets _may_ be less efficient than using [add](#add-weighted-set). ###### Clearing a field To clear a field, assign a `null` value to it. ``` ``` { "update": "id:mynamespace:music::example", "fields": { "title": { "assign": null } } } ``` ``` ##### add `add` is used to add entries to arrays, weighted sets or to the mapped dimensions of tensors. ###### Adding array elements The added entries are appended to the end of the array in the order specified. ``` field tracks type array { indexing: summary } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "tracks": { "add": [ "Lay Lady Lay", "Every Grain of Sand" ] } } } ``` ``` ###### Add weighted set entries Add weighted set elements by using a JSON key/value syntax, where the value is the weight of the element. Adding a key/weight mapping that already exists will overwrite the existing weight with the new one. ``` field int_weighted_set type weightedset { indexing: summary } field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "int_weighted_set": { "add": { "123": 123, "456": 100 } }, "string_weighted_set": { "add": { "item 1": 144, "item 2": 7 } } } } ``` ``` ###### Add tensor cells Add cells to mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Adding a cell that already exists will overwrite the cell value with the new value. The address must be fully specified, but cells with bound indexed dimensions not specified will receive the default value of `0.0`. See system test[tensor add update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples. ``` field tensorfield type tensor(x{},y[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "add": { "cells": [ { "address": { "x": "b", "y": "0" }, "value": 2.0 }, { "address": { "x": "b", "y": "1" }, "value": 3.0 } ] } } } } ``` ``` In this example, cell `{"x":"b","y":"2"}` will implicitly be set to 0.0. So if you started with the following tensor: ``` { {"x": "a", "y": "0"}: 0.2, {"x": "a", "y": "1"}: 0.3, {"x": "a", "y": "2"}: 0.5, } ``` You now end up with this tensor after the above add operation was applied: ``` { {"x": "a", "y": "0"}: 0.2, {"x": "a", "y": "1"}: 0.3, {"x": "a", "y": "2"}: 0.5, {"x": "b", "y": "0"}: 2.0, {"x": "b", "y": "1"}: 3.0, {"x": "b", "y": "2"}: 0.0, } ``` Prefer the _block short form_ for mixed tensors instead. This also avoids the problem where cells with indexed dimensions are not specified: ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "add": { "blocks": [ { "address": { "x": "b" }, "values": [2.0, 3.0, 5.0] } ] } } } } ``` ``` ##### remove Remove elements from weighted sets, maps and tensors with `remove`. ###### Weighted set field ``` field string_weighted_set type weightedset { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:weightedsetdoctype::example1", "fields": { "string_weighted_set": { "remove": { "item 2": 0 } } } } ``` ``` ###### Map field ``` field string_map type map { indexing: summary } ``` ``` ``` { "update":"id:mynamespace:mapdoctype::example1", "fields": { "string_map{item 2}": { "remove": 0 } } } ``` ``` ###### Tensor field Removes cells from mapped or mixed tensors. Invalid for tensors with only indexed dimensions. Only mapped dimensions should be specified for tensors with both mapped and indexed dimensions, as all indexed cells the mapped dimensions point to will be removed implicitly. See system test[tensor remove update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_add_remove_update)for more examples. ``` field tensorfield type tensor(x{},y[2]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "remove": { "addresses": [ {"x": "b"}, {"x": "c"} ] } } } } ``` ``` In this example, cells `{x:b,y:0},{x:b,y:1},{x:c,y:0},{x:c,y:1}` will be removed. It is also supported to specify only a subset of the mapped dimensions in the addresses. In that case, all cells that match the label values of the specified dimensions are removed. In the given example, all cells having label `b` for dimension `x` are removed. ``` field tensorfield type tensor(x{},y{},z[2]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "remove": { "addresses": [ {"x": "b"} ] } } } } ``` ``` ##### Arithmetic The four arithmetic operators `increment`, `decrement`,`multiply` and `divide` are used to modify _single value_ numeric values without having to look up the current value before applying the update. Example: ``` field sales type int { indexing: summary | attribute } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "sales": { "increment": 1 } } } ``` ``` ##### match If an arithmetic operation is to be done for a specific key in a _weighted set or array_, use the `match` operation: ``` field track_popularity type weightedset { indexing: summary | attribute } ``` ``` ``` { "update": "id:mynamespace:music::https://music.yahoo.com/bobdylan/BestOf", "fields": { "track_popularity": { "match": { "element": "Lay Lady Lay", "increment": 1 } } } } ``` ``` In other words, for the weighted set "track\_popularity",`match` the element "Lay Lady Lay", then `increment` its weight by 1. See the [weightedset properties](schema-reference.html#weightedset-properties)reference for how to make incrementing a non-existing key trigger auto-create of the key. If the updated field is an array, the `element` value would be a positive integer. **Note:** Only oneelement can be matched per operation. ##### Modify tensors Individual cells in tensors can be modified using the `modify` update. The cells are modified according to the given operation: - `replace` - replaces a single cell value - `add` - adds a value to the existing cell value - `multiply` - multiples a value with the existing cell value The addresses of cells must be fully specified. If the cell does not exist, the update for that cell will be ignored. Use `"create": true` (see example below) to create non-existing cells before the modify update is applied. See system test[tensor modify update](https://github.com/vespa-engine/system-test/tree/master/tests/search/tensor_feed/tensor_modify_update)for more examples. ``` field tensorfield type tensor(x[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "replace", "addresses": [ { "address": { "x": "1" }, "value": 7.0 }, { "address": { "x": "2" }, "value": 8.0 } ] } } } } ``` ``` In this example, cell `{"x":"1"}` is replaced with value 7.0 and `{"x":"2"}` with value 8.0. If operation `add` or `multiply` was used instead, 7.0 and 8.0 would be added or multiplied to the current values of cells `{"x":"1"}` and `{"x":"2"}`. For tensors with a single mapped dimension the _cells short form_ can also be used: ``` field tensorfield type tensor(x{}) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "add", "create": true, "cells": { "b": 5.0, "c": 6.0 } } } } } ``` ``` In this example, 5.0 is added to cell `{"x":"b"}` and 6.0 is added to cell `{"x":"c"}`. With `"create": true` non-existing cells in the input tensor are created before applying the modify update. The default cell value is 0.0 for `replace` and `add`, and 1.0 for `multiply`. This means a non-existing cell ends up with the value specified in the operation. For mixed tensors the _block short form_ can also be used to modify entire dense subspaces: ``` field tensorfield type tensor(x{},y[3]) { indexing: attribute | summary } ``` ``` ``` { "update": "id:mynamespace:tensordoctype::example", "fields": { "tensorfield": { "modify": { "operation": "replace", "blocks": { "a": [1,2,3], "b": [4,5,6] } } } } } ``` ``` ##### Fieldpath Fieldpath is for accessing fields within composite structures - for structures that are not part of index or attribute, it is possible to access elements directly using fieldpaths. This is done by adding more information to the field value. For map structures, specify the key (see [example](#assign)). ``` mymap{mykey} ``` and then do operation on the element which is keyed by "mykey". Arrays can be accessed as well (see [details](#assign)). ``` myarray[3] ``` And this is also true for structs (see [details](#assign)).**Note:** Struct updates do not work for[index](services-content.html#document) mode: ``` mystruct.value1 ``` This also works for nested structures, e.g. a `map` of `map` to `array` of `struct`: ``` ``` { "update": "id:mynamespace:complexdoctype::foo", "fields": { "nested_structure{firstMapKey}{secondMapKey}[4].title": { "assign": "Look at me, mom! I'm hiding deep in a nested type!" } } } ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Field types](#field-types) - [Empty fields](#empty-fields) - [Document operations](#document-operations) - [Put](#put) - [Get](#get) - [Remove](#remove) - [Update](#update) - [Update operations](#update-operations) - [assign](#assign) - [Single value field](#single-value-field) - [Tensor field](#tensor-field) - [Map field](#assign-map-field) - [Array field](#array-field) - [Weighted set field](#weighted-set-field) - [Clearing a field](#clearing-a-field) - [add](#add) - [Adding array elements](#add-array-elements) - [Add weighted set entries](#add-weighted-set) - [Add tensor cells](#tensor-add) - [remove](#composite-remove) - [Weighted set field](#weighted-set-field-remove) - [Map field](#map-field-remove) - [Tensor field](#tensor-remove) - [Arithmetic](#arithmetic) - [match](#match) - [Modify tensors](#tensor-modify) - [Fieldpath](#fieldpath) --- ## Document Processing ### Document Processing This document describes how to develop and deploy _Document Processors_, often called _docproc_ in this documentation. #### Document Processing This document describes how to develop and deploy _Document Processors_, often called _docproc_ in this documentation. Document processing is a framework to create [chains](components/chained-components.html) of configurable [components](jdisc/container-components.html), that read and modify document operations. The input source splits the input data into logical units called [documents](documents.html). A [feeder application](reads-and-writes.html) sends the documents into a document processing chain. This chain is an ordered list of document processors. Document processing examples range from language detection, HTML removal and natural language processing to mail attachment processing, character set transcoding and image thumbnailing. At the end of the processing chain, extracted data will typically be set in some fields in the document. The motivation for document processing is that code and configuration is atomically deployed, as like all Vespa components. It is also easy to build components that access data in Vespa as part of processing. To get started, see the [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing). Read [indexing](indexing.html) to understand deployment and routing. As document processors are chained components just like Searchers, read [Searcher Development](searcher-development.html). For reference, see the [Javadoc](https://javadoc.io/doc/com.yahoo.vespa/docproc), and [services.xml](reference/services-docproc.html). ![Document Processing component in Vespa overview](/assets/img/vespa-overview-docproc.svg) ##### Deploying a Document Processor Refer to[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) to get started, [LyricsDocumentProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/document-processing/src/main/java/ai/vespa/example/album/LyricsDocumentProcessor.java) is a document processor example. Add the document processor in [services.xml](reference/services-docproc.html), and then add it to a [chain](#chains). The type of processing done by the processor dictates what chain it should be part of: - If it does general data-processing, such as populating some document fields from others, looking up data in external services, etc., it should be added to a general docproc chain. - If, and only if, it does processing required for _indexing_ - or requires this to have already been run — it should be added to a chain which inherits the _indexing_ chain, and which is used for indexing by a content cluster. An example that adds a general document processor to the "default" chain, and an indexing related processor to the chain for a particular content cluster: ``` \ \ ...\ ``` The "default" chain, if it exists, is run by default, before the chain used for indexing. The default indexing chain is called "indexing", and _must_ be inherited by any chain that is to replace it. To run through any chain, specify a [route](/en/operations-selfhosted/routing.html) which includes the chain. For example, the route `default/chain.my-chain indexing` would route feed operations through the chain "my-chain" in the "default" container cluster, and then to the "indexing" hop, which resolves to the specified indexing chain for each content cluster the document should be sent to. More details can be found in [indexing](/en/operations-selfhosted/routing.html#document-processing): ##### Document Processors A document processor is a component extending `com.yahoo.docproc.DocumentProcessor`. All document processors must implement `process()`: ``` public Progress process(Processing processing); ``` When the container receives a document operation, it will create a new `Processing`, and add the `DocumentPut`s, `DocumentUpdate`s or `DocumentRemove`s to the `List` accessible through `Processing.getDocumentOperations()`. The latter is useful also where a processing should be stopped by doing `Processing.getDocumentOperations().clear()` before `Progress.DONE`, say for blocklist use, to stop a `DocumentPut/Update`. Furthermore, the call stack of the document processing chain in question will be _copied_ to `Processing.callStack()`, so that document processors may freely modify the flow of control for this processing without affecting all other processings going on. After creation, the `Processing` is added to an internal queue. A worker thread will retrieve a `Processing` from the input queue, and run its document operations through its call stack. A minimal, no-op document processor implementation is thus: ``` ``` import com.yahoo.docproc.*; public class SimpleDocumentProcessor extends DocumentProcessor { public Progress process(Processing processing) { return Progress.DONE; } } ``` ``` The `process()` method should loop through all document operations in `Processing.getDocumentOperations()`, do whatever it sees fit to them, and return a `Progress`: ``` ``` public Progress process(Processing processing) { for (DocumentOperation op : processing.getDocumentOperations()) { if (op instanceof DocumentPut) { DocumentPut put = (DocumentPut) op; // TODO do something to 'put here } else if (op instanceof DocumentUpdate) { DocumentUpdate update = (DocumentUpdate) op; // TODO do something to 'update' here } else if (op instanceof DocumentRemove) { DocumentRemove remove = (DocumentRemove) op; // TODO do something to 'remove' here } } return Progress.DONE; } ``` ``` | Return code | Description | | --- | --- | | `Progress.DONE` | Returned if a document processor has successfully processed a `Processing`. | | `Progress.FAILED` | Processing failed and the input message should return a _fatal_ failure back to the feeding application, meaning that this application will not try to re-feed this document operation. Return an error message/reason by calling `withReason()`: This result is represented as a `500 Internal Server Error` response in [Document v1](document-v1-api-guide.html). ``` ``` if (op instanceof DocumentPut) { return Progress.FAILED.withReason("PUT is not supported"); } ``` ``` | | `Progress.INVALID_INPUT` | Available since 8.584. Processing failed due to invalid input, like a malformed document operation. This result is represented as a `400 Bad Request` response in [Document v1](document-v1-api-guide.html). | | `Progress.LATER` | See [execution model](#execution-model). The document processor wants to release the calling thread and be called again later. This is useful if e.g. calling an external service with high latency. The document processor may then save its state in the `Processing` and resume when called again later. There are no guarantees as to _when_ the processor is called again with this `Processing`; it is simply appended to the back of the input queue. By the use of `Progress.LATER`, this is an asynchronous model, where the processing of a document operation does not need to consume one thread for its entire lifespan. Note, however, that the document processors themselves are shared between all processing operations in a chain, and must thus be implemented [thread-safe](#state). | | Exception | Description | | --- | --- | | `com.yahoo.docproc.TransientFailureException` | Processing failed and the input message should return a _transient_ failure back to the feeding application, meaning that this application _may_ try to re-feed this document operation. | | `RuntimeException` | Throwing any other `RuntimeException` means same behavior as for `Progress.FAILED`. | ##### Chains The call stack mentioned above is another name for a _document processor chain_. Document processor chains are a special case of the general [component chains](components/chained-components.html) - to avoid confusion some concepts are explained here as well. A document processor chain is nothing more than a list of document processor instances, having an id, and represented as a stack. The document processor chains are typically not created for every processing, but are part of the configuration. Multiple ones may exist at the same time, the chain to execute will be specified by the message bus destination of the incoming message. The same document processor instance may exist in multiple document processor chains, which is why the `CallStack` of the `Processing` is responsible for knowing the next document processor to invoke in a particular message. The execution order of the document processors in a chain are not ordered explicitly, but by [ordering constraints](components/chained-components.html#ordering-components) declared in the document processors or their configuration. ##### Execution model The Document Processing Framework works like this: 1. A thread from the message bus layer appends an incoming message to an internal priority queue, shared between all document processing chains configured on a node. The priority is set based on the message bus priority of the message. Messages of the same priority are ordered FIFO. 2. One worker thread from the docproc thread pool picks one message from the head of the queue, deserializes it, copies the call stack (chain) in question, and runs it through the document processors. 3. Processing finishes if **(a)** the document(s) has passed successfully through the whole chain, or **(b)** a document processor in the chain has returned `Progress.FAILED` or thrown an exception. 4. The same thread passes the message on to the message bus layer for further transport on to its destination. There is a single instance of each document processor chain. In every chain, there is a single instance of each document processor - unless a chain is configured with multiple, identical document processors - this is a rare case. As is evident from the model above, multiple worker threads execute the document processors in a chain concurrently. Thus, many threads of execution can be going through `process()` in a document processor, at the same time. This model places an important constraint on document processor classes: _instance variables are not safe._ They must be eliminated, or made thread-safe somehow. Also see [Resource management](jdisc/container-components.html#resource-management), use `deconstruct()` in order to not leak resources. ###### Asynchronous execution The execution model outlined above also shows one important restriction: If a document processor performs any high-latency operation in its process() method, a docproc worker thread will be occupied. With all _n_ worker threads blocking on an external resource, throughput will be limited. This can be fixed by saving the state in the Processing object, and returning `Progress.LATER`. A document processor doing a high-latency operation should use a pattern like this: 1. Check a self-defined context variable in Processing for status. Basically, _have we seen this Processing before?_ 2. If no: 1. We have been given a Processing object fresh off the network, we have not seen this before. Process it up until the high-latency operation. 2. Start the high-latency operation (possibly in a separate thread). 3. Save the state of the operation in a self-defined context variable in the Processing. 4. Return `Progress.LATER`. This Processing is the appended to the back of the input queue, and we will be called again later. 3. If yes: 1. Retrieve the reference that we set in our self-defined context variable in Processing. 2. Is the high-latency operation done? If so, return `Progress.DONE`. 3. Is it not yet done? Return `Progress.LATER` again. As is evident, this will let the finite set of document processing threads to do more work at the same time. ##### State Any state in the document processor for the particular Processing should be kept as local variables in the process method, while state which should be shared by all Processings should be kept as member variables. As the latter kind will be accessed by multiple threads at any one time, the state of such member variables must be _thread-safe_. This critical restriction is similar to those of e.g. the Servlet API. Options for implementing a multithread-safe document processor with instance variables: 1. Use immutable (and preferably final) objects: they never change after they are constructed; no modifications to their state occurs after the DocumentProcessor constructor returns. 2. Use a single instance of a thread-safe class. 3. Create a single instance and synchronize access to it across all threads (but this will severely limit scalability). 4. Arrange for each thread to have its own instance, e.g. with a `ThreadLocal`. ###### Processing Context Variables `Processing` has a map `String -> Object` that can be used to pass information between document processors. It is also useful when using `Progress.LATER` to save the state of a processing - see [Processing.java](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/Processing.java) for `get/setVariable` and more. The [sample application](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) uses such context variables, too. ##### Operation ordering ###### Feed ordering Ordering of feed operations is not guaranteed. Operations on different documents will be done concurrently and are therefore not ordered. However, Vespa guarantees that operations on the same document are processed in the order they were fed if they enter vespa at the _same_ feed endpoint. ###### Document processing ordering Document operations that are produced inside a document processor obey the same rules as at feed time. If you either split the input into other documents or into multiple operations to the same document, Vespa will ensure that operations to the same document id are sequenced and are delivered in the order they enter. ##### (Re)configuring Document Processing Consider the following configuration: ``` \\value\\ ``` Changing chain ids, components in a chain, component configuration and schema mapping all takes effect after [vespa activate](applications.html#deploy) - no restart required. Changing a _cluster name_ (i.e. the container id) requires a restart of docproc services after _vespa activate_. Note when adding or modifying a processing chain in a running cluster; if at the same time deploying a _new_ document processor (i.e. a document processor that was unknown to Vespa at the time the cluster was started), the container must be restarted: ``` $[vespa-sentinel-cmd](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-sentinel-cmd)restart container ``` ##### Class diagram ![Document processing core class diagram](/assets/img/document-processing-class-diagram.svg) The framework core supports asynchronous processing, processing one or multiple documents or document updates at the same time, document processors that makes dynamic decisions about the processing flow and passing of information between processors outside the document or document update: - One or more named `Docproc Services` may be created. One of the services is the _default_. - A service accepts subclasses of `DocumentOperation` for processing, meaning `DocumentPuts`, `DocumentUpdates` and `DocumentRemoves`. It has a `Call Stack` which lists the calls to make to various `DocumentProcessors` to process each DocumentOperation handed to the service. - Call Stacks consist of `Calls`, which refer to the Document Processor instance to call. - Document puts and document updates are processed asynchronously, the state is kept in a `Processing` for its duration (instead of in a thread or process). A Document Processor may make some asynchronous calls (typically to remote services) and return to the framework that it should be called again later for the same Processing to handle the outcome of the calls. - A processing contains its own copy of the Call Stack of the Docproc Service to keep track of what to call next. Document Processors may modify this Call Stack to dynamically decide the processing steps required to process a DocumentOperation. - A Processing may contain one or more DocumentOperations to be processed as a unit. - A Processing has a `context`, which is a Map of named values which can be used to pass arguments between processors. - Processings are prepared to be stored to disk, to allow a high number of ongoing long-term processings per node. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Deploying a Document Processor](#deploying-a-document-processor) - [Document Processors](#document-processors) - [Chains](#chains) - [Execution model](#execution-model) - [Asynchronous execution](#asynchronous-execution) - [State](#state) - [Processing Context Variables](#processing-context-variables) - [Operation ordering](#operation-ordering) - [Feed ordering](#feed-ordering) - [Document processing ordering](#document-processing-ordering) - [(Re)configuring Document Processing](#reconfiguring-document-processing) - [Class diagram](#class-diagram) --- ## Document Select Language ### Document Selector Language This document describes the _document selector language_, used to select a subset of documents when feeding, dumping and garbage collecting data. #### Document Selector Language This document describes the _document selector language_, used to select a subset of documents when feeding, dumping and garbage collecting data. It defines a text string format that can be parsed to build a parse tree, which in turn can answer whether a given document is contained within the subset or not. ##### Examples Match all documents in the `music` schema: `music` As applications can have multiple schemas, match document type (schema) and then a specific value in the `artistname` field: `music and music.artistname == "Coldplay"` Below, the first condition states that the documents should be of type music, and the author field must exist. The second states that the field length must be set, and be less than or equal to 1000: `music.author and music.length <= 1000` The next expression selects all documents where either of the subexpressions are true. The first one states that the author field should include the name John Doe, with anything in between or in front. The `\n` escape is converted to a newline before the field comparison is done. Thus requiring the field to end with Doe and a newline for a match to be true. The second expression selects all books where no author is defined: `book.author = "*John*Doe\n" or not book.author` Here is an example of how parentheses are used to group expressions. Also, a constant value false has been used. Note that the `(false or music.test)` sub-expression could be exchanged with just `music.test` without altering the result of the selection. The sub-expression within the `not` clause selects all documents where the size field is above 1000 and the test field is defined. The `not` clause inverts the selection, thus selecting all documents with size less than or equal to 1000 or the test field undefined: `not (music.length > 1000) and (false or music.test)` Other examples: - `music.version() == 3 and (music.givenname + " " + music.surname).lowercase() = "bruce spring*"` - `id.user.hash().abs() % 300 % 7 = 1` - `music.wavstream.hash() == music.checksum` - `music.size / music.length > 10` - `music.expire > now() - 7200` ##### Case sensitiveness The identifiers used in this language (`and or not true false null id scheme namespace specific user group`) are not case-sensitive. It is recommended to use lower cased identifiers for consistency with the documentation. ##### Branch operators / precedence The branch operators are used to combine other nodes in the parse tree generated from the text format. The different branch nodes existing is listed in the table below in order of precedence. Operators listed in order of precedence: | Operator | Description | | --- | --- | | NOT | Unary prefix operator inverting the selection of the child node | | AND | Binary infix operator, which is true if all its children are | | OR | Binary infix operator, which is true if any of its children are | Use parentheses to define own precedence. `a and b or c and d` is equivalent to `(a and b) or (c and d) ` since and has higher precedence than or. The expression `a and (b or c) and d` is not equivalent to the previous two, since parentheses have been used to force the or-expression to be evaluated first. Parentheses can also be used in value calculations. Where modulo `%` has the highest precedence, multiplication `*` and division `/` next, addition `+` and subtractions `-` have lowest precedence. ##### Primitives | Primitive | Description | | --- | --- | | Boolean constant | The boolean constants `true` and `false` can be used to match all/nothing | | Null constant | Referencing a field that is not present in a document returns a special `null` value. The expression `music.title` is shorthand for `music.title != null`. There are potentially subtle interactions with null values when used with comparisons, see [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values). | | Document type | A document type can be used as a primitive to select a given type of documents - [example](/en/visiting.html#analyzing-field-values). | | Document field specification | A document field specification (`doctype.field`) can be used as a primitive to select all documents that have field set - a shorter form of `doctype.field != null` | | Comparison | The comparison is a primitive used to compare two values | ##### Comparison Comparisons operators compares two values using an operator. All the operators are infix and take two arguments. | Operator | Description | | --- | --- | | \> | This is true if the left argument is greater than the right one. Operators using greater than or less than notations only makes sense where both arguments are either numbers or strings. In case of strings, they are ordered by their binary (byte-wise) representation, with the first character being the most significant and the last character the least significant. If the argument is of mixed type or one of the arguments are not a number or a string, the comparison will be invalid and not match. | | \< | Matches if left argument is less than the right one | | \<= | Matches if the left argument is less than or equal to the right one | | \>= | Matches if the left argument is greater than or equal to the right one | | == | Matches if both arguments are exactly the same. Both arguments must be of the same type for a match | | != | Matches if both arguments are not the same | | = | String matching using a glob pattern. Matches only if the pattern given as the right argument matches the whole string given by the left argument. Asterisk `*` can be used to match zero or more of any character. Question mark `?` can be used to match any one character. The pattern matching operators, regex `=~` and glob `=`, only makes sense if both arguments are strings. The regex operator will never match anything else. The glob operator will revert to the behaviour of `==` if both arguments are not strings. | | =~ | String matching using a regular expression. Matches if the regular expression given as the right argument matches the string given as the left argument. Regex notation is like perl. Use '^' to indicate start of value, '$' to indicate end of value | ###### Comparisons with missing fields (null values) The only comparison operators that are well-defined when one or both operands may be `null`(i.e. field is not present) are `==` and `!=`. Using any other comparison operators on a `null` value will yield a special _invalid_ value. Invalid values may "poison" any logical expression they are part of: - `AND` returns invalid if none of its operands are false and at least one is invalid - `OR` returns invalid if none of its operands are true and at least one is invalid - `NOT` returns invalid if the operand is invalid If an invalid value is propagated as the root result of a selection expression, the document is not considered a match. This is usually the behavior you want; if a field does not exist, any selection requiring it should not match either. However, in garbage collection, documents which results in an invalid selection are _not_ removed as that could be dangerous. One example where this may have _unexpected_ behavior: 1. You have many documents of type `foo` already fed into a cluster. 2. You add a new field `expires_at_time` to the document type and update a subset of the documents that you wish to keep. 3. You add a garbage collection selection to the `foo` document declaration to only keep non-expired documents: `foo.expires_at_time > now()` At this point, the old documents that _do not_ contain an `expires_at_time` field will _not_ be removed, as the expression will evaluate to invalid instead of `false`. To work around this issue, "short-circuiting" using a field presence check may be used: `(foo.expires_at_time != null) and (foo.expires_at_time > now())`. ##### Null behavior with imported fields If your selection references imported fields, `null` will be returned for any imported field when the selection is evaluated in a context where the referenced document can't be retrieved. For GC expressions this will happen in the client as part of the feed routing logic, and it may also happen on backend nodes whose parent document set is incomplete (in case of node failures etc.). It is therefore important that you have this in mind when writing GC selections using imported fields. When you specify a selection criteria in a `` tag, you're stating what a document must satisfy in order to be fed into the content cluster and to be kept there. As an example, imagine a document type `music_recording` with an imported field `artist_is_cool` that points to a boolean field `is_cool` in a parent `artist` document. If you only want your cluster to retain recordings from artists that are certifiably cool, you might be tempted to write a selection like the following: ``` ``` ``` ``` **This won't work as expected**, because this expression is evaluated as part of the feeding pipeline to figure out if a cluster should accept a given document. At that point in time, there is no access to the parent document. Consequently, the field will return `null` and the document won't be routed to the cluster. Instead, write your expressions to handle the case where the parent document _may not exist_: ``` ``` ``` ``` With this selection, we explicitly let a document be accepted into the cluster if its imported field is _not_ available. However, if it _is_ available, we allow it to be used for GC. ##### Locale / Character sets The language currently does not support character sets other than ASCII. Glob and regex matching of single characters are not guaranteed to match exactly one character, but might match a part of a character represented by multiple byte values. ##### Values The comparison operator compares two values. A value can be any of the following: | Document field specification | Syntax: `.` Documents have a set of fields defined, depending on the document type. The field name is the identifier used for the field. This expression returns the value of the field, which can be an integer, a floating point number, a string, an array, or a map of these types. For multivalues, we support only the _equals_ operator for comparison. The semantics is that the array returned by the fieldvalue must _contain_ at least one element that matches the other side of the comparison. For maps, there must exist a key matching the comparison. The simplest use of the fieldpath is to specify a field, but for complex types please refer to [the field path syntax documentation](document-field-path.html). | | Id | Syntax: ` id.[scheme|namespace|type|specific|user|group] ` Each document has a Document Id, uniquely identifying that document within a Vespa installation. The id operator returns the string identifier, or if an optional argument is given, a part of the id. - scheme (id) - namespace (to separate different users' data) - type (specified in the id scheme) - specific (User specified part to distinguish documents within a namespace) - user (The number specified in document ids using the n= modifier) - group (The string group specified in document ids using the g= modifier) | | null | The value null can be given to specify nothingness. For instance, a field specification for a document not containing the field will evaluate to null, so the comparison 'music.artist == null' will select all documents that don't have the artist field set. 'id.user == null' will match all documents that don't use the `n=`[document id scheme](../documents.html#id-scheme). Tensor fields can _only_ be compared against null. It's not possible to write a document selection that uses the _contents_ of tensor fields—only their presence can be checked. | | Number | A value can be a number, either an integer or a floating point number. Type of number is insignificant. You don't have to use the same type of number on both sides of a comparison. For instance '3.0 \< 4' will match, and '3.0 == 3' will probably match (operator == is generally not advised for floating point numbers due to rounding issues). Numbers can be written in multiple ways - examples: ``` 1234 -234 +53 +534.34 543.34e4 -534E-3 0.2343e-8 ``` | | Strings | A string value is given quoted with double quotes (i.e. "mystring"). The string is interpreted as an ASCII string. that is, only ASCII values 32 to 126 can be used unescaped, apart from the characters \ and " which also needs to be escaped. Escape common special characters like: | Character | Escaped character | | --- | --- | | Newline | \n | | Carriage return | \r | | Tab | \t | | Form feed | \f | | " | \" | | Any other character | \x## (where ## is a two digit hexadecimal number specifying the ASCII value. | | ###### Value arithmetics You can do arithmetics on values. The common arithmetics operators addition `+`, subtraction `-`, multiplication `*`, division `/` and modulo `%` are supported. ###### Functions Functions are called on something and returns a value that can be used in comparison expressions: | Value functions | A value function takes a value, does something with it and returns a value which can be of any type. - _abs()_ Called on a numeric type, returns the absolute value of that numeric type. That is -3 returns 3 and -4.3 returns 4.3. - _hash()_ Calculates an MD5 hash of whatever value it is called on. The result is a signed 64-bit integer. (Use abs() after if you want to only get positive hash values). - _lowercase()_ Called on a string value to turn upper case characters into lower case ones. **NOTE:** This only works for the characters 'a' through 'z', no locale support. | | Document type functions | Some functions can take a document type instead of a value, and return a value based on the type. - _version()_ The `version()` function returns the version number of a document type. | ###### Now function Document selection provides a _now()_ function, which returns the current date timestamp. Use this to filter documents by age, typically for [garbage collection](services-content.html#documents). **Example**: If you have a long field _inserttimestamp_ in your `music` schema, this expression will only match documents from the last two hours: `music.inserttimestamp > now() - 7200` ##### Using imported fields in selections When using [parent-child](../parent-child.html) you can refer to simple imported fields (i.e. top-level primitive fields) in selections as if they were regular fields in the child document type. Complex fields (collections, structures etc.) are not supported. **Important:** special care needs to be taken when using document selections referencing imported fields, especially if using these are part of garbage collection expressions. If an imported field references a document that cannot be accessed at evaluation time, the imported field behaves as if it had been a regular, non-present field in the child document. In other words, it will return the special `null` value. See [comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values)for a more detailed discussion of null-semantics and how to write selections that handle these in a well-defined manner. In particular, read [null behavior with imported fields](#null-behavior-with-imported-fields) if you're writing GC selections. ###### Example The following is an example of a 3-level parent-child hierarchy. Grandparent schema: ``` schema grandparent { document grandparent { field a1 type int { indexing: attribute | summary } } } ``` Parent schema, with reference to grandparent: ``` schema parent { document parent { field a2 type int { indexing: attribute | summary } field ref type reference { indexing: attribute | summary } } import field ref.a1 as a1 {} } ``` Child schema, with reference to parent and (transitively) grandparent: ``` schema child { document child { field a3 type int { indexing: attribute | summary } field ref type reference { indexing: attribute | summary } } import field ref.a1 as a1 {} import field ref.a2 as a2 {} } ``` Using these in document selection expressions is easy: Find all child docs whose grandparents have an `a1` greater than 5: `child.a1 > 5` Find all child docs whose parents have an `a2` of 10 and grandparents have `a1` of 4: `child.a1 == 10 and child.a2 == 4` Find all child docs where the parent document cannot be found (or where the referenced field is not set in the parent): `child.a2 == null` Note that when visiting `child` documents we only ever access imported fields via the**child** document type itself. A much more complete list usage examples for the above document schemas and reference relations can be found in the[imported fields in selections](https://github.com/vespa-engine/system-test/blob/master/tests/search/parent_child/imported_fields_in_selections.rb) system test. This test covers both the visiting and GC cases. ##### Constraints Language identifiers restrict what can be used as document type names. The following values are not valid document type names:_true, false, and, or, not, id, null_ ##### Grammar - EBNF of the language To simplify, double casing of strings has not been included. The identifiers "null", "true", "false" etc. can be written in any case, including mixed case. ``` nil = "null" ; bool = "true" | "false" ; posdigit = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; digit = '0' | posdigit ; hexdigit = digit | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' ; integer = ['-' | '+'], posdigit, { digit } ; float = ['-' | '+'], digit, { digit }, ['.' , { digit }, [ ('e' | 'E'), posdigit, { digit }] ] ; number = float | integer ; stdchars = ? All ASCII chars except '\\', '"', 0 - 31 and 127 - 255 ? ; alpha = ? ASCII characters in the range a-z and A-Z ? ; alphanum = alpha | digit ; space = ( ' ' | '\t' | '\f' | '\r' | '\n' ) ; string = '"', { stdchars | ( '\\', ( 't' | 'n' | 'f' | 'r' | '"' ) ) | ( "\\x", hexdigit, hexdigit ) }, '"' ; doctype = alpha, { alphanum } ; fieldname = { alphanum '{' |'}' | '[' | ']' '.' } ; function = alpha, { alphanum } ; idarg = "scheme" | "namespace" | "type" | "specific" | "user" | "group" ; searchcolumnarg = integer ; operator = ">=" | ">" | "==" | "=~" | "=" | "<=" | "<" | "!=" ; idspec = "id", ['.', idarg] ; searchcolumnspec = "searchcolumn", ['.', searchcolumnarg] ; fieldspec = doctype, ( function | ('.', fieldname) ) ; value = ( valuegroup | nil | number | string | idspec | searchcolumnspec | fieldspec ), { function } ; valuefuncmod = ( valuegroup | value ), '%', ( valuefuncmod | valuegroup | value ) ; valuefuncmul = ( valuefuncmod | valuegroup | value ), ( '*' | '/' ), ( valuefuncmul | valuefuncmod | valuegroup | value ) ; valuefuncadd = ( valuefuncmul | valuefuncmod | valuegroup | value ), ( '+' | '-' ), ( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup | value ) ; valuegroup = '(', arithmvalue, ')' ; arithmvalue = ( valuefuncadd | valuefuncmul | valuefuncmod | valuegroup | value ) ; comparison = arithmvalue, { space }, operator, { space }, arithmvalue ; leaf = bool | comparison | fieldspec | doctype ; not = "not", { space }, ( group | leaf ) ; and = ( not | group | leaf ), { space }, "and", { space }, ( and | not | group | leaf ) ; or = ( and | not | group | leaf ), { space }, "or", { space }, ( or | and | not | group | leaf ) ; group = '(', { space }, ( or | and | not | group | leaf ), { space }, ')' ; expression = ( or | and | not | group | leaf ) ; ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Examples](#examples) - [Case sensitiveness](#case-sensitiveness) - [Branch operators / precedence](#branch-operators-precedence) - [Primitives](#primitives) - [Comparison](#comparison) - [Comparisons with missing fields (null values)](#comparisons-with-missing-fields-null-values) - [Null behavior with imported fields](#null-behavior-with-imported-fields) - [Locale / Character sets](#locale-character-sets) - [Values](#values) - [Value arithmetics](#value-arithmetics) - [Functions](#functions) - [Using imported fields in selections](#using-imported-fields-in-selections) - [Example](#example) - [Constraints](#constraints) - [Grammar - EBNF of the language](#grammar-EBNF-of-the-language) --- ## Document Summaries ### Document Summaries A _document summary_ is the information that is shown for each document in a query result. #### Document Summaries A _document summary_ is the information that is shown for each document in a query result. What information to include is determined by a _document summary class_: A named set of fields with config on which information they should contain. A special document summary named `default` is always present and used by default. This contains: - all fields which specifies in their indexing statements that they may be included in summaries - all fields specified in any document summary - [sddocname](reference/default-result-format.html#sddocname) - [documentid](reference/default-result-format.html#documentid). Summary classes are defined in the schema: ``` schema music { document music { field artist type string { indexing: summary | index } field album type string { indexing: summary | index index: enable-bm25 } field year type int { indexing: summary | attribute } field category_scores type tensor(cat{}) { indexing: summary | attribute } }document-summary my-short-summary {summary artist {}summary album {}}} ``` See the [schema reference](reference/schema-reference.html#summary) for details. The summary class to use for a query is determined by the parameter[presentation.summary](reference/query-api-reference.html#presentation.summary);: ``` $ vespa query "select\*from music where album contains 'head'" \"presentation.summary=my-short-summary" ``` A common reason to define a document summary class is [performance](#performance): By configuring a document summary which only contains attributes the result can be generated without disk accesses. Note that this is needed to ensure only memory is accessed even if all fields are attributes because the [document id](documents.html#document-ids) is not stored as an attribute. Document summaries may also contain [dynamic snippets and highlighted terms](#dynamic-snippets). The document summary class to use can also be issued programmatically to the `fill()`method from a searcher, and multiple fill operations interleaved with programmatic filtering can be used to optimize data access and transfer when programmatic filtering in a Searcher is used. ##### Selecting summary fields in YQL A [YQL](query-language.html) statement can also be used to filter which fields from a document summary to include in results. Note that this is just a field filter in the container - a summary containing all fields of a summary class is always fetched from content nodes, so to optimize performance it is necessary to create custom summary classes. ``` $ vespa query "selectartist,album,documentid,sddocnamefrom music where album contains 'head'" ``` ``` ``` { "root": { }, "children": [ { "id": "id:mynamespace:music::a-head-full-of-dreams", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "sddocname": "music", "documentid": "id:mynamespace:music::a-head-full-of-dreams", "artist": "Coldplay", "album": "A Head Full of Dreams" } } ] } } ``` ``` Use `*` to select all the fields of the chosen document summary class used, (which is `default` by default). ``` $ vespa query "select\*from music where album contains 'head'" ``` ``` ``` { "root": { }, "children": [ { "id": "id:mynamespace:music::a-head-full-of-dreams", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "sddocname": "music", "documentid": "id:mynamespace:music::a-head-full-of-dreams", "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015, "category_scores": { "type": "tensor(cat{})", "cells": { "pop": 1.0, "rock": 0.20000000298023224, "jazz": 0.0 } } } } ] } } ``` ``` ##### Summary field rename Summary classes may define fields by names not used in the document type: ``` document-summary rename-summary { summary artist_name { source: artist } } ``` Refer to the [schema reference](reference/schema-reference.html#source) for adding [attribute](reference/schema-reference.html#add-or-remove-an-existing-document-field-from-document-summary) and[non-attribute](reference/schema-reference.html#add-or-remove-a-new-non-attribute-document-field-from-document-summary) fields - some changes require re-indexing. ##### Dynamic snippets Use [dynamic](reference/schema-reference.html#summary)to generate dynamic snippets from fields based on the query keywords. Example from Vespa Documentation Search - see the[schema](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/schemas/doc.sd): ``` document doc { field content type string { indexing: summary | indexsummary : dynamic} ``` A query for _document summary_ returns: > Use **document summaries** to configure which fields ... indexing: **summary** | index } } **document-summary** titleyear { **summary** title ... The example above creates a dynamic summary with the matched terms highlighted. The latter is called [bolding](reference/schema-reference.html#bolding)and can be enabled independently of dynamic summaries. Refer to the [reference](reference/schema-reference.html#summary) for the response format. ###### Dynamic snippet configuration You can configure generation of dynamic snippets by adding an instance of the[vespa.config.search.summary.juniperrc config](https://github.com/vespa-engine/vespa/blob/master/searchsummary/src/vespa/searchsummary/config/juniperrc.def)in services.xml inside the \ cluster tag for the content cluster in question. E.g: ``` ... 2 1000 500 300 ... ``` Numbers here are in bytes. ##### Performance [Attribute](attributes.html) fields are held in memory. This means summaries are memory-only operations if all fields requested are attributes, and is the optimal way to get high query throughput. The other document fields are stored as blobs in the [document store](proton.html#document-store). Requesting these fields may therefore require a disk access, increasing latency. **Important:** The default summary class will access the document store as it includes the [documentid](reference/default-result-format.html#documentid) field which is stored there. For maximum query throughput using memory-only access, use a dedicated summary class with attributes only. When using additional summary classes to increase performance, only the network data size is changed - the data read from storage is unchanged. Having "debug" fields with summary enabled will hence also affect the amount of information that needs to be read from disk. See [query execution](query-api.html#query-execution) - breakdown of the summary (a.k.a. result processing, rendering) phase: - The document summary latency on the content node, tracked by [content\_proton\_search\_protocol\_docsum\_latency\_average](operations/metrics.html). - Getting data across from content nodes to containers. - Deserialization from internal binary formats (potentially) to Java objects if touched in a [Searcher](searcher-development.html), and finally serialization to JSON (default rendering) + rendering and network. The work, and thus latency, increases with more [hits](reference/query-api-reference.html#hits). Use [query tracing](query-api.html#query-tracing) to analyze performance. Refer to [content node summary cache](performance/caches-in-vespa.html#content-node-summary-cache). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Selecting summary fields in YQL](#selecting-summary-fields-in-yql) - [Summary field rename](#summary-field-rename) - [Dynamic snippets](#dynamic-snippets) - [Dynamic snippet configuration](#dynamic-snippet-configuration) - [Performance](#performance) --- ## Document V1 Api Guide ### /document/v1 API guide Use the _/document/v1/_ API to read, write, update and delete documents. #### /document/v1 API guide Use the _/document/v1/_ API to read, write, update and delete documents. Refer to the [document/v1 API reference](reference/document-v1-api-reference.html) for API details. [Reads and writes](reads-and-writes.html) has an overview of alternative tools and APIs as well as the flow through the Vespa components when accessing documents. See [getting started](#getting-started) for how to work with the _/document/v1/ API_. Examples: | GET | | Get | ``` $ curl http://localhost:8080/document/v1/my_namespace/music/docid/love-id-here-to-stay ``` | | Visit | [Visit](visiting.html) all documents with given namespace and document type: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid ``` Visit all documents using continuation: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid?continuation=AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA ``` Visit using a _selection_: ``` $ curl http://localhost:8080/document/v1/namespace/music/docid?selection=music.genre=='blues' ``` Visit documents across all _non-global_ document types and namespaces stored in content cluster `mycluster`: ``` $ curl http://localhost:8080/document/v1/?cluster=mycluster ``` Visit documents across all _[global](reference/services-content.html#document)_ document types and namespaces stored in content cluster `mycluster`: ``` $ curl http://localhost:8080/document/v1/?cluster=mycluster&bucketSpace=global ``` Read about [visiting throughput](#visiting-throughput) below. | | | POST | Post data in the [document JSON format](reference/document-json-format.html). ``` $ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015 } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` | | PUT | Do a [partial update](partial-updates.html) for a document. ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Warmplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` | | DELETE | Delete a document by ID: ``` $ curl -X DELETE http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` Delete all documents in the `music` schema: ``` $ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/music/docid?selection=true&cluster=my_cluster" ``` | ##### Conditional writes A _test-and-set_ [condition](reference/document-select-language.html) can be added to Put, Remove and Update operations. Example: ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "condition": "music.artist==\"Warmplay\"", "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams ``` **Important:** Use _documenttype.fieldname_ (e.g. music.artist) in the condition, not only _fieldname_. If the condition is not met, a _412 Precondition Failed_ is returned: ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/a-head-full-of-dreams", "id": "id:mynamespace:music::a-head-full-of-dreams", "message": "[UNKNOWN(251013) @ tcp/vespa-container:19112/default]: ReturnCode(TEST_AND_SET_CONDITION_FAILED, Condition did not match document nodeIndex=0 bucket=20000000000000c4 ) " } ``` ``` Also see the [condition reference](reference/document-json-format.html#test-and-set). ##### Create if nonexistent ###### Upserts Updates to nonexistent documents are supported using [create](reference/document-json-format.html#create). This is often called an _upsert_ — insert a document if it does not already exist, or update it if it exists. An empty document is created on the content nodes, before the update is applied. This simplifies client code in the case of multiple writers. Example: ``` $ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true ``` ###### Conditional updates and puts with create Conditional updates and puts can be combined with [create](reference/document-json-format.html#create). This has the following semantics: - If the document already exists, the condition is evaluated against the most recent document version available. The operation is applied if (and only if) the condition matches. - Otherwise (i.e. the document does not exist or the newest document version is a tombstone), the condition is _ignored_ and the operation is applied as if no condition was provided. Support for conditional puts with create was added in Vespa 8.178. ``` $ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true&condition=music.title%3D%3D%27best+of%27 ``` **Warning:** If all existing replicas of a document are missing when an operation with `"create": true` is executed, a new document will always be created. This happens even if a condition has been given. If the existing replicas become available later, their version of the document will be overwritten by the newest update since it has a higher timestamp. **Note:** See [document expiry](documents.html#document-expiry) for auto-created documents — it is possible to create documents that do not match the selection criterion. **Note:** Specifying _create_ for a Put operation _without_ a condition has no observable effect, as unconditional Put operations will always write a new version of a document regardless of whether it existed already. ##### Data dump To iterate over documents, use [visiting](visiting.html) — sample output: ``` ``` { "pathId": "/document/v1/namespace/doc/docid", "documents": [ { "id": "id:namespace:doc::id-1", "fields": { "title": "Document title 1", } } ], "continuation": "AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA" } ``` ``` Note the _continuation_ token — use this in the next request for more data. Below is a sample script dumping all data using [jq](https://stedolan.github.io/jq/) for JSON parsing. It splits the corpus in 8 slices by default; using a number of slices at least four times the number of container nodes is recommended for high throughput. Timeout can be set lower for benchmarking. (Each request has a maximum timeout of 60s to ensure progress is saved at regular intervals) ``` ``` #!/bin bash set -eo pipefail if [$# -gt 2] then echo "Usage: $0 [number of slices, default 8] [timeout in seconds, default 31536000 (1 year)]" exit 1 fi endpoint="https://my.vespa.endpoint" cluster="db" selection="true" slices="${1:-8}" timeout="${2:-31516000}" curlTimeout="$((timeout > 60 ? 60 : timeout))" url="$endpoint/document/v1/?cluster=$cluster&selection=$selection&stream=true&timeout=$curlTimeout&concurrency=8&slices=$slices" auth="--key my-key --cert my-cert -H 'Authorization: my-auth'" curl="curl -sS $auth" start=$(date '+%s') doom=$((start + timeout)) ##### auth can be something like auth='--key data-plane-private-key.pem --cert data-plane-public-cert.pem' curl="curl -sS $auth" function visit { sliceId="$1" documents=0 continuation="" while printf -v filename "data-%03g-%012g.json.gz" $sliceId $documents json="$(eval "$curl '$url&sliceId=$sliceId$continuation'" | tee >( gzip > $filename ) | jq '{ documentCount, continuation, message }')" message="$(jq -re .message <<< $json)" && echo "Failed visit for sliceId $sliceId: $message" >&2 && exit 1 documentCount="$(jq -re .documentCount <<< $json)" && ((documents += $documentCount)) ["$(date '+%s')" -lt "$doom"] && token="$(jq -re .continuation <<< $json)" do echo "$documentCount documents retrieved from slice $sliceId; continuing at $token" continuation="&continuation=$token" done time=$(($(date '+%s') - start)) echo "$documents documents total retrieved in $time seconds ($((documents / time)) docs/s) from slice $sliceId" >&2 } for ((sliceId = 0; sliceId < slices; sliceId++)) do visit $sliceId & done wait ``` ``` ###### Visiting throughput Note that visit with selection is a linear scan over all the music documents in the request examples at the start of this guide. Each complete visit thus requires the selection expression to be evaluated for all documents. Running concurrent visits with selections that match disjoint subsets of the document corpus is therefore a poor way of increasing throughput, as work is duplicated across each such visit. Fortunately, the API offers other options for increasing throughput: - Split the corpus into any number of smaller [slices](reference/document-v1-api-reference.html#slices), each to be visited by a separate, independent series of HTTP requests. This is by far the most effective setting to change, as it allows visiting through all HTTP containers simultaneously, and from any number of clients—either of which is typically the bottleneck for visits through _/document/v1_. A good value for this setting is at least a handful per container. - Increase backend [concurrency](reference/document-v1-api-reference.html#concurrency) so each visit HTTP response is promptly filled with documents. When using this together with slicing (above), take care to also stream the HTTP responses (below), to avoid buffering too much data in the container layer. When a high number of slices is specified, this setting may have no effect. - [Stream](reference/document-v1-api-reference.html#stream) the HTTP responses. This lets you receive data earlier, and more of it per request, reducing HTTP overhead. It also minimizes memory usage due to buffering in the container, allowing higher concurrency per container. It is recommended to always use this, but the default is not to, due to backwards compatibility. ##### Getting started Pro-tip: It is easy to generate a `/document/v1` request by using the [Vespa CLI](vespa-cli.html), with the `-v` option to output a generated `/document/v1` request - example: ``` $ vespa document -v ext/A-Head-Full-of-Dreams.jsoncurl -X POST -H 'Content-Type: application/json'--data-binary @ext/A-Head-Full-of-Dreams.jsonhttp://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsSuccess: put id:mynamespace:music::a-head-full-of-dreams ``` See the [document JSON format](reference/document-json-format.html) for creating JSON payloads. This is a quick guide into dumping random documents from a cluster to get started: 1. To get documents from a cluster, look up the content cluster name from the configuration, like in the [album-recommendation](https://github.com/vespa-engine/sample-apps/blob/master/album-recommendation/app/services.xml) example: ``. 2. Use the cluster name to start dumping document IDs (skip `jq` for full json): ``` $ curl -s 'http://localhost:8080/document/v1/?cluster=music&wantedDocumentCount=10&timeout=60s' | \ jq -r .documents[].id ``` ``` id:mynamespace:music::love-is-here-to-stay id:mynamespace:music::a-head-full-of-dreams id:mynamespace:music::hardwired-to-self-destruct ``` `wantedDocumentCount` is useful to let the operation run longer to find documents, to avoid an empty result. This operation is a scan through the corpus, and it is normal to get empty result and the [continuation token](#data-dump). 3. Look up the document with id `id:mynamespace:music::love-is-here-to-stay`: ``` $ curl -s 'http://localhost:8080/document/v1/mynamespace/music/docid/love-is-here-to-stay' | jq . ``` ``` ``` { "pathId": "/document/v1/mynamespace/music/docid/love-is-here-to-stay", "id": "id:mynamespace:music::love-is-here-to-stay", "fields": { "artist": "Diana Krall", "year": 2018, "category_scores": { "type": "tensor(cat{})", "cells": { "pop": 0.4000000059604645, "rock": 0, "jazz": 0.800000011920929 } }, "album": "Love Is Here To Stay" } } ``` ``` 4. Read more about [document IDs](documents.html). ##### Troubleshooting - When troubleshooting documents not found using the query API, use [vespa visit](vespa-cli.html#documents) to export the documents. Then compare the `id` field with other user-defined `id` fields in the query. - Document not found responses look like: - Query results can have results like: - Delete _all_ documents in _music_ schema, with security credentials: ##### Request size limit Starting from version 8.577.16, Vespa returns 413 (Content too large) as a response to POST and PUT requests that are above the request size limit. To avoid this, automatically check document size and truncate or split large documents before feeding. For optimal performance, it is recommended to keep the document size below 10 MB. ##### Backpressure Vespa returns response code 429 (Too Many Requests) as a backpressure signal whenever client feed throughput exceeds system capacity. Clients should implement retry strategies as described in the [HTTP best practices](cloud/http-best-practices.html) document. Instead of implementing your own retry logic, consider using Vespa's feed clients which automatically handle retries and backpressure. See the [feed command](vespa-cli.html#documents) of the Vespa CLI and the [vespa-feed-client](vespa-feed-client.html). The `/document/v1` API includes a configurable operation queue that by default is tuned to balance latency, throughput and memory. Applications can adjust this balance by overriding the parameters defined in the [document-operation-executor](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) config definition. To optimize for higher throughput at the cost of increased latency and higher memory usage on the container, increase any of the `maxThrottled` (maximum queue capacity in number of operations), `maxThrottledAge` (maximum time in queue in seconds), and `maxThrottledBytes` (maximum memory usage in bytes) parameters. This allows the container to buffer more operations during temporary spikes in load, reducing the number of 429 responses while increasing request latency. Make sure to increase operation and client timeouts to accommodate for the increased latency. See the [config definition](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/document-operation-executor.def) for a detailed explanation of each parameter. Set the values to `0` for the opposite effect, i.e. to optimize for latency. Operations will be dispatched directly, and failed out immediately if the number of pending operations exceeds the dynamic window size of the document processing pipeline. _Example: overriding the default value of all 3 parameters to `0`._ ``` 0 0 0 ``` The effective operation queue configuration is logged when the container starts up, see below example. ``` INFO container Container.com.yahoo.document.restapi.resource.DocumentV1ApiHandler Operation queue: max-items=256, max-age=3000 ms, max-bytes=100 MB ``` You can observe the state of the operation queue through the metrics `httpapi_queued_operations`, `httpapi_queued_bytes` and `httpapi_queued_age`. ##### Using number and group id modifiers Do not use group or number modifiers with regular indexed mode document types. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](reference/services-content.html#document). Examples: | Get | Get a document in a group: ``` $ curl http://localhost:8080/document/v1/mynamespace/music/number/23/some_key ``` ``` $ curl http://localhost:8080/document/v1/mynamespace/music/group/mygroupname/some_key ``` | | Visit | Visit all documents for a group: ``` $ curl http://localhost:8080/document/v1/namespace/music/number/23/ ``` ``` $ curl http://localhost:8080/document/v1/namespace/music/group/mygroupname/ ``` | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Conditional writes](#conditional-writes) - [Create if nonexistent](#create-if-nonexistent) - [Upserts](#upserts) - [Conditional updates and puts with create](#conditional-updates-and-puts-with-create) - [Data dump](#data-dump) - [Visiting throughput](#visiting-throughput) - [Getting started](#getting-started) - [Troubleshooting](#troubleshooting) - [Request size limit](#request-size-limit) - [Backpressure](#backpressure) - [Using number and group id modifiers](#using-number-and-group-id-modifiers) --- ## Document V1 Api Reference ### /document/v1 API reference This is the /document/v1 API reference documentation. #### /document/v1 API reference This is the /document/v1 API reference documentation. Use this API for synchronous [Document](../documents.html) operations to a Vespa endpoint - refer to [reads and writes](../reads-and-writes.html) for other options. The [document/v1 API guide](../document-v1-api-guide.html) has examples and use cases. **Note:** Mapping from document IDs to /document/v1/ URLs is found in [document IDs](../documents.html#id-scheme) - also see [troubleshooting](../document-v1-api-guide.html#troubleshooting). Some examples use _number_ and _group_[document id](../documents.html#document-ids) modifiers. These are special cases that only work as expected for document types with [mode=streaming or mode=store-only](services-content.html#document). Do not use group or number modifiers with regular indexed mode document types. ##### Configuration To enable the API, add `document-api` in the serving container cluster - [services.xml](services-container.html): ``` \ ``` ##### HTTP requests | HTTP request | document/v1 operation | Description | | --- | --- | --- | | GET | _Get_ a document by ID or _Visit_ a set of documents by selection. | | | Get | Get a document: ``` /document/v1///docid/ /document/v1///number// /document/v1///group// ``` Optional parameters: - [cluster](#cluster) - [fieldSet](#fieldset) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Visit | Iterate over and get all documents, or a [selection](#selection) of documents, in chunks, using [continuation](#continuation) tokens to track progress. Visits are a linear scan over the documents in the cluster. ``` /document/v1/ ``` It is possible to specify namespace and document type with the visit path: ``` /document/v1///docid ``` Documents can be grouped to limit accesses to a subset. A group is defined by a numeric ID or string — see [id scheme](../documents.html#id-scheme). ``` /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - Visits can only retrieve data from _one_ content cluster, so `cluster` **must** be specified for requests at the root `/document/v1/` level, or when there is ambiguity. This is required even if the application has only one content cluster. Optional parameters: - [bucketSpace](#bucketspace) - Parent documents are [global](services-content.html#document) and in the `global` [bucket space](#bucketspace). By default, visit will visit non-global documents in the `default` bucket space, unless document type is indicated, and is a global document type. - [concurrency](#concurrency) - Use to configure backend parallelism for each visit HTTP request. - [continuation](#continuation) - [fieldSet](#fieldset) - [selection](#selection) - [sliceId](#sliceid) - [slices](#slices) - Split visiting of the document corpus across more than one HTTP request—thus allowing the concurrent use of more HTTP containers—use the `slices` and `sliceId` parameters. - [stream](#stream) - It's recommended enabling streamed HTTP responses, with the [stream](#stream) parameter, as this reduces memory consumption and reduces HTTP overhead. - [timeout](#timeout) - [tracelevel](#tracelevel) - [wantedDocumentCount](#wanteddocumentcount) - [fromTimestamp](#fromtimestamp) - [toTimestamp](#totimestamp) - [includeRemoves](#includeRemoves) Optional request headers: - [Accept](#accept) - specify the desired response format. | | POST | _Put_ a given document, by ID, or _Copy_ a set of documents by selection from one content cluster to another. | | | Put | Write the document contained in the request body in JSON format. ``` /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Optional parameters: - [condition](#condition) - Use for conditional writes. - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Copy | Write documents visited in source [cluster](#cluster) to the [destinationCluster](#destinationcluster) in the same application. A [selection](#selection) is mandatory — typically the document type. Supported paths (see [visit](#visit) above for semantics): ``` /document/v1/ /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [destinationCluster](#destinationcluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - [continuation](#continuation) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | | PUT | _Update_ a document with the given partial update, by ID, or _Update where_ the given selection is true. | | | Update | Update a document with the partial update contained in the request body in the [document update JSON format](document-json-format.html#update). ``` /document/v1///docid/ ``` Optional parameters: - [condition](#condition) - use for conditional writes - [create](#create) - use to create empty documents when updating non-existent ones. - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Update where | Update visited documents in [cluster](#cluster) with the partial update contained in the request body in the [document update JSON format](document-json-format.html#update). Supported paths (see [visit](#visit) above for semantics): ``` /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space - [continuation](#continuation) - [stream](#stream) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | | DELETE | _Remove_ a document, by ID, or _Remove where_ the given selection is true. | | | Remove | Remove a document. ``` /document/v1///docid/ ``` Optional parameters: - [condition](#condition) - [route](#route) - [timeout](#timeout) - [tracelevel](#tracelevel) | | | Delete where | Delete visited documents from [cluster](#cluster). Supported paths (see [visit](#visit) above for semantics): ``` /document/v1/ /document/v1///docid/ /document/v1///group/ /document/v1///number/ ``` Mandatory parameters: - [cluster](#cluster) - [selection](#selection) Optional parameters: - [bucketSpace](#bucketspace) - See [visit](#visit), `default` or `global` bucket space - [continuation](#continuation) - [stream](#stream) - [timeChunk](#timechunk) - [timeout](#timeout) - [tracelevel](#tracelevel) | ##### Request parameters | Parameter | Type | Description | | --- | --- | --- | | bucketSpace | String | Specify the bucket space to visit. Document types marked as `global` exist in a separate _bucket space_ from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root `/document/v1/` level this information is not available, and the non-global ("default") bucket space is visited by default. Specify `global` to visit global documents instead. Supported values: `default` (for non-global documents) and `global`. | | cluster | String | Name of [content cluster](../content/content-nodes.html) to GET from, or visit. | | concurrency | Integer | Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Default is 1. When `stream=true`, concurrency limits the maximum concurrency, which is otherwise unbounded, but controlled by a dynamic throttle policy. **Important:** Given a concurrency parameter of _N_, the worst case for memory used while processing the request grows linearly with _N_, unless [stream](#stream) mode is turned on. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent. | | condition | String | For test-and-set. Run a document operation conditionally — if the condition fails, a _412 Precondition Failed_ is returned. See [example](../document-v1-api-guide.html#conditional-writes). | | continuation | String | When visiting, a continuation token is returned as the `"continuation"` field in the JSON response, as long as more documents remain. Use this token as the `continuation` parameter to visit the next chunk of documents. See [example](../document-v1-api-guide.html#data-dump). | | create | Boolean | If `true`, updates to non-existent documents will create an empty document to update. See [create if nonexistent](../document-v1-api-guide.html#create-if-nonexistent). | | destinationCluster | String | Name of [content cluster](../content/content-nodes.html) to copy to, during a copy visit. | | dryRun | Boolean | Used by the [vespa-feed-client](../vespa-feed-client.html) using `--speed-test` for bandwidth testing, by setting to `true`. | | fieldSet | String | A [field set string](../documents.html#fieldsets) with the set of document fields to fetch from the backend. Default is the special `[document]` fieldset, returning all _document_ fields. To fetch specific fields, use the name of the document type, followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`). | | route | String | The route for single document operations, and for operations generated by [copy](#copy), [update](#update-where) or [deletion](#delete-where) visits. Default value is `default`. See [routes](/en/operations-selfhosted/routing.html). | | selection | String | Select only a subset of documents when [visiting](../visiting.html) — details in [document selector language](document-select-language.html). | | sliceId | Integer | The slice number of the visit represented by this HTTP request. This number must be non-negative and less than the number of [slices](#slices) specified for the visit - e.g., if the number of slices is 10, `sliceId` is in the range [0-9]. **Note:** If the number of distribution bits change during a sliced visit, the results are undefined. Thankfully, this is a very rare occurrence and is only triggered when adding content nodes. | | slices | Integer | Split the document corpus into this number of independent slices. This lets multiple, concurrent series of HTTP requests advance the same logical visit independently, by specifying a different [sliceId](#sliceid) for each. | | stream | Boolean | Whether to stream the HTTP response, allowing data to flow as soon as documents arrive from the backend. This obsoletes the [wantedDocumentCount](#wanteddocumentcount) parameter. The HTTP status code will always be 200 if the visit is successfully initiated. Default value is false. | | format.tensors | String | Controls how tensors are rendered in the result. | Value | Description | | --- | --- | | `short` | **Default**. Render the tensor value in an object having two keys, "type" containing the value, and "cells"/"blocks"/"values" ([depending on the type](document-json-format.html#tensor)) containing the tensor content. Render the tensor content in the [type-appropriate short form](document-json-format.html#tensor). | | `long` | Render the tensor value in an object having two keys, "type" containing the value, and "cells" containing the tensor content. Render the tensor content in the [general verbose form](document-json-format.html#tensor). | | `short-value` | Render the tensor content directly. Render the tensor content in the [type-appropriate short form](document-json-format.html#tensor). | | `long-value` | Render the tensor content directly. Render the tensor content in the [general verbose form](document-json-format.html#tensor). | | | timeChunk | String | Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit. Default value is 60. | | timeout | String | Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 180s. | | tracelevel | Integer | Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See [routes](/en/operations-selfhosted/routing.html). | | wantedDocumentCount | Integer | Best effort attempt to not respond to the client before `wantedDocumentCount` number of documents have been visited. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each visit request would otherwise process only a handful of documents. The maximum value of `wantedDocumentCount` is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value. | | fromTimestamp | Integer | Filters the returned document set to only include documents that were last modified at a time point equal to or higher to the specified value, in microseconds from UTC epoch. Default value is 0 (include all documents). | | toTimestamp | Integer | Filters the returned document set to only include documents that were last modified at a time point lower than the specified value, in microseconds from UTC epoch. Default value is 0 (sentinel value; include all documents). If non-zero, must be greater than, or equal to, `fromTimestamp`. | | includeRemoves | Boolean | Include recently removed document IDs, along with the set of returned documents. By default, only documents currently present in the corpus are returned in the `"documents"` array of the response; when this parameter is set to `"true"`, documents that were recently removed, and whose tombstones still exist, are also included in that array, as entries on the form `{ "remove": "id:ns:type::foobar" }`. See [here](/en/operations-selfhosted/admin-procedures.html#data-retention-vs-size) for specifics on tombstones, including their lifetime. | ##### HTTP request headers | Header | Values | Description | | --- | --- | --- | | Accept | `application/json` or `application/jsonl` | The [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Accept) header lets the client specify to the server what [media (MIME) types](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) it accepts as the response format. All Document V1 API calls support `application/json` for returning [JSON](#json) responses. [Streaming visiting](#stream) additionally supports `application/jsonl` for returning [JSON Lines](#json-lines) (JSONL) since Vespa 8.593. To ensure compatibility with older versions, make sure to check the `Content-Type`[HTTP response header](#http-response-headers). A JSONL response will always have a `Content-Type` media type of `application/jsonl`, and JSON wil always have a media type of `application/json`. Multiple acceptable types can be specified. JSONL will be returned if (and only if) `application/jsonl` is part of the list _and_ no other media types have a higher [quality value](https://httpwg.org/specs/rfc9110.html#quality.values). Example: ``` Accept: application/jsonl ``` If the client accepts both JSON and JSONL, the server will respond with JSONL: ``` Accept: application/json, application/jsonl ``` For backwards compatibility, if no `Accept` header is provided (or if no provided media types are acceptable) `application/json` is assumed. | ##### Request body POST and PUT requests must include a body for single document operations; PUT must also include a body for [update where](#update-where) visits. A field has a _value_ for a POST and an _update operation object_ for PUT. Documents and operations use the [document JSON format](document-json-format.html). The document fields must match the [schema](../schemas.html): ``` ``` { "fields": { "": "" } } ``` ``` ``` ``` { "fields": { "": { "" : "" } } } ``` ``` The _update-operation_ is most often `assign` - see [update operations](document-json-format.html#update-operations) for the full list. Values for `id` / `put` / `update` in the request body are silently dropped. The ID is generated from the request path, regardless of request body data - example: ``` ``` { "put" : "id:mynamespace:music::123", "fields": { "title": "Best of" } } ``` ``` This makes it easier to generate a feed file that can be used for both the [vespa-feed-client](../vespa-feed-client.html) and this API. ##### HTTP status codes | Code | Description | | --- | --- | | 200 | OK. Attempts to remove or update a non-existent document also yield this status code (see 412 below). | | 204 | No Content. Successful response to OPTIONS request. | | 400 | Bad request. Returned for undefined document types + other request errors. See [13465](https://github.com/vespa-engine/vespa/issues/13465) for defined document types not assigned to a content cluster when using PUT. Inspect `message` for details. | | 404 | Not found; the document was not found. This is only used when getting documents. | | 405 | Method Not Allowed. HTTP method is not supported by the endpoint. Valid combinations are listed [above](#http-requests) | | 412 | [condition](#condition) is not met. Inspect `message` for details. This is also the result when a condition if specified, but the document does not exist. | | 413 | Content too large; used for POST and PUT requests that are above the [request size limit](../document-v1-api-guide.html#request-size-limit). | | 429 | Too many requests; the document API has too many inflight feed operations, retry later. | | 500 | Server error; an unspecified error occurred when processing the request/response. | | 503 | Service unavailable; the document API was unable to produce a response at this time. | | 504 | Gateway timeout; the document API failed to respond within the given (or default 180s) timeout. | | 507 | Insufficient storage; the content cluster is out of memory or disk space. | ##### HTTP response headers | Header | Values | Description | | --- | --- | --- | | X-Vespa-Ignored-Fields | true | Will be present and set to 'true' only when a put or update contains one or more fields which were [ignored since they are not present in the document type](services-container.html#ignore-undefined-fields). Such operations will be applied exactly as if they did not contain the field operations referencing non-existing fields. References to non-existing fields in field _paths_ are not detected. | | Content-Type | `application/json` or `application/jsonl` | The [media type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME_types) (MIME type) of the response body. Either `application/json` for [JSON](#json) responses or `application/jsonl` for [JSON Lines](#json-lines) (JSONL) responses. The content type may include additional parameters such as `charset`. Example header: ``` Content-Type: application/json; charset=UTF-8 ``` | ##### Response formats Responses are by default in JSON format. [Streaming visiting](#stream)supports an optional [JSON Lines](#json-lines) (JSONL) response format since Vespa 8.593. ###### JSON JSON responses have the following fields: | Field | Description | | --- | --- | | pathId | Request URL path — always included. | | message | An error message — included for all failed requests. | | id | Document ID — always included for single document operations, including _Get_. | | fields | The requested document fields — included for successful _Get_ operations. | | documents[] | Array of documents in a visit result — each document has the _id_ and _fields_. | | documentCount | Number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). | | continuation | Token to be used to get the next chunk of the corpus - see [continuation](#continuation). | GET can include a `fields` object if a document was found in a _GET_ request ``` ``` { "pathId": "", "id": "", "fields": { } } ``` ``` A GET _visit_ result can include an array of `documents`plus a [continuation](#continuation): ``` ``` { "pathId": "", "documents": [ { "id": "", "fields": { } } ], "continuation": "", "documentCount": 123 } ``` ``` A continuation indicates the client should make further requests to get more data, while lack of a continuation indicates an error occurred, and that visiting should cease, or that there are no more documents. A `message` can be returned for failed operations: ``` ``` { "pathId": "", "message": "" } ``` ``` ###### JSON Lines A JSON Lines (JSONL) response is a stream of newline-separated JSON objects. Each line contains exactly one JSON object, and each JSON object takes up exactly one line. No line breaks are allowed within an object. JSONL is an optional response format for [streaming visiting](#stream), enabling efficient client-side parsing and fine-grained, continuous tracking of visitor progress. The JSONL response format is currently not supported for any other operations than streaming visiting. The JSONL response format is enabled by providing a HTTP [Accept](#accept) request header that specifies `application/jsonl` as the preferred response type, and will have a [Content-Type](#content-type) of `application/jsonl` if the server is on a version that supports JSONL visiting. Clients must check the `Content-Type` header to ensure they are getting the format they expect. JSONL support requires Vespa 8.593 or newer. Example response body: ``` ``` {"put":"id:ns:music::one","fields":{"foo":"bar"}} {"put":"id:ns:music::two","fields":{"foo":"baz"}} {"continuation":{"token":"...","percentFinished":40.0}} {"put":"id:ns:music::three","fields":{"foo":"zoid"}} {"remove":"id:ns:music::four"} {"continuation":{"token":"...","percentFinished":50.0}} {"continuation":{"token":"...","percentFinished":60.0}} {"put":"id:ns:music::five","fields":{"foo":"berg"}} {"continuation":{"token":"...","percentFinished":70.0}} {"sessionStats":{"documentCount":5}} {"continuation":{"percentFinished":100.0}} ``` ``` Note that the `"..."` values are placeholders for (from a client's perspective) opaque string values. ###### JSONL response objects **Note:** To be forwards compatible with future extensions to the response format, ignore unknown objects and fields. | Object | Description | | --- | --- | | put | A document [Put](document-json-format.html#put) operation in the same format as that accepted by Vespa's JSONL feed API. | | remove | A document [Remove](document-json-format.html#remove) operation in the same format as that accepted by Vespa's JSONL feed API. Only present if [includeRemoves](#includeRemoves) is `true`. | | continuation | A visitor [continuation](#continuation). Possible sub-object fields: | Field name | Description | | --- | --- | | `token` | An opaque string value representing the current visitor progress through the data space. This value can be provided as part of a subsequent visitor request to continue visiting from where the last request left off. Clients should not attempt to parse the contents of this string, as it's considered an internal implementation detail and may be changed (in a backwards compatible way) without any prior announcement. | | `percentFinished` | A floating point number between 0 and 100 (inclusive) that gives an approximation of how far the visitor has progressed through the data space. | The last line of a successful request should always be a `continuation` object. If (and only if) visiting has completed, the last `continuation` object will have a `percentFinished` value of `100` and will _not_ have a `token` field. | | message | A message received from the backend visitor session. Can be used by clients to report problems encountered during visiting. Possible sub-object fields: | Field name | Description | | --- | --- | | `text` | The actual message, in unstructured text | | `severity` | The severity of the message. One of `info`, `warning` or `error`. | | | sessionStats | Statistics from the backend visitor session. Possible sub-object fields: | Field name | Description | | --- | --- | | `documentCount` | The number of visited and selected documents. If [includeRemoves](#includeRemoves) is `true`, this also includes the number of returned removes (tombstones). | | Note that it's possible for a successful response to contain zero `put` or `remove` objects if the [selection](#selection) did not match any documents. ###### Differences from the JSON format The biggest difference in semantics between the JSON and JSONL response formats is when, and how, [continuation](#continuation) objects are returned. In the JSON format a continuation is included _once_ at the very end of the response object and covers the progress made by the entire request. If the request somehow fails after receiving 99% of all documents but prior to receiving the continuation field, the client must retry the entire request from the previously known continuation value. This can result in getting many requested documents twice; once from the incomplete first request and once more from the second request that covers the same part of the data space. In the JSON Lines format, a contination object is emitted to the stream _every time_ a backend data [bucket](../content/buckets.html) has been fully visited, as well as at the end of the response stream. This may happen many times in a response. Each continuation object _subsumes_ the progress of previously emitted continuations, meaning that a client only needs to remember the _most recent_ continuation value it observed in the response. If the request fails prior to completion, the client can specify the most recent continuation in the next request; it will then only receive duplicates for the data buckets that were actively being processed when the request failed. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Configuration](#configuration) - [HTTP requests](#http-requests) - [Request parameters](#request-parameters) - [HTTP request headers](#http-request-headers) - [Request body](#request-body) - [HTTP status codes](#http-status-codes) - [HTTP response headers](#http-response-headers) - [Response formats](#response-formats) - [JSON](#json) - [JSON Lines](#json-lines) --- ## Documents ### Documents Vespa models data as _documents_. #### Documents Vespa models data as _documents_. A document has a string identifier, set by the application, unique across all documents. A document is a set of [key-value pairs](document-api-guide.html). A document has a schema (i.e. type), defined in the [schema](schemas.html). When configuring clusters, a [documents](reference/services-content.html#documents) element sets what document types a cluster is to store. This configuration is used to configure the garbage collector if it is enabled. Additionally, it is used to define default routes for documents sent into the application. By default, a document will be sent to all clusters having the document type defined. Refer to [routing](/en/operations-selfhosted/routing.html) for details. Vespa uses the document ID to distribute documents to nodes. From the document identifier, the content layer calculates a numeric location. A bucket contains all the documents, where a given amount of least-significant bits of the location are all equal. This property is used to enable co-localized storage of documents - read more in [buckets](content/buckets.html) and [content cluster elasticity](elasticity.html). Documents can be [global](reference/services-content.html#document), see [parent/child](parent-child.html). ##### Document IDs The document identifiers are URIs, represented by a string, which must conform to a defined URI scheme for document identifiers. The document identifier string may only contain _text characters_, as defined by `isTextCharacter` in [com.yahoo.text.Text](https://github.com/vespa-engine/vespa/blob/master/vespajlib/src/main/java/com/yahoo/text/Text.java). ###### id scheme Vespa currently has only one defined scheme, the _id scheme_: `id::::` **Note:** An example mapping from ID to the URL used in [/document/v1/](document-v1-api-guide.html) is from`id:mynamespace:mydoctype::user-defined-id` to`/document/v1/mynamespace/mydoctype/docid/user-defined-id`. Find examples and tools in [troubleshooting](document-v1-api-guide.html#document-not-found). Find examples in the [/document/v1/](document-v1-api-guide.html) guide. | Part | Required | Description | | --- | --- | --- | | namespace | Yes | Not used by Vespa, see [below](#namespace). | | document-type | Yes | Document type as defined in [services.xml](reference/services-content.html#document) and the [schema](reference/schema-reference.html). | | key/value-pair | Optional | Modifiers to the id scheme, used to configure document distribution to [buckets](content/buckets.html#document-to-bucket-distribution). With no modifiers, the id scheme distributes all documents uniformly. The key/value-pair field contains one of two possible key/value pairs; **n** and **g** are mutually exclusive: | n=_\_ | Number in the range [0,2^63-1] - only for testing of abnormal bucket distributions | | g=_\_ | The _groupname_ string is hashed and used to select the storage location | **Important:** This is only useful for document types with [mode=streaming or mode=store-only](reference/services-content.html#document). Do not use modifiers for regular indexed document types. See [streaming search](streaming-search.html). Using modifiers for regular indexed document will cause unpredictable feeding performance, in addition, search dispatch does not have support to limit the search to modifiers/buckets. | | user-specified | Yes | A unique ID string. | ###### Document IDs in search results The full Document ID (as a string) will often contain redundant information and be quite long; a typical value may look like "id:mynamespace:mydoctype::user-specified-identifier" where only the last part is useful outside Vespa. The Document ID is therefore not stored in memory, and it **not always present** in [search results](reference/default-result-format.html#id). It is therefore recommended to put your own unique identifier (usually the "user-specified-identifier" above) in a document field, typically named "myid" or "shortid" or similar: ``` field shortid type string { indexing: attribute | summary } ``` This enables using a [document-summary](document-summaries.html) with only in-memory fields while still getting the identifier you actually care about. If the "user-specified-identifier" is just a simple number you could even use "type int" for this field for minimal memory overhead. ###### Namespace The namespace in document ids is useful when you have multiple document collections that you want to be sure never end up with the same document id. It has no function in Vespa beyond this, and can just be set to any short constant value like for example "doc". Consider also letting synthetic documents used for testing use namespace "test" so it's easy to detect and remove them if they are present outside the test by mistake. Example - if feeding - document A by `curl -X POST https:.../document/v1/first_namespace/my_doc_type/docid/shakespeare` - document B by `curl -X POST https:.../document/v1/second_namespace/my_doc_type/docid/shakespeare` then those will be separate documents, both searchable, with different document IDs. The document ID differs not in the user specified part (this is `shakespeare` for both documents), but in the namespace part (`first_namespace` vs `second_namespace`). The full document ID for document A is `id:first_namespace:my_doc_type::shakespeare`. The namespace has no relation to other configuration elsewhere, like in _services.xml_ or in schemas. It is just like the user specified part of each document ID in that sense. Namespace can not be used in queries, other than as part of the full document ID. However, it can be used for [document selection](reference/document-select-language.html), where `id.namespace` can be accessed and compared to a given string, for instance. An example use case is [visiting](visiting.html) a subset of documents. ##### Fields Documents can have fields, see the [schema reference](reference/schema-reference.html#field). A field can not be defined with a default value. Use a [choice ('||') indexing statement or a](indexing.html#choice-example)[document processor](document-processing.html) to assign a default to document put/update operations. ##### Fieldsets Use _fieldset_ to limit the fields that are returned from a read operation, like _get_ or _visit_ - see [examples](vespa-cli.html#documents). Vespa may return more fields than specified if this does not impact performance. **Note:** Document field sets is a different thing than[searchable fieldsets](reference/schema-reference.html#fieldset). There are two options for specifying a fieldset: - Built-in fieldset - Name of a document type, then a colon ":", followed by a comma-separated list of fields (for example `music:artist,song` to fetch two fields declared in `music.sd`) Built-in fieldsets: | Fieldset | Description | | --- | --- | | [all] | Returns all fields in the schema (generated fields included) and the document ID. | | [document] | Returns original fields in the document, including the document ID. | | [none] | Returns no fields at all, not even the document ID. _Internal, do not use_ | | [id] | Returns only the document ID | | \:[document] | **Deprecated:** Use `[document]` Same as `[document]` fieldset above: Returns only the original document fields (generated fields not included) together with the document ID. | If a built-in field set is not used, a list of fields can be specified. Syntax: ``` :field1,field2,… ``` Example: ``` music:title,artist ``` ##### Document expiry To auto-expire documents, use a [selection](reference/services-content.html#documents.selection) with [now](reference/indexing-language-reference.html#now). Example, set time-to-live (TTL) for _music_-documents to one day, using a field called _timestamp_: ``` ``` ``` ``` Note: The `selection` expression says which documents to _keep_, not which ones to delete. The _timestamp_ field must have a value in seconds since EPOCH: ``` field timestamp type long { indexing: attribute attribute { fast-access } } ``` When `garbage-collection="true"`, Vespa iterates over the document space to purge expired documents. Vespa will invoke the configured GC selection for each stored document once every [garbage-collection-interval](reference/services-content.html#documents.selection) seconds. It is unspecified when a particular document will be processed within the configured interval. **Important:** This is a best-effort garbage collection feature to conserve CPU and space. Use query filters if it is important to exclude documents based on a criterion. - Using a _selection_ with _now_ can have side effects when re-feeding or re-processing documents, as timestamps can be stale. A common problem is feeding with too old timestamps, resulting in no documents being indexed. - Normally, documents that are already expired at write time are not persisted. When using [create](document-v1-api-guide.html#create-if-nonexistent) (Create if nonexistent), it is possible to create documents that are expired and will be removed in next cycle. - Deploying a configuration where the selection string selects no documents will cause all documents to be garbage collected. Use [visit](visiting.html) to test the selection string. Garbage collected documents are not to be expected to be recoverable. - The fields that are referenced in the selection expression should be attributes. Also, either the fields should be set with _"fast-access"_ or the number of [searchable copies](reference/services-content.html#searchable-copies) in the content cluster should be the same as the [redundancy](reference/services-content.html#redundancy). Otherwise, the document selection maintenance will be slow and have a major performance impact on the system. - [Imported fields](reference/schema-reference.html#import-field) can be used in the selection string to expire documents, but special care needs to be taken when using these. See [using imported fields in selections](reference/document-select-language.html#using-imported-fields-in-selections) for more information and restrictions. - Document garbage collection is a low priority background operation that runs continuously unless preempted by higher priority operations. If the cluster is too heavily loaded by client feed operations, there's a risk of starving GC from running. To verify that garbage collection is not starved, check the [vds.idealstate.max\_observed\_time\_since\_last\_gc\_sec.average](operations/metrics.html) distributor metric. If it significantly exceeds `garbage-collection-interval` it is an indication that GC is starved. To batch remove, set a selection that matches no documents, like _"not music"_ Use [vespa visit](visiting.html) to test the selection. Dump the IDs of all documents that would be _preserved_: ``` ``` $ vespa visit --selection 'music.timestamp > now() - 86400' --field-set "music.timestamp" ``` ``` Negate the expression by wrapping it in a `not` to dump the IDs of all the documents that would be _removed_ during GC: ``` ``` $ vespa visit --selection 'not (music.timestamp > now() - 86400)' --field-set "music.timestamp" ``` ``` ##### Processing documents To process documents, use [Document processing](document-processing.html). Examples are enriching documents (look up data from other sources), transform content (like linguistic transformations, tokenization), filter data and trigger other events based on the input data. See the sample app [album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing) for use of Vespa APIs like: - [Document API](document-api-guide.html) - work on documents and fields in documents, and create unit tests using the Application framework - [Document Processing](document-processing.html) - chain independent processors with ordering constraints The sample app [vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search) has examples of processing PUTs or UPDATEs (using [create-if-nonexistent](document-v1-api-guide.html#create-if-nonexistent)) of documents in [OutLinksDocumentProcessor](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/OutLinksDocumentProcessor.java). It is also in introduction to using [multivalued fields](schemas.html#field) like arrays, maps and tensors. Use the [VespaDocSystemTest](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/test/java/ai/vespa/cloud/docsearch/VespaDocSystemTest.java) to build code that feeds and tests an instance in the Vespa Developer Cloud / local Docker instance. Both sample apps also use the Document API to GET/PUT/UPDATE other documents as part of processing, using asynchronous [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java). Use this as a starting point for applications that enrich data when writing. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Document IDs](#document-ids) - [id scheme](#id-scheme) - [Document IDs in search results](#docid-in-results) - [Namespace](#namespace) - [Fields](#fields) - [Fieldsets](#fieldsets) - [Document expiry](#document-expiry) - [Processing documents](#processing-documents) --- ## Elasticity ### Content Cluster Elasticity Vespa clusters can be grown and shrunk while serving queries and writes. #### Content Cluster Elasticity Vespa clusters can be grown and shrunk while serving queries and writes. Documents in content clusters are automatically redistributed on changes to maintain an even distribution with minimal data movement. To resize, just change the [nodes](reference/services-content.html#nodes) and redeploy the application - no restarts needed. ![A cluster growing in two dimensions](/assets/img/elastic-grow.svg) Documents are managed by Vespa in chunks called [buckets](#buckets). The size and number of buckets are completely managed by Vespa and there is never any need to manually control sharding. The elasticity mechanism is also used to recover from a node loss: New replicas of documents are created automatically on other nodes to maintain the configured redundancy. Failed nodes is therefore not a problem that requires immediate attention - clusters will self-heal from node failures as long as there are sufficient resources. ![A cluster with a node failure](/assets/img/elastic-fail.svg) When you want to remove nodes from a content cluster, you can have the system migrate data off them in an orderly fashion prior to removal. This is done by marking nodes as _retired_. This is useful to remove nodes that should be retired, but also to migrate a cluster to entirely new nodes while online: Add the new nodes, mark the old nodes retired, wait for the data to be redistributed and remove the old nodes. The auto-elasticity is configured for a normal fail-safe operation, but there are tradeoffs like recovery speed and resource usage. Learn more in [procedures](/en/operations-selfhosted/admin-procedures.html#content-cluster-configuration). ##### Adding nodes To add or remove nodes from a content cluster, just `nodes` tag of the [content](reference/services-content.html) cluster in [services.xml](reference/services.html) and [redeploy](applications.html#deploy). Read more in [procedures](/en/operations-selfhosted/admin-procedures.html). When adding a new node, a new _ideal state_ is calculated for all buckets. The buckets mapped to the new node are moved, the superfluous are removed. See redistribution example - add a new node to the system, with redundancy n=2: ![Bucket migration as a node is added to the cluster](/assets/img/add-node-move-buckets.svg) The distribution algorithm generates a random node sequence for each bucket. In this example with n=2, replicas map to the two nodes sorted first. The illustration shows how placement onto two nodes changes as a third node is added. The new node takes over as primary for the buckets where it got sorted first, and as secondary for the buckets where it got sorted second. This ensures minimal data movement when nodes come and go, and allows capacity to be changed easily. No buckets are moved between the existing nodes when a new node is added. Based on the pseudo-random sequences, some buckets change from primary to secondary, or are removed. Multiple nodes can be added in the same deployment. ##### Removing nodes Whether a node fails or is _retired_, the same redistribution happens. If the node is retired, replicas are generated on the other nodes and the node stays up, but with no active replicas. Example of redistribution after node failure, n=2: ![Bucket migration as a node is removed from the cluster](/assets/img/lose-node-move-buckets.svg) Here, node 2 fails. This node held the active replicas of bucket 2 and 6. Once the node fails the secondary replicas are set active. If they were already in a _ready_ state, they start serving queries immediately, otherwise they will index replicas, see [searchable-copies](reference/services-content.html#searchable-copies). All buckets that no longer have secondary replicas are merged to the remaining nodes according to the ideal state. ##### Grouped distribution Nodes in content clusters can be placed in [groups](reference/services-content.html#group). A group of nodes in a content cluster will have one or more complete replicas of the entire document corpus. ![A cluster changes from using one to many groups](/assets/img/query-groups.svg) This is useful in the cases listed below: | Cluster upgrade | With multiple groups it becomes safe to take out a full group for upgrade instead of just one node at a time. [Read more](/en/operations-selfhosted/live-upgrade.html). | | Query throughput | Applications with high query rates and/or high static query cost can use groups to scale to higher query rates since Vespa will automatically send a query to just a single group. [Read more](performance/sizing-search.html). | | Topology | By using groups you can control replica placement over network switches or racks to ensure there is redundancy at the switch and rack level. | Tuning group sizes and node resources enables applications to easily find the latency/cost sweet spot, the elasticity operations are automatic and queries and writes work as usual with no downtime. ##### Changing topology A Vespa elasticity feature is the ability to change topology (i.e. grouped distribution) without service disruption. This is a live change, and will auto-redistribute documents to the new topology. Also read [topology change](/en/operations-selfhosted/admin-procedures.html#topology-change) if running Vespa self-hosted - the below steps are general for all hosting options. ###### Replicas When changing topology, pay attention to the [min-redundancy](/en/reference/services-content.html#min-redundancy) setting - this setting configures a _minimum_ number of replicas in a cluster, the _actual_ number is topology dependent - example: A flat cluster with min-redundancy n=2 and 15 nodes is changed into a grouped cluster with 3 groups with 5 nodes each (total node count and n is kept unchanged). In this case, the actual redundancy will be 3 after the change, as each of the 3 groups will have at least 1 replica for full query coverage. The practical consequence is that disk and memory requirements per node _increases_ due to the change to topology. It is therefore important to calculate the actual replica count before reconfiguring topology. ###### Query coverage Changing topology might cause query coverage loss in the transition, unless steps taken in the right order. If full coverage is not important, just make the change and wait for document redistribution to complete. To keep full query coverage, make sure not to change both group size and number of groups at the same time: 1. To add nodes for more data, or to have less data per node, increase group size. E.g., in a 2-group cluster with 8 nodes per group, add 4 nodes for a 25% capacity increase with 10 nodes per group. 2. If the goal is to add query capacity, add one or more groups, with the same node count as existing group(s). A flat cluster is the same as one group - if the flat cluster has 8 nodes, change to a grouped cluster with 2 groups of 8 nodes per group. This will add an empty group, which is put in query serving once populated. In short, if the end-state means both changing number of groups and node count per group, do this as separate steps, as a combination of the above. Between each step, wait for document redistribution to complete using the `merge_bucket.pending` metric - see [example](https://cloud.vespa.ai/en/index-bootstrap). ##### Buckets To manage documents, Vespa groups them in _buckets_, using hashing or hints in the [document id](documents.html). A document Put or Update is sent to all replicas of the bucket with the document. If bucket replicas are out of sync, a bucket merge operation is run to re-sync the bucket. A bucket contains [tombstones](/en/operations-selfhosted/admin-procedures.html#data-retention-vs-size) of recently removed documents. Buckets are split when they grow too large, and joined when they shrink. This is a key feature for high performance in small to large instances, and eliminates need for downtime or manual operations when scaling. Buckets are purely a content management concept, and data is not stored or indexed in separate buckets, nor does queries relate to buckets in any way. Read more in [buckets](content/buckets.html). ##### Ideal state distribution algorithm The [ideal state distribution algorithm](content/idealstate.html) uses a variant of the [CRUSH algorithm](https://ceph.com/assets/pdfs/weil-crush-sc06.pdf) to decide bucket placement. It makes a minimal number of documents move when nodes are added or removed. Central to the algorithm is the assignment of a node sequence to each bucket: ![Assignment of a node sequence to each bucket](/assets/img/bucket-node-sequence.svg) Steps to assign a bucket to a set of nodes: 1. Seed a random generator with the bucket ID to generate a pseudo-random sequence of numbers. Using the bucket ID as seed will then always generate the same sequence for the bucket. 2. Nodes are ordered by [distribution-key](reference/services-content.html#node), assign the random number in that order. E.g. a node with distribution-key 0 will get the first random number, node 1 the second. 3. Sort the node list by the random number. 4. Select nodes in descending random number order - above, node 1, 3 and 0 will store bucket 0x3c000000000000a0 with n=3 (redundancy). For n=2, node 1 and 3 will store the bucket. This specification of where to place a bucket is called the bucket's _ideal state_. Repeat this for all buckets in the system. ##### Consistency Consistency is maintained at bucket level. Content nodes calculate local checksums based on the bucket contents, and the distributors compare checksums across the bucket replicas. A _bucket merge_ is issued to resolve inconsistency, when detected. While there are inconsistent bucket replicas, operations are routed to the "best" replica. As buckets are split and joined, it is possible for replicas of a bucket to be split at different levels. A node may have been down while its buckets have been split or joined. This is called _inconsistent bucket splitting_. Bucket checksums can not be compared across buckets with different split levels. Consequently, content nodes do not know whether all documents exist in enough replicas in this state. Due to this, inconsistent splitting is one of the highest maintenance priorities. After all buckets are split or joined back to the same level, the content nodes can verify that all the replicas are consistent and fix any detected issues with a merge. [Read more](content/consistency.html). ##### Further reading - [content nodes](content/content-nodes.html) - [proton](proton.html) - see _ready_ state Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Adding nodes](#adding-nodes) - [Removing nodes](#removing-nodes) - [Grouped distribution](#grouped-distribution) - [Changing topology](#changing-topology) - [Replicas](#replicas) - [Query coverage](#query-coverage) - [Buckets](#buckets) - [Ideal state distribution algorithm](#ideal-state-distribution-algorithm) - [Consistency](#consistency) - [Further reading](#further-reading) --- ## Embedding Reference ### Embedding Reference Reference configuration for [embedders](../embedding.html). #### Embedding Reference Reference configuration for [embedders](../embedding.html). ##### Model config reference Embedder models use the [model](config-files.html#model) type configuration which accepts the attributes `model-id`, `url` or `path`. Multiple of these can be specified as a single config value, where one is used depending on the deployment environment: - If a `model-id` is specified and the application is deployed on Vespa Cloud, the `model-id` is used. - Otherwise, if a `url` is specified, it is used - Otherwise, `path` is used. When using `path`, the model files must be supplied in the Vespa [application package](../application-packages.html#deploying-remote-models). ##### Huggingface Embedder An embedder using any [Huggingface tokenizer](https://huggingface.co/docs/tokenizers/index), including multilingual tokenizers, to produce tokens which is then input to a supplied transformer model in ONNX model format. The Huggingface embedder is configured in [services.xml](services.html), within the `container` tag: ``` ``` query: passage: ... ``` ``` ###### Private Model Hub You may also use models hosted in a[private Huggingface model hub](https://huggingface.co/docs/hub/en/repositories-settings#private-repositories). Retrieve an API key from Huggingface with the appropriate permissions, and add it to the [vespa secret store.](/en/cloud/security/secret-store)Add the secret to the container `` and refer to it in your Huggingface model configuration: ``` ``` ``` ``` ###### Huggingface embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | last\_hidden\_state | | pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean`,`cls` and `none` | string | mean | | normalize | One | A boolean indicating whether to normalize the output embedding vector to unit length (length 1). Useful for `prenormalized-angular`[distance-metric](schema-reference.html#distance-metric) | boolean | false | | prepend | Optional | Prepend instructions that are prepended to the text input before tokenization and inference. Useful for models that have been trained with specific prompt instructions. The instructions are prepended to the input text. - Element \ - Optional query prepend instruction. - Element \ - Optional document prepend instruction. ``` ``` query: passage: ``` ``` | Optional \ \ elements. | | ##### Bert embedder The Bert embedder is configured in [services.xml](services.html), within the `container` tag: ``` ``` ``` ``` ###### Bert embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-vocab | One | Use to point to the Huggingface `vocab.txt` tokenizer file with valid wordpiece tokens. Does not support `tokenizer.json` format. | [model-type](#model-config-reference) | N/A | | max-tokens | One | The maximum number of tokens allowed in the input | integer | 384 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | output\_0 | | transformer-start-sequence-token | One | The start of sequence token | numeric | 101 | | transformer-end-sequence-token | One | The start of sequence token | numeric | 102 | | pooling-strategy | One | How the output vectors of the ONNX model is pooled to obtain a single vector representation. Valid values are `mean` and `cls` | string | mean | ##### colbert embedder The colbert embedder is configured in [services.xml](services.html), within the `container` tag: ``` ``` 32 256 ``` ``` The Vespa colbert implementation works with default configurations for transformer models that use WordPiece tokenization. ###### colbert embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ColBERT ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | max-tokens | One | Max length of token sequence the transformer-model can handle | numeric | 512 | | max-query-tokens | One | The maximum number of ColBERT query token embeddings. Queries are padded to this length. Must be lower than max-tokens | numeric | 32 | | max-document-tokens | One | The maximum number of ColBERT document token embeddings. Documents are not padded. Must be lower than max-tokens | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-mask-token | One | The mask token id used for ColBERT query padding | numeric | 103 | | transformer-start-sequence-token | One | The start of sequence token id | numeric | 101 | | transformer-end-sequence-token | One | The end of sequence token id | numeric | 102 | | transformer-pad-token | One | The pad sequence token id | numeric | 0 | | query-token-id | One | The colbert query token marker id | numeric | 1 | | document-token-id | One | The colbert document token marker id | numeric | 2 | | transformer-output | One | The name or identifier for the transformer output | string | contextual | The Vespa colbert-embedder uses `[unused0]`token id 1 for `query-token-id`, and `[unused1]`, token id 2 for ` document-token-id`document marker. Document punctuation chars are filtered (not configurable). The following characters are removed `!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~`. ###### splade embedder reference config In addition to [embedder ONNX parameters](#embedder-onnx-reference-config): | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | transformer-model | One | Use to point to the transformer ONNX model file | [model-type](#model-config-reference) | N/A | | tokenizer-model | One | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file | [model-type](#model-config-reference) | N/A | | term-score-threshold | One | An optional threshold to increase sparseness, tokens/terms with a score lower than this is not retained. | numeric | N/A | | max-tokens | One | The maximum number of tokens accepted by the transformer model | numeric | 512 | | transformer-input-ids | One | The name or identifier for the transformer input IDs | string | input\_ids | | transformer-attention-mask | One | The name or identifier for the transformer attention mask | string | attention\_mask | | transformer-token-type-ids | One | The name or identifier for the transformer token type IDs. If the model does not use `token_type_ids` use `` | string | token\_type\_ids | | transformer-output | One | The name or identifier for the transformer output | string | logits | ##### Huggingface tokenizer embedder The Huggingface tokenizer embedder is configured in [services.xml](services.html), within the `container` tag: ``` ``` ``` ``` ###### Huggingface tokenizer reference config | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | model | One To Many | Use to point to the `tokenizer.json` Huggingface tokenizer configuration file. Also supports `language`, which is only relevant if one wants to tokenize differently based on the document language. Use "unknown" for a model to be used for any language (i.e. by default). | [model-type](#model-config-reference) | N/A | ##### Embedder ONNX reference config Vespa uses [ONNX Runtime](https://onnxruntime.ai/) to accelerate inference of embedding models. These parameters are valid for both [Bert embedder](#bert-embedder) and [Huggingface embedder](#huggingface-embedder). | Name | Occurrence | Description | Type | Default | | --- | --- | --- | --- | --- | | onnx-execution-mode | One | Low level ONNX execution model. Valid values are `parallel` or `sequential`. Only relevant for inference on CPU. See [ONNX runtime documentation](https://onnxruntime.ai/docs/performance/tune-performance/threading.html) on threading. | string | sequential | | onnx-interop-threads | One | Low level ONNX setting.Only relevant for inference on CPU. | numeric | 1 | | onnx-intraop-threads | One | Low level ONNX setting. Only relevant for inference on CPU. | numeric | 4 | | onnx-gpu-device | One | The GPU device to run the model on. See [configuring GPU for Vespa container image](/en/operations-selfhosted/vespa-gpu-container.html). Use `-1` to not use GPU for the model, even if the instance has available GPUs. | numeric | 0 | ##### SentencePiece embedder A native Java implementation of [SentencePiece](https://github.com/google/sentencepiece). SentencePiece breaks text into chunks independent of spaces, which is robust to misspellings and works with CJK languages. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models. This is suitable to use in conjunction with [custom components](../jdisc/container-components.html), or the resulting tensor can be used in [ranking](../ranking.html). To use the [SentencePiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/sentencepiece/SentencePieceEmbedder.java), add it to [services.xml](services.html): ``` ``` ; unknown model/en.wiki.bpe.vs10000.model ``` ``` See the options available for configuring SentencePiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.sentencepiece.sentence-piece.def). ##### WordPiece embedder A native Java implementation of [WordPiece](https://github.com/google-research/bert#tokenization), which is commonly used with BERT models. Prefer the [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) over this for better compatibility with Huggingface models. This is suitable to use in conjunction with [custom components](../jdisc/container-components.html), or the resulting tensor can be used in [ranking](../ranking.html). To use the [WordPiece embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/java/com/yahoo/language/wordpiece/WordPieceEmbedder.java), add it to [services.xml](services.html) within the `container` tag: ``` ``` class="com.yahoo.language.wordpiece.WordPieceEmbedder" bundle="linguistics-components"> unknown models/bert-base-uncased-vocab.txt ``` ``` See the options available for configuring WordPiece in [the full configuration definition](https://github.com/vespa-engine/vespa/blob/master/linguistics-components/src/main/resources/configdefinitions/language.wordpiece.word-piece.def). WordPiece is suitable to use in conjunction with custom components, or the resulting tensor can be used in [ranking](../ranking.html). ##### Using an embedder from Java When writing custom Java components (such as [Searchers](../searcher-development.html) or [Document processors](../document-processing.html#document-processors)), use embedders you have configured by [having them injected in the constructor](../jdisc/injecting-components.html), just as any other component: ``` ``` class MyComponent { @Inject public MyComponent(ComponentRegistry embedders) { // embedders contains all the embedders configured in your services.xml } } ``` ``` See a concrete example of using an embedder in a custom searcher in[LLMSearcher](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/java/ai/vespa/cloud/docsearch/LLMSearcher.java). ##### Custom Embedders Vespa provides a Java interface for defining components which can provide embeddings of text:[com.yahoo.language.process.Embedder](https://github.com/vespa-engine/vespa/blob/master/linguistics/src/main/java/com/yahoo/language/process/Embedder.java). To define a custom embedder in an application and make it usable by Vespa (see [embedding a query text](../embedding.html#embedding-a-query-text)), implement this interface and add it as a [component](../developer-guide.html#developing-components) to [services.xml](services-container.html): ``` ``` foo ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Model config reference](#model-config-reference) - [Huggingface Embedder](#huggingface-embedder) - [Private Model Hub](#private-model-hub) - [Huggingface embedder reference config](#huggingface-embedder-reference-config) - [Bert embedder](#bert-embedder) - [Bert embedder reference config](#bert-embedder-reference-config) - [colbert embedder](#colbert-embedder) - [colbert embedder reference config](#colbert-embedder-reference-config) - [splade embedder reference config](#splade-embedder-reference-config) - [Huggingface tokenizer embedder](#huggingface-tokenizer-embedder) - [Huggingface tokenizer reference config](#huggingface-tokenizer-reference-config) - [Embedder ONNX reference config](#embedder-onnx-reference-config) - [SentencePiece embedder](#sentencepiece-embedder) - [WordPiece embedder](#wordpiece-embedder) - [Using an embedder from Java](#using-an-embedder-from-java) - [Custom Embedders](#custom-embedders) --- ## Embedding ### Embedding A common technique is to map unstructured data - say, text or images - to points in an abstract vector space and then do the computation in that space. #### Embedding A common technique is to map unstructured data - say, text or images - to points in an abstract vector space and then do the computation in that space. For example, retrieve similar data by [finding nearby points in the vector space](approximate-nn-hnsw.html), or [using the vectors as input to a neural net](onnx.html). This mapping is referred to as _embedding_. Read more about embedding and embedding management in this [blog post](https://blog.vespa.ai/tailoring-frozen-embeddings-with-vespa/). Embedding vectors can be sent to Vespa in queries and writes: ![document- and query-embeddings](/assets/img/vespa-overview-embeddings-1.svg) Alternatively, you can use the `embed` function to generate the embeddings inside Vespa to reduce vector transfer costs and make clients simpler: ![Vespa's embedding feature, creating embeddings from text](/assets/img/vespa-overview-embeddings-2.svg) Adding embeddings to schemas will change the characteristics of an application; Memory usage will grow, and feeding latency might increase. Read more on how to address this in [binarizing vectors](/en/binarizing-vectors.html). ##### Configuring embedders Embedders are [components](jdisc/container-components.html) which must be configured in your[services.xml](reference/services.html). Components are shared and can be used across schemas. ``` ``` query: passage: ... ``` ``` You can [write your own](https://javadoc.io/doc/com.yahoo.vespa/linguistics/latest/com/yahoo/language/process/Embedder.html), or use [embedders provided in Vespa](#provided-embedders). ##### Embedding a query text Where you would otherwise supply a tensor in a query request, you can (with an embedder configured) instead supply any text enclosed in `embed()`: ``` input.query(q)=embed(myEmbedderId, "Hello%20world") ``` Both single and double quotes are permitted, and if you have only configured a single embedder, you can skip the embedder id argument and the quotes. The text argument can be supplied by a referenced parameter instead, using the `@parameter` syntax: ``` ``` { "yql": "select * from doc where {targetHits:10}nearestNeighbor(embedding_field, query_embedding)", "text": "my text to embed", "input.query(query_embedding)": "embed(@text)", } ``` ``` Remember that regardless of whether you are using embedders, input tensors must always be [defined in the schema's rank-profile](reference/schema-reference.html#inputs). ##### Embedding a document field Use the `embed` function of the [indexing language](reference/indexing-language-reference.html#indexing-statement)to convert strings into embeddings: ``` schema doc { document doc { field title type string { indexing: summary | index } } field embeddings type tensor(x[384]) { indexing { input title |embed embedderId| attribute | index } } } ``` Notice that the embedding field is defined outside the `document` clause in the schema. If you have only configured a single embedder, you can skip the embedder id argument. The input field can also be an array, where the output becomes a rank two tensor, see[this blog post](https://blog.vespa.ai/semantic-search-with-multi-vector-indexing/): ``` schema doc { document doc { field chunks type array { indexing: index | summary } } field embeddings type tensor(p{},x[5]) { indexing: input chunks |embed embedderId| attribute | index } } ``` ##### Provided embedders Vespa provides several embedders as part of the platform. ###### Huggingface Embedder An embedder using any [Huggingface tokenizer](https://huggingface.co/docs/tokenizers/index), including multilingual tokenizers, to produce tokens which are then input to a supplied transformer model in [ONNX](https://onnx.ai/) model format: ``` ``` ... ``` ``` The huggingface-embedder supports all[Huggingface tokenizer implementations](https://huggingface.co/docs/tokenizers/index). - The `transformer-model` specifies the embedding model in [ONNX](https://onnx.ai/) format. See [exporting models to ONNX](onnx.html#using-optimum-to-export-models-to-onnx-format) for how to export embedding models from Huggingface to be compatible with Vespa's `hugging-face-embedder`. See [Limitations on Model Size and Complexity](onnx.html#limitations-on-model-size-and-complexity) for details on the ONNX model format supported by Vespa. - The `tokenizer-model` specifies the Huggingface `tokenizer.json` formatted file. See [HF loading tokenizer from a JSON file.](https://huggingface.co/transformers/v4.8.0/fast_tokenizers.html#loading-from-a-json-file) Use `path` to supply the model files from the application package,`url` to supply them from a remote server, or`model-id` to use a[model supplied by Vespa Cloud](https://cloud.vespa.ai/en/model-hub#hugging-face-embedder). You can also use a model hosted in private Huggingface Model Hub by adding your Huggingface API token to the [secret store](/en/cloud/security/secret-store.html) and referring to the secret using `secret-ref` in the model tag. See [model config reference](reference/embedding-reference.html#model-config-reference) for more details. ``` ``` ... ``` ``` See the [reference](reference/embedding-reference.html#huggingface-embedder-reference-config)for all configuration parameters. ###### Huggingface embedder models The following are examples of text embedding models that can be used with the hugging-face-embedder and their output [tensor](tensor-user-guide.html) dimensionality. The resulting [tensor type](reference/tensor.html#tensor-type-spec) can be `float`, `bfloat16` or using binarized quantization into `int8`. See blog post [Combining matryoshka with binary-quantization](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/) for more examples of using the Huggingface embedder with binary quantization. The following models use `pooling-strategy` `mean`, which is the default [pooling-strategy](reference/embedding-reference.html#huggingface-embedder-reference-config): - [intfloat/e5-small-v2](https://huggingface.co/intfloat/e5-small-v2) produces `tensor(x[384])` - [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) produces `tensor(x[768])` - [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) produces `tensor(x[1024])` - [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) produces `tensor(x[768])` The following models are useful for binarization and Matryoshka dimensionality flexibility where only the first k dimensions are retained. [Matryoshka 🤝 Binary vectors: Slash vector search costs with Vespa](https://blog.vespa.ai/combining-matryoshka-with-binary-quantization-using-embedder/) is a great read on this subject. When enabling binarization with `int8` use [distance-metric hamming](reference/schema-reference.html#hamming): - [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) produces `tensor(x[1024])`. This model is also useful for binarization, which can be triggered by using destination `tensor(x[128])`. Use `pooling-strategy` `cls` and `normalize` `true`. - [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) produces `tensor(x[768])`. This model is also useful for binarization, which can be triggered by using destination `tensor(x[96])`. Use `normalize` `true`. Snowflake arctic model series: - [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) produces `tensor(x[384])`. Use `pooling-strategy` `cls` and `normalize` `true`. - [snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) produces `tensor(x[768])`. Use `pooling-strategy` `cls` and `normalize` `true`. All of these example text embedding models can be used in combination with Vespa's[nearest neighbor search](nearest-neighbor-search.html)using the appropriate [distance-metric](reference/schema-reference.html#distance-metric). Notice that to use the [distance-metric: prenormalized-angular](/en/reference/schema-reference.html#prenormalized-angular), the `normalize` configuration must be set to `true`. Check the [Massive Text Embedding Benchmark](https://huggingface.co/blog/mteb) (MTEB) benchmark and[MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)for help with choosing an embedding model. ###### Bert embedder DEPRECATED; prefer using the [Huggingface Embedder](#huggingface-embedder) instead of the Bert embedder. An embedder using the [WordPiece](reference/embedding-reference.html#wordpiece-embedder) embedder to produce tokens which are then input to a supplied [ONNX](https://onnx.ai/) model on the form expected by a BERT base model: ``` ``` 128 last_hidden_state ``` ``` - The `transformer-model` specifies the embedding model in [ONNX](https://onnx.ai/) format. See [exporting models to ONNX](onnx.html#using-optimum-to-export-models-to-onnx-format), for how to export embedding models from Huggingface to compatible [ONNX](https://onnx.ai/) format. - The `tokenizer-vocab` specifies the Huggingface `vocab.txt` file, with one valid token per line. Note that the Bert embedder does not support the `tokenizer.json` formatted tokenizer configuration files. This means that tokenization settings like max tokens should be set explicitly. - The `transformer-output` specifies the name given to to embedding output in the model.onnx file; this will differ depending on how the model is exported to ONNX format. One common name is `last_hidden_state`, especially in transformer-based models. Other common names are `output` or `output_0`, `embedding` or `embeddings`, `sentence_embedding`, `pooled_output`, or `encoder_last_hidden_state`. The default is `output_0`. The Bert embedder is limited to English ([WordPiece](reference/embedding-reference.html#wordpiece-embedder)) and BERT-styled transformer models with three model inputs (_input\_ids, attention\_mask, token\_type\_ids_). Prefer using the [Huggingface Embedder](#huggingface-embedder) instead of the Bert embedder. See [configuration reference](reference/embedding-reference.html#bert-embedder-reference-config) for all configuration options. ###### ColBERT embedder An embedder supporting [ColBERT](https://github.com/stanford-futuredata/ColBERT) models. The ColBERT embedder maps text to _token_ embeddings, representing a text as multiple contextualized embeddings. This produces better quality than reducing all tokens into a single vector. Read more about ColBERT and the ColBERT embedder in blog post form [Announcing the Vespa ColBERT embedder](https://blog.vespa.ai/announcing-colbert-embedder-in-vespa/) and [Announcing Vespa Long-Context ColBERT](https://blog.vespa.ai/announcing-long-context-colbert-in-vespa/). ``` ``` 32 128 ``` ``` - The `transformer-model` specifies the ColBERT embedding model in [ONNX](https://onnx.ai/) format. See [exporting models to ONNX](onnx.html#using-optimum-to-export-models-to-onnx-format) for how to export embedding models from Huggingface to compatible [ONNX](https://onnx.ai/) format. The [vespa-engine/col-minilm](https://huggingface.co/vespa-engine/col-minilm) page on the HF model hub has a detailed example of how to export a colbert checkpoint to ONNX format for accelerated inference. - The `tokenizer-model` specifies the Huggingface `tokenizer.json` formatted file. See [HF loading tokenizer from a JSON file.](https://huggingface.co/transformers/v4.8.0/fast_tokenizers.html#loading-from-a-json-file) - The `max-query-tokens` controls the maximum number of query text tokens that are represented as vectors, and similarly, `max-document-tokens` controls the document side. These parameters can be used to control resource usage. See [configuration reference](reference/embedding-reference.html#colbert-embedder-reference-config) for all configuration options and defaults. The ColBERT token embeddings are represented as a[mixed tensor](tensor-user-guide.html#tensor-concepts): `tensor(token{}, x[dim])` where`dim` is the vector dimensionality of the contextualized token embeddings. The [colbert model checkpoint](https://huggingface.co/colbert-ir/colbertv2.0) on Hugging Face hub uses 128 dimensions. The embedder destination tensor is defined in the [schema](schemas.html), and depending on the target [tensor cell precision](reference/tensor.html#tensor-type-spec) definition the embedder can compress the representation: If the target tensor cell type is `int8`, the ColBERT embedder compresses the token embeddings with binarization for the document to reduce storage to 1-bit per value, reducing the token embedding storage footprint by 32x compared to using float. The _query_ representation is not compressed with binarization. The following demonstrates two ways to use the ColBERT embedder in the document schema to [embed a document field](#embedding-a-document-field). ``` schema doc { document doc { field text type string {..} } field colbert_tokens type tensor(token{}, x[128]) { indexing: input text | embed colbert | attribute } field colbert_tokens_compressed type tensor(token{}, x[16]) { indexing: input text | embed colbert | attribute } } ``` The first field `colbert_tokens` stores the original representation as the tensor destination cell type is float. The second field, the `colbert_tokens_compressed` tensor is compressed. When using `int8` tensor cell precision, one should divide the original vector size by 8 (128/8 = 16). You can also use `bfloat16` instead of `float` to reduce storage by 2x compared to `float`. ``` field colbert_tokens type tensor(token{}, x[128]) { indexing: input text | embed colbert | attribute } ``` You can also use the ColBERT embedder with an array of strings (representing chunks): ``` schema doc { document doc { field chunks type array {..} } field colbert_tokens_compressed type tensor(chunk{}, token{}, x[16]) { indexing: input text | embed colbert chunk | attribute } } ``` Here, we need a second mapped dimension in the target tensor and a second argument to embed, telling the ColBERT embedder the name of the tensor dimension to use for the chunks. Notice that the examples above did not specify the `index` function for creating a[HNSW](approximate-nn-hnsw.html) index. The colbert representation is intended to be used as a ranking model and not for retrieval with Vespa's nearestNeighbor query operator, where you can e.g., use a document-level vector and/or lexical matching. To reduce memory footprint, use [paged attributes](attributes.html#paged-attributes). ###### ColBERT ranking See the sample applications for using ColBERT in ranking with variants of the MaxSim similarity operator expressed using Vespa tensor computation expressions. See:[colbert](https://github.com/vespa-engine/sample-apps/tree/master/colbert) and [colbert-long](https://github.com/vespa-engine/sample-apps/tree/master/colbert-long). ###### SPLADE embedder An embedder supporting [SPLADE](https://github.com/naver/splade) models. The SPLADE embedder maps text to mapped tensor, representing a text as a sparse vector of unique tokens and their weights. ``` ``` ``` ``` - The `transformer-model` specifies the SPLADE embedding model in [ONNX](https://onnx.ai/) format. See [exporting models to ONNX](onnx.html#using-optimum-to-export-models-to-onnx-format) for how to export embedding models from Huggingface to compatible [ONNX](https://onnx.ai/) format. - The `tokenizer-model` specifies the Huggingface `tokenizer.json` formatted file. See [HF loading tokenizer from a JSON file.](https://huggingface.co/transformers/v4.8.0/fast_tokenizers.html#loading-from-a-json-file) See [configuration reference](reference/embedding-reference.html#splade-embedder-reference-config) for all configuration options and defaults. The splade token weights are represented as a [mapped tensor](tensor-user-guide.html#tensor-concepts): `tensor(token{})`. The embedder destination tensor is defined in the [schema](schemas.html). The following demonstrates how to use the SPLADE embedder in the document schema to [embed a document field](#embedding-a-document-field). ``` schema doc { document doc { field text type string {..} } field splade_tokens type tensor(token{}) { indexing: input text | embed splade | attribute } } ``` You can also use the SPLADE embedder with an array of strings (representing chunks). Here, also using lower tensor cell precision `bfloat16`: ``` schema doc { document doc { field chunks type array {..} } field splade_tokens type tensor(chunk{}, token{}) { indexing: input text | embed splade chunk | attribute } } ``` Here, we need a second mapped dimension in the target tensor and a second argument to embed, telling the splade embedder the name of the tensor dimension to use for the chunks. To reduce memory footprint, use [paged attributes](attributes.html#paged-attributes). ###### SPLADE ranking See the [splade](https://github.com/vespa-engine/sample-apps/tree/master/splade) sample application for how to use SPLADE in ranking, including also how to use the SPLADE embedder with an array of strings (representing chunks). ##### Embedder performance Embedding inference can be resource-intensive for larger embedding models. Factors that impact performance: - The embedding model parameters. Larger models are more expensive to evaluate than smaller models. - The sequence input length. Transformer models scale quadratically with input length. Since queries are typically shorter than documents, embedding queries is less computationally intensive than embedding documents. - The number of inputs to the `embed` call. When encoding arrays, consider how many inputs a single document can have. For CPU inference, increasing [feed timeout](reference/document-v1-api-reference.html#timeout) settings might be required when documents have many `embed`inputs. Using [GPU](reference/embedding-reference.html#embedder-onnx-reference-config), especially for longer sequence lengths (documents), can dramatically improve performance and reduce cost. See the blog post on [GPU-accelerated ML inference in Vespa Cloud](https://blog.vespa.ai/gpu-accelerated-ml-inference-in-vespa-cloud/). With GPU-accelerated instances, using fp16 models instead of fp32 can increase throughput by as much as 3x compared to fp32. Refer to [binarizing vectors](/en/binarizing-vectors.html) for how to reduce vector size. ##### Metrics Vespa's built-in embedders emit metrics for computation time and token sequence length. These metrics are prefixed with `embedder.`and listed in the [Container Metrics](reference/container-metrics-reference.html) reference documentation. Third-party embedder implementations may inject the `ai.vespa.embedding.Embedder.Runtime` component to easily emit the same predefined metrics, although emitting custom metrics is perfectly fine. ##### Sample applications These sample applications use embedders: - [commerce-product-ranking](https://github.com/vespa-engine/sample-apps/tree/master/commerce-product-ranking) - demonstrates using multiple embedders - [multi-vector-indexing](https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing) demonstrates how to use embedders with multiple document field inputs - [colbert](https://github.com/vespa-engine/sample-apps/tree/master/colbert) demonstrates how to use the colbert-embedder - [colbert-long](https://github.com/vespa-engine/sample-apps/tree/master/colbert-long) demonstrates how to use the colbert-embedder with long contexts (array input) - [splade](https://github.com/vespa-engine/sample-apps/tree/master/splade) demonstrates how to use the splade-embedder. ##### Tricks and tips Various tricks that are useful with embedders. ###### Adding a fixed string to a query text Embedding models might require text to be prepended with a fixed string, e.g.: ``` ``` query: passage: ``` ``` The above configuration prepends text in queries and field data. Find a complete example in the [ColBERT](https://github.com/vespa-engine/sample-apps/tree/master/colbert) sample application. An alternative approach is using query profiles to prepend query data. If you need to add a standard wrapper or a prefix instruction around the input text you want to embed use parameter substitution to supply the text, as in `embed(myEmbedderId, @text)`, and let the parameter (`text` here) be defined in a [query profile](query-profiles.html), which in turn uses [value substitution](query-profiles.html#value-substitution) to place another query request with a supplied text value within it. The following is a concrete example where queries should have a prefix instruction before being embedded in a vector representation. The following defines a `text` input field to `search/query-profiles/default.xml`: ``` ``` "Represent this sentence for searching relevant passages: %{user_query} ``` ``` Then, at query request time, we can pass `user_query` as a request parameter, this parameter is then used to produce the `text` value which then is embedded. ``` ``` { "yql": "select * from doc where userQuery() or ({targetHits: 100}nearestNeighbor(embedding, e))", "input.query(e)": "embed(mxbai, @text)", "user_query": "space contains many suns" } ``` ``` The text that is embedded by the embedder is then:_Represent this sentence for searching relevant passages: space contains many suns_. ###### Concatenating input fields You can concatenate values in indexing using "`.`", and handle missing field values using [choice](/en/indexing.html#choice-example) to produce a single input for an embedder: ``` schema doc { document doc { field title type string { indexing: summary | index } field body type string { indexing: summary | index } } field embeddings type tensor(x[384]) { indexing { (input title || "") . " " . (input body || "") |embed embedderId| attribute | index } index: hnsw } } ``` You can also use concatenation to add a fixed preamble to the string to embed. ###### Combining with foreach The indexing expression can also use `for_each` and include other document fields. For example, the _E5_ family of embedding models uses instructions along with the input. The following expression prefixes the input with _passage:_ followed by a concatenation of the title and a text chunk. ``` schema doc { document doc { field title type string { indexing: summary | index } field chunks type array { indexing: index | summary } } field embedding type tensor(p{}, x[384]) { indexing { input chunks | for_each { "passage: " . (input title || "") . " " . ( _ || "") } | embed e5 | attribute | index } attribute { distance-metric: prenormalized-angular } } } ``` See [Indexing language execution value](/en/indexing.html#execution-value-example)for details. ##### Troubleshooting This section covers common issues and how to resolve them. ###### Model download failure If models fail to download, it will cause the Vespa stateless container service to not start with `RuntimeException: Not able to create config builder for payload` - see [example](/en/jdisc/container-components.html#component-load). This usually means that the model download failed. Check the Vespa log for more details. The most common reasons for download failure are network issues or incorrect URLs. This will also be visible in the Vespa status output as the container will not listen to its port: ``` vespa status -t http://127.0.0.1:8080 Container at http://127.0.0.1:8080 is not ready: unhealthy container at http://127.0.0.1:8080/status.html: Get "http://127.0.0.1:8080/status.html": EOF Error: services not ready: http://127.0.0.1:8080 ``` ###### Tensor shape mismatch The native embedder implementations expect that the output tensor has a specific shape. If the shape is incorrect, you will see an error message during feeding like: ``` feed: got status 500 ({"pathId":"..","..","message":"[UNKNOWN(252001) @ tcp/vespa-container:19101/chain.indexing]: Processing failed. Error message: java.lang.IllegalArgumentException: Expected 3 output dimensions for output name 'sentence_embedding': [batch, sequence, embedding], got 2 -- See Vespa log for details. "}) for put xx:not retryable ``` This means that the exported ONNX model output tensor does not have the expected shape. For example, the above is logged by the [hf-embedder](#huggingface-embedder) that expects the output shape to be [batch, sequence, embedding] (A 3D tensor). This is because the embedder implementation performs the [pooling-strategy](reference/embedding-reference.html#huggingface-embedder) over the sequence dimension to produce a single embedding vector. The batch size is always 1 for Vespa embeddings. See [onnx export](onnx.html#using-optimum-to-export-models-to-onnx-format) for how to export models to ONNX format with the correct output shapes and[onnx debug](onnx.html#debugging-onnx-models) for debugging input and output names. ###### Input names The native embedder implementations expect that the ONNX model accepts certain input names. If the names are incorrect, it will cause the Vespa container service to not start, and you will see an error message in the vespa log like: ``` WARNING container Container.com.yahoo.container.di.Container Caused by: java.lang.IllegalArgumentException: Model does not contain required input: 'input_ids'. Model contains: my_input ``` This means that the ONNX model accepts "my\_input", while our configuration attempted to use "input\_ids". The default input names for the [hf-embedder](#huggingface-embedder) are "input\_ids", "attention\_mask" and "token\_type\_ids". These are overridable in the configuration ([reference](reference/embedding-reference.html#huggingface-embedder)). Some embedding models do not use the "token\_type\_ids" input. We can specify this in the configuration by setting `transformer-token-type-ids` to empty, illustrated by the following example. ``` ``` ``` ``` ###### Output names The native embedder implementations expect that the ONNX model produces certain output names. It will cause the Vespa stateless container service to not start, and you will see an error message in the vespa log like: ``` Model does not contain required output: 'test'. Model contains: last_hidden_state ``` This means that the ONNX model produces "last\_hidden\_state", while our configuration attempted to use "test". The default output name for the [hf-embedder](#huggingface-embedder) is "last\_hidden\_state". This is overridable in the configuration. See [reference](reference/embedding-reference.html#huggingface-embedder). ###### EOF If vespa status shows that the container is healthy, but you observe an EOF error during feeding, this means that the stateless container service has crashed and stopped listening to its port. This could be related to the embedder ONNX model size, docker container memory resource constraints, or the configured JVM heap size of the Vespa stateless container service. ``` vespa feed ext/1.json feed: got error "Post "http://127.0.0.1:8080/document/v1/doc/doc/docid/1": unexpected EOF" (no body) for put id:doc:doc::1: giving up after 10 attempts ``` This could be related to insufficient stateless container (JVM) memory. Check the container logs for OOM errors. See [jvm-tuning](performance/container-tuning.html#jvm-tuning) for JVM tuning options (The default heap size is 1.5GB). Container crashes could also be caused by too little memory allocated to the docker or podman container, which can cause the Linux kernel to kill processes to free memory. See the [docker containers memory](operations-selfhosted/docker-containers.html#memory) documentation. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Configuring embedders](#configuring-embedders) - [Embedding a query text](#embedding-a-query-text) - [Embedding a document field](#embedding-a-document-field) - [Provided embedders](#provided-embedders) - [Huggingface Embedder](#huggingface-embedder) - [Bert embedder](#bert-embedder) - [ColBERT embedder](#colbert-embedder) - [SPLADE embedder](#splade-embedder) - [Embedder performance](#embedder-performance) - [Metrics](#metrics) - [Sample applications](#sample-applications) - [Tricks and tips](#tricks-and-tips) - [Adding a fixed string to a query text](#adding-a-fixed-string-to-a-query-text) - [Concatenating input fields](#concatenating-input-fields) - [Combining with foreach](#combining-with-foreach) - [Troubleshooting](#troubleshooting) - [Model download failure](#model-download-failure) - [Tensor shape mismatch](#tensor-shape-mismatch) - [Input names](#input-names) - [Output names](#output-names) - [EOF](#EOF) --- ## Environments ### Environments Vespa Cloud has two kinds of environments: #### Environments Vespa Cloud has two kinds of environments: - Manual environment for rapid development and test: `dev` - Automated environment with integrated CD pipeline: `prod` An application is deployed to one or more _zones_ (see [zone list](/en/cloud/zones.html)), which is a combination of an _environment_ and a _region_, like `vespa deploy -z dev.aws-us-east-1c`. ##### Dev The dev environment is built for rapid developments cycles, with auto-downscaling and auto-expiry for ease of use and cost control. The dev environment is the default, to deploy to this, use `vespa deploy`. ###### Auto downscaling One use case for the dev environment is to take an application package from a prod environment and deploy to the dev environment to debug. To minimize cost and make this speedy, Vespa Cloud will by default ignore [nodes](/en/reference/services.html#nodes) and [resources](/en/reference/services.html#resources) settings. With this, you can safely download an application package from prod (that are normally large) and deploy to dev, with no changes. To override this behavior and control the resources, specify them explicitly for the dev environment as described in [deployment variants](/en/reference/deployment-variants.html#services.xml-variants). Example: ``` ``` > ``` ``` **Important:** The `dev` environment has redundancy 1 by default, and there are no availability or data persistence guarantees. Do not use applications deployed to these zones for production serving use cases. ###### Auto expiry Deployments to `dev` expire after 14 days of inactivity, that is, 14 days after the last [deployment](/en/application-packages.html#deploy). **This applies to all plans**. To add 7 more days to the expiry period, redeploy the application or use the Vespa Cloud Console. ###### Vespa version The latest active Vespa version is used when deploying to the dev environment. The deployment is upgraded at a time which is most likely at night for the developer in order to minimize downtime (based on the time when last deployments were made). An upgrade will be skipped if metrics indicate ongoing feed or query load, but will still be done if current version is more than a week old. ##### Prod Applications are deployed to the `prod` environment for production serving. Deployments are passed through an integrated CD pipeline for system tests and staging tests. Read more in [automated deployments](/en/cloud/automated-deployments.html). ##### Test The `test` environment is used by the integrated CD pipeline for prod deployments, to run [system tests](/en/cloud/automated-deployments.html#system-tests). The test capacity is ephemeral and only used during test. Nodes in test and staging environments do not have access to data in prod environments. Note that one cannot deploy directly to test and staging environments. For long-lived test applications (e.g., a QA system that is integrated with other services) use the prod environment. System tests are always invoked, even if there are no tests defined. In this case, an instance is just started and then stopped. This has value in itself, as it ensures that the application is able to start. Test runs can be [aborted](/en/cloud/automated-deployments.html#disabling-tests). ##### Staging See system tests above, this applies to the staging, too. [Staging tests](/en/cloud/automated-deployments.html#staging-tests) use a fraction of the configured prod capacity, this can be overridden to using 1 node regardless of prod cluster size: ``` ``` ``` ``` ##### Reference Environment settings: | Name | Description | Expiry | Cluster sizes | | --- | --- | --- | --- | | `dev` | Used for manual development testing. | 14 days | `1` | | `test` | Used for [automated system tests](/en/testing.html#system-tests). | - | `1` | | `staging` | Used for [automated staging tests](/en/testing.html#staging-tests). | - | `min(max(2, 0.05 * spec), spec)` | | `prod` | Hosts all production deployments. | No expiry | `max(2, spec)` | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Dev](#dev) - [Auto downscaling](#auto-downscaling) - [Auto expiry](#auto-expiry) - [Vespa version](#vespa-version) - [Prod](#prod) - [Test](#test) - [Staging](#staging) - [Reference](#reference) --- ## Exposing Schema Information ### Exposing schema information Some applications need to expose information about schemas to data plane clients. #### Exposing schema information Some applications need to expose information about schemas to data plane clients. This document explains how to add an API for that to your application. You need to know two things: - Your application can expose any custom API by implementing a [handler](/en/jdisc/developing-request-handlers.html). - Information about the deployed schemas are available in the component _com.yahoo.search.schema.SchemaInfo_. With this information, we can add an API exposing schemas information through the following steps. ##### 1. Make sure your application package can contain Java components Application packages containing Java components must follow Maven layout. If your application package root contains a `pom.xml` and `src/main`you're good, otherwise convert it to this layout by copying the pom.xml from[the album-recommendation.java](https://github.com/vespa-engine/sample-apps/tree/master/album-recommendation-java)sample app and moving the files to follow this layout before moving on. ##### 2. Add a handler exposing schema info Add the following handler (to a package of your choosing): ``` ``` package ai.vespa.example; import com.yahoo.container.jdisc.HttpRequest; import com.yahoo.container.jdisc.HttpResponse; import com.yahoo.container.jdisc.ThreadedHttpRequestHandler; import com.yahoo.jdisc.Metric; import com.yahoo.search.schema.SchemaInfo; import java.io.IOException; import java.io.OutputStream; import java.nio.charset.Charset; import java.util.concurrent.Executor; public class SchemaInfoHandler extends ThreadedHttpRequestHandler { private final SchemaInfo schemaInfo; public SchemaInfoHandler(Executor executor, Metric metric, SchemaInfo schemaInfo) { super(executor, metric); this.schemaInfo = schemaInfo; } @Override public HttpResponse handle(HttpRequest httpRequest) { // Creating JSON, handling different paths etc. left as an exercise for the reader StringBuilder response = new StringBuilder(); for (var schema : schemaInfo.schemas().values()) { response.append("schema: " + schema.name() + "\n"); for (var field : schema.fields().values()) response.append(" field: " + field.name() + "\n"); } return new Response(200, response.toString()); } private static class Response extends HttpResponse { private final byte[] data; Response(int code, byte[] data) { super(code); this.data = data; } Response(int code, String data) { this(code, data.getBytes(Charset.forName(DEFAULT_CHARACTER_ENCODING))); } @Override public String getContentType() { return "application/json"; } @Override public void render(OutputStream outputStream) throws IOException { outputStream.write(data); } } private static class ErrorResponse extends Response { ErrorResponse(int code, String message) { super(code, "{\"error\":\"" + message + "\"}"); } } } ``` ``` ##### 3. Add the new API handler to your container cluster In your `services.xml` file, under ``, add: ``` ``` http://*/schema/v1/* ``` ``` ##### 4. Deploy the modified application ``` $ mvn install $ vespa deploy ``` ##### 5. Verify that it works ``` $ vespa curl "schema/v1/" ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [1. Make sure your application package can contain Java components](#) - [2. Add a handler exposing schema info](#) - [3. Add the new API handler to your container cluster](#) - [4. Deploy the modified application](#) - [5. Verify that it works](#) --- ## Faq ### FAQ - frequently asked questions Refer to [Vespa Support](https://vespa.ai/support) for more support options. #### FAQ - frequently asked questions Refer to [Vespa Support](https://vespa.ai/support) for more support options. * * * ##### Ranking ###### Does Vespa support a flexible ranking score? [Ranking](ranking.html) is maybe the primary Vespa feature - we like to think of it as scalable, online computation. A rank profile is where the application's logic is implemented, supporting simple types like `double` and complex types like `tensor`. Supply ranking data in queries in query features (e.g. different weights per customer), or look up in a [Searcher](searcher-development.html). Typically, a document (e.g. product) "feature vector"/"weights" will be compared to a user-specific vector (tensor). ###### Where would customer specific weightings be stored? Vespa doesn't have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in[album-recommendation-docproc](https://github.com/vespa-engine/sample-apps/tree/master/examples/document-processing). ###### How to create a tensor on the fly in the ranking expression? Create a tensor in the ranking function from arrays or weighted sets using `tensorFrom...` functions - see [document features](reference/rank-features.html#document-features). ###### How to set a dynamic (query time) ranking drop threshold? Pass a ranking feature like `query(threshold)` and use an `if` statement in the ranking expression - see [retrieval and ranking](getting-started-ranking.html#retrieval-and-ranking). Example: ``` rank-profile drop-low-score { function my_score() { expression: ..... #custom first phase score } rank-score-drop-limit:0.0 first-phase { if(my_score() < query(threshold), -1, my_score()) } } ``` ###### Are ranking expressions or functions evaluated lazily? Rank expressions are not evaluated lazily. No, this would require lambda arguments. Only doubles and tensors are passed between functions. Example: ``` function inline foo(tensor, defaultVal) { expression: if (count(tensor) == 0, defaultValue, sum(tensor)) } function bar() { expression: foo(tensor, sum(tensor1 * tensor2)) } ``` ###### Does Vespa support early termination of matching and ranking? Yes, this can be accomplished by configuring [match-phase](reference/schema-reference.html#match-phase) in the rank profile, or by adding a range query item using _hitLimit_ to the query tree, see [capped numeric range search](reference/query-language-reference.html#numeric). Both methods require an _attribute_ field with _fast-search_. The capped range query is faster, but beware that if there are other restrictive filters in the query, one might end up with 0 hits. The additional filters are applied as a post filtering step over the hits from the capped range query. _match-phase_ on the other hand, is safe to use with filters or other query terms, and also supports diversification which the capped range query term does not support. ###### What could cause the relevance field to be -Infinity The returned [relevance](reference/default-result-format.html#relevance) for a hit can become "-Infinity" instead of a double. This can happen in two cases: - The [ranking](ranking.html) expression used a feature which became `NaN` (Not a Number). For example, `log(0)` would produce -Infinity. One can use [isNan](reference/ranking-expressions.html#isnan-x) to guard against this. - Surfacing low scoring hits using [grouping](grouping.html), that is, rendering low ranking hits with `each(output(summary()))` that are outside of what Vespa computed and caches on a heap. This is controlled by the [keep-rank-count](reference/schema-reference.html#keep-rank-count). ###### How to pin query results? To hard-code documents to positions in the result set, see the [pin results example](/en/multivalue-query-operators.html#pin-results-example). ##### Documents ###### What limits apply to document size? There is a [maximum document size](/en/reference/services-content.html#max-document-size) of 128 MiB, which is configurable per content cluster in services.xml. See also [field size](/en/schemas.html#field-size). ###### Is there any size limitation for multivalued fields? No enforced limit, except resource usage (memory). See [field size](/en/schemas.html#field-size). ###### Can a document have lists (key value pairs)? E.g. a product is offered in a list of stores with a quantity per store. Use [multivalue fields](schemas.html#field) (array of struct) or [parent child](parent-child.html). Which one to chose depends on use case, see discussion in the latter link. ###### Does a whole document need to be updated and re-indexed? E.g. price and quantity available per store may often change vs the actual product attributes. Vespa supports [partial updates](reads-and-writes.html) of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently. ###### What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc? See the [Vespa Consistency Model](content/consistency.html). Vespa is not transactional in the traditional sense, it doesn't have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off. ###### Does vespa support wildcard fields? Wildcard fields are not supported in vespa. Workaround would be to use maps to store the wildcard fields. Map needs to be defined with `indexing: attribute` and hence will be stored in memory. Refer to [map](reference/schema-reference.html#map). ###### Can we set a limit for the number of elements that can be stored in an array? Implement a [document processor](document-processing.html) for this. ###### How to auto-expire documents / set up garbage collection? Set a selection criterion on the `document` element in `services.xml`. The criterion selects documents to keep. I.e. to purge documents "older than two weeks", the expression should be "newer than two weeks". Read more about [document expiry](documents.html#document-expiry). ###### How to increase redundancy and track data migration progress? Changing redundancy is a live and safe change (assuming there is headroom on disk / memory - e.g. from 2 to 3 is 50% more). The time to migrate will be quite similar to what it took to feed initially - a bit hard to say generally, and depends on IO and index settings, like if building an HNSW index. To monitor progress, take a look at the[multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode)sample application for the _clustercontroller_ status page - this shows buckets pending, live. Finally, use the `.idealstate.merge_bucket.pending` metric to track progress - when 0, there are no more data syncing operations - see[monitor distance to ideal state](/en/operations-selfhosted/admin-procedures.html#monitor-distance-to-ideal-state). Nodes will work as normal during data sync, and query coverage will be the same. ###### How does namespace relate to schema? It does not,_namespace_ is a mechanism to split the document space into parts that can be used for document selection - see [documentation](documents.html#namespace). The namespace is not indexed and cannot be searched using the query api, but can be used by [visiting](visiting.html). ###### Visiting does not dump all documents, and/or hangs. There are multiple things that can cause this, see [visiting troubleshooting](visiting.html#troubleshooting). ###### How to find number of documents in the index? Run a query like `vespa query "select * from sources * where true"` and see the `totalCount` field. Alternatively, use metrics or `vespa visit` - see [examples](/en/operations/batch-delete.html#example). ###### Can I define a default value for a field? Not in the field definition, but it's possible to do this with the [choice](/en/indexing.html#choice-example)expression in an indexing statement. ##### Query ###### Are hierarchical facets supported? Facets is called [grouping](grouping.html) in Vespa. Groups can be multi-level. ###### Are filters supported? Add filters to the query using [YQL](query-language.html)using boolean, numeric and [text matching](text-matching.html). Query terms can be annotated as filters, which means that they are not highlighted when bolding results. ###### How to query for similar items? One way is to describe items using tensors and query for the[nearest neighbor](reference/query-language-reference.html#nearestneighbor) - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using [dot products](multivalue-query-operators.html) or [weak and](using-wand-with-vespa.html) are alternatives. ###### Does Vespa support stop-word removal? Vespa does not have a stop-word concept inherently. See the [sample app](https://github.com/vespa-engine/sample-apps/pull/335/files)for how to use [filter terms](reference/query-language-reference.html#annotations). ###### How to extract more than 400 hits / query and get ALL documents? Trying to request more than 400 hits in a query, getting this error:`{'code': 3, 'summary': 'Illegal query', 'message': '401 hits requested, configured limit: 400.'}`. - To increase max result set size (i.e. allow a higher [hits](reference/query-api-reference.html#hits)), configure `maxHits` in a [query profile](reference/query-api-reference.html#queryprofile), e.g. `500` in `search/query-profiles/default.xml` (create as needed). The [query timeout](reference/query-api-reference.html#timeout) can be increased, but it will still be costly and likely impact other queries - large limit more so than a large offset. It can be made cheaper by using a smaller [document summary](document-summaries.html), and avoiding fields on disk if possible. - Using _visit_ in the [document/v1/ API](document-v1-api-guide.html)is usually a better option for dumping all the data. ###### How to make a sub-query to get data to enrich the query, like get a user profile? See the [UserProfileSearcher](https://github.com/vespa-engine/sample-apps/blob/master/news/app-6-recommendation-with-searchers/src/main/java/ai/vespa/example/UserProfileSearcher.java)for how to create a new query to fetch data - this creates a new Query, sets a new root and parameters - then `fill`s the Hits. ###### How to create a cache that refreshes itself regularly See the sub-query question above, in addition add something like: ``` ``` public class ConfigCacheRefresher extends AbstractComponent { private final ScheduledExecutorService configFetchService = Executors.newSingleThreadScheduledExecutor(); private Chain searcherChain; void initialize() { Runnable task = () -> refreshCache(); configFetchService.scheduleWithFixedDelay(task, 1, 1, TimeUnit.MINUTES); searcherChain = executionFactory.searchChainRegistry().getChain(new ComponentId("configDefaultProvider")); } public void refreshCache() { Execution execution = executionFactory.newExecution(searcherChain); Query query = createQuery(execution); public void deconstruct() { super.deconstruct(); try { configFetchService.shutdown(); configFetchService.awaitTermination(1, TimeUnit.MINUTES); }catch(Exception e) {..} } } ``` ``` ###### Is it possible to query Vespa using a list of document ids? Yes, using the [in query operator](reference/query-language-reference.html#in). Example: ``` select * from data where user_id in (10, 20, 30) ``` The best article on the subject is[multi-lookup set filtering](performance/feature-tuning.html#multi-lookup-set-filtering). Refer to the [in operator example](multivalue-query-operators.html#in-example)on how to use it programmatically in a [Java Searcher](searcher-development.html). ###### How to query documents where one field matches any values in a list? Similar to using SQL IN operator Use the [in query operator](reference/query-language-reference.html#in). Example: ``` select * from data where category in ('cat1', 'cat2', 'cat3') ``` See [multi-lookup set filtering](#is-it-possible-to-query-vespa-using-a-list-of-document-ids)above for more details. ###### How to count hits / all documents without returning results? Count all documents using a query like [select \* from doc where true](query-language.html) - this counts all documents from the "doc" source. Using `select * from doc where true limit 0` will return the count and no hits, alternatively add [hits=0](reference/query-api-reference.html#hits). Pass [ranking.profile=unranked](reference/query-api-reference.html#ranking.profile)to make the query less expensive to run. If an _estimate_ is good enough, use [hitcountestimate=true](reference/query-api-reference.html#hitcountestimate). ###### Must all fields in a fieldset have compatible type and matching settings? Yes - a deployment warning with _This may lead to recall and ranking issues_ is emitted when fields with conflicting tokenization are put in the same[fieldset](reference/schema-reference.html#fieldset). This is because a given query item searching one fieldset is tokenized just once, so there's no right choice of tokenization in this case. If you have user input that you want to apply to multiple fields with different tokenization, include the userInput multiple times in the query: ``` select * from sources * where ({defaultIndex: 'fieldsetOrField1'}userInput(@query)) or ({defaultIndex: 'fieldsetOrField2'}userInput(@query)) ``` More details on [stack overflow](https://stackoverflow.com/questions/72784136/why-vepsa-easily-warning-me-this-may-lead-to-recall-and-ranking-issues). ###### How is the query timeout computed? Find query timeout details in the [Query API Guide](query-api.html#timeout)and the [Query API Reference](reference/query-api-reference.html#timeout). ###### How does backslash escapes work? Backslash is used to escape special characters in YQL. For example, to query with a literal backslash, which is useful in regexpes, you need to escape it with another backslash: \. Unescaped backslashes in YQL will lead to "token recognition error at: ''". In addition, Vespa CLI unescapes double backslashes to single (while single backslashes are left alone), so if you query with Vespa CLI you need to escape with another backslash: \\. The same applies to strings in Java. Also note that both log messages and JSON results escape backslashes, so any \ becomes \. ###### Is it possible to have multiple SELECT statements in a single call (subqueries)? E.g. two select queries with slightly different filtering condition and have a limit operator for each of the subquery. This makes it impossible to do via OR conditions to select both collection of documents - something equivalent to: ``` SELECT 1 AS x UNION ALL SELECT 2 AS y; ``` This isn’t possible, need to run 2 queries. Alternatively, split a single incoming query into two running in parallel in a [Searcher](searcher-development.html) - example: ``` FutureResult futureResult = new AsyncExecution(settings).search(query); FutureResult otherFutureResult = new AsyncExecution(settings).search(otherQuery); ``` ###### Is it possible to query for the number of elements in an array No, there is no index or attribute data structure that allows efficient searching for documents where an array field has a certain number of elements or items. ###### Is it possible to query for fields with NaN/no value set/null/none The [visiting](visiting.html#analyzing-field-values) API using document selections supports it, with a linear scan over all documents. If the field is an _attribute_ one can query using grouping to identify Nan Values, see count and list [fields with NaN](/en/grouping.html#count-fields-with-nan). ###### How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()" See the [random.match](reference/rank-features.html#random.match) rank feature - example: ``` rank-profile random { first-phase { expression: random.match } } ``` Run queries, seeding the random generator: ``` $ vespa query 'select * from music where true' \ ranking=random \ rankproperty.random.match.seed=2 ``` ###### Some of the query results have too many hits from the same source, how to create a diverse result set? See [result diversity](/en/result-diversity.html) for strategies on how to create result sets from different sources. ###### How to find most distant neighbor in a embedding field called clip\_query\_embedding? If you want to search for the most dissimilar items, you can with angular distance multiply your `clip_query_embedding` by the scalar -1. Then you are searching for the points that are closest to the point which is the farthest away from your `clip_query_embedding`. Also see a [pyvespa example](https://pyvespa.readthedocs.io/en/latest/examples/pyvespa-examples.html#Neighbors). ##### Feeding ###### How to debug a feeding 400 response? The best option is to use `--verbose` option, like `vespa feed --verbose myfile.jsonl` - see [documentation](/en/vespa-cli.html#documents). A common problem is a mismatch in schema names and [document IDs](/en/documents.html#document-ids) - a schema like: ``` schema article { document article { ... } } ``` will have a document feed like: ``` {"put": "id:mynamespace:article::1234", "fields": { ... }} ``` Note that the [namespace](/en/glossary.html#namespace) is not mentioned in the schema, and the schema name is the same as the document name. ###### How to debug document processing chain configuration? This configuration is a combination of content and container cluster configuration, see [indexing](indexing.html) and [feed troubleshooting](/en/operations-selfhosted/admin-procedures.html#troubleshooting). ###### I feed documents with no error, but they are not in the index This is often a problem if using [document expiry](documents.html#document-expiry), as documents already expired will not be persisted, they are silently dropped and ignored. Feeding stale test data with old timestamps in combination with document-expiry can cause this behavior. ###### How to feed many files, avoiding 429 error? Using too many HTTP clients can generate a 429 response code. The Vespa sample apps use [vespa feed](vespa-cli.html#documents) which uses HTTP/2 for high throughput - it is better to stream the feed files through this client. ###### Can I use Kafka to feed to Vespa? Vespa does not have a Kafka connector. Refer to third-party connectors like [kafka-connect-vespa](https://github.com/vinted/kafka-connect-vespa). ##### Text Search ###### Does Vespa support addition of flexible NLP processing for documents and search queries? E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well: - [Query (and result) processing](searcher-development.html) - [Document processing](document-processing.html)and document processors working on semantic annotations of text ###### Does Vespa support customization of the inverted index? E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses - e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index _format_ means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a [custom tokenizer](linguistics.html). That gets called from the query parser and bundled linguistics processing [Searchers](searcher-development.html)as well as the [Document Processor](document-processing.html)creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself. ###### Does vespa provide any support for named entity extraction? It provides the building blocks but not an out-of-the-box solution. We can write a [Searcher](searcher-development.html) to detect query-side entities and rewrite the query, and a [DocProc](document-processing.html) if we want to handle them in some special way on the indexing side. ###### Does vespa provide support for text extraction? You can write a document processor for text extraction, Vespa does not provide it out of the box. ###### How to do Text Search in an imported field? [Imported fields](parent-child.html) from parent documents are defined as [attributes](attributes.html), and have limited text match modes (i.e. `indexing: index` cannot be used).[Details](https://stackoverflow.com/questions/71936330/parent-child-mode-cannot-be-searched-by-parent-column). ##### Semantic search ###### Why is closeness 1 for all my vectors? If you have added vectors to your documents and queries, and see that the rank feature closeness(field, yourEmbeddingField) produces 1.0 for all documents, you are likely using[distance-metric](reference/schema-reference.html#distance-metric): innerproduct/prenormalized-angular, but your vectors are not normalized, and the solution is normally to switch to[distance-metric: angular](reference/schema-reference.html#angular)or use[distance-metric: dotproduct](reference/schema-reference.html#dotproduct)(available from Vespa 8.170.18). With non-normalized vectors, you often get negative distances, and those are capped to 0, leading to closeness 1.0. Some embedding models, such as models from sbert.net, claim to output normalized vectors but might not. ##### Programming Vespa ###### Is Python plugins supported / is there a scripting language? Plugins have to run in the JVM - [jython](https://www.jython.org/) might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like[painless](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html) - it is more flexible to write application logic in a JVM-supported language, using[Searchers](searcher-development.html) and [Document Processors](document-processing.html). ###### How can I batch-get documents by ids in a Searcher A [Searcher](searcher-development.html) intercepts a query and/or result. To get a number of documents by id in a Searcher or other component like a [Document processor](document-processing.html), you can have an instance of [com.yahoo.documentapi.DocumentAccess](reference/component-reference.html#injectable-components)injected and use that to get documents by id instead of the HTTP API. ###### Does Vespa work with Java 20? Vespa uses Java 17 - it will support 20 some time in the future. ###### How to write debug output from a custom component? Use `System.out.println` to write text to the [vespa.log](reference/logs.html). ##### Performance ###### What is the latency of documents being ingested vs indexed and available for search? Vespa has a near real-time indexing core with typically sub-second latencies from document ingestion to being indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the [scaling guide](performance/sizing-search.html). ###### Is there a batch ingestion mode, what limits apply? Vespa does not have a concept of "batch ingestion" as it contradicts many of the core features that are the strengths of Vespa, including [serving elasticity](elasticity.html) and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the [feed sizing guide](performance/sizing-feeding.html). ###### Can the index support up to 512 GB index size in memory? Yes. The [content node](proton.html) is implemented in C++ and not memory constrained other than what the operating system does. ###### Get request for a document when document is not in sync in all the replica nodes? If the replicas are in sync the request is only sent to the primary content node. Otherwise, it's sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document). ###### How to keep indexes in memory? [Attribute](attributes.html) (with or without `fast-search`) is always in memory, but does not support tokenized matching. It is for structured data.[Index](schemas.html#indexing) (where there’s no such thing as fast-search since it is always fast) is in memory to the extent there is available memory and supports tokenized matching. It is for unstructured text. It is possible to guarantee that fields that are defined with `index`have both the dictionary and the postings in memory by changing from `mmap` to `populate`, see [index \> io \> search](reference/services-content.html#index-io-search). Make sure that the content nodes run on nodes with plenty of memory available, during index switch the memory footprint will 2x. Familiarity with Linux tools like `pmap` can help diagnose what is mapped and if it’s resident or not. Fields that are defined with `attribute` are in-memory, fields that have both `index` and `attribute` have separate data structures, queries will use the default mapped on disk data structures that supports `text` matching, while grouping, summary and ranking can access the field from the `attribute` store. A Vespa query is executed in two phases as described in [sizing search](performance/sizing-search.html), and summary requests can touch disk (and also uses `mmap` by default). Due to their potential size there is no populate option here, but one can define [dedicated document summary](document-summaries.html#performance)containing only fields that are defined with `attribute`. The [practical performance guide](performance/practical-search-performance-guide.html)can be a good starting point as well to understand Vespa query execution, difference between `index` and `attribute` and summary fetching performance. ###### Is memory freed when deleting documents? Deleting documents, by using the [document API](reads-and-writes.html)or [garbage collection](documents.html#document-expiry) will increase the capacity on the content nodes. However, this is not necessarily observable in system metrics - this depends on many factors, like what kind of memory that is released, when [flush](proton.html#proton-maintenance-jobs) jobs are run and document [schema](schemas.html). In short, Vespa is not designed to release memory once used. It is designed for sustained high throughput, low latency, keeping maximum memory used under control using features like [feed block](operations/feed-block.html). When deleting documents, one can observe a slight increase in memory. A deleted document is represented using a [tombstone](/en/operations-selfhosted/admin-procedures.html#content-cluster-configuration), that will later be removed, see [removed-db-prune-age](reference/services-content.html#removed-db-prune-age). When running garbage collection, the summary store is scanned using mmap and both VIRT and page cache memory usage increases. Read up on [attributes](attributes.html) to understand more of how such fields are stored and managed.[Paged attributes](attributes.html#paged-attributes) trades off memory usage vs. query latency for a lower max memory usage. ##### Administration ###### Can one do a partial deploy to the config server / update the schema without deploying all the node configs? Yes, deployment is using this web service API, which allows you to create an edit session from the currently deployed package, make modifications, and deploy (prepare+activate) it: [deploy-rest-api-v2.html](reference/deploy-rest-api-v2.html). However, this is only useful in cases where you want to avoid transferring data to the config server unnecessarily. When you resend everything, the config server will notice that you did not actually change e.g. the node configs and avoid unnecessary noop changes. ###### How fast can nodes be added and removed from a running cluster? [Elasticity](elasticity.html) is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to [ideal data distribution](content/idealstate.html). ###### Should Vespa API search calls be load balanced or does Vespa do this automatically? You will need to load balance incoming requests between the nodes running the[stateless Java container cluster(s)](overview.html). This can typically be done using a simple network load balancer available in most cloud services. This is included when using [Vespa Cloud](https://cloud.vespa.ai/), with an HTTPS endpoint that is already load balanced - both locally within the region and globally across regions. ###### Supporting index partitions [Search sizing](performance/sizing-search.html) is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost. ###### Can a running cluster be upgraded with zero downtime? With [Vespa Cloud](https://cloud.vespa.ai/), we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in [live-upgrade](/en/operations-selfhosted/live-upgrade.html). ###### Can Vespa be deployed multi-region? [Vespa Cloud](https://cloud.vespa.ai/en/reference/zones) has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones. ###### Can Vespa serve an Offline index? Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported. ###### Does vespa give us any tool to browse the index and attribute data? Use [visiting](visiting.html) to dump all or a subset of the documents. See [data-management-and-backup](https://cloud.vespa.ai/en/data-management-and-backup) for more information. ###### What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)? Failure response will be given in case the document is not written on some replica nodes. ###### When the doc is not written to some nodes, will the document become available due to replica reconciliation? Yes, it will be available, eventually. Also try [Multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode). ###### Does vespa provide soft delete functionality? Yes just add a `deleted` attribute, add [fast-search](attributes.html#fast-search) on it and create a searcher which adds an `andnot deleted` item to queries. ###### Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down? You can set a [transition-time](reference/services-content.html#transition-time) in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down. ###### What is the recommended redundant/searchable-copies config when using grouping distribution? Grouped distribution is used to reduce search latency. Content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all groups. ###### How to set up for disaster recovery / backup? Refer to [#17898](https://github.com/vespa-engine/vespa/issues/17898) for a discussion of options. ###### How to check Vespa version for a running instance? Use [/state/v1/version](reference/state-v1.html#state-v1-version) to find Vespa version. ###### Deploy rollback See [rollback](/en/applications.html#rollback) for options. ##### Troubleshooting ###### Deployment fails with response code 413 If deployment fails with error message "Deployment failed, code: 413 ("Payload Too Large.")" you might need to increase the config server's JVM heap size. The config server has a default JVM heap size of 2 Gb. When deploying an app with e.g. large models this might not be enough, try increasing the heap to e.g. 4 Gb when executing 'docker run …' by adding an environment variable to the command line: ``` docker run --env VESPA_CONFIGSERVER_JVMARGS=-Xmx4g ``` ###### The endpoint does not come up after deployment When deploying an application package, with some kind of error, the endpoints might fail, like: ``` $ vespa deploy --wait 300 Uploading application package ... done Success: Deployed target/application.zip Waiting up to 5m0s for query service to become available ... Error: service 'query' is unavailable: services have not converged ``` Another example: ``` [INFO] [03:33:48] Failed to get 100 consecutive OKs from endpoint ... ``` There are many ways this can fail, the first step is to check the Vespa Container: ``` $ docker exec vespa vespa-logfmt -l error [2022-10-21 10:55:09.744] ERROR container Container.com.yahoo.container.jdisc.ConfiguredApplication Reconfiguration failed, your application package must be fixed, unless this is a JNI reload issue: Could not create a component with id 'ai.vespa.example.album.MetalSearcher'. Tried to load class directly, since no bundle was found for spec: album-recommendation-java. If a bundle with the same name is installed, there is a either a version mismatch or the installed bundle's version contains a qualifier string. ... ``` [Bundle plugin troubleshooting](components/bundles.html#bundle-plugin-troubleshooting) is a good resource to analyze Vespa container startup / bundle load problems. ###### Starting Vespa using Docker on M1 fails Using an M1 MacBook Pro / AArch64 makes the Docker run fail: ``` WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested ``` Make sure you are running a recent version of the Docker image, do `docker pull vespaengine/vespa`. ###### Deployment fails / nothing is listening on 19071 Make sure all [Config servers](/en/operations-selfhosted/configuration-server.html#troubleshooting) are started, and are able to establish ZooKeeper quorum (if more than one) - see the [multinode](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) sample application. Validate that the container has [enough memory](/en/operations-selfhosted/docker-containers.html). ###### Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails The Config Server cluster with 3 nodes fails to start. The ZooKeeper cluster the Config Servers use waits for hosts on the network, the hosts wait for ZooKeeper in a catch 22 - see [sampleapp troubleshooting](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations#troubleshooting). ###### How to display vespa.log? Use [vespa-logfmt](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-logfmt) to dump logs. If Vespa is running in a local container (named "vespa"), run `docker exec vespa vespa-logfmt`. ###### How to fix encoding problems in document text? See [encoding troubleshooting](/en/troubleshooting-encoding.html)for how to handle and remove control characters from the document feed. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Ranking](#ranking) - [Does Vespa support a flexible ranking score?](#does-vespa-support-a-flexible-ranking-score) - [Where would customer specific weightings be stored?](#where-would-customer-specific-weightings-be-stored) - [How to create a tensor on the fly in the ranking expression?](#how-to-create-a-tensor-on-the-fly-in-the-ranking-expression) - [How to set a dynamic (query time) ranking drop threshold?](#how-to-set-a-dynamic-query-time-ranking-drop-threshold) - [Are ranking expressions or functions evaluated lazily?](#are-ranking-expressions-or-functions-evaluated-lazily) - [Does Vespa support early termination of matching and ranking?](#does-vespa-support-early-termination-of-matching-and-ranking) - [What could cause the relevance field to be -Infinity](#what-could-cause-the-relevance-field-to-be--infinity) - [How to pin query results?](#how-to-pin-query-results) - [Documents](#documents) - [What limits apply to document size?](#what-limits-apply-to-document-size) - [Is there any size limitation for multivalued fields?](#is-there-any-size-limitation-for-multivalued-fields) - [Can a document have lists (key value pairs)?](#can-a-document-have-lists-key-value-pairs) - [Does a whole document need to be updated and re-indexed?](#does-a-whole-document-need-to-be-updated-and-re-indexed) - [What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?](#what-acid-guarantees-if-any-does-vespa-provide-for-single-writes--updates--deletes-vs-batch-operations-etc) - [Does vespa support wildcard fields?](#does-vespa-support-wildcard-fields) - [Can we set a limit for the number of elements that can be stored in an array?](#can-we-set-a-limit-for-the-number-of-elements-that-can-be-stored-in-an-array) - [How to auto-expire documents / set up garbage collection?](#how-to-auto-expire-documents--set-up-garbage-collection) - [How to increase redundancy and track data migration progress?](#how-to-increase-redundancy-and-track-data-migration-progress) - [How does namespace relate to schema?](#how-does-namespace-relate-to-schema) - [Visiting does not dump all documents, and/or hangs.](#visiting-does-not-dump-all-documents-andor-hangs) - [How to find number of documents in the index?](#how-to-find-number-of-documents-in-the-index) - [Can I define a default value for a field?](#can-i-define-a-default-value-for-a-field) - [Query](#query) - [Are hierarchical facets supported?](#are-hierarchical-facets-supported) - [Are filters supported?](#are-filters-supported) - [How to query for similar items?](#how-to-query-for-similar-items) - [Does Vespa support stop-word removal?](#does-vespa-support-stop-word-removal) - [How to extract more than 400 hits / query and get ALL documents?](#how-to-extract-more-than-400-hits--query-and-get-all-documents) - [How to make a sub-query to get data to enrich the query, like get a user profile?](#how-to-make-a-sub-query-to-get-data-to-enrich-the-query-like-get-a-user-profile) - [How to create a cache that refreshes itself regularly](#how-to-create-a-cache-that-refreshes-itself-regularly) - [Is it possible to query Vespa using a list of document ids?](#is-it-possible-to-query-vespa-using-a-list-of-document-ids) - [How to query documents where one field matches any values in a list? Similar to using SQL IN operator](#how-to-query-documents-where-one-field-matches-any-values-in-a-list-similar-to-using-sql-in-operator) - [How to count hits / all documents without returning results?](#how-to-count-hits--all-documents-without-returning-results) - [Must all fields in a fieldset have compatible type and matching settings?](#must-all-fields-in-a-fieldset-have-compatible-type-and-matching-settings) - [How is the query timeout computed?](#how-is-the-query-timeout-computed) - [How does backslash escapes work?](#how-does-backslash-escapes-work) - [Is it possible to have multiple SELECT statements in a single call (subqueries)?](#is-it-possible-to-have-multiple-select-statements-in-a-single-call-subqueries) - [Is it possible to query for the number of elements in an array](#is-it-possible-to-query-for-the-number-of-elements-in-an-array) - [Is it possible to query for fields with NaN/no value set/null/none](#is-it-possible-to-query-for-fields-with-nanno-value-setnullnone) - [How to retrieve random documents using YQL? Functionality similar to MySQL "ORDER BY rand()"](#how-to-retrieve-random-documents-using-yql-functionality-similar-to-mysql-order-by-rand) - [Some of the query results have too many hits from the same source, how to create a diverse result set?](#some-of-the-query-results-have-too-many-hits-from-the-same-source-how-to-create-a-diverse-result-set) - [How to find most distant neighbor in a embedding field called clip\_query\_embedding?](#how-to-find-most-distant-neighbor-in-a-embedding-field-called-clip_query_embedding) - [Feeding](#feeding) - [How to debug a feeding 400 response?](#how-to-debug-a-feeding-400-response) - [How to debug document processing chain configuration?](#how-to-debug-document-processing-chain-configuration) - [I feed documents with no error, but they are not in the index](#i-feed-documents-with-no-error-but-they-are-not-in-the-index) - [How to feed many files, avoiding 429 error?](#how-to-feed-many-files-avoiding-429-error) - [Can I use Kafka to feed to Vespa?](#can-i-use-kafka-to-feed-to-vespa) - [Text Search](#text-search) - [Does Vespa support addition of flexible NLP processing for documents and search queries?](#does-vespa-support-addition-of-flexible-nlp-processing-for-documents-and-search-queries) - [Does Vespa support customization of the inverted index?](#does-vespa-support-customization-of-the-inverted-index) - [Does vespa provide any support for named entity extraction?](#does-vespa-provide-any-support-for-named-entity-extraction) - [Does vespa provide support for text extraction?](#does-vespa-provide-support-for-text-extraction) - [How to do Text Search in an imported field?](#how-to-do-text-search-in-an-imported-field) - [Semantic search](#semantic-search) - [Why is closeness 1 for all my vectors?](#why-is-closeness-1-for-all-my-vectors) - [Programming Vespa](#programming-vespa) - [Is Python plugins supported / is there a scripting language?](#is-python-plugins-supported--is-there-a-scripting-language) - [How can I batch-get documents by ids in a Searcher](#how-can-i-batch-get-documents-by-ids-in-a-searcher) - [Does Vespa work with Java 20?](#does-vespa-work-with-java-20) - [How to write debug output from a custom component?](#how-to-write-debug-output-from-a-custom-component) - [Performance](#performance) - [What is the latency of documents being ingested vs indexed and available for search?](#what-is-the-latency-of-documents-being-ingested-vs-indexed-and-available-for-search) - [Is there a batch ingestion mode, what limits apply?](#is-there-a-batch-ingestion-mode-what-limits-apply) - [Can the index support up to 512 GB index size in memory?](#can-the-index-support-up-to-512-gb-index-size-in-memory) - [Get request for a document when document is not in sync in all the replica nodes?](#get-request-for-a-document-when-document-is-not-in-sync-in-all-the-replica-nodes) - [How to keep indexes in memory?](#how-to-keep-indexes-in-memory) - [Is memory freed when deleting documents?](#is-memory-freed-when-deleting-documents) - [Administration](#administration) - [Can one do a partial deploy to the config server / update the schema without deploying all the node configs?](#can-one-do-a-partial-deploy-to-the-config-server--update-the-schema-without-deploying-all-the-node-configs) - [How fast can nodes be added and removed from a running cluster?](#how-fast-can-nodes-be-added-and-removed-from-a-running-cluster) - [Should Vespa API search calls be load balanced or does Vespa do this automatically?](#should-vespa-api-search-calls-be-load-balanced-or-does-vespa-do-this-automatically) - [Supporting index partitions](#supporting-index-partitions) - [Can a running cluster be upgraded with zero downtime?](#can-a-running-cluster-be-upgraded-with-zero-downtime) - [Can Vespa be deployed multi-region?](#can-vespa-be-deployed-multi-region) - [Can Vespa serve an Offline index?](#can-vespa-serve-an-offline-index) - [Does vespa give us any tool to browse the index and attribute data?](#does-vespa-give-us-any-tool-to-browse-the-index-and-attribute-data) - [What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)?](#what-is-the-response-when-data-is-written-only-on-some-nodes-and-not-on-all-replica-nodes-based-on-the-redundancy-count-of-the-content-cluster) - [When the doc is not written to some nodes, will the document become available due to replica reconciliation?](#when-the-doc-is-not-written-to-some-nodes-will-the-document-become-available-due-to-replica-reconciliation) - [Does vespa provide soft delete functionality?](#does-vespa-provide-soft-delete-functionality) - [Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?](#can-we-configure-a-grace-period-for-bucket-distribution-so-that-buckets-are-not-redistributed-as-soon-as-a-node-goes-down) - [What is the recommended redundant/searchable-copies config when using grouping distribution?](#what-is-the-recommended-redundantsearchable-copies-config-when-using-grouping-distribution) - [How to set up for disaster recovery / backup?](#how-to-set-up-for-disaster-recovery--backup) - [How to check Vespa version for a running instance?](#how-to-check-vespa-version-for-a-running-instance) - [Deploy rollback](#deploy-rollback) - [Troubleshooting](#troubleshooting) - [Deployment fails with response code 413](#deployment-fails-with-response-code-413) - [The endpoint does not come up after deployment](#the-endpoint-does-not-come-up-after-deployment) - [Starting Vespa using Docker on M1 fails](#starting-vespa-using-docker-on-m1-fails) - [Deployment fails / nothing is listening on 19071](#deployment-fails--nothing-is-listening-on-19071) - [Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails](#startup-problems-in-multinode-kubernetes-cluster---readinessprobe-using-19071-fails) - [How to display vespa.log?](#how-to-display-vespalog) - [How to fix encoding problems in document text?](#how-to-fix-encoding-problems-in-document-text) --- ## Feature Tuning ### Vespa Serving Tuning This document describes how to tune certain features of an application for high query serving performance, where the main focus is on content cluster search features; see [Container tuning](container-tuning.html) for tuning of container clusters. #### Vespa Serving Tuning This document describes how to tune certain features of an application for high query serving performance, where the main focus is on content cluster search features; see [Container tuning](container-tuning.html) for tuning of container clusters. The [search sizing guide](sizing-search.html) is about _scaling_ an application deployment. ##### Attribute vs index The [attribute](../attributes.html) documentation summarizes when to use [attribute](../reference/schema-reference.html#attribute) in the [indexing](../reference/schema-reference.html#indexing) statement. Also see the [procedure](/en/schemas.html#schema-modifications) for changing from attribute to index and vice-versa. ``` field timestamp type long { indexing: summary | attribute } ``` If both index and attribute are configured for string-type fields, Vespa will search and match against the index with default match `text`. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa. ##### When to use fast-search for attribute fields By default, Vespa does not build any posting list index structures over _attribute_ fields. Adding _fast-search_ to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph): ``` field timestamp type long { indexing: summary | attribute attribute: fast-search rank: filter } ``` When Vespa runs a query with multiple query items, it builds a query execution plan. It tries to optimize the plan so that the temporary result set is as small as possible. To do this, restrictive query tree items (matching few documents) are evaluated early. The query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries, which track the number of documents in which a given term occurs. However, for attribute fields without [fast-search](../attributes.html#fast-search) there is no hit count estimate, so the estimate becomes the total number of documents (matches all) and the query tree item is moved to the end of the query evaluation. A query with only one query term searching an attribute field without `fast-search` would be a linear scan over all documents and thus expensive: ``` select * from sources * where range(timestamp, 0, 100) ``` But if this query term is _and_-ed with another term that matches fewer documents, that term will determine the cost instead, and fast-search won't be necessary, e.g.: ``` select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000" ``` The general rules of thumb for when to use fast-search for an attribute field are: - Use _fast-search_ if the attribute field is searched without any other query terms - Use _fast-search_ if the attribute field could limit the total number of hits efficiently Changing fast-search aspect of the attribute is a [live change](../reference/schema-reference.html#modifying-schemas) which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing _fast-search_ requires restart. Note that _attribute_ fields with _fast-search_ that are not used in term based [ranking](../ranking.html) should use _rank: filter_for optimal performance. See reference [rank: filter](../reference/schema-reference.html#rank). See optimization for sorting on a _single-value numeric attribute with fast-search_ using [sorting.degrading](../reference/query-api-reference.html#sorting.degrading). ##### Tuning query performance for lexical search Lexical search (or keyword-based search) is a method that matches query terms as they appear in indexed documents. It relies on the lexical representation of words rather than their meaning, and is one of the two retrieval methods used in [hybrid search](../tutorials/hybrid-search.html). Lexical search in Vespa is done by querying string (text) [index](../schemas.html#indexing) fields, typically using the [weakAnd](../using-wand-with-vespa.html#weakand) query operator with [BM25](../reference/bm25.html) ranking. The following schema represents a simple article document with _title_ and _content_ fields, that can represent Wikipedia articles as an example. A _default_ fieldset is specified such that user queries are matched against both the _title_ and _content_ fields. BM25 ranking combines the scores of both fields in the _default_ rank profile. In addition, the _optimized_ rank profile specifies tuning parameters to improve query performance: ``` schema article { document article { field title type string { indexing: index | summary index: enable-bm25 } field content type string { indexing: index | summary index: enable-bm25 } } fieldset default { fields: title, content } rank-profile default { first-phase { expression: bm25(title) + bm25(content) } } rank-profile optimized inherits default { filter-threshold: 0.05 weakand { stopword-limit: 0.6 adjust-target: 0.01 } } } ``` The following shows an example question-answer query against a collection of articles, using the _weakAnd_ query operator and the _optimized_ rank profile. Question-answer queries are often written in full sentences, and as a consequence, they tend to contain many stopwords that are present in many documents and of less relevance when it comes to ranking. E.g., terms as "the", "in", and "are" are typically present in more the 60% of the documents: ``` ``` { "yql": "select * from article where userQuery()", "ranking.profile": "optimized", "query": "what are the three highest mountains in the world" } ``` ``` The cost of evaluating such a query is primarily linear with the number of matched documents. The _AND_ operator is most effective, but often ends up being too restrictive by not returning enough matches. The _OR_ operator is less restrictive, but has the problem of returning too many matches, which is very costly. The _weakAnd_ operator is somewhere in between the two in cost. ###### Posting Lists To find matching documents, the query operator uses the _posting lists_ associated with each query term. A posting list is part of the inverted index and contains all occurrences of a term within a collection of documents. It consists of document IDs for documents that contain the term, and additional information such as the positions of the term within those documents (used for ranking purposes). For common terms (e.g., stopwords), the posting lists are very large and can be expensive to use during evaluation and ranking. CPU work is required to iterate them, and I/O work is required to load portions of them from disk to memory with MMAP. The last part is especially problematic when all posting lists of a disk index cannot fit into physical memory, and the system must constantly swap parts of them in and out of memory, leading to high I/O wait times. To improve query performance, the following tuning parameters are available, as seen used in the _optimized_ rank profile. These are used to make tradeoffs between performance and quality. - **Use more compact posting lists for common terms**: Setting [filter-threshold](../reference/schema-reference.html#filter-threshold) to 0.05 ensures that all terms that are estimated to occur in more than 5% of the documents are handled with [compact posting lists (bitvectors)](../proton.html#index) instead of the full posting lists. This makes matching faster at the cost of producing less information for BM25 ranking (only a boolean signal is available). - **Avoid using large posting lists all together**: Setting [stopword-limit](../reference/schema-reference.html#weakand-stopword-limit) to 0.6, ensures that all terms that are estimated to occur in more than 60% of the documents are considered stopwords and dropped entirely from the query and also from ranking. - **Reduce the number of hits produced by _weakAnd_**: Setting [adjust-target](../reference/schema-reference.html#weakand-adjust-target) ensures that documents that only match terms that occur very frequently in the documents are not considered hits. This also removes the need to calculate _first-phase_ ranking for these documents, which is beneficial if _first-phase_ ranking is more complex and expensive. ###### Performance The tuning parameters used in the _optimized_ rank profile have been shown to provide a good tradeoff between performance and quality in testing. A Wikipedia dataset with [SQuAD](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf) (Stanford Question Answering Dataset) queries was used to analyze performance, and [trec-covid](https://ir.nist.gov/trec-covid/), [MS MARCO](https://microsoft.github.io/msmarco/) and [nfcorpus](https://huggingface.co/datasets/BeIR/nfcorpus) from the BEIR dataset to analyze quality implications. For instance, the query performance was tripled without any measurable drop in quality with the Wikipedia dataset, using the tuning parameters in the _optimized_ rank profile. See the blog post [Tripling the query performance of lexical search](https://blog.vespa.ai/tripling-the-query-performance-of-lexical-search/) for more details. Note that testing should be conducted on your particular dataset to find the right tradeoff between performance and quality. ##### Hybrid TAAT and DAAT query evaluation Vespa supports **hybrid** query evaluation over inverted indexes, combining _TAAT_ and _DAAT_ evaluation to combine the best of both query evaluation techniques. Hybrid is not enabled per default and is triggered by a run-time query parameter. - **TAAT:** _Term At A Time_ scores documents one query term at a time. The entire posting iterator can be read per query term, and the score of a document is accumulated. It is CPU cache friendly as posting data is read sequentially without randomly seeking the posting list iterator. The downside is that _TAAT_ limits the term-based ranking function to be a linear sum of term scores. This downside is one reason why most search engines use _DAAT_. - **DAAT:** _Document At A Time_ scores documents completely one at a time. This requires multiple seeks in the term posting lists, which is CPU cache unfriendly but allows non-linear ranking functions. Generally, Vespa does _DAAT_ (document-at-a-time) query evaluation and not _TAAT_ (term-at-a time) for the reason listed above. Ranking (score calculation) and matching (does the document match the query logic) is not fully two separate disjunct phases, where one first finds matches and calculates the ranking score in a later phase. Matching and _first-phase_ score calculation is interleaved when using _DAAT_. The _first-phase_ ranking score is assigned to the hit when it satisfies the query constraints. At that point, the term iterators are positioned at the document id and one can unpack additional data from the term posting lists - e.g., for term proximity scoring used by the [nativeRank](../nativerank.html) ranking feature, which also requires unpacking of positions of the term within the document. The way hybrid query evaluation is done is that _TAAT_ is used for sub-branches of the overall query tree, which is not used for term-based ranking. Using _TAAT_ can speed up query matching significantly (up to 30-50%) in cases where the query tree is large and complex, and where only parts of the query tree are used for term-based ranking. Examples of query tree branches that would require _DAAT_ is using text ranking features like [bm25 or nativeRank](../reference/rank-features.html). The list of ranking features which can handle _TAAT_ is long, but using [attribute or tensor](../tensor-user-guide.html) features only can have the entire tree evaluated using _TAAT_. For example, for a query where there is a user text query from an end user, one can use _userQuery()_ YQL syntax and combine it with application-level constraints. The application level filter constraints in the query could benefit from using _TAAT_. Given the following document schema: ``` search news { document news { field title type string {} field body type string{} field popularity type float {} field market type string { rank:filter indexing: attribute attribute: fast-search } field language type string { rank:filter indexing: attribute attribute: fast-search } } fieldset default { fields: title,body } rank-profile text-and-popularity { first-phase { expression: attribute(popularity) + log10(bm25(title)) + log10(bm25(body)) } } } ``` In this case, the rank profile only uses two ranking features, the popularity attribute and the [bm25](../reference/bm25.html) score of the userQuery(). These are used in the default fieldset containing the title and body. Notice how neither _market_ nor _language_ is used in the ranking expression. In this query example, there is a language constraint and a market constraint, where both language and market are queried with a long list of valid values using OR, meaning that the document should match any of the market constraints and any of the language constraints: ``` ``` { "hits": 10, "ranking.profile": "text-and-popularity", "yql": "select * from sources * where userQuery() and (language contains \"en\" or language contains \"br\") and (market contains \"us\" or market contains \"eu\" or market contains \"apac\" or market contains \"..\" )", "query": "cat video", "ranking.matching.termwiselimit": 0.1 } ``` ``` The language and the market constraints in the query tree are not used in the ranking score, and that part of the query tree could be evaluated using _TAAT_. See also [multi lookup set filter](#multi-lookup-set-filtering) for how to most efficiently search with large set filters. The subtree result is then passed as a bit vector into the _DAAT_ query evaluation, which could significantly speed up the overall evaluation. Enabling hybrid _TAAT_ is done by passing `ranking.matching.termwiselimit=0.1` as a request parameter. It's possible to evaluate the performance impact by changing this limit. Setting the limit to 0 will force termwise evaluation, which might hurt performance. One can evaluate if using the hybrid evaluation improves search performance by adding the above parameter. The limit is compared to the hit fraction estimate of the entire query tree. If the hit fraction estimate is higher than the limit, the termwise evaluation is used to evaluate the sub-branch of the query. ##### Indexing uuids When configuring [string](../reference/schema-reference.html#string) type fields with `index`, the default [match](../reference/schema-reference.html#match) mode is `text`. This means Vespa will [tokenize](../linguistics.html#tokenization) the content and index the tokens. The string representation of an [Universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier) (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens). Example: Indexing `123e4567-e89b-12d3-a456-426655440000` with the above document definition, Vespa will tokenize this into 5 tokens: `[123e4567,e89b,12d3,a456,426655440000]`, each of which could be matched independently, leading to possible incorrect matches. To avoid this, change the mode to [match: word](../reference/schema-reference.html#word) to treat the entire uuid as _one_ token/word: ``` field uuid type string { indexing: summary | index match: word rank: filter } ``` In addition, configure the `uuid` as a [rank: filter](../reference/schema-reference.html#rank) field - the field will then be represented as efficiently as possible during search and ranking. The `rank:filter` behavior can also be triggered at query time on a per-query item basis by the `com.yahoo.prelude.query.Item.setRanked()` in a [custom searcher](../searcher-development.html). ##### Parent child and search performance When searching imported attribute fields (with `fast-search`) from parent document types, there is an additional indirection that can be reduced significantly if the imported field is defined with `rank:filter` and [visibility-delay](../reference/services-content.html#visibility-delay) is configured to \> 0. The [rank:filter](../reference/schema-reference.html#rank) setting impacts posting list granularity and `visibility-delay` enables a cache for the indirection between the child and parent document. ##### Ranking and ML Model inferences Vespa [scales](sizing-search.html) with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on [phased ranking](../phased-ranking.html). Phased ranking enables using more resources during the second phase ranking step than in the first phase. The first phase should focus on getting decent recall (retrieving relevant documents in the top k), while the second phase should tune precision. For [text search](../nativerank.html) applications, consider using the [WAND](../using-wand-with-vespa.html) query operator - WAND can efficiently (sublinear) find the top-k documents using an inner scoring function. ##### Multi Lookup - Set filtering Several real-world search use cases are built around limiting or filtering based on a set filter. If the contents of a field in the document match any of the values in the query set, it should be retrieved. E.g., searching data for a set of users: ``` select * from sources * where user_id = 1 or user_id = 2 or user_id = 3 or user_id = 3 or user_id = 4 or user_id 5 ... ``` For OR filters over the same field, it is strongly recommended to use the [in query operator](../reference/query-language-reference.html#in) instead. It has considerably better performance than plain OR for set filtering: ``` select * from sources * where user_id in (1, 2, 3, 4, 5) ``` **Note:** Large sets can slow down YQL-parsing of the query - see [parameter substitution](../reference/query-language-reference.html#parameter-substitution)for how to send the set in a compact, performance-effective way. Attribute fields used like the above without other stronger query terms, should have `fast-search` and `rank: filter`. If there is a large number of unique values in the field, it is also faster to use `hash` dictionary instead of `btree`, which is the default data structure for dictionaries for attribute fields with `fast-search`: ``` field user_id type long { indexing: summary | attribute attribute: fast-search dictionary: hash rank: filter } ``` For `string` fields, we also need to include [match](/en/reference/schema-reference.html#match) settings if using the `hash` dictionary: ``` field user_id_str type string { indexing: summary | attribute attribute: fast-search match: cased rank: filter dictionary { hash cased } } ``` If having 10M unique user\_ids in the dictionary and searching for 1000 users per query, the _btree_ dictionary would be 1000 lookup times log(10M), while _hash_ based would be 1000 lookups times O(1). Still, the _btree_ dictionary offers more flexibility in terms of [match](/en/reference/schema-reference.html#match) settings. The `in` query set filtering approach can be used in combination with hybrid _TAAT_ evaluation to further improve performance. See the [hybrid TAAT/DAAT](#hybrid-taat-daat) section. Also see the [dictionary schema reference](../reference/schema-reference.html#dictionary). **Note:** For most use cases, the time spent on dictionary traversal is negligible compared to the time spent on query evaluation (matching and ranking). If the query is very selective, for example, using vespa as a key-value lookup store with ranking support, the dictionary traversal time can be significant. ##### Document summaries - hits If queries request many (thousands) of hits from a content cluster with few content nodes, increasing the [summary cache](caches-in-vespa.html) might reduce latency and cost. Using [explicit document summaries](../document-summaries.html), Vespa can support memory-only summary fetching if all fields referenced in the document summary are **all** defined with `attribute`. Dedicated in-memory summaries avoid (potential) disk read and summary chunk decompression. Vespa document summaries are stored using compressed [chunks](../reference/services-content.html#summary-store-logstore-chunk). See also the [practical search performance guide on hits fetching](practical-search-performance-guide.html#hits-and-summaries). ##### Boolean, numeric, text attribute When using the attribute field type, considering performance, this is a rule of thumb: 1. Use boolean if a field is a boolean (max two values) 2. Use a string attribute if there is a set of values - only unique strings are stored 3. Use a numeric attribute for range searches 4. Use a numeric attribute if the data really is numeric; don't replace numeric with string numeric Refer to [attributes](../attributes.html) for details. ##### Tensor ranking The ranking workload can be significant for large tensors - it is important to understand both the potential memory and computational cost for each query. ###### Memory Assume the dot product of two tensors with 1000 values of 8 bytes each, as in `tensor(x[1000])`. With one query tensor and one document tensor, the dot product is `sum(query(tensor1) * attribute(tensor2))`. Given a Haswell CPU architecture, where the theoretical upper memory bandwidth is 68 GB/sec, this gives 68 GB/sec / 8 KB = 9M ranking evaluations/sec. In other words, for a 1 M index, 9 queries per second before being memory bound. See below for using smaller [cell value types](#cell-value-types), and read more about [quantization](https://blog.vespa.ai/from-research-to-production-scaling-a-state-of-the-art-machine-learning-system/#model-quantization). ###### Compute When using tensor types with at least one mapped dimension (sparse or mixed tensor), [attribute: fast-rank](../reference/schema-reference.html#attribute) can be used to optimize the tensor attribute for ranking expression evaluation at the cost of using more memory. This is a good tradeoff if benchmarking indicates significant latency improvements with `fast-rank`. When optimizing ranking functions with tensors, try to avoid temporary objects. Use the [Tensor Playground](https://docs.vespa.ai/playground/) to evaluate what the expressions map to, using the execution details to list the detailed steps - find examples below. ###### Multiphase ranking To save both memory and compute resources, use [multiphase ranking](../phased-ranking.html). In short, use less expensive ranking evaluations to find the most promising candidates, then a high-precision evaluation for the top-k candidates. The blog post series on [Building Billion-Scale Vector Search](https://blog.vespa.ai/building-billion-scale-vector-search/) is a good read. ###### Cell value types | Type | Description | | --- | --- | | double | The default tensor cell type is the 64-bit floating-point `double` format. It gives the best precision at the cost of high memory usage and somewhat slower calculations. Using a smaller value type increases performance, trading off precision, so consider changing to one of the cell types below before scaling the application. | | float | The 32-bit floating-point format `float` should usually be used for all tensors when scaling for production. Note that some frameworks like TensorFlow prefer 32-bit floats. A vector with 1000 dimensions, `tensor(x[1000])` uses approximately 4K memory per tensor value. | | bfloat16 | This type has the range as a normal 32-bit float but only 8 bits of precision and can be thought of as a "float with lossy compression" - see [Wikipedia](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format). If memory (or memory bandwidth) is a concern, change the most space-consuming tensors to use the `bfloat16` cell type. Some careful analysis of the data is required before using this type. When doing calculations, `bfloat16` will act as if it was a 32-bit float, but the smaller size comes with a potential computational overhead. In most cases, the `bfloat16` needs conversion to a 32-bit float before the actual calculation can occur, adding an extra conversion step. In some cases, having tensors with `bfloat16` cells might bypass some built-in optimizations (like matrix multiplication) that will be hardware-accelerated only if the cells are of the same type. To avoid this, use the [cell\_cast](../reference/ranking-expressions.html#cell_cast) tensor operation to make sure the cells are of the right type before doing the more expensive operations. | | int8 | If using machine learning to generate a model with data quantization, one can target the `int8` cell value type, which is a signed integer with a range from -128 to +127 only. This is also treated like a "float with limited range and lossy compression" by the Vespa tensor framework, and gives results as if it were a 32-bit float when any calculation is done. This type is also suitable when representing boolean values (0 or 1). **Note:** If the input for an `int8` cell is not directly representable, the resulting cell value is undefined, so take care to only input numbers in the `[-128,127]` range. It's also possible to use `int8` representing binary data for [hamming distance](../reference/schema-reference.html#distance-metric) Nearest-Neighbor search. Refer to [billion-scale-knn](https://blog.vespa.ai/billion-scale-knn/) for example use. | ###### Inner/outer products The following is a primer into inner/outer products and execution details: | tensor a | tensor b | product | sum | comment | | --- | --- | --- | --- | --- | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[3]):[4.0, 5.0, 6.0] | tensor(x[3]):[4.0, 10.0, 18.0] | 32 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension name and size are the same in both tensors - this is an inner product with a scalar result. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[3]):[4.0, 5.0, 6.0] | tensor(x[3],y[3]):[ [4.0, 5.0, 6.0], [8.0, 10.0, 12.0], [12.0, 15.0, 18.0] ] | 90 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMmz5AFmUBWZQA2KU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). The dimension size is the same in both tensors, but dimensions have different names -\> this is an outer product; the result is a two-dimensional tensor. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(x[2]):[4.0, 5.0] | undefined | | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IWMQrZ8gCzKArFKajroY+qxGmCacZtYU+JHk9Jyw5o7sDIwAVPbWoajhhnHRhA4WCdYQkMlQ0ACutADGZenOkHx1ALbC2bYywTT5OihSINpAA). Two tensors in the same dimension but with different lengths -\> undefined. | | tensor(x[3]):[1.0, 2.0, 3.0] | tensor(y[2]):[4.0, 5.0] | tensor(x[3],y[2]):[ [4.0, 5.0], [8.0, 10.0], [12.0, 15.0] ] | 54 | [Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPRAGYAugEo4iAIzEwAJmXTIrCAF9W20hmrlcDIgbaU0WugwBGLGpg5Qe-IcICeiFbPkAWZQBWKU1HXQx9ViNME04zawp8aPJ6TlhzR3YGRgAqe2tw1EjDBNjCBwsk6whIVKhoAFdaAGMKzOdIPgaAW2Fc2xlQmkKdFCkQbSA). Two tensors with different names and dimensions -\> this is an outer product; the result is a two-dimensional tensor. | Inner product - observe optimized into `DenseDotProductFunction` with no temporary objects: ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::DenseDotProductFunction", "symbol": "vespalib::eval::(anonymous namespace)::my_cblas_double_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Outer product, parsed into a tensor multiplication (`DenseSimpleExpandFunction`), followed by a `Reduce` operation: ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::DenseSimpleExpandFunction", "symbol": "void vespalib::eval::(anonymous namespace)::my_simple_expand_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)" }, { "class": "vespalib::eval::tensor_function::Reduce", "symbol": "void vespalib::eval::instruction::(anonymous namespace)::my_full_reduce_op >(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Note that an inner product can also be run on mapped tensors ([Playground example](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEMSybIiFIAXA2gZywAnABQAPYAF8AlHGCiAjHHnEwogExw1K0QGY4OiZFYQJrCaQzVyuBkQttKaY3QYAjFjUwcoPfkLGSMnKKACwAdAAM2hoArJHaegBskYbOphjmrFaYNpx2zhT42eT0nACWtLQEgjiCWAAmAK4AxlwenoQMjABU7mlmKAC6IBJAA)): ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::SparseFullOverlapJoinFunction", "symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_full_overlap_join_op, true>(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ###### Mapped lookups `sum(model_id * models, m_id)` | tensor name | tensor type | | --- | --- | | model\_id | `tensor(m_id{})` | | models | `tensor(m_id{}, x[3])` | Using a mapped dimension to select an indexed tensor can be considered a [mapped lookup](../tensor-examples.html#using-a-tensor-as-a-lookup-structure). This is similar to creating a slice but optimized into a single `MappedLookup` - see [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gFssATAgGwH0BLFksmpCIJIAFwK0AzlgBOACkY8WwAL4BKOMEYAmOAEYAdAAYVkARBUCVpDNXK4GRG4MppzdBszbtJ-GpmEocSlZBSVVYgAPRABmAF0NLT04RAAWY2IwAFYMsAA2YzjMnRSAdlyADlyATkLTd0sMawE7TAcRJ3cKfFbyehFJAFdGBVYOJTAAKjAvDklipTU-f0IGIZHZrl4pmbGfBd4lhqtnCF6odtXTzFdziEh+qEl2bgBjTpXVkVHvSUnNxZaJRwHT1fyNVDNWxdS5CZbkW7ue6PJgAQxwOAILE47CwWAA1oMcJwZFjBu94YJApBSSxyQQfnN-nslMR1sRFIczOCrCg4iAVEA) example. ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::MappedLookup", "symbol": "void vespalib::eval::(anonymous namespace)::my_mapped_lookup_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ###### Three-way dot product - mapped `sum(query(model_id) * model_weights * model_features)` | tensor name | tensor type | | --- | --- | | query(model\_id) | `tensor(model{})` | | model\_weights | `tensor(model{}, feature{})` | | model\_features | `tensor(feature{})` | Three-way mapped (sparse) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASW1sBgADq0tmZCcPbEpeVKlQRw0HYWTsSNzVFtdh1lFVV9znAATMNNRa08jpNdPX0DcADMS6PCbeud073QcwAsYC5hHhhesr6Y-oqBYRT4z+T0iisiU26VVSQXS8gY2Q02l0BmMXEBPRqNn6cAArJNHHAAGy3dL3VCPHwfV6ENLBL5hCCQX5QFjsbj-CSSMAAKjA-1iCWSalZ7JaAM2wLJYMyTFYnFMUXEUl5HLiSRSsv5CKFd08oNCYJJ4I1VJC33CijUzB4XDpEsZMrZcq5iutysFBDU0l1GQYxtN5oZ-KZSqlnIVPPtUpVTukaoeKAAuiALEA) ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::Sparse112DotProduct", "symbol": "void vespalib::eval::(anonymous namespace)::my_sparse_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` ###### Three-way dot product - mixed `sum(query(model_id) * model_weights * model_features)` | tensor name | tensor type | | --- | --- | | query(model\_id) | `tensor(model{})` | | model\_weights | `tensor(model{}, feature[2])` | | model\_features | `tensor(feature[2])` | Three-way mapped (mixed) dot product: [Tensor Playground](https://docs.vespa.ai/playground/#N4KABGBEBmkFxgNrgmUrWQPYAd5QFNIAaFDSPBdDTAO30gEcBXAgJwE8AKAWywBMCAGwD6AS34BKEmRqQiCSABcCtAM5Y2AHmhCsAQyUA+XgOHAAvpLjAeABjgBGC5FkQLsi6QzVyuBkTecpRobnQMfIKiAO4EYgDmABZKajI0mApQKuqaOnqGJpHmXmDQBIbMbASIAEwAutbAYAA6tLZmQnD2xKXlSpUEcHYWTsSt7VFddj1lFVVOIzVjbUWdPI4zfQNDIwDMyxPCXRu9c4POcAAsYC5hHhhesr6Y-oqBYRT4z+T0iqsis36VVSQXS8gY2Q02l0BmMXEBA1qDTgiAArMQAGx1Vzpe6oR4+D6vQhpYJfMIQSC-KAsdjcf4SSRgABUYH+sQSyTULLZHQBW2BpLBmSYrE4pii4ikPPZcSSKRlfIRgrunlBoTBxPB6spIW+4UUamYPC4tPFDOlrNlnIVVqVAoIamkOoyDCNJrN9L5jMVko58u5dslysd0lVDxQdRAFiAA) ``` ``` [ { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::tensor_function::Inject", "symbol": "" }, { "class": "vespalib::eval::Mixed112DotProduct", "symbol": "void vespalib::eval::(anonymous namespace)::my_mixed_112_dot_product_op(vespalib::eval::InterpretedFunction::State&, unsigned long)" } ] ``` ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Attribute vs index](#attribute-vs-index) - [When to use fast-search for attribute fields](#when-to-use-fast-search-for-attribute-fields) - [Tuning query performance for lexical search](#tuning-query-performance-for-lexical-search) - [Posting Lists](#posting-lists) - [Performance](#performance) - [Hybrid TAAT and DAAT query evaluation](#hybrid-taat-daat) - [Indexing uuids](#indexing-uuids) - [Parent child and search performance](#parent-child-and-search-performance) - [Ranking and ML Model inferences](#ranking-and-ml-model-inferences) - [Multi Lookup - Set filtering](#multi-lookup-set-filtering) - [Document summaries - hits](#document-summaries-hits) - [Boolean, numeric, text attribute](#boolean-numeric-text-attribute) - [Tensor ranking](#tensor-ranking) - [Memory](#memory) - [Compute](#compute) - [Multiphase ranking](#multiphase-ranking) - [Cell value types](#cell-value-types) - [Inner/outer products](#Inner-outer-products) - [Mapped lookups](#mapped-lookups) - [Three-way dot product - mapped](#three-way-dot-product-mapped) - [Three-way dot product - mixed](#three-way-dot-product-mixed) --- ## Features ### Features Vespa is a platform for applications which need low-latency computation over large data sets. #### Features ##### What is Vespa? Vespa is a platform for applications which need low-latency computation over large data sets. It allows you to write and persist any amount of data, and execute high volumes of queries over the data which typically complete in tens of milliseconds. Queries can use both structured filters conditions, text and nearest neighbor vector search to select data. All the matching data is then ranked according to ranking functions - typically machine learned - to implement such use cases as search relevance, recommendation, targeting and personalization. All the matching data can also be grouped into groups and subgroups where data is aggregated for each group to implement features like graphs, tag clouds, navigational tools, result diversity and so on. Application specific behavior can be included by adding Java components for processing queries, results and writes to the application package. Vespa is real time. It is architected to maintain constant response times with any data volume by executing queries in parallel over many data shards and cores, and with added query volume by executing queries in parallel over many copies of the same data (groups). It is optimized to return responses in tens of milliseconds. Writes to data becomes visible in a few milliseconds and can be handled at a rate of thousands to tens of thousands per node per second. A lot of work has gone into making Vespa easy to set up and operate. Any Vespa application - from single node systems to systems running on hundreds of nodes in data centers - are fully configured by a single artifact called an _application package_. Low level configuration of nodes, processes and components is done by the system itself based on the desired traits specified in the application package. Vespa is scalable. System sizes up to hundreds of nodes handling tens of billions of documents, and tens of thousands of queries per second are not uncommon, and no harder to set up and modify than single node systems. Since all system components, as well as stored data is redundant and self-correcting, hardware failures are not operational emergencies and can be handled by re-adding capacity when convenient. Vespa is self-repairing and dynamic. When machines are lost or new ones added, data is automatically redistributed over the machines, while continuing serving and accepting writes to the data. Changes to configuration and Java components can be made while serving by deploying a changed application package - no downtime or restarts required. ##### Features This section provides an overview of the main features of Vespa. The remainder of the documentation goes into full detail. ###### Data and writes - Documents in Vespa may be added, replaced, modified (single fields or any subset) and removed. - Writes are acknowledged back to the client issuing them when they are durable and visible in queries, in a few milliseconds. - Writes can be issued at a sustained volume of thousands to tens of thousands per node per second while serving queries. - Data is replicated with a configurable redundancy. - An even data distribution, with the desired redundancy is automatically maintained when nodes are added, removed or lost unexpectedly. - Data corruption is automatically repaired from an uncorrupted replica of the data. - Data is written over a simple HTTP/2 API, or (for high volume) using a small, standalone client. - Document data schemas allow fields of any of the usual primitive types as well as collections, structs and tensors. - Any number of data schemas can be used at the same time. - Documents may reference each other and field from referenced documents may be used in queries without performance penalty. - Write operations can be processed by adding custom Java components. - Data can be streamed out of the system for batch reprocessing. ###### Queries - Queries may contain any combination of structured filters, free text and vector search operators. - Queries may contain large tensors and vectors (to represent e.g a user). - Queries choose how results should be ranked and specify how they should be organized (see sections below). - Queries and results may be processed by adding custom Java components - or any HTTP request may be turned into a query by custom request handlers. - Query response times are typically in tens of milliseconds and can be maintained given any load and data size by adding more hardware. - A _streaming search_ mode is available where search/selection is only supported on predefined groups of documents (e.g a user's document). In this mode each node can store and serve billions of documents while maintaining low response times. ###### Ranking and inference - All results are ranked using a configured ranking function, selected in the query. - A ranking function may be any mathematical function over scalars or tensors (multidimensional arrays). - Scalar functions include an "if" function to express business logic and decision trees. - Tensor functions include a powerful set of primitives and composite functions which allows expression of advanced machine-learned ranking functions such as e.g. deep neural nets. - Functions can also refer to ONNX models invoked locally on the content nodes. - Multiple ranking phases are supported to allocate more CPU to ranking promising candidates. - A powerful set of text ranking features using positional information from the documents is provided out of the box. - Other ranking features include 2D distance and freshness. ###### Organizing data and presenting results - Matches to a query can be grouped and aggregated according to a specification in the query. - All the matches are included, even though they reside on multiple machines executing in parallel. - Matches can be grouped by a unique value or by a numerical bucket. - Any level of groups and subgroups are supported, and multiple parallel groupings can be specified in one query. - Data can be aggregated (counted, averaged etc.) and selected within each group and subgroup. - Any selection of data from documents can be included with the final result returned to the client. - Search engine style keyword highlighting in matching fields is supported. ##### Configuration and operations - Vespa can be installed using rpm files or a Docker image - on personal laptops, owned datacenters or in AWS. - An application of Vespa is fully specified as a separate buildable artifact: An _application package_ - individual machines or processes need never be configured individually. - Systems may contain multiple clusters of each type (stateless and stateful), each containing any number of nodes. - Systems of any size may be specified by two short configuration files in the application package. - Document schemas, Java components and ranking functions/models are also configured in the application package. - An application package is deployed as a single unit to Vespa to realizes the system desired by the application. - Most application changes (including Java component changes) can be performed by deploying a changed application package - the system will manage its own change process while serving and handling writes. - Most document schema changes (excluding field type changes) can be made while the system is live. - Application package changes are validated on deployment to prevent destructive changes to live systems. - Vespa has no single-point-of-failures and automatically routes around failing nodes. - System logs are collected to a central server in real time. - Selected metrics may be emitted to a third-party metrics/alerting system from all the nodes. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [What is Vespa?](#what-is-vespa) - [Features](#features) - [Data and writes](#data-and-writes) - [Queries](#queries) - [Ranking and inference](#ranking-and-inference) - [Organizing data and presenting results](#organizing-data-and-presenting-results) - [Configuration and operations](#configuration-and-operations) --- ## Ja ### Vespaの機能 Vespaはリアルタイムで巨大なデータセットに対して計算を実行・配信するためのエンジンです。 どのような量のデータでも書き込んで保存することができ、典型的には数十ミリ秒で完了するデータに対する膨大なクエリーを実行することができます。 #### Vespaの機能 ##### Vespaとは何か? Vespaはリアルタイムで巨大なデータセットに対して計算を実行・配信するためのエンジンです。 どのような量のデータでも書き込んで保存することができ、典型的には数十ミリ秒で完了するデータに対する膨大なクエリーを実行することができます。 データを選択するために、クエリーには構造化されたフィルタと構造化されていないテキスト検索の両方を指定することができます。 検索の関連度、レコメンデーション、ターゲティングやパーソナライズといった用途を実現するため、マッチしたすべてのデータはランキング関数(典型的には機械学習による)に従ってランキングされます。 マッチしたすべてのドキュメントはグループやサブグループに分けることもでき、このときデータはグラフ、タグクラウド、ナビゲーション・ツール、結果の多様性といった機能を実現するため各グループに集約されます。 クエリー、検索結果、書き込みを処理するアプリケーション固有の動作は、Javaコンポーネントをアプリケーション・パッケージに含めることで追加することができます。 Vespaはリアルタイムです。データの断片とコアに対してクエリーを同時実行することで、さらなるクエリー量には同じデータのたくさんのコピー(グループ)に対してクエリーを同時実行することで、どのような量のデータでも一定の応答時間を維持するよう設計されています。数十ミリ秒以内にレスポンスを返却できるよう最適化されています。データの書き込みは数ミリ秒で参照できるようになり、各ノードで秒あたり数千から数万のレートで扱うことができます。 Vespaを簡単にセットアップして運用できるようにするため多くの作業が費やされました。 どのようなVespaアプリケーション(単一ノードのシステムから複数のデータセンターにまたがる数百ノードを稼働させるシステムまで)も_アプリケーション・パッケージ_と呼ばれる単一のアーティファクトで完全に設定されます。ノードやプロセス、コンポーネントの低レベルな設定はアプリケーション・パッケージで指定された特性をもとに行われます。 Vespaはスケーラブルです。数百億のドキュメントを扱う数百ノードにもおよぶシステムは珍しいものではなく、単一ノードのシステムをセットアップして変更するのと大変さは変わりません。 すべてのシステム・コンポーネントは蓄積されたデータと同様に冗長性と自己修正のメカニズムを備えているため、ハードウェア故障は運用上の緊急事態ではなく、都合のよい時にキャパシティを再度追加することで対処できます。 Vespaは自己修復のメカニズムを備えておりダイナミックです。マシンが失われたり新しいものが追加されると、データの配信と書き込みは継続しながら、各マシンに自動的に再配布されます。 変更されたアプリケーション・パッケージをデプロイすることで、配信を継続しながら設定情報とJavaコンポーネントを変更することが可能です - ダウンタイムはなく、再起動も必要ありません。 ##### 機能 このセクションではVespaの主な機能の概要を説明します。 ドキュメンテーションの残りで詳しく説明します。 ###### データと書き込み - Vespaのドキュメントは追加、置換、変更(1つのフィールドあるいは任意のサブセット)、削除することができます。 - 書き込みは恒久的な状態になり、さらに(デフォルトでは)クエリーで参照できるようになるとクライアントに受信したことが通知されます。 - 書き込みリクエストはクエリーに対する配信を継続しながら、毎ノード毎秒あたり数千から数万のボリュームを維持して発行することができます。 - データは設定可能な冗長レベルでレプリケーションされます。 - ノードが追加、削除あるいは意図せず失われると、設定された冗長レベルのデータの均等分散が維持されます。 - データの破損は、破損していないデータのレプリカから自動的に修復されます。 - データはシンプルなHTTP APIか、(大容量向けに)小さなスタンドアロンのJavaクライアントで書き込むことができます。 - 一般的なプリミティブ型やコレクション、構造体、テンソルをドキュメントのデータ・スキーマのフィールドに指定することができます。 - 同時にいくつものデータ・スキーマを使うことができます。 - ドキュメントは相互に参照することができ、参照されているドキュメントのフィールドはパフォーマンスのペナルティなしでクエリーで指定することができます。 - カスタムのJavaコンポーネントを追加することで、書き込み処理をプロセスすることができます。 - データはバッチ再処理用にシステムからストリーミングすることができます。 ###### クエリー - クエリーは構造化されたフィルターと構造化されていない検索演算子の任意の組み合わせを含むことができます。 - クエリーは大きなテンソルとベクトルを(例えばユーザーを表現するために)含むことができます。 - クエリーで検索結果がどのようにランキングされて、またオーガナイズされるべきかを指定します(以下のセクションを参照してください)。 - カスタムのJavaコンポーネントを追加することで、クエリーと検索結果をプロセスすることができます - カスタムのリクエスト・ハンドラで任意のHTTPリクエストをクエリーに変換することもできます。 - クエリーの応答時間は典型的には数十ミリ秒以内で、ハードウェアを追加することで負荷やデータサイズに対応できます。 - 事前に設定されたドキュメントのグループ(例えば、ユーザーのドキュメント)に対してのみ_ストリーミング検索_モードを利用することができます。このモードでは各ノードで、短い応答時間を維持しながら数十億のドキュメントを保持して配信することができます。 ###### ランキング - すべての結果は設定されたランキング関数でランキングされます。ランキング関数はクエリーで指定します。 - ランキング関数にはスカラーまたはテンソル(多次元配列)の任意の数学関数を指定することができます。 - スカラー関数にはビジネスロジックや決定木を表現するための "if" 関数が含まれます。 - テンソル関数には深層ニューラルネットワークのようなもっとも進化した機械学習ランキング関数の表現が可能な原始関数と合成関数の強力なセットが含まれます。 - 期待の持てる候補のランキングにより多くのCPUを割り当てられるように、複数フェーズのランキングがサポートされています。 - ドキュメントでの位置情報を用いたテキストのランキング特徴量の強力なセットがすぐに使えます。 - その他にも2次元の距離や鮮度といったランキング特徴量があります。 ###### 結果のオーガナイズとプレゼンテーション - クエリーでの指定にしたがって、クエリーに対してマッチしたドキュメントをグルーピングしたり、集約したりすることができます。 - 同時実行される複数台のマシンにまたがる場合であっても、すべてのマッチしたドキュメントが含まれます。 - マッチしたドキュメントはユニークな値や数値的なバケットでグルーピングすることができます。 - 任意のレベルのグループとサブグループがサポートされており、複数の並列グルーピングを1つのクエリーで指定することができます。 - データは集約することができ(カウントする、平均をとるなど)、また各グループやサブグループ内で選択することができます。 - ドキュメントからのいかなるデータ選択もクライアントに返却される最終的な検索結果に含めることができます。 - マッチしたフィールドにおける検索エンジン・スタイルのキーワードのハイライトがサポートされています。 ##### 設定と運用 - VespaはRPMまたはDockerイメージとしてインストールすることができます。それは個人のラップトップでも、所有しているデータセンターでも、AWSでも可能です。 - Vespaのアプリケーションは独立した構築可能なアーティファクトで完全に記述されます: それは_アプリケーション・パッケージ_で、個々のマシンやプロセスを個別に設定する必要はありません。 - システムは任意の数のノードを含む各タイプ(ステートレスとステートフル)からなる複数のクラスタで構成することができます。 - どのようなサイズのシステムもアプリケーション・パッケージの2つの短い設定ファイルで記述することができます。 - ドキュメント・スキーマ、Javaコンポーネント、ランキング関数/モデルもアプリケーションパッケージで設定されます。 - アプリケーションによって意図されたシステムを実現するため、アプリケーション・パッケージが単一のユニットとしてVespaにデプロイされます。 - アプリケーションのたいていの変更(Javaコンポーネントの変更を含む)は変更されたアプリケーション・パッケージをデプロイすることで適用することができます。システムは配信と書き込みを維持しながら変更処理を管理します。 - たいていのドキュメント・スキーマの変更(フィールド型の変更を除く)はシステムが稼働した状態で適用することができます。 - アプリケーション・パッケージの変更は稼働中のシステムに対する破壊的な変更を防ぐため、デプロイ時にバリデーションされます。 - Vespaには単一障害点がなく、自動でフェイルしたノードを迂回します。 - システムのログはリアルタイムで中央サーバーに収集されます。 - すべてのノードから、サードパーティーのメトリクス/アラートシステムに選択されたメトリクスを送信することができます。 Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Vespaとは何か?](#) - [機能](#) - [データと書き込み](#) - [クエリー](#) - [ランキング](#) - [結果のオーガナイズとプレゼンテーション](#) - [設定と運用](#) --- ### Vespaドキュメンテーションの紹介 このドキュメンテーションはVespaの利用者、あるいは潜在的な利用者(アプリケーションの所有者 / PM、エンジニアまたは運用者)のためのものです。 このドキュメンテーションはプロダクトの概念上の概要をカバーしており、それには背景的な理論や、プロダクトがなぜそのように開発されたかの説明が含まれています。 これは開発者が最初に使う機能の詳しい説明に続きます。 #### Vespaドキュメンテーションの紹介 このドキュメンテーションはVespaの利用者、あるいは潜在的な利用者(アプリケーションの所有者 / PM、エンジニアまたは運用者)のためのものです。 このドキュメンテーションはプロダクトの概念上の概要をカバーしており、それには背景的な理論や、プロダクトがなぜそのように開発されたかの説明が含まれています。 これは開発者が最初に使う機能の詳しい説明に続きます。 Vespaプラグインの開発については、このドキュメンテーションは経験豊かなJava開発者を対象としています。 初心者は対象としておらず、一般的なプログラミング技術やプログラミング言語の基礎はカバーしていません。 ドキュメンテーションの一部について読者はUnixライクなプラットフォームに精通しているべきで、これはVespaがLinux上で利用可能であるためです。 VespaのAPIは、Vespaの内部動作に関する深い知識がなくても使い始められるようになっています。 読者はエキスパートである必要は決してありませんが、Vespaの基本的性質をきちんと理解することで、テキストとサンプルをより簡単に理解することができるでしょう。 もし誤りや綴りの間違い、コードの欠陥を発見したりドキュメンテーションを改善したい場合は、pull requestを送るか[issueを作成](https://github.com/vespa-engine/vespa/issues)してください。 _イタリック文字_は以下の用途に用いられます: - パス名、ファイル名、プログラム名、ホスト名、URL - 新しい用語が定義されている箇所 `等幅文字`は以下の用途に用いられます: - プログラミング言語の要素、コードの例、キーワード、関数、クラス、インタフェース、メソッド、その他 - コマンドとコマンドラインの出力 注釈とその他の重要な情報は以下のように示されます: **Note:** 注意してほしい情報 コマンドラインで実行するコマンドは、以下のようにプロンプトの $ で始まり示されます: ``` $ export PATH=$VESPA_HOME/bin:$PATH ``` Copyright © 2025 - [Cookie Preferences](#) --- ### Vespaの概要 Vespaはスケーラブルで低レイテンシな、ステートフルあるいはステートレスなバックエンドサービスを簡単に開発して稼働させることができるプラットフォームです。 このドキュメントではプラットフォームの機能と主なコンポーネントの概要を説明します。 #### Vespaの概要 Vespaはスケーラブルで低レイテンシな、ステートフルあるいはステートレスなバックエンドサービスを簡単に開発して稼働させることができるプラットフォームです。 このドキュメントではプラットフォームの機能と主なコンポーネントの概要を説明します。 ##### イントロダクション Vespaを使うことで、レイテンシや信頼性を犠牲にせずに、大規模データや高負荷に耐えるバックエンドあるいはミドルウェアシステムを構築することができます。 Vespaのインスタンスはいくつかの_ステートレスなJavaコンテナー・クラスター_と、データを保持する0個以上の_コンテント_・クラスターで構成されます。 ![Vespa Overview](../assets/img/vespa-overview.svg) [ステートレスな**コンテナー**・クラスター](../en/jdisc/)は、入力データと、リクエスト/クエリーとそのレスポンスの両方を処理するコンポーネントをホスティングします。 これらのコンポーネントは(インデックス構築やクエリー実行の全ステージといった)プラットフォームに関する機能を提供するだけでなく、アプリケーションのミドルウェアのロジックも提供します。 アプリケーション開発者は、全ての機能を満たす単一のステートレスなクラスターとしてVespaシステムを設定することもできますし、タスクの種類に合わせて異なる複数のクラスターを設定することもできます。 そしてコンテナー・クラスターは クエリーやデータ操作の命令を適切なコンテント・クラスターにパスします — アプリケーションが保持しないデータである場合には、そのデータを供給する外部サービスと連携することが可能です。 Vespaクラスターの [**コンテント**・クラスター](../en/elasticity.html)は データ(ドキュメント)を保持して、それらに対する参照、分散された選択/グルーピング/集約のクエリー処理に責任を持ちます。 コンテント・クラスターはシンプルなkey-valueの配信システムとして機能させることもできますし、構造化・非構造化データに対する複雑な検索を実行したり、関連度のモデルに従って並べ替えて検索結果をグルーピングしたり集約処理を実行したりもできます。 これらの操作が低レイテンシで機能するよう細心の注意が払われています。それは、結果データを事前に計算することなく、大規模なデータセットに対してエンドユーザー・アプリケーションが直接使えるようにするためです。 スケーラビリティを提供するために、コンテント・クラスターは設定された冗長性のレベルを維持するためにバックグラウンドで自動的にデータを再バランスします。到達できないノードに対するフェイル・オーバーも行なうため、柔軟であり、自動リカバリー機能を備えているといえるのです。 コンテナー・クラスターでの中間処理の後、データはコンテント・クラスターに書き込まれます。 書き込みは数ミリ秒の後に有効になり、与えられた時間内に成功するか失敗に関する情報を提供するかが保証されており、利用可能なリソースに合わせてスケールされます。 書き込みはHTTPで直接送信することもできますし、Javaクライアントを使うこともできます —[APIドキュメンテーション](../en/api.html)を参照してください。 Vespaに蓄積されるドキュメントのインスタンスは設定された[スキーマ](../en/schemas.html)を持たなければなりません。 システムにおける各コンテント・クラスターは同時に複数の型のドキュメントを扱うことができます; アプリケーションは異なる型のデータを異なるコンテント・クラスターに割り当てることもできますし、同じコンテント・クラスターに複数のデータ型を割り当てることもできます。 コンテナー・クラスターとコンテント・クラスターはVespaのすべてのエンドユーザーのトラフィックを処理しますが、3番目のタイプのクラスタがあります。それは[**admin**とconfigクラスター](../en/application-packages.html)で、これは他のクラスターを管理してシステムの設定変更のリクエストを扱います。 Vespaアプリケーションは[_アプリケーション・パッケージ_](../en/application-packages.html)で完全に記述されますが、それはシステムの一部として稼働するクラスターに関する宣言、コンテントのスキーマ、アプリケーションで必要なJavaコンポーネントやその他の設定、データ・ファイルなどを含むディレクトリです。 アプリケーションの所有者はアプリケーション・パッケージを単一のadminクラスターに_デプロイする_ことで稼働させることができ、また同じ手順で稼働中のアプリケーションに変更を加えることができます。 アプリケーション設定の管理に加えて、adminクラスターはシステムのすべてのノードからリアルタイムでログを収集します。 ノードにVespaがインストールされて起動されると、それはシステム全体が単一のユニットとして扱えるようにadminシステムによって管理されるようになり、そしてアプリケーションの所有者はシステムのノードのローカルで管理タスクを実行する必要はありません。 ドキュメントの残りでVespaが行なう機能の詳細について説明します。 ##### Vespaのオペレーション Vespaは以下のオペレーションを受け付けます: - 書き込み: ドキュメントの設置(追加と置き換え)と削除、それらのフィールドの更新。 - IDによるドキュメント(または、そのサブセット)の参照。 - [_選択_](../en/query-language.html); マッチしたドキュメントは[_ソーティング_](../en/reference/sorting.html)したり、[_ランキング_](../en/ranking.html)したり、[_グルーピング_](../en/grouping.html)することができます。 検索結果のランキングは_[ランキング式](../en/reference/ranking-expressions.html)_に従って実行されます。シンプルな数学関数や複雑なビジネスロジック、機械学習の検索ランキングモデルを使うことができます。 グルーピングは各グループがグループ内のデータの集約した値を含むことができるような階層的なグループの集合で、フィールドの値を使って行なわれます。 グルーピングは値を計算するために集約処理と組み合わせることができます。例えば: ナビゲーション補助、タグ・クラウド、グラフ、クラスタリング — すべては分散して処理され、大規模データセットで法外な計算コストになってしまうような、コンテナー・クラスターに全データを送り返すようなことはありません。 - データのダンプ: [_visit_](../en/visiting.html)オペレーションを使うことで、条件に一致するコンテントをストリーム出力することができます。これはバックグラウンドの再処理やバックアップといったことに使うことができます。 - [その他のカスタムのネットワーク・リクエスト](../en/reference/component-reference.html) はコンテナー・クラスターにデプロイされたアプリケーション・コンポーネントで処理することができます。 これらのオペレーションで開発者はリッチな機能のアプリケーションを構築することができます。 それは選択やキーワード検索、オーガナイズやコンテントの処理が宣言的なクエリーで表現可能な蓄積されたコンテントで稼働するようなJavaのミドルウェア・ロジックとして記述されます。 ##### ステートレス・コンテナー [コンテナー・クラスター](../en/jdisc/)は上記に掲載されたオペレーションと、それらの返却データの処理に従事するアプリケーション・コンポーネントをホスティングしています。 Vespaはコンポーネントのインフラとともに、すぐに使えるコンポーネント群を提供しています: adminサーバまたはアプリケーション・パッケージからの設定の注入のサポートが追加された[Guice](https://github.com/google/guice)の上に構築された依存関係の注入; OSGiをベースとしたコンポーント・モデル; メトリクスやロギングと同様にモジュール性のためハンドラのチェーンにコンポーネントを連結できる共有されたメカニズム。 さらにコンテナーはリモート・リクエストを処理したり発行できるネットワーク・レイヤーを提供しています - HTTPはすぐに使えますし、その他のプロトコル/トランスポートはコンポーネントとして透過的にプラグインできます。 開発者はアプリケーション・パッケージを単に再デプロイするだけでコンポーネント群に変更を(もちろんその設定も)加えることができます - システムはその場でリクエスト処理に影響を与えずに、クラスターのノードに対するコピー、コンポーネントのロード/アンロードを管理します。 ##### コンテント・クラスター [コンテント・クラスター](../en/elasticity.html) はデータを確実に保存して、検索と選択のためにデータの分散インデックスを維持します。クラスターがノードやディスクの喪失に対して自動で修復できるよう、データはアプリケーションで指定されたコピー数に応じて複数ノード間でレプリケーションされます。同じメカニズムを使って、クラスターは拡大したり縮小したりすることもできます。それはアプリケーション・パッケージで宣言された利用可能なノードの集合を単に変更するだけです。 個々のドキュメントの参照はそのドキュメントを保持するノードに直接的にルーティングされ、クエリーは対象ドキュメントを保持する一部のノードに分散されます。 複雑なクエリーはコンテナーとコンテント・ノードの間で複数ステップにまたがる分散アルゴリズムで処理されます; これはVespaのデザインのゴールのひとつである低レイテンシーを実現するためのものです。 ##### 管理と開発者のサポート [単一のadmin・configクラスター](../en/application-packages.html)がシステムの他のクラスターを制御します。 アプリケーション開発者が詳細を気にせずに希望するシステムの高レベルな宣言ができるよう、 プロセスとコンポーネントの実体を含む個々のクラスターの低レベルな設定が導かれます。 アプリケーション・パッケージが再デプロイされるといつも、システムは設定の必要な変更を計算して、これらが分散されたコンポーネントにプッシュされます。 効率のため、変更されたコンポーネントとデータ・ファイルはBitTorrentで配布されます。 アプリケーション・パッケージはHTTP REST APIか[コマンドライン・インタフェース](../en/application-packages.html#deploy)で[変更したり、再デプロイしたり](../en/reference/deploy-rest-api-v2.html)、[検証する](../en/reference/config-rest-api-v2.html)ことができます。 設定の変更を唯一で一貫性のあるものにするため、また単一障害点を持つことを避けるため、管理クラスターは[ZooKeeper](https://zookeeper.apache.org/)の上で稼働します。 数百ノードで構成される大規模システムも、すべてのサービスを稼働させている単一ノードも、アプリケーションパッケージは同じように見え、またデプロイも同じ方法で行われます。 唯一の必要な変更はクラスターを構成するノードのリストです。 コンテナー・クラスターはメソッド呼び出しでアプリケーション・パッケージを「デプロイ」することで、単一のJava VMの中で起動することもできます。 これはIDEやユニット・テストの中でアプリケーションをテストするのに便利です。 コンポーネントを含むアプリケーション・パッケージは、サンプル・アプリケーションを始めとして、Mavenを使ったIDEで[開発する](../en/developer-guide.html)ことができます。 ##### サマリー Vespaを使うことで、低レベルの複雑さに開発者が悩むことなく、スケーラブルで高い基準に従う、リッチに機能して高い可用性のあるアプリケーションを構築することができます。 開発者は時間の経過とともに、システムをオフラインにすることなくアプリケーションを進化させ、また成長させることができます。 そして、データを陳腐化させてパーソナライズできなくするような、複雑なデータやページの事前計算を避けることができます。なぜならそれは、同時かつ定期的に変化するユーザーのデータに対する複雑なクエリーをしばしば必要とするからです。 Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [イントロダクション](#introduction) - [Vespaのオペレーション](#vespa-operations) - [ステートレス・コンテナー](#the-stateless-container) - [コンテント・クラスター](#content_clusters) - [管理と開発者のサポート](#administration_and_developer_support) - [サマリー](#summary) --- ### Dockerを使ったVespaのクイック・スタート このガイドではDockerを使って1台のマシン上にVespaをインストールして起動する方法を説明します。 #### Dockerを使ったVespaのクイック・スタート このガイドではDockerを使って1台のマシン上にVespaをインストールして起動する方法を説明します。 **必要条件**: - [Docker](https://docs.docker.com/engine/install/)がインストールされていること。 - [Git](https://git-scm.com/downloads)がインストールされていること。 - オペレーティング・システム: macOSまたはLinux - アーキテクチャ: x86\_64 - 少なくとも2GBのメモリがコンテナのインスタンスに割り当てられていること。 1. **[GitHub](https://github.com/vespa-engine/sample-apps)からVespaのサンプル・アプリケーションをcloneする:** ``` $ git clone https://github.com/vespa-engine/sample-apps.git $ export VESPA_SAMPLE_APPS=`pwd`/sample-apps ``` 2. **VespaのDockerコンテナを起動する:** ``` $ docker run --detach --name vespa --hostname vespa-container --privileged \ --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 vespaengine/vespa ``` `volume`オプションで、事前にダウンロードしたソースコードにDockerコンテナ内の`/vespa-sample-apps`としてアクセスできるようになります。 検索やフィード用のインタフェースにアクセスできるように、Dockerコンテナの外に`8080`ポートを公開します。 `vespa`の名前で同時に稼働できるDockerコンテナは1つまでです。必要あらば変更してください。 上記のコマンドの具体的なステップに興味がある場合は、[Dockerfile](https://github.com/vespa-engine/docker-image/blob/master/Dockerfile) と[起動スクリプト](https://github.com/vespa-engine/docker-image/blob/master/include/start-container.sh)を参照してください。 3. **設定サーバが起動するのを待つ - 200 OKのレスポンスを待つ:** ``` $ docker exec vespa bash -c 'curl -s --head http://localhost:19071/ApplicationStatus' ``` 4. **サンプル・アプリケーションをデプロイしてアクティベートする:** ``` $ docker exec vespa bash -c 'vespa-deploy prepare /vespa-sample-apps/basic-search/src/main/application/ && \ vespa-deploy activate' ``` さらなるサンプル・アプリケーションは[sample-apps](https://github.com/vespa-engine/sample-apps/tree/master)で見つけることができます。 [アプリケーション・パッケージ](../en/application-packages.html)のアプリケーションの項目を参照してください。 5. **アプリケーションがアクティブであることを確認する - 200 OKのレスポンスを待つ:** ``` $ curl -s --head http://localhost:8080/ApplicationStatus ``` 6. **ドキュメントをフィードする:** ``` $ curl -s -H "Content-Type:application/json" --data-binary @${VESPA_SAMPLE_APPS}/basic-search/music-data-1.json \ http://localhost:8080/document/v1/music/music/docid/1 | python -m json.tool $ curl -s -H "Content-Type:application/json" --data-binary @${VESPA_SAMPLE_APPS}/basic-search/music-data-2.json \ http://localhost:8080/document/v1/music/music/docid/2 | python -m json.tool ``` この例では[ドキュメントAPI](../en/reference/document-v1-api-reference.html)を使っています。 大規模なデータを高速にフィードするには[Java Feeding API](../en/vespa-feed-client.html)を使ってください。 7. **クエリーとドキュメント取得リクエストを実行する:** ``` $ curl -s http://localhost:8080/search/?query=bad | python -m json.tool ``` ``` $ curl -s http://localhost:8080/document/v1/music/music/docid/2 | python -m json.tool ``` ブラウザで[localhost:8080/search/?query=bad](http://localhost:8080/search/?query=bad)の結果を参照してください。 詳しくは[Query API](../en/query-api.html)を参照してください。 8. **終わったらクリーンアップする** ##### 次のステップ - このアプリケーションは完全に機能してプロダクションで使うことができますが、冗長性のために [ノードを追加](../en/operations-selfhosted/multinode-systems.html )した方がいいかもしれません。 - Vespaアプリケーションにあなた独自のJavaコンポーネントを追加するには、 [アプリケーションの開発](../en/developer-guide.html) を参照してください。 - [Vespa API](../en/api.html)はVespaのインタフェースの理解に役立つでしょう。 - [サンプル・アプリケーション](https://github.com/vespa-engine/sample-apps/tree/master)を眺めてみましょう。 - [Vespaのインストールをセキュア](../en/operations-selfhosted/securing-your-vespa-installation.html)にします。 - AWSで稼働させるには、[AWS EC2での複数ノードのクイック・スタート](../en/operations-selfhosted/multinode-systems.html#aws-ec2)または [AWS ECSでの複数ノードのクイック・スタート](../en/operations-selfhosted/multinode-systems.html#aws-ecs)を参照してください。 Copyright © 2025 - [Cookie Preferences](#) --- ## Federation ### Federation ![Federation example](/assets/img/federation-simple.svg) #### Federation ![Federation example](/assets/img/federation-simple.svg) The Vespa Container allows multiple sources of data to be _federated_ to a common search service. The sources of data may be both search clusters belonging to the same application, or external services, backed by Vespa or any other kind of service. The container may be used as a pure _federation platform_ by setting up a system consisting solely of container nodes federating to external services. This document gives a short intro to federation, explains how to create an application package doing federation and shows what support is available for choosing the sources given a query, and the final result given the query and some source specific results. _Federation_ allows users to access data from multiple sources of various kinds through one interface. This is useful to: - enrich the results returned from an application with auxiliary data, like finding appropriate images to accompany news articles. - provide more comprehensive results by finding data from alternative sources in the cases where the application has none, like back-filling web results. - create applications whose main purpose is not to provide access to some data set but to provide users or frontend applications a single starting point to access many kinds of data from various sources. Examples are browse pages created dynamically for any topic by pulling together data from external sources. The main tasks in creating a federation solution are: 1. creating connectors to the various sources 2. selecting the data sources which will receive a given query 3. rewriting the received request to an executable query returning the desired data from each source 4. creating the final result by selecting from, organizing and combining the returned data from each selected source The container aids with these tasks by providing a way to organize a federated execution as a set of search chains which can be configured through the application package. Read the [Container intro](jdisc/) and[Chained components](components/chained-components.html) before proceeding. Read about using [multiple schemas](schemas.html#multiple-schemas). Refer to the `com.yahoo.search.federation`[Javadoc](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/package-summary.html). ##### Configuring Providers A _provider_ is a search chain that produces data (in the form of a Result) from a data source. The provider must contain a Searcher which connects to the data source and produces a Result from the returned data. Configure a provider as follows: ``` ``` You can add multiple searchers in the provider just like in other chains. Search chains that provide data from some content cluster in the same application are also _providers_. To explicitly configure a provider talking to internal content clusters, set the attribute type="local" on the provider. That will automatically add the searchers necessary to talk to internal content clusters to the search chain. Example: querying this provider will not lowercase / stem terms: ``` ``` ##### Configuring Sources A single provider may be used to produce multiple kinds of results. To implement and present each kind of result, we can use _sources_. A _source_ is a search chain that provides a specific kind of result by extending or modifying the behavior of one or more providers. Suppose that we want to retrieve two kinds of results from my-provider: Web results and java API documentation: ``` ``` This results in two _source search chains_ being created,`web@my-provider` and `java-api@my-provider`. Each of them constitutes a source, namely `web` and `java-api` respectively. As the example suggests, these search chains are named after the source and the enclosing provider. The @-sign in the name should be read as _in_, so `web@my-provider` should for example be read as _web in my-provider_. The JavaApiSearcher is responsible for modifying the query so that we only get hits from the java API documentation. We added this searcher directly inside the source element; source search chains and providers are both instances of search chains. All the options for configuring regular search chains are therefore also available for them. How does the `web@my-provider`and `java-api@my-provider` source search chains use the`my-provider` provider to send queries to the external service? Internally, the source search chains _inherit_ from the enclosing provider. Since the provider contains searchers that know how to talk to the external service, the sources will also contain the same searchers. As an example, consider the "web" search chain; It will contain exactly the same searcher instances as the`my-provider` search chain. By organizing chains for talking to data providers, we can reuse the same connections and logic for talking to remote services ("providers") for multiple purposes ("sources"). The provider search chain `my-provider` is _not modified_ by adding sources. To verify this, try to send queries to the three search chains`my-provider`, `web@my-provider` and `java-api@my-provider`. ###### Multiple Providers per Source You can create a source that consists of source search chains from several providers. Effectively, this lets you vary which provider should be used to satisfy each request to the source: ``` ``` Here, the two source search chains `common-search@news-search` and`common-search@my-provider` constitutes a single source `common-search`. The source search chains using the `idref` attribute are called participants, while the ones using the `id` attribute are called leaders. Each source must consist of a single leader and zero or more participants. Per default, only the leader search chain is used when _federating_ to a source. To use one of the participants instead, use [sources](reference/query-api-reference.html#model.sources) and _source_: ``` http://[host]:[port]/?sources=common-search&source.common-search.provider=news-search ``` ##### Federation Now we can search both the web and the java API documentation at the same time, and get a combined result set back. We achieve this by setting up a _federation_ searcher: ``` ``` Inside the Federation element, we list the sources we want to use. Do not let the name _source_ fool you; If it behaves like a source, then you can use it as a source (i.e. all types of search chains including providers are accepted). As an example, try replacing the _web_ reference with _my-provider_. When searching, select a subset of the sources specified in the federation element by specifying the [sources](reference/query-api-reference.html#model.sources) query parameter. ##### Built-in Federation The built-in search chains _native_ and_vespa_ contain a federation searcher named _federation._This searcher has been configured to federate to: - All sources - All providers that does not contain a source If configuring your own federation searcher, you are not limited to a subset of these sources - you can use any provider, source or search chain. ##### Inheriting default Sources To get the same sources as the built-in federation searcher, inherit the default source set: ``` ... ``` ##### Changing content cluster chains With the information above, we can create a configuration where we modify the search chain sending queries to and receiving queries form a single content cluster (here, removing a searcher and adding another): ``` ``` ##### Timeout behavior What if we want to limit how much time a provider is allowed to use to answer a query? ``` ``` The provider search chain will then be limited to use 100 ms to execute each query. The Federation layer allows all providers to continue until the non-optional provider with the longest timeout is finished or canceled. In some cases it is useful to be able to keep executing the request to a provider longer than we are willing to wait for it in that particular query. This allows us to populate caches inside sources which can only meet the timeout after caches are populated. To use this option, specify a [request timeout](reference/services-search.html#federationoptions)for the provider: ``` ... ``` Also see [Searcher timeouts](searcher-development.html#timeouts). ##### Non-essential Providers Now let us add a provider that retrieves ads: ``` ``` Suppose that it is more important to return the result to the user as fast as possible, than to retrieve ads. To signal this, we mark the ads provider as _optional_: ``` ``` The Federation searcher will then only wait for ads as long as it waits for mandatory providers. If the ads are available in time, they are used, otherwise they are dropped. If only optional providers are selected for Federation, they will all be treated as mandatory. Otherwise, they would not get a chance to return any results. ##### Federation options inheritance The sources automatically use the same Federation options as the enclosing provider._override_ one or more of the federation options in the sources: ``` ``` You can use a single source in different Federation searchers. If you send queries with different cost to the same source from different federation searchers, you might also want to _override_ the federation options for when they are used: ``` ``` ##### Selecting Search Chains programmatically If we have complicated rules for when a search chain should be used, we can select search chains programmatically instead of setting up sources under federation in services.xml. The selection code is implemented as a[TargetSelector](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/federation/selection/TargetSelector.html). This TargetSelector is used by registering it on a federation searcher. ``` ``` package com.yahoo.example; import com.google.common.base.Preconditions; import com.yahoo.component.chain.Chain; import com.yahoo.processing.execution.chain.ChainRegistry; import com.yahoo.search.Query; import com.yahoo.search.Result; import com.yahoo.search.result.Hit; import com.yahoo.search.Searcher; import com.yahoo.search.federation.selection.FederationTarget; import com.yahoo.search.federation.selection.TargetSelector; import com.yahoo.search.searchchain.model.federation.FederationOptions; import java.util.Arrays; import java.util.Collection; class MyTargetSelector implements TargetSelector { @Override public Collection> getTargets(Query query, ChainRegistry searcherChainRegistry) { Chain searchChain = searcherChainRegistry.getComponent("my-chain"); Preconditions.checkNotNull(searchChain, "No search chain named 'my-chain' exists in services.xml"); return Arrays.asList(new FederationTarget<>(searchChain, new FederationOptions(), null)); } @Override public void modifyTargetQuery(FederationTarget target, Query query) { query.setHits(10); } @Override public void modifyTargetResult(FederationTarget target, Result result) { for (Hit hit: result.hits()) { hit.setField("my-field", "hello-world"); } } } ``` ``` The target selector chooses search chains for the federation searcher. In this example, MyTargetSelector.getTargets returns a single chain named "my-chain" that has been set up in `services.xml`. Before executing each search chain, the federation searcher allows the target selector to modify the query by calling modifyTargetQuery. In the example, the number of hits to retrieve is set to 10. After the search chain has been executed, the federation searcher allows the target selector to modify the result by calling modifyTargetResult. In the example, each hit gets a field called "my-field" with the value "hello-world". Configure a federation searcher to use a target selector in `services.xml`. Only a single target selector is supported. ``` ``` We can also set up both a target-selector and normal sources. The federation searcher will then send queries both to programmatically selected sources and those that would normally be selected without the target selector: ``` ... ``` ##### Example: Setting up a Federated Service A federation application is created by providing custom searcher components performing the basic federation tasks and combining these into chains in a federation setup in[services.xml](applications.html#services.xml). For example, this is a complete configuration which sets up a cluster of container nodes (having 1 node) which federates to the another Vespa service (news) and to some web service: ``` ``` This creates a configuration of search chains like: ![Federation example](/assets/img/federation.svg) Each provider _is_ a search chain ending in a Searcher forwarding the query to a remote service. In addition, there is a main chain (included by default) ending in a FederationSearcher, which by default forwards the query to all the providers in parallel. The provider chains returns their result upwards to the federation searcher which merges them into a complete result which is returned up the main chain. This services file, an implementation of the `example` classes (see below), and _[hosts.xml](reference/hosts.html)_listing the container nodes, is all that is needed to set up and[deploy](applications.html#deploy)an application federating to multiple sources. For a reference to these XML sections, see the [chains reference](reference/services-search.html#chain). The following sections outlines how this can be elaborated into a solution producing more user-friendly federated results. ###### Selecting Sources To do the best possible job of bringing relevant data to the user, we should send every query to all sources, and decide what data to include when all the results are available, and we have as much information as possible at hand. In general this is not advisable because of the resource cost involved, so we must select a subset based on information in the query. This is best viewed as a probabilistic optimization problem: The selected sources should be the ones having a high enough probability of being useful to offset the cost of querying it. Any Searcher which is involved in selecting sources or processing the entire result should be added to the main search chain, which was created implicitly in the examples above. To do this, the main chain should be created explicitly: ``` \\\ ``` This adds an explicit main chain to the configuration which has two additional searchers in addition to those inherited from the `native` chain, which includes the FederationSearcher. Note that if the full Vespa functionality is needed, the `vespa` chain should be inherited rather than `native`. The chain called `default` will be invoked if no searchChain parameter is given in the query. To learn more about creating Searcher components, see [searcher development](searcher-development.html). ###### Rewriting Queries to Individual Providers The _provider_ searchers are responsible for accepting a Query object, translating it to a suitable request to the backend in question and deserializing the response into a Result object. There is often a need to modify the query to match the particulars of a provider before passing it on: - To get results from the provider which matches the determined interpretation and intent as well as possible, the query may need to be rewritten using detailed information about the provider - Parameters beyond the basic ones supported by each provider searcher may need to be translated to the provider - There may be a need for provider specific business rules These query changes may range in complexity from setting a query parameter, applying some source specific information to the query or transferring all the relevant query state into a new object representation which is consumed by the provider searcher. This example shows a searcher adding a customer id to the `news` request: ``` ``` package com.yahoo.example; import com.yahoo.search.searchchain.Execution; import com.yahoo.search.*; public class NewsCustomerIdSearcher extends Searcher { @Override public Result search(Query query, Execution execution) { String customerId="provider.news.custid"; if (query.properties().get(customerId) == null) query.properties().set(customerId, "yahoo/test"); if (query.getTraceLevel() >= 3) query.trace("News provider: Will use " + customerId + "=" + query.properties().get(customerId), false, 3); return execution.search(query); } } ``` ``` This searcher should be added to the `news` source chain as shown above. You may have noticed that we have referred to the search chains talking to a service as a **provider**while referring to selection of **sources**. The reason for making this distinction is that it is sometimes useful to treat different kinds of processing of queries and results to/from the same service as different sources. Hence, it is possible to create `source` search chains in addition to the provider chains in _services.xml_. Each such source will refer to a provider (by inheriting the provider chain) but include some searchers specific to that source. Selection and routing of the query from the federation searchers is always to sources, not providers. By default, if no source tags are added in the provider, each provider implicitly creates a source by the same name. ###### Processing Results When we have selected the sources, created queries fitting to get results from each source and executed those queries, we have produced a result which contains a HitGroup per source containing the list of hits from that source. These results may be returned in XML as is, preserving the structure as XML, by requesting the [page](reference/page-result-format.html) result format: ``` http://[host]:[port]/search/?query=test&presentation.format=page ``` However, this is not suitable for presenting to the user in most cases. What we want to do is select the subset of the hits having the highest probable utility to the user, organized in a way that maximizes the user experience. This is not an easy task, and we will not attempt to solve it here, other than noting that any solution should make use of both the information in the intent model and the information within the results from each source, and that this is a highly connected optimization problem because the utility of including some data in the result depends on what other data is included. Here we will just use a searcher which shows how this is done in principle, this searcher flattens the news and web service hit groups into a single list of hits, where only the highest ranked news ones are included: ``` ``` package com.yahoo.example; import com.yahoo.search.*; import com.yahoo.search.result.*; import com.yahoo.search.searchchain.Execution; public class ResultBlender extends Searcher { @Override public Result search(Query query,Execution execution) { Result result = execution.search(query); HitGroup news = (HitGroup)result.hits().remove("source:news"); HitGroup webService = (HitGroup)result.hits().remove("source:webService"); if (webService == null) return result; result.hits().addAll(webService.asList()); if (news == null) return result; for (Hit hit : news.asList()) if (shouldIncludeNewsHit(hit)) result.hits().add(hit); return result; } private boolean shouldIncludeNewsHit(Hit hit) { if (hit.isMeta()) return true; if (hit.getRelevance().getScore() > 0.7) return true; return false; } } ``` ``` The optimal result to return to the user is not necessarily one flattened list. In some cases it may be better to keep the source organization, or to pick some other organization. The [page result format](reference/page-result-format.html)requested in the query above is able to represent any hierarchical organization as XML. A more realistic version of this searcher will use that to choose between some predefined layouts which the frontend in question knows how to handle, and choose some way of grouping the available hits suitable for the selected layout. This searcher should be added to the main (`default`) search chain in_services.xml_ together with the SourceSelector (the order does not matter). ###### Unit Testing the Result Processor Unit test example for the Searcher above: ``` ``` package com.yahoo.search.example.test; import org.junit.Test; import com.yahoo.search.searchchain.*; import com.yahoo.search.example.ResultBlender; import com.yahoo.search.*; import com.yahoo.search.result.*; public class ResultBlenderTestCase { @Test public void testBlending() { Chain chain = new Chain(new ResultBlender(), new MockBackend()); Context context = Execution.Context.createContextStub(null); Result result = new Execution(chain, context).search(new Query("?query=test")); assertEquals(4, result.hits().size()); assertEquals("webService:1", result.hits().get(0).getId().toString()); assertEquals("news:1", result.hits().get(1).getId().toString()); assertEquals("webService:2", result.hits().get(2).getId().toString()); assertEquals("webService:3", result.hits().get(3).getId().toString()); } private static class MockBackend extends Searcher { @Override public Result search(Query query,Execution execution) { Result result = new Result(query); HitGroup webService = new HitGroup("source:webService"); webService.add(new Hit("webService:1",0.9)); webService.add(new Hit("webService:2",0.7)); webService.add(new Hit("webService:3",0.5)); result.hits().add(webService); HitGroup news = new HitGroup("source:news"); news.add(new Hit("news:1",0.8)); news.add(new Hit("news:2",0.6)); news.add(new Hit("news:3",0.4)); result.hits().add(news); return result; } } } ``` ``` This shows how a search chain can be created programmatically, with a mock backend producing results suitable for exercising the functionality of the searcher being tested. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Configuring Providers](#configuring-providers) - [Configuring Sources](#configuring-sources) - [Multiple Providers per Source](#multiple-providers-per-source) - [Federation](#federation) - [Built-in Federation](#built-in-federation) - [Inheriting default Sources](#inheriting-default-sources) - [Changing content cluster chains](#changing-content-cluster-chains) - [Timeout behavior](#timeout-behavior) - [Non-essential Providers](#non-essential-providers) - [Federation options inheritance](#federation-options-inheritance) - [Selecting Search Chains programmatically](#selecting-search-chains-programmatically) - [Example: Setting up a Federated Service](#setting-up-a-federated-service) - [Selecting Sources](#selecting-sources) - [Rewriting Queries to Individual Providers](#rewriting-queries-to-individual-providers) - [Processing Results](#processing-results) - [Unit Testing the Result Processor](#unit-testing-the-result-processor) --- ## Feed Block ### Feed block A content cluster blocks external write operations when at least one content node has reached the [resource limit](../reference/services-content.html#resource-limits) of disk or memory. #### Feed block A content cluster blocks external write operations when at least one content node has reached the [resource limit](../reference/services-content.html#resource-limits) of disk or memory. This is done to avoid saturating resource usage on content nodes. The _Cluster controller_ monitors the resource usage of the content nodes and decides whether to block feeding. Transient resource usage (see details in the metrics below) is not included in the monitored usage. This ensures that transient resource usage is covered by the resource headroom on the content nodes, instead of leading to feed blocked due to natural fluctuations. **Note:** When running Vespa in a Docker image on a laptop, one can easily get `[UNKNOWN(251009) @ tcp/vespa-host:19112/default]: ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': disk on node 0 [vespa-host] is 76.7% full (the configured limit is 75.0%, effective limit lowered to 74.0% until feed unblocked)`. Fix this by increasing allocated storage for the Docker daemon, clean up unused volumes or remove unused Docker images. HTTP clients will see _507 Server Error: Insufficient Storage_ when this happens. When feed is blocked, write operations are rejected by _Distributors_. All Put operations and most Update operations are rejected. These operations are still allowed: - Remove operations - Update [assign](../reference/document-json-format.html#assign) operations to numeric single-value fields To remedy, add nodes to the content cluster. The data will [auto-redistribute](../elasticity.html), and feeding is unblocked when all content nodes are below the limits. For self-managed Vespa you can configure [resource-limits](../reference/services-content.html#resource-limits), although this is not recommended. Increasing them too much might lead to OOM and content nodes being unable to start. **Important:** Always **add** nodes, do not change node capacity - this is in practise safer and quicker. As most Vespa applications are set up on homogeneous nodes, changing node capacity can cause a full node set swap and more data copying than just adding more nodes of the same kind. Copying data will in itself stress nodes, adding one node is normally the smallest and safest change. These [metrics](metrics.html) are used to monitor resource usage and whether feeding is blocked: | cluster-controller.resource\_usage.nodes\_above\_limit | The number of content nodes that are above one or more resource limits. When above 0, feeding is blocked. | | content.proton.resource\_usage.disk | A number between 0 and 1, indicating how much disk (of total available) is used on the content node. Transient disk used during [disk index fusion](../proton.html#disk-index-fusion) is not included. | | content.proton.resource\_usage.memory | A number between 0 and 1, indicating how much memory (of total available) is used on the content node. Transient memory used by [memory indexes](../proton.html#memory-index-flush) is not included. | When feeding is blocked, error messages are returned in write operation replies - example: ``` ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': memory on node 0 [my-vespa-node-0.example.com] is 82.0% full (the configured limit is 80.0%, effective limit lowered to 79.0% until feed unblocked)) ``` Note that when feeding is blocked resource usage needs to decrease below another, lower limit before getting unblocked. This is to avoid flip-flopping between blocking and unblocking feed when being near the limit. This lower limit is 1% lower than the configured limit. The address space used by data structures in attributes (_Multivalue Mapping_, _Enum Store_, and _Tensor Store_) can also go full and block feeding - see [attribute data structures](../attributes.html#data-structures) for details. This will rarely happen. The following metric is used to monitor address space usage: | content.proton.documentdb.attribute.resource\_usage.address\_space.max | A number between 0 and 1, indicating how much address space is used by the worst attribute data structure on the content node. | An error is returned when the address space limit (default value is 0.90) is exceeded: ``` ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: in content cluster 'example': attribute-address-space:example.ready.a1.enum-store on node 0 [my-vespa-node-0.example.com] is 91.0% full (the configured limit is 90.0%)) ``` To remedy, add nodes to the content cluster to distribute documents with attributes over more nodes. Copyright © 2025 - [Cookie Preferences](#) --- ## Files Processes And Ports ### Files, Processes, Ports, Environment This is a reference of directories used in a Vespa installation, processes that run on the Vespa nodes and ports / environment variables used. #### Files, Processes, Ports, Environment This is a reference of directories used in a Vespa installation, processes that run on the Vespa nodes and ports / environment variables used. Also see [log files](/en/reference/logs.html). ##### Directories | Directory | Description | | --- | --- | | $VESPA\_HOME/bin/ | Command line utilities and scripts | | $VESPA\_HOME/libexec/vespa/ | Command line utilities and scripts | | $VESPA\_HOME/sbin/ | Server programs, daemons, etc | | $VESPA\_HOME/lib64/ | Dynamically linked libraries, typically third-party libraries | | $VESPA\_HOME/lib/jars/ | Java archives | | $VESPA\_HOME/logs/vespa/ | Log files | | $VESPA\_HOME/var/db/vespa/config\_server/serverdb/ | Config server database and user applications | | $VESPA\_HOME/share/vespa/ | A directory with config definitions and XML schemas for application package validation | | $VESPA\_HOME/conf/vespa | Various config files used by Vespa or libraries Vespa depend on | ##### Processes and ports The following is an overview of which ports and port ranges are used by the different services in a Vespa system. Note that for services capable of running multiple instances on the same node, all instances will run within the listed port range. Processes are run as user `vespa`. Many services are allocated ports dynamically. So even though the allocation is deterministic, i.e. the same system will get the same ports on subsequent startups, a particular service instance may get different ports when the overall system setup is changed through [services.xml](/en/reference/services.html). Use [vespa-model-inspect](/en/operations-selfhosted/vespa-cmdline-tools.html#vespa-model-inspect) to see port allocations. - The number of ports used in a range depends on number of instances that are running - Not all ports within a range are used, but they are assigned each service to support future extensions - The range from 19100 is used for internal communication ports, i.e. ports that are not necessary to use from an external API - See [Configuring Http Servers and Filters](../jdisc/http-server-and-filters.html) for how to configure Container ports and [services.xml](/en/reference/services.html) for how to configure other ports | Process | Host | Port/range | ps | Function | | --- | --- | --- | --- | --- | | [Config server](/en/operations-selfhosted/configuration-server.html) | Config server nodes | 19070-19071 | java (...) -jar $VESPA\_HOME/lib/jars/standalone-container-jar-with-dependencies.jar | Vespa Configuration server | | 2181-2183 | | Embedded Zookeeper cluster ports, see [zookeeper-server.def](https://github.com/vespa-engine/vespa/blob/master/configdefinitions/src/vespa/zookeeper-server.def) | | [Config sentinel](/en/operations-selfhosted/config-sentinel.html) | All nodes | 19098 | $VESPA\_HOME/sbin/vespa-config-sentinel | Sentinel that starts and stops vespa services and makes sure they are running unless they are manually stopped | | [Config proxy](/en/operations-selfhosted/config-proxy.html) | All nodes | 19090 | java (…) com.yahoo.vespa.config.proxy.ProxyServer | Communication liaison between Vespa processes and config server. Caches config in memory | | [Slobrok](/en/operations-selfhosted/slobrok.html) | Admin nodes | 19099 for RPC port, HTTP port dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-slobrok | Service location object broker | | [logd](/en/reference/logs.html#logd) | All nodes | 19089 | $VESPA\_HOME/sbin/vespa-logd | Reads local log files and sends them to log server | | [Log server](/en/reference/logs.html#log-server) | Log server node | 19080 | java (...) -jar lib/jars/logserver-jar-with-dependencies.jar | Vespa Log server | | [Metrics proxy](/en/operations-selfhosted/monitoring.html#metrics-proxy) | All nodes | 19092-19095 | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-with-dependencies.jar | Provides a single access point for metrics from all services on a Vespa node | | [Distributor](/en/content/content-nodes.html#distributor) | Content cluster | dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-distributord-bin | Content layer distributor processes | | [Cluster controller](/en/content/content-nodes.html#cluster-controller) | Content cluster | 19050, plus ports dynamically allocated in the 19100-19899 range | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-jar-with-dependencies.jar | Cluster controller processes, manages state for content nodes | | [proton](/en/proton.html) | Content cluster | dynamically allocated in the 19100-19899 range | $VESPA\_HOME/sbin/vespa-proton-bin | Searchnode process, receives queries from the container and returns results from the indexes. Also receives feed and indexes documents | | [container](/en/jdisc/index.html) | Container cluster | 8080 | java (...) -jar $VESPA\_HOME/lib/jars/container-disc-with-dependencies.jar | Container running servers, handlers and processing components | ##### System limits The [startup scripts](/en/operations-selfhosted/admin-procedures.html#vespa-start-stop-restart) checks that system limits are set, failing startup if not. Refer to [vespa-configserver.service](https://github.com/vespa-engine/vespa/blob/master/vespabase/src/vespa-configserver.service.in) and [vespa.service](https://github.com/vespa-engine/vespa/blob/master/vespabase/src/vespa.service.in) for minimum values. ##### Core dumps Example settings: ``` $ mkdir -p /tmp/cores && chmod a+rwx /tmp/cores $ echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern ``` This will write files like _/tmp/cores/core.vespa-proton-bi.1721.localhost.1580387387_. ##### Environment variables Vespa configuration is set in [application packages](/en/application-packages.html). Some configuration is used to bootstrap nodes - this is set in environment variables. Environment variables are only read at startup. _$VESPA\_HOME/conf/vespa/default-env.txt_ is read in Vespa start scripts - use this to modify variables ([example](/en/operations-selfhosted/multinode-systems.html#aws-ec2)). Each line has the format `action variablename value` where the items are: | Item | Description | | --- | --- | | action | One of `fallback`, `override`, or `unset`. `fallback` sets the variable if it is unset (or empty). `override` set the value regardless. `unset` unsets the variable. | | variablename | The name of the variable, e.g. `VESPA_CONFIGSERVERS` | | value | The rest of the line is the variable's value. | Refer to the [template](https://github.com/vespa-engine/vespa/blob/master/vespabase/conf/default-env.txt.in) for format. | Environment variable | Description | | --- | --- | | VESPA\_CONFIGSERVERS | A comma-separated list of hosts to run configservers, use fully qualified hostnames. Should always be set to the same value on all hosts in a multi-host setup. If not set, `localhost` is assumed. Refer to [configuration server operations](/en/operations-selfhosted/configuration-server.html). | | VESPA\_HOSTNAME | Vespa uses `hostname` for node identity. But sometimes this doesn't work properly, either because that name can't be used to find an IP address which works for connecting to services running on the node, or it's just that the name doesn't agree with what the config server thinks the node's host name is. In this case, override by setting the `VESPA_HOSTNAME`, to be used instead of running the `hostname` command. Note that `VESPA_HOSTNAME` will be used _both_ when a node identifies itself to the config server _and_ when a service on that node registers a network connection point that other services can connect to. An error message with "hostname detection failed" is emitted if the `VESPA_HOSTNAME` isn't set and the hostname isn't usable. If `VESPA_HOSTNAME` is set to something that cannot work, an error with "hostname validation failed" is emitted instead. | | VESPA\_CONFIG\_SOURCES | Used by libraries like the [Document API](/en/document-api-guide.html) to set config server endpoints. Refer to [configuration server operations](/en/operations-selfhosted/configuration-server.html#configuration) for example use. | | VESPA\_WEB\_SERVICE\_PORT | The port number where REST apis will run, default `8080`. This isn't strictly needed, as the port number can be set for each HTTP server in `services.xml`, but with a big application it can be easier to set the default port number just once. Also note that this needs to be set when starting the _configserver_, since the REST api implementation gets its port number from there. | | VESPA\_TLS\_CONFIG\_FILE | Absolute path to [TLS configuration file](/en/operations-selfhosted/mtls.html). | | VESPA\_CONFIGSERVER\_JVMARGS | JVM arguments for the config server - see [tuning](/en/performance/container-tuning.html#config-server-and-config-proxy). | | VESPA\_CONFIGPROXY\_JVMARGS | JVM arguments for the config proxy - see [tuning](/en/performance/container-tuning.html#config-server-and-config-proxy). | | VESPA\_LOG\_LEVEL | Tuning of log output from tools, see [controlling log levels](/en/reference/logs.html#controlling-log-levels). | Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Directories](#directories) - [Processes and ports](#processes-and-ports) - [System limits](#vespa-system-limits) - [Core dumps](#core-dumps) - [Environment variables](#environment-variables) --- ## Geo Search ### Geo Search To model a geographical position in documents, use a field where the type is [position](reference/schema-reference.html#position) for a single, required position. #### Geo Search To model a geographical position in documents, use a field where the type is [position](reference/schema-reference.html#position) for a single, required position. To allow any number of positions (including none at all) use `array` instead. This can be used to limit hits (only those documents with a position inside a circular area will be hits), the distance from a point can be used as input to ranking functions, or both. A geographical point in Vespa is specified using the geographical [latitude](https://en.wikipedia.org/wiki/Latitude) and [longitude](https://en.wikipedia.org/wiki/Longitude). As an example, a location in [Sunnyvale, California](https://www.google.com/maps/place/721+1st+Ave,+Sunnyvale,+CA+94089/@37.4181488,-122.0256157,12z) could be latitude 37.4181488 degrees North, longitude 122.0256157 degrees West. This would be represented as `{ "lat": 37.4181488, "lng": -122.0256157 }` in JSON. As seen above, positive numbers are used for north (latitudes) and east (longitudes); negative numbers are used for south and west. This is the usual convention. **Note:** Old formats for position (those used in Vespa 5, 6, and 7) are still accepted as feed input; enabling legacy output is temporarily possible also. See[legacy flag v7-geo-positions](reference/default-result-format.html#geo-position-rendering). ##### Sample schema and document A sample schema could be a business directory, where every business has a position (for its main office or contact point): ``` schema biz { document biz { field title type string { indexing: index } field mainloc typeposition{ indexing: attribute | summary } } fieldset default { fields: title } } ``` Using this schema is one possible business entry with its location: ``` ``` { "put": "id:mynamespace:biz::business-1", "fields": { "title": "Yahoo Inc (main office)", "mainloc": { "lat": 37.4181488, "lng": -122.0256157 } } } ``` ``` ##### Restrict The API for adding a geographical restriction is to use a [geoLocation](reference/query-language-reference.html#geolocation) clause in the YQL statement, specifying a point and a maximum distance from that point: ``` $ curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where title contains \"office\" and geoLocation(mainloc, 37.416383, -122.024683, \"20 miles\")"}' \ http://localhost:8080/search/ ``` One can also build or modify the query programmatically by adding a [GeoLocationItem](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/GeoLocationItem.html) anywhere in the query tree. To use a position for ranking only (without _any_ requirement for a matching position), specify it as a ranking-only term. Use the [rank()](reference/query-language-reference.html#rank) operation in YQL for this, or a [RankItem](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/RankItem.html) when building the query programmatically. At the _same time_, specify a negative radius (for example `-1 m`). This matches any position, and computes distance etc. for the closest position in the document. Example: ``` $ curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where rank(title contains \"office\", geoLocation(mainloc, 37.416383, -122.024683, \"-1 m\"))"}' \ http://localhost:8080/search/ ``` ##### Ranking from a position match The main rank feature to use for the example above would be [distance(mainloc).km](reference/rank-features.html#distance(name).km) and doing further calculation on it, giving better rank to documents that are closer to the wanted (query) position. Here one needs to take into consideration what sort of distances is practical; traveling on foot, by car, or by plane should have quite different ranking scales - using different rank profiles would be one natural way to support that. If the query specifies a maximum distance, that could be sent as an input to ranking as well, and used for scaling. There is also a [closeness(mainloc)](reference/rank-features.html#closeness(name)) which goes from 1.0 at the exact location to 0.0 at a tunable maximum distance, which is enough for many needs. ###### Useful summary-features To do further processing, it may be useful to get the computed distance back. The preferred way to do this is to use the associated rank features as [summary-features](reference/schema-reference.html#summary-features). In particular, [distance(_fieldname_).km](reference/rank-features.html#distance(name).km) gives the geographical distance in kilometers, while [distance(_fieldname_).latitude](reference/rank-features.html#distance(name).latitude) and [distance(_fieldname_).longitude](reference/rank-features.html#distance(name).longitude) gives the geographical coordinates for the best location directly, in degrees. These are easy to use programmatically from a searcher, accessing [feature values in results](ranking-expressions-features.html#accessing-feature-function-values-in-results) for further processing. **Note:**`geoLocation` doesn't do proper great-circle-distance calculations. It works well for 'local' search (city or metro area), using simpler distance calculations. For positions which are very distant or close to the international date line (e.g. the Bering sea), the computed results may be inaccurate. ##### Using multiple position fields For some applications, it can be useful to have several position attributes that may be searched. For example, we could expand the above examples with the locations of subsidiary offices: ``` schema biz { document biz { field title type string { indexing: index } field mainloc typeposition{ indexing: attribute | summary } field otherlocs typearray\{ indexing: attribute } } fieldset default { fields: title } } ``` Expanding the example business with an office in Australia and one in Norway could look like: ``` ``` { "put": "id:mynamespace:biz::business-1", "fields": { "title": "Yahoo Inc (some offices)", "mainloc": { "lat": 37.4, "lng": -122.0 }, "otherlocs": [ { "lat": -33.9, "lng": 151.2 }, { "lat": 63.4, "lng": 10.4 } ] } } ``` ``` A single query item can only search in one of the position attributes. For a search that spans several fields, use YQL to combine several `geoLocation` items inside an `or` clause, or combine several fields into a combined array field (so in the above example, one could duplicate the "mainloc" position into the "otherlocs" array as well, possibly changing the name from "otherlocs" to "all\_locs"). ##### Example with airport positions To give some more example positions, here is a list of some airports with their locations in JSON format: | Airport code | City | Location | | --- | --- | --- | | SFO | San Francisco, USA | { "lat": 37.618806, "lng": -122.375416 } | | LAX | Los Angeles, USA | { "lat": 33.942496, "lng": -118.408048 } | | JFK | New York, USA | { "lat": 40.639928, "lng": -73.778692 } | | LHR | London, UK | { "lat": 51.477500, "lng": -0.461388 } | | SYD | Sydney, Australia | { "lat": -33.946110, "lng": 151.177222 } | | TRD | Trondheim, Norway | { "lat": 63.457556, "lng": 10.924250 } | | OSL | Oslo, Norway | { "lat": 60.193917, "lng": 11.100361 } | | GRU | São Paulo, Brazil | { "lat": -23.435555, "lng": -46.473055 } | | GIG | Rio de Janeiro, Brazil | { "lat": -22.809999, "lng": -43.250555 } | | BLR | Bangalore, India | { "lat": 13.198867, "lng": 77.705472 } | | FCO | Rome, Italy | { "lat": 41.804475, "lng": 12.250797 } | | NRT | Tokyo, Japan | { "lat": 35.765278, "lng": 140.385556 } | | PEK | Beijing, China | { "lat": 40.073, "lng": 116.598 } | | CPT | Cape Town, South Africa | { "lat": -33.971368, "lng": 18.604292 } | | ACC | Accra, Ghana | { "lat": 5.605186, "lng": -0.166785 } | | TBU | Nuku'alofa, Tonga | { "lat": -21.237999, "lng": -175.137166 } | ##### Distance to path This example provides an overview of the [DistanceToPath](reference/rank-features.html#distanceToPath(name).distance) rank feature. This feature matches _document locations_ to a path given in the query. Not only does this feature return the closest distance for each document to the path, it also includes the length traveled _along_ the path before reaching the closest point, or _intersection_. This feature has been nick named the _gas_ feature because of its obvious use case of finding gas stations along a planned trip. In this example we have been traveling from the US to Bangalore, and we are now planning our trip back. We have decided to rent a car in Bangalore that we are to return upon arrival at the airport in Chennai. We are already quite hungry and wish to stop for a meal once we are outside of town. To avoid having to pay an additional fueling premium, we also wish to refuel just before reaching the airport. We need to figure out what roads to take, what restaurants are available outside of Bangalore, and what fuel stations are available once we get close to Chennai. Here we have plotted our trip from Bangalore to the airport: ![Trip from Bangalore to the airport](/assets/img/geo/path1.png) If we search for restaurants along the path, we only see a small subset of all restaurants present in the window of our quite large map. Here you see how the most relevant results are actually all in Bangalore or Chennai: ![Most relevant results](/assets/img/geo/path2.png) To find the best results, move the map window to just about where we expect to be eating, and redo the search: ![redo search with adjusted map](/assets/img/geo/path3.png) This has to be done similarly for finding a gas station near the airport. This illustrates searching for restaurants in a smaller window along the planned trip without _DistanceToPath_. Next, we outline how _DistanceToPath_ can be used to quickly and easily improve this type of planning to be more convenient for the user. The nature of this feature requires that the search corpus contains documents with position data. A [searcher component](searcher-development.html) needs to be written that is able to pass paths with the queries that lie in the same coordinate space as the searchable documents. Finally, a [rank-profile](ranking.html) needs to defined that scores documents according to how they match some target distance traveled and at the same time lies close "enough" to the path. ###### Query Syntax This document does not describe how to write a searcher plugin for the Container, refer to the [container documentation](searcher-development.html). However, let us review the syntax expected by _DistanceToPath_. As noted in the [rank features reference](reference/rank-features.html#distanceToPath(name).distance), the path is supplied as a query parameter by name of the feature and the `path` keyword: ``` yql=(…)&rankproperty.distanceToPath(_name_).path=(x1,y1,x2,y2,…,xN,yN) ``` Here `name` has to match the name of the position attribute that holds the positions data. The path itself is parsed as a list of `N` coordinate pairs that together form `N-1` line segments: $$(x\_1,y\_1) \rightarrow (x\_2,y\_2), (x\_2,y\_2) \rightarrow (x\_3,y\_3), (…), (x\_{N-1},y\_{N-1}) \rightarrow (x\_N,y\_N)$$ **Note:** The path is _not_ in a readable (latitude, longitude) format, but is a pair of integers in the internal format (degrees multiplied by 1 million). If a transform is required from geographic coordinates to this, the search plugin must do it; note that the first number in each pair (the 'x') is longitude (degrees East or West) while the second ('y') is latitude (degrees North or South), corresponding to the usual orientation for maps - _opposite_ to the usual order of latitude/longitude. ###### Rank profile If we were to disregard our scenario for a few moments, we could suggest the following rank profile: ``` rank-profile default { first-phase { expression: nativeRank } second-phase { expression: firstPhase * if (distanceToPath(ll).distance < 10000, 1, 0) } } ``` This profile will first rank all documents according to Vespa's _nativeRank_ feature, and then do a second pass over the top 100 results and order these based on their distance to our path. If a document lies within 100 metres of our path it retains its relevancy, otherwise its relevancy is set to 0. Such a rank profile would indeed solve the current problem, but Vespa's ranking model allows for us to take this a lot further. The following is a rank profile that ranks documents according to a query-specified target distance to path and distance traveled: ``` rank-profile default { first-phase { expression { max(0, query(distance) - distanceToPath(ll).distance) * (1 - fabs(query(traveled) - distanceToPath(ll).traveled)) } } } ``` The expression is two-fold; a first component determines a rank based on the document's distance to the given path as compared to the [query parameter](reference/ranking-expressions.html)`distance`. If the allowed distance is exceeded, this component's contribution is 0. The distance contribution is then multiplied by the difference of the actual distance traveled as compared to the query parameter `traveled`. In short, this profile will include all documents that lie close enough to the path, ranked according to their actual distance and traveled measure. **Note:**_DistanceToPath_ is only compatible with _2D coordinates_ because pathing in 1 dimension makes no sense. ###### Results For the sake of this example, assume that we have implemented a custom path searcher that is able to pass the path found by the user's initial directions query to Vespa's [query syntax](#query-syntax). There are then two more parameters that must be supplied by the user; `distance` and `traveled`. Vespa expects these parameters to be supplied in a scale compatible with the feature's output, and should probably also be mapped by the container plugin. The feature's _distance_ output is given in Vespa's internal resolution, which is approximately 10 units per meter. The _traveled_ output is a normalized number between 0 and 1, where 0 represents the beginning of the path, and 1 is the end of the path. This illustrates how these parameters can be used to return the most appropriate hits for our scenario. Note that the figures only show the top hit for each query: ![Top tip 1](/assets/img/geo/path4.png) ![Top tip 2](/assets/img/geo/path5.png) 1. Searching for restaurants with the DistanceToPath feature. `distance = 1000, traveled = 0.1` 2. Searching for gas stations with the DistanceToPath feature. `distance = 1000, traveled = 0.9` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Sample schema and document](#sample-schema-and-document) - [Restrict](#restrict) - [Ranking from a position match](#ranking-from-a-position-match) - [Useful summary-features](#useful-summary-features) - [Using multiple position fields](#using-multiple-position-fields) - [Example with airport positions](#example-with-airport-positions) - [Distance to path](#distance-to-path) - [Query Syntax](#query-syntax) - [Rank profile](#rank-profile) - [Results](#results) --- ## Getting Started Ranking ### Getting started with ranking Learn how [ranking](ranking.html) works in Vespa by using the open [query API](query-api.html) of[vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search). #### Getting started with ranking Learn how [ranking](ranking.html) works in Vespa by using the open [query API](query-api.html) of[vespa-documentation-search](https://github.com/vespa-cloud/vespa-documentation-search). In this article, find a set of queries invoking different `rank-profiles`, which is the ranking definition. Ranking is the user-defined computation that scores documents to a query, here configured in [doc.sd](https://github.com/vespa-cloud/vespa-documentation-search/blob/main/src/main/application/schemas/doc.sd), also see [schema documentation](schemas.html). This schema has a set of (contrived) ranking functions, to help learn Vespa ranking. ##### Ranking using document features only Let's start with something simple: _Irrespective of the query, score all documents by the number of in-links to it_. That is, for any query, return the documents with most in-links first in the result set (these queries are clickable!): yql=select \* from doc where true&ranking=inlinks The score, named `relevance` in query results, is the size of the `inlinks` attribute array in the document, as configured in the `expression`: ``` rank-profile inlinks { first-phase { expression: attribute(inlinks).count } summary-features { attribute(inlinks).count } } ``` Count the number of entries in `inlinks` in the result and compare with `relevance` - it will be the same. Observe that the ranking expression does not use any features from the query, it only uses `attribute(inlinks).count`, which is a [document feature](reference/rank-features.html#document-features). ##### Observing values used in ranking When developing ranking expressions, it is useful to observe the input values. Output the input values using [summary-features](reference/schema-reference.html#summary-features). In this experiment, we will use another rank function, still counting in-links but scoring older documents lower: $$ num\_inlinks \* {decay\_const}^{doc\_age\_seconds/3600} $$ Notes: - use of the `now` [ranking feature](reference/rank-features.html) - use `pow`, a mathematical function in [ranking expressions](reference/ranking-expressions.html) - use of constants and functions to write better code yql=select \* from doc where true&ranking=inlinks\_age ``` rank-profile inlinks_age { first-phase { expression: rank_score } summary-features { attribute(inlinks).count attribute(last_updated) now doc_age_seconds age_decay num_inlinks rank_score } constants { decay_const: 0.9 } function doc_age_seconds() { expression: now - attribute(last_updated) } function age_decay() { expression: pow(decay_const, doc_age_seconds/3600) } function num_inlinks() { expression: attribute(inlinks).count } function rank_score() { expression: num_inlinks * age_decay } } ``` In the query results, here we observe a document with 27 in-links, 9703 seconds old, get at relevance at 20.32 (the age of documents will vary with query time): ``` "relevance": 20.325190122213748, ... "summaryfeatures": { "attribute(inlinks).count": 27.0, "attribute(last_updated)": 1.615971522E9, "now": 1.615981225E9, "rankingExpression(age_decay)": 0.7527848193412499, "rankingExpression(doc_age_seconds)": 9703.0, "rankingExpression(num_inlinks)": 27.0, "rankingExpression(rank_score)": 20.325190122213748, } ``` Using `summary-features` makes it easy to validate and develop the ranking expression. ##### Ranking with query features Let's assume we want to find similar documents, and we define document similarity as having the same number of words. From most perspectives, this is a poor similarity function, better functions are described later. The documents have a `term_count` field - so let's add an [input.query()](reference/query-api-reference.html#ranking.features) for term count: yql=select \* from doc where true;&ranking=term\_count\_similarity&input.query(q\_term\_count)=1000 $$ 1 - \frac{fabs(attribute(term\_count) - query(q\_term\_count))}{1 + attribute(term\_count) + query(q\_term\_count)} $$ ``` rank-profile term_count_similarity { first-phase { expression { 1 - fabs( attribute(term_count) - query(q_term_count) ) / (1 + attribute(term_count) + query(q_term_count) ) } } summary-features { attribute(term_count) query(q_term_count) } } ``` This rank function will score documents [0-1\>, closer to 1 is more similar: ``` "relevance": 0.9985029940119761, ... "summaryfeatures": { "attribute(term_count)": 1003.0, "query(q_term_count)": 1000.0, } ``` The key learning here is how to transfer ranking features in the query, using `input.query()`. Use different names for more query features. ##### Ranking with a query tensor Another similarity function can be overlap in in-links. We will map the inlinks [weightedset](reference/schema-reference.html#weightedset) into a[tensor](reference/schema-reference.html#tensor), query with a tensor of same type and create a scalar using a tensor product as the rank score. We use a [mapped](reference/tensor.html#general-literal-form) query tensor, where the document name is the address in the tensor, using a value of 1 for each in-link: ``` { {links:/en/query-profiles.html}:1, {links:/en/page-templates.html}:1, {links:/en/overview.html}:1 } ``` **Important:** Vespa cannot know the query tensor type from looking at it - it must be configured using [inputs](reference/schema-reference.html#inputs). As the in-link data is represented in a weightedset, we use the [tensorFromWeightedSet](reference/rank-features.html#document-features)rank feature to transform it into a tensor named _links_: ``` rank-profile inlink_similarity { inputs { query(links) tensor(links{}) } first-phase { expression: sum(tensorFromWeightedSet(attribute(inlinks), links) * query(links)) } summary-features { query(links) tensorFromWeightedSet(attribute(inlinks), links) } } ``` yql=select \* from doc where true&ranking=inlink\_similarity&input.query(links)={ {links:/en/query-profiles.html}:1, {links:/en/page-templates.html}:1, {links:/en/overview.html}:1 } Inspect relevance and summary-features: ``` "relevance": 2.0 ... "summaryfeatures": { "query(links)": { "type": "tensor(links{})", "cells": [ { "address": { "links": "/en/query-profiles.html" }, "value": 1 }, { "address": { "links": "/en/page-templates.html" }, "value": 1 }, { "address": { "links": "/en/overview.html" }, "value": 1 } ] }, "tensorFromWeightedSet(attribute(inlinks),links)": { "type": "tensor(links{})", "cells": [ { "address": { "links": "/en/page-templates.html" }, "value": 1 }, { "address": { "links": "/en/jdisc/container-components.html" }, "value": 1 }, { "address": { "links": "/en/query-profiles.html" }, "value": 1 } ] } } ``` Here, the tensors have one dimension, so they are vectors - the sum of the tensor product is hence the doc product. As all values are 1, all products are 1 and the sum is 2: | document | query | value | | --- | --- | --- | | /en/jdisc/container-components.html | | 0 | | | /en/overview.html | 0 | | /en/page-templates.html | /en/page-templates.html | 1 | | /en/query-profiles.html | /en/query-profiles.html | 1 | Change values in the query tensor to see difference in rank score, setting different weights for links. Summary: The problem of comparing two lists of links is transformed into a numerical problem of multiplying two occurrence vectors, summing co-occurrences and ranking by this sum: ``` sum(tensorFromWeightedSet(attribute(inlinks), links) * query(links)) ``` Notes: - Query tensors can grow large. Applications will normally create the tensor in code using a [Searcher](searcher-development.html), also see [example](ranking-expressions-features.html#query-feature-types). - Here the document tensor is created from a weighted set - a better way would be to store this in a tensor in the document to avoid the transformation. ##### Retrieval and ranking So far in this guide, we have run the ranking function over _all_ documents. This is a valid use case for many applications. However, ranking documents is generally CPU-expensive, optimizing by reducing the candidate set will increase performance. Example query using text matching, dumping [calculated rank features](reference/query-api-reference.html#ranking.listfeatures): yql=select \* from doc where title contains "document"&ranking.listFeatures See the **long** list of rank features calculated per result. However, the query filters on documents with "document" in the title, so the features are only calculated for the small set of matching documents. Running a filter like this is _document retrieval_. Another good example is web search - the user query terms are used to _retrieve_ the candidate set cheaply (from billions of documents), then one or more _ranking functions_ are applied to the much smaller candidate set to generate the ranked top-ten. Another way to look at it is: - In the retrieval (recall) phase, _find all relevant documents_ - In the ranking phase, _show only relevant documents_. Still, the candidate set after retrieval can be big, a query can hit all documents. Ranking all candidates is not possible in many applications. Splitting the ranking into two phases is another optimization - use an inexpensive ranking expression to sort out the least promising candidates before spending most resources on the highest ranking candidates. In short, use increasingly more power per document as the candidate set shrinks: ![Retrieval and ranking]](/assets/img/retrieval-ranking.svg) Let's try the same query again, with a two-phase rank-profile that also does an explicit rank score cutoff: yql=select \* from doc where title contains "attribute"&ranking=inlinks\_twophase ``` rank-profile inlinks_twophase inherits inlinks_age { first-phase { keep-rank-count : 50 rank-score-drop-limit : 10 expression : num_inlinks } second-phase { expression : rank_score } } ``` Note how using rank-profile `inherits` is a smart way to define functions once, then use in multiple rank-profiles. Read more about [schema inheritance](schemas.html#schema-inheritance). Here, `num_inlinks` and `rank_score` are defined in a rank profile we used earlier: ``` function num_inlinks() { expression: attribute(inlinks).count } ``` In the results, observe that no document has a _rankingExpression(num\_inlinks)_ less than or equal to 10.0, meaning all such documents were purged in the first ranking phase due to the `rank-score-drop-limit`. Normally, the `rank-score-drop-limit` is not used, as the `keep-rank-count` is most important. Read more in the [reference](reference/schema-reference.html#rank-score-drop-limit). For a dynamic limit, pass a ranking feature like `query(threshold)`and use an `if` statement to check if the score is above the threshold or not - if below, assign -1 (something lower than the `rank-score-drop-limit`) and have it dropped. Read more in [ranking expressions](ranking-expressions-features.html#the-if-function-and-string-equality-tests). Two-phased ranking is a performance optimization - this guide is about functionality, so the rest of the examples will only be using one ranking phase. Read more in [first-phase](reference/schema-reference.html#firstphase-rank). ##### Retrieval: AND, OR, weakAnd This guide will not go deep in query operators in the retrieval phase, see [query-api](query-api.html) for details. Consider a query like _"vespa documents about ranking and retrieval"_. A query AND-ing these terms hits less than 3% of the document corpus, missing some of the documents about ranking and retrieval: yql=select \* from doc where (default contains "vespa" AND default contains "documents" AND default contains "about" AND default contains "ranking" AND default contains "and" AND default contains "retrieval") Alternatively, OR-ing the terms hits more than 95% of the documents, unable to filter out irrelevant documents in the retrieval phase: yql=select \* from doc where (default contains "vespa" OR default contains "documents" OR default contains "about" OR default contains "ranking" OR default contains "and" OR default contains "retrieval") Using a "weak AND" can address the problems of too few (AND) or too many (OR) hits in the retrieval phase. Think of it as an _optimized OR_, where the least relevant candidates are discarded from further evaluation. To find the least relevant candidates, a simple scoring function is used: ``` rank_score = sum_n(term(n).significance * term(n).weight) ``` As the point of [weakAnd](reference/query-language-reference.html#weakand) is to early discard the worst candidates,_totalCount_ is an approximation: yql=select \* from doc where {scoreThreshold: 0, targetHits: 10}weakAnd( default contains "vespa", default contains "documents", default contains "about", default contains "ranking", default contains "and", default contains "retrieval") Note that this blurs the distinction between filtering (retrieval) and ranking a little - here the `weakAnd` does both filtering and ranking to optimize the number of candidates for the later rank phases. The default rank-profile is used: ``` rank-profile documentation inherits default { inputs { query(titleWeight): 2.0 query(contentsWeight): 1.0 } first-phase { expression: query(titleWeight) * bm25(title) + query(contentsWeight) * bm25(content) } } ``` Observe we are here using text matching rank features, which fits well with weakAnd's scoring function that also uses text matching features. Read more in [using weakAnd with Vespa](using-wand-with-vespa.html). ##### Next steps - Read more about custom re-ranking of the final result set in[reranking in searcher](reranking-in-searcher.html). Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Ranking using document features only](#ranking-using-document-features-only) - [Observing values used in ranking](#observing-values-used-in-ranking) - [Ranking with query features](#ranking-with-query-features) - [Ranking with a query tensor](#ranking-with-a-query-tensor) - [Retrieval and ranking](#retrieval-and-ranking) - [Retrieval: AND, OR, weakAnd](#retrieval-and-or-weakand) - [Next steps](#next-steps) --- ## Getting Started ### Getting Started Welcome to Vespa, the open big data serving engine! Here you'll find resources for getting started. #### Getting Started Welcome to Vespa, the open big data serving engine! Here you'll find resources for getting started. | Quick Start | [**Quick start: Create and run a minimal Vespa application**](/en/cloud/getting-started) Other ways to get started: - [Quick start, application with Java components](/en/cloud/getting-started-java) - [Quick start, using the Pyvespa Python API](https://vespa-engine.github.io/pyvespa/) - Docker Desktop: [Install and run Vespa locally](deploy-an-application-local.html) - Docker Desktop: [Install and run Vespa locally, with Java components](deploy-an-application-local-java.html) The [developer guide](/en/developer-guide.html) is an intro to developing, testing, and deploying applications. Until you add multiple nodes an application can be deployed both on cloud and locally with no modifications. | | Tutorials and Use Cases | Moving from the minimal quick start to more advanced use cases **Search** - [Tutorial: Text Search](tutorials/text-search.html). A text search tutorial and introduction to text ranking with Vespa using traditional information retrieval techniques like BM25. - [Tutorial: Hybrid Text Search](tutorials/hybrid-search.html). A search tutorial and introduction to hybrid text ranking with Vespa, combining BM25 with text embedding models. - [Tutorial: Improving Text Search with Machine Learning](tutorials/text-search-ml.html). This tutorial builds on the [text search tutorial](tutorials/text-search.html) but introduces Learning to Rank to improve relevance. **Vector Search** Learn how to use Vespa Vector Search in the [practical nearest neighbor search guide](nearest-neighbor-search-guide.html). It uses Vespa's support for [nearest neighbor search](nearest-neighbor-search.html), there is also support for fast [approximate nearest neighbor search](approximate-nn-hnsw.html) in Vespa. The guide covers combining vector search with filters and how to perform hybrid search, combining retrieval over inverted index structures with vector search. **RAG (Retrieval-Augmented Generation)** - [Tutorial: RAG Blueprint](tutorials/rag-blueprint.html). A tutorial that provides a blueprint for building high-quality RAG applications with Vespa. Includes evaluation and learning-to-rank (LTR). - [Retrieval-augmented generation (RAG) in Vespa](llms-rag.html). **Recommendation** Learn how to use Vespa for content recommendation/personalization in the [News Search and Recommendation](tutorials/news-1-getting-started.html) tutorial set. **ML Model Serving** Learn how to use Vespa for ML model serving in [Stateless Model Evaluation](stateless-model-evaluation.html). Vespa supports running inference with models from many popular ML frameworks, which can be used for ranking, query classification, question answering, multi-modal retrieval, and more. - [Ranking with ONNX models](onnx.html). Export models from popular deep learning frameworks such as [PyTorch](https://pytorch.org/docs/stable/onnx.html) to [ONNX](https://onnx.ai/) format for serving in Vespa. Vespa integrates with [ONNX-Runtime](https://blog.vespa.ai/stateful-model-serving-how-we-accelerate-inference-using-onnx-runtime/) for [accelerated inference](https://blog.vespa.ai/stateless-model-evaluation/). Many ML frameworks support exporting models to ONNX, including [sklearn](http://onnx.ai/sklearn-onnx/). - [Ranking with LightGBM models](lightgbm.html) - [Ranking with XGBoost models](xgboost.html) - [Ranking with TensorFlow models](tensorflow.html) **Embedding Model Inference** Vespa supports integrating [embedding](embedding.html) models, which avoids transferring large amounts of embedding vector data over the network and allows for efficient serving of embedding models. - [Huggingface Embedder](embedding.html#huggingface-embedder) Use single-vector embedding models from Hugging face - [ColBERT Embedder](embedding.html#colbert-embedder) Use multi-vector embedding models - [Splade Embedder](embedding.html#splade-embedder) Use sparse learned single vector embedding models **ML Model Lifecycle** The [Models hot swap tutorial](tutorials/models-hot-swap.html) shows a solution for changing the vector embedding model atomically while serving. It also extends the application to support multiple recommendation models while minimizing data duplication. Lastly, it demonstrates how to efficiently garbage collect obsolete content in an application. **E-Commerce Search** The [e-commerce shopping sample application](use-case-shopping.html) demonstrates Vespa grouping, true in-place partial updates, custom ranking, and more. **Examples and starting sample applications** There are many examples and starting applications on[GitHub](https://github.com/vespa-engine/sample-apps/) and [Pyvespa examples](https://vespa-engine.github.io/pyvespa/index.html). | | Production deployment environments | Vespa can be deployed in multiple ways. These guides show how to deploy [multi-node applications](/en/operations-selfhosted/multinode-systems.html) in various environments. - [Production deployments on Vespa Cloud](https://cloud.vespa.ai/en/production-deployment) - [Vespa high-availability multi-node template](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) - [Vespa multinode testing and observability](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode) - [Using Kubernetes with Vespa](/en/operations-selfhosted/using-kubernetes-with-vespa.html) - [AWS EC2 multinode](/en/operations-selfhosted/multinode-systems.html#aws-ec2) - [AWS ECS multinode](/en/operations-selfhosted/multinode-systems.html#aws-ecs) See also [monitoring Vespa](/en/operations-selfhosted/monitoring.html). | | Custom component development | Vespa applications can contain custom components that are run by Vespa, for example, when receiving queries or documents. The applications must be able to run on a JVM. While all the built-in behavior of Vespa can be invoked by a YQL query, advanced applications often choose to use plugin components to build queries from frontend requests as doing this closer to the data is faster and simpler. See the quick starts with Java above to get started. The [Developer Guide](developer-guide.html) has more details. | ##### Next Steps - [Performance and scaling on Vespa](performance/). - [Vespa query performance - practical guide](performance/practical-search-performance-guide.html). - Overview of [Vespa APIs](api.html). - [Frequently asked questions](faq.html). - [Sample applications GitHub repo](https://github.com/vespa-engine/sample-apps). - [Securing a Vespa installation](/en/operations-selfhosted/securing-your-vespa-installation.html). - Follow the [Vespa Blog](https://blog.vespa.ai/) for product updates and use cases. Copyright © 2025 - [Cookie Preferences](#) --- ## Glossary ### Glossary This is a glossary of both Vespa-specific terminology, and general terms useful in this context. #### Glossary This is a glossary of both Vespa-specific terminology, and general terms useful in this context. * * * - **Application** - **Attribute** - **Boolean Search** - **Cluster** - **Component** - **Configuration Server** - **Container** - **Content Node** - **Control Plane** - **Data Plane** - **Deploy** - **Deployment** - **Diversity** - **Docker** - **Document** - **Document frequency (normalized)** - **Document summary** - **Document Processor** - **Document Type** - **Elasticity** - **Enclave** - **Embedding** - **Estimated hit ratio** - **Federation** - **Field** - **Fieldset** - **Garbage Collection** - **Grouping** - **Handler** - **Indexing** - **Instance** - **Namespace** - **Nearest neighbor search** - **Node** - **Parent / Child** - **Partial Update** - **Posting List** - **Quantization** - **Query** - **Ranking** - **Schema** - **Searcher** - **Semantic search** - **Service** - **Streaming search** - **Tenant** - **Tensor** - **Visit** Copyright © 2025 - [Cookie Preferences](#) --- ## Graceful Degradation ### Graceful Query Coverage Degradation Ideally one want to query all data indexed in a Vespa cluster within the specified timeout, but that might not be possible for different reasons: #### Graceful Query Coverage Degradation Ideally one want to query all data indexed in a Vespa cluster within the specified timeout, but that might not be possible for different reasons: - The system might be overloaded due to capacity constraints, and queries do not complete within the timeout, as they are sitting in a queue waiting for a resource. - A complex query might take longer time to execute than the specified timeout, or the timeout is too low given the complexity of the query and available resource capacity. This document describes how Vespa could gracefully degrade the result set if the query cannot be completed within the timeout specified. Definitions: - **Coverage**: The percentage of documents indexed which were evaluated by the query. The ideal coverage is 100%. - **Timeout**: The total time a query is allowed to run for, see [timeout](reference/query-api-reference.html#timeout) (default 500 ms). Vespa is a distributed system where multiple components are involved in the query execution. - **Soft Timeout**: Soft timeout allows coverage to be less than 100%, but larger than 0% if the query is approaching timeout. Soft timeout might also be considered as an _early termination_ technique, and is enabled by default. Refer to [ranking.softtimeout.enable](reference/query-api-reference.html#ranking.softtimeout.enable). ##### Detection The default JSON renderer template will always render a _coverage_ element below the root element, which has a _degraded_ element if the query execution was degraded in some way and the _coverage_ field will be less than 100. Example request with a query timeout of 200 ms and _ranking.softtimeout.enable=true_: ``` /search/?searchChain=vespa&yql=select * from sources * where foo contains bar&presentation.format=json&timeout=200ms&ranking.softtimeout.enable=true ``` ``` ``` { "root": { "coverage": { "coverage": 99, "degraded": { "adaptive-timeout": false, "match-phase": false, "non-ideal-state": false, "timeout": true }, "documents": 167006201, "full": false, "nodes": 11, "results": 1, "resultsFull": 0 }, "fields": { "totalCount": 16469732 } } } ``` ``` The result was delivered in 200 ms but the query was degraded as coverage is less than 100. In this case, 167,006,201 out of x documents where queried, and 16,469,732 documents where matched and ranked, using the first-phase ranking expression in the default rank profile. The _degraded_ field contains the following fields which explains why the result had coverage less than 100: - _adaptive-timeout_ is true if [adaptive node timeout](#adaptive-node-timeout) has been enabled, and one or more nodes fail to produce a result at all within the timeout. This could be caused by nodes with degraded hardware making them slower than peers in the cluster. - _match-phase_ is true if the rank profile has defined [match phase ranking degradation](reference/schema-reference.html#match-phase). Match-phase can be used to control which documents are ranked within the timeout. - _non-ideal-state_ is true in cases where the system is not in [ideal state](content/idealstate.html). This case is extremely rare. - _timeout_ is true if softtimeout was enabled, and not all documents could be matched and ranked within the query timeout. Note that the degraded reasons are not mutually exclusive. In the example, the softtimeout was triggered and only 99% of the documents where queried before the time budget ran out. One could imagine scenarios where 10 out of 11 nodes involved in the query execution were healthy and triggered soft timeout and delivered a result, while the last node was in a bad state (e.g. hw issues) and could not produce a result at all, and that would cause both _timeout_ and _adaptive-timeout_ to be true. When working on Results in a [Searcher](searcher-development.html), get the coverage information programmatically: ``` ``` @Override public Result search(Query query, Execution execution) { Result result = execution.search(query); Coverage coverage = result.getCoverage(false); if (coverage != null && coverage.isDegraded()) { logger.warning("Got a degraded result for query " + query + " : " + coverage.getResultPercentage() + "% was searched"); } return result; } ``` ``` ##### Adaptive node timeout For a content cluster with [flat](performance/sizing-search.html#data-distribution) data distribution, query performance is no better than the slowest node. The worst case scenario happens when a node in the cluster is experiencing underlying HW issues. In such a state, a node might answer health checks and pings, but still not be able to serve queries within the timeout. Using [adaptive coverage](reference/services-content.html#coverage) allows ignoring slow node(s). The following example demonstrates how to use adaptive timeout. The example uses a flat content cluster with 10 nodes: ``` ``` 0.9 0.2 0.3 ``` ``` - Assuming using the default vespa timeout of 500ms, the stateless container dispatches the query to all 10 nodes in parallel and waits until 9 out of 10 have replied (minimum coverage 0.9). - Assuming 9 could respond in 100ms, there is 400ms left. The dispatcher then waits minimum 80 ms (0.2\*400ms) for the last node to respond, and at maximum 120 (0.3\*400ms) before giving up waiting for the slowest node and return the result. - The min wait setting is used to allow some per node response time variance. Using min wait 0 will cause the query to return immediately when min coverage has been reached (9 out of 10 nodes replied). A higher than 0 value for min allows a node to be slightly slower than the peers and overall still reach 100% coverage. ##### Match phase degradation Refer to the [match-phase reference](reference/schema-reference.html#match-phase). Concrete examples of using match phase is found in the [practical performance guide](performance/practical-search-performance-guide.html#match-phase-limit---early-termination). Match-phase works by specifying an `attribute` that measures document quality in some way (popularity, click-through rate, pagerank, ad bid value, price, text quality). In addition, a `max-hits` value is specified that specifies how many hits are "more than enough" for the application. Then an estimate is made after collecting a reasonable amount of hits for the query, and if the estimate is higher than the configured `max-hits` value, an extra limitation is added to the query, ensuring that only the highest quality documents can become hits. In effect, this limits the documents actually queried to the highest quality documents, a subset of the full corpus, where the size of subset is calculated in such a way that the query is estimated to give `max-hits` hits. Since some (low-quality) hits will already have been collected to do the estimation, the actual number of hits returned will usually be higher than max-hits. But since the distribution of documents isn't perfectly smooth, you risk sometimes getting less than the configured `max-hits` hits back. Note that limiting hits in the match-phase also affects [aggregation/grouping](grouping.html), and total-hit-count since it actually limits, so the query gets fewer hits. Also note that it doesn't really make sense to use this feature together with a [WAND operator](using-wand-with-vespa.html) that also limit hits, since they both operate in the same manner, and you would get interference between them that could cause unpredictable results. The graph shows possible hits versus actual hits in a corpus with 100 000 documents, where `max-hits` is configured to 10 000. The corpus is a synthetic (slightly randomized) data set, in practice the graph will be less smooth: ![Plot of possible vs. actual hits](/assets/img/relevance/match-phase-max-hits.png) There is a content node metric per rank-profile named_content.proton.documentdb.matching.rank\_profile.limited\_queries_which can be used to see how many of the queries are actually affected by these settings; compare with the corresponding _content.proton.documentdb.matching.rank\_profile.queries_ metric to measure the percentage. ###### Match Phase Tradeoffs There are some important things to consider before using _match-phase_. In a normal query scenario, latency is directly proportional to the number of hits the query matches: a query that matches few documents will have low latency and a query that matches many documents will have high latency. Match-phase has the **opposite** effect. This means that if you have queries that match few documents, match-phase might make these queries significantly slower. It might actually be faster to run the query without the filter. Example: Lets say you have a corpus with a document attribute named _created\_time_. For all queries you want the newest content surfaced, so you enable match-phase on _created\_time_. So far, so good - you get a great latency and always get your top-k hits. The problem might come if you introduce a filter. If you have a filter saying you only want documents from the last day, then match-phase can become suboptimal and in some cases much worse than running without match-phase. By design, Vespa will evaluate potential matches for a query by the order of their internal documentid. This means it will start evaluating documents in the order they were indexed on the node, and for most use-cases that means the oldest documents first. Without a filter, every document is a potential match, and match-phase will quickly figure out how it can optimize. With the filter, on the other hand, the algorithm need to evaluate almost the full corpus before it reaches potential matches (1 day old corpus), and because of the way the algorithm is implemented, end up with doing a lot of unnecessary work and can have orders of magnitude higher latencies than running the query without the filter. Another important thing to mention is that the reported total-hits will be different when doing queries with match-phase enabled. This is because match-phase works on an estimated "virtual" corpus, which might have much fewer hits than is actually in the full corpus. If used correctly match-phase can be a life-saver, however, it is not a straight forward fix-it-all silver bullet. Please test and measure your use of match-phase, and contact the Vespa team if your results are not what you expect. Copyright © 2025 - [Cookie Preferences](#) --- ## Grouping Syntax ### Grouping Reference Read the [Vespa grouping guide](../grouping.html) first, for examples and an introduction to grouping - this is the Vespa grouping reference. #### Grouping Reference Read the [Vespa grouping guide](../grouping.html) first, for examples and an introduction to grouping - this is the Vespa grouping reference. Also note that using a [multivalued](../schemas.html#field) attribute (such as an array of doubles) in a grouping expression affects performance. Such operations can hit a memory bandwidth bottleneck, particularly if the set of hits to be processed is large, as more data is evaluated. ##### Group Group query results using a custom expression (using the `group` clause): - A numerical or string constant (e.g., `group(1)` or `group("all")`) which makes one bucket with everything - A document [attribute](../attributes.html) - A function over another expression (`xorbit`, `md5`, `cat`, `xor`, `and`, `or`, `add`, `sub`, `mul`, `div`, `mod`) or any other [expression](#expressions) - The datatype of an expression is resolved using best-effort, similarly to how common programming languages does to resolve arithmetics of different data typed operands - The results of any expression are either scalar or single dimension arrays - `add()` adds all elements together to produce a scalar - `add(, )` adds each element together producing a new array whose size is `max(||, ||)` Groups can contain subgroups (by using `each` and `group` operations), and may be nested to any level. Multiple sub-groupings or outputs can be created under the same group level, using multiple parallel `each`or `all` clauses, and each one may be labelled using [as(mylabel)](#labels). When grouping results, _groups_ that contain _outputs_,_group lists_ and _hit lists_ are generated. Group lists contain subgroups, and hit lists contain hits that are part of the owning group. The identity of a group is held by its _id_. Scalar identities such as long, double and string, are directly available from the _id_, whereas range identities used for bucket aggregation are separated into the sub-nodes _from_ and _to_. Refer to the [result format reference](default-result-format.html). ###### Multivalue attributes A [multivalue](../schemas.html#field) attribute is a[weighted set](schema-reference.html#weightedset),[array](schema-reference.html#array) or[map](schema-reference.html#map). Most grouping functions will just handle the elements of multivalued attributes separately, as if they were all individual values in separate documents. If you are grouping over array of struct or maps, scoping will be used to preserve structure. Each entry in the array/map will be treated as a separate sub-document. The following syntax can be used when grouping on _map_ attribute fields. Group on map keys: ``` all( group(mymap.key) each(output(count())) ) ``` Group on map keys then on map values: ``` all( group(mymap.key) each( group(mymap.value) each(output(count())) )) ``` Group on values for key _my\_key_: ``` all( group(my_map{"my_key"}) each(output(count())) ) ``` Group on struct field _my\_field_ referenced in map element _my\_key_: ``` all( group(my_map{"my_key"}.my_field) each(output(count())) ) ``` The key can either be specified directly (above) or indirectly via a key source attribute. The key is retrieved from the key source attribute for each document. Note that the key source attribute must be single value and have the same data type as the key type of the map: ``` all( group(my_map{attribute(my_key_source)}) each(output(count())) ) ``` Group on array of integers field: ``` all( group(my_array) each(output(count())) ) ``` Group on struct field _my\_field_ in the _my\_array_ array of structs: ``` all( group(my_array.my_field) each(output(count())) ) ``` [Tensors](schema-reference.html#tensor) can not be used in grouping. ##### Filtering groups When grouping on multivalue attributes, it may be useful to filter the groups so that only some specific values are collected. This can be done by adding a filter. The `filter` clause expects a filter _predicate_: - [regex("regular expression", input-expression)](#regex-filter) - [range(min-limit, max-limit, input-expression)](#range-filter) - [range(min-limit, max-limit, input-expression, bool, bool)](#range-filter) - [not _predicate_](#logical-predicates-filter) - [_predicate_ and _predicate_](#logical-predicates-filter) - [_predicate_ or _predicate_](#logical-predicates-filter) ###### Regex filter Use a regular expression to match the input, and include only documents that match in the grouping. The input will usually be the same -expression as in the "group" clause. Example: ``` all( group(my_array) filter(regex("foo.*", my_array)) ...) ``` Here, each value in _my\_array_ is considered, but only the values that start with a "foo" prefix are collected in groups; All others are ignored. See [example](/en/grouping.html#structured-grouping). ###### Range filter Use a `range` filter to match documents where a field value is between a lower and an upper bound. Example: ``` all( group(some_field) filter(range(1990, 2012, year)) ...) ``` Here, the lower bound is _inclusive_ (year ≥ 1990) and the upper bound is _exclusive_ (year \< 2012). Use optional bools at the end to control if the lower and upper bounds are inclusive respectively. The first bool sets the lower bound inclusive, and the second sets upper bound inclusive. ``` all( group(some_field) filter(range(1990, 2012, year, true, true)) ...) ``` Here, both lower and upper bound are inclusive. ###### Logical predicates Use `not` to negate another filter expression. It takes a single sub-filter and matches when the sub-filter does not. Example: ``` all( group(my_field) filter( not regex("bar.*", my_other_field)) ...) ``` Use `or` to perform a logical disjunction across two sub-filters. The combined filter matches if any of the sub-filters evaluate to true. Example: ``` all( group(my_field) filter( regex("bar.*", my_field) or regex("baz.*", my_third_field) ) ...) ``` Use `and` to perform a logical conjunction across two sub-filters. The combined filter matches only if all of the sub-filters evaluate to true. Example: ``` all( group(my_field) filter( regex("bar.*", my_other_field) and regex("baz.*", my_third_field) ) ...) ``` These logical predicates can be nested to create complex filter conditions. Filter expressions follow _conventional precedence_rules: `not` is evaluated before `and`, and `and` is evaluated before `or`. Operators of the same precedence are evaluated left-to-right. Use parentheses `(...)` to force a different grouping when needed. Example: ``` all( group(my_field) filter( (regex("bar.*", some_field) or regex("baz.*", other_field)) and not regex(".*foo", some_field)) each(...) ) ``` ##### Order / max Each level of grouping may specify how to order its groups (using `order`): - Ordering can be done using any of the available aggregates - Multi-level grouping allows strict ordering where primary aggregates may be equal - Ordering is either ascending or descending, specified per level of ordering - Groups are sorted using [locale aware sorting](#uca) Limit the number of groups returned for each level using `max`, returning only first _n_ groups as specified by `order`: - `order` changes ordering of groups after a merge operation for the following aggregators: `count`, `avg` and ` sum` - `order` **will not** change ordering of groups after a merge operation when `max` or `min` is used - Default order, `-max(relevance())`, **does not** require use of [precision](#precision) ##### Continuations Pagination of grouping results are managed by `continuations`. These are opaque objects that can be combined and re-submitted using the `continuations` annotation on the grouping step of the query to move to the previous or next page in a result list. All root groups contain a single _this_ continuation per `select`. That continuation represents the current view, and if submitted as the sole continuation, it will reproduce the exact same result as the one that contained it. There are zero or one _prev_/_next_ continuation per group- and hit list. Submit any number of these to retrieve the next/previous pages of the corresponding lists Any number of continuations can be combined in a query, but the first must always be the _this_-continuation. E.g. one may simultaneously move both to the next page of one list, and the previous page of another. **Note:** If more than one continuation object are provided for the same group- or hit-list, the one given last is the one that takes effect. This is because continuations are processed in the order given, and they replace whatever continuations they collide with. If working programmatically with grouping, find the[Continuation](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/Continuation.html)objects within[RootGroup](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/RootGroup.html),[GroupList](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/GroupList.html) and[HitList](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/result/HitList.html)result objects. These can then be added back into the continuation list of the[GroupingRequest](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/GroupingRequest.html)to paginate. Refer to the [grouping guide](../grouping.html#pagination) for an example. ##### Labels Lists created using the `each` keyword can be assigned a label using the construct `each(...) as(mylabel)`. The outputs created by that each clause will be identified by this label. ##### Aliases Grouping expressions can be tagged with an _alias_. An alias allows the expression to be reused without having to repeat the expression verbatim. ``` all(group(a) alias(myalias, count()) each(output($myalias))) ``` is equivalent to ``` all(group(a) each(output(count()))) ``` . ``` all(group(a) order($myalias=count()) each(output($myalias))) ``` is equivalent to ``` all(group(a) order(count()) each(output(count()))) ``` . ##### Precision The number of intermediate groups returned from each content node during expression evaluation to give the container node more data to consider when selecting the groups that are to be evaluated further:`each(...) precision(1000)`A higher number costs more bandwidth, but leads to higher accuracy in some cases. ##### Query parameters The following _query parameters_ are relevant for grouping. See the [Query API Reference](query-api-reference.html#parameters) for description. - [select](query-api-reference.html#select) - [groupingSessionCache](query-api-reference.html#groupingsessioncache) - [grouping.defaultMaxGroups](query-api-reference.html#grouping.defaultmaxgroups) - [grouping.defaultMaxHits](query-api-reference.html#grouping.defaultmaxhits) - [grouping.globalMaxGroups](query-api-reference.html#grouping.globalmaxgroups) - [grouping.defaultPrecisionFactor](query-api-reference.html#grouping.defaultprecisionfactor) ##### Grouping Session Cache **Important:** The grouping session cache is **only useful if** the grouping expression uses default ordering. The **drawback** is that when `max` is specified in the grouping expression, it might cause inaccuracies in aggregated values such as `count`. It is recommended testing whether this is an issue or not, and if so, adjust the `precision` parameter to still get correct counts. The session cache stores intermediate grouping results in the content nodes when using multi-level grouping expressions, in order to speed up grouping at a potential loss of accuracy. This causes the query and grouping expression to be run only once. When having multi-level grouping expressions, the search query is normally re-run for each level. The drawback of this is, with an expensive ranking function, the query will take more time than strictly necessary. ##### Aggregators Each level of grouping specifies a set of aggregates to collect for all documents that belong to that group (using the `output` operation): - The documents in a group, retrieved using a specified summary class - The count of documents in a group - The sum, average, min, max, xor or standard deviation of an expression - Multiple quantiles of an expressions value When all arguments are numeric, the result type is resolved by looking at the argument types. If all arguments are longs, the result is a long. If at least one argument is a double, the result is a double. When using `order`, aggregators can also be used in expressions in order to get increased control over group sorting. This does not work with expressions that takes attributes as an argument, unless the expression is enclosed within an aggregator. Using sum, max on a multivalued attribute: Doing an operation such as `output(sum(myarray))` will run the sum over each element value in each document. The result is the sum of sums of values. Similarly `max(myarray)` will yield the maximal element over all elements in all documents, and so on. Compute quantiles by listing the desired quantile values (comma-separated) in brackets, followed by a comma and the expression (e.g., a field): ``` all( group(city) each(output(quantiles([0.5], delivery_days) as(median_delivery_days) ) ) ) ``` to compute the median, or ``` all( group(city) each(output(quantiles([0.5, 0.9], delivery_days))) ) ``` This computes the median (p50) and 90th percentile (p90) time to delivery in days per city. Note that quantiles are computed using [KLL Sketch](https://datasketches.apache.org/docs/KLL/KLLSketch.html), so they are approximate. Multivalue fields such as maps, arrays can be used for grouping. However, using aggregation functions such as sum() on such fields can give misleading results. Assume a map from strings to integers (`map`), where the strings are some sort of key to use for grouping. The following expression will provide the sum of the values for all keys: ``` all( group(mymap.key) each(output(sum(mymap.value))) ) ``` and not the sum of the values within each key, as one would expect. It is still, however, possible to run the following expression to get the sum of values within a specific key: ``` all( group("my_group") each(output(sum(mymap{"foo"}))) ) ``` Refer to the system test for[grouping on struct and map types](https://github.com/vespa-engine/system-test/blob/master/tests/search/struct_and_map_types/struct_and_map_grouping.rb)for more examples. | ###### Group list aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | count | Counts the number of unique groups (as produced by `group`). Note that `count` operates independently of `max` and that this count is an estimate using HyperLogLog++ which is an algorithm for the count-distinct problem | None | Long | | ###### Group aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | count | Increments a long counter every time it is invoked | None | Long | | sum | Sums the argument over all selected documents | Numeric | Numeric | | avg | Computes the average over all selected documents | Numeric | Numeric | | min | Keeps the minimum value of selected documents | Numeric | Numeric | | max | Keeps the maximum value of selected documents | Numeric | Numeric | | xor | XOR the values (their least significant 64 bits) of all selected documents | Any | Long | | stddev | Computes the population standard deviation over all selected documents | Numeric | Double | | quantiles | Computes one or multiple quantiles of the values of an expression. Quantiles must be a number between 0 and 1 inclusive. | [Numeric+], Expr | [{"quantile":Double,"value":Double}+] | | ###### Hit aggregators | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | summary | Produces a summary of the requested [summary class](/en/reference/schema-reference.html#document-summary) | Name of summary class | Summary | ##### Expressions | ###### Arithmetic expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | add | Add the arguments together | Numeric+ | Numeric | | + | Add left and right argument | Numeric, Numeric | Numeric | | mul | Multiply the arguments together | Numeric+ | Numeric | | \* | Multiply left and right argument | Numeric, Numeric | Numeric | | sub | Subtract second argument from first, third from result, etc | Numeric+ | Numeric | | - | Subtract right argument from left | Numeric, Numeric | Numeric | | div | Divide first argument by second, result by third, etc | Numeric+ | Numeric | | / | Divide left argument by right | Numeric, Numeric | Numeric | | mod | Modulo first argument by second, result by third, etc | Numeric+ | Numeric | | % | Modulo left argument by right | Numeric, Numeric | Numeric | | neg | Negate argument | Numeric | Numeric | | - | Negate right argument | Numeric | Numeric | | ###### Bitwise expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | and | AND the arguments in order | Long+ | Long | | or | OR the arguments in order | Long+ | Long | | xor | XOR the arguments in order | Long+ | Long | | ###### String expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | strlen | Count the number of bytes in argument | String | Long | | strcat | Concatenate arguments in order | String+ | String | | ###### Type conversion expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | todouble | Convert argument to double | Any | Double | | tolong | Convert argument to long | Any | Long | | tostring | Convert argument to string | Any | String | | toraw | Convert argument to raw | Any | Raw | | ###### Raw data expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | cat | Cat the binary representation of the arguments together | Any+ | Raw | | md5 | Does an MD5 over the binary representation of the argument, and keeps the lowest 'width' bits | Any, Numeric(width) | Raw | | xorbit | Does an XOR of 'width' bits over the binary representation of the argument. Width is rounded up to a multiple of 8 | Any, Numeric(width) | Raw | | ###### Accessor expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | relevance | Return the computed rank of a document | None | Double | | \ | Return the value of the named attribute | None | Any | | array.at | Array element access. The expression `array.at(myarray, idx)` returns one value per document by evaluating the `idx` expression and using it as an index into the array. The expression can then be used to build bigger expressions such as `output(sum(array.at(myarray, 0)))` which will sum the first element in the array of each document. - The `idx` expression is capped to `[0, size(myarray)-1]` - If \> array size, the last element is returned - If \< 0, the first element is returned | Array, Numeric | Any | | interpolatedlookup | Counts elements in a sorted array that are less than an expression, with linear interpolation if the expression is between element values. The operation `interpolatedlookup(myarray, expr)` is intended for generic graph/function lookup. The data in `myarray` should be numerical values sorted in ascending order. The operation will then scan from the start of the array to find the position where the element values become equal to (or greater than) the value of the `expr` lookup argument, and return the index of that position. When the lookup argument's value is between two consecutive array element values, the returned position will be a linear interpolation between their respective indexes. The return value is always in the range `[0, size(myarray)-1]` of the valid index values for an array. Assume `myarray` is a sorted array of type `array` in each document: The expression `interpolatedlookup(myarray, 4.2)` is now a per-document expression that first evaluates the lookup argument, here a constant expression 4.2, and then looks at the contents of `myarray` in the document. The scan starts at the first element and proceeds until it hits an element value greater than 4.2 in the array. This means that: - If the first element in the array is greater than 4.2, the expression returns 0 - If the first element in the array is exactly 4.2, the expression still returns 0 - If the first element in the array is 1.7 while the **second** element value is exactly 4.2, the expression return 1.0 - the index of the second element - If **all** the elements in the array are less than 4.2, the last valid array index `size(myarray)-1` is returned - If the 5 first elements in the array have values smaller than the lookup argument, and the lookup argument is halfway between the fifth and sixth element, a value of 4.5 is returned - halfway between the array indexes of the fifth and sixth elements - Similarly, if the elements in the array are `{0, 1, 2, 4, 8}` then passing a lookup argument of "5" would return 3.25 (linear interpolation between `indexOf(4)==3` and `indexOf(8)==4`) | Array, Numeric | Numeric | | ###### Bucket expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | fixedwidth | Maps the value of the first argument into consecutive buckets whose width equals the second argument | Any, Numeric | NumericBucketList | | predefined | Maps the value of the first argument into the given buckets. - Standard mathematical start and end specifiers may be used to define the width of a `bucket`. The `(` and `)` evaluates to `[` and `>` by default. - The buckets assume the type of the start/end specifiers (`string`, `long`, `double` or `raw`). Values are converted to this type before being compared with these specifiers. (e.g. `double` values are rounded to the nearest integer for buckets of type `long`). - The end specifier can be skipped. The buckets `bucket(3)`/`bucket[3]` are the same as `bucket[3,4>`. This is allowed for string expressions as well; `bucket("c")` is identical to `bucket["c", "c ">`. | Any, Bucket+ | BucketList | | ###### Time expressions The field must be a [long](schema-reference.html#long), with second resolution (unix timestamp/epoch) - [examples](../grouping.html#time-and-date). Each of the time-functions will respect the [timezone](query-api-reference.html#timezone) query parameter. | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | time.dayofmonth | Returns the day of month (1-31) for the given timestamp | Long | Long | | time.dayofweek | Returns the day of week (0-6) for the given timestamp, Monday being 0 | Long | Long | | time.dayofyear | Returns the day of year (0-365) for the given timestamp | Long | Long | | time.hourofday | Returns the hour of day (0-23) for the given timestamp | Long | Long | | time.minuteofhour | Returns the minute of hour (0-59) for the given timestamp | Long | Long | | time.monthofyear | Returns the month of year (1-12) for the given timestamp | Long | Long | | time.secondofminute | Returns the second of minute (0-59) for the given timestamp | Long | Long | | time.year | Returns the full year (e.g. 2009) of the given timestamp | Long | Long | | time.date | Returns the date (e.g. 2009-01-10) of the given timestamp | Long | Long | | ###### List expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | size | Return the number of elements in the argument if it is a list. If not return 1 | Any | Long | | sort | Sort the elements in argument in ascending order if argument is a list If not it is a NOP | Any | Any | | reverse | Reverse the elements in the argument if argument is a list If not it is a NOP | Any | Any | | ###### Other expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | zcurve.x | Returns the X component of the given [zcurve](https://en.wikipedia.org/wiki/Z-order_curve) encoded 2d point. All fields of type "position" have an accompanying "\\_zcurve" attribute that can be decoded using this expression, e.g. `zcurve.x(foo_zcurve)` | Long | Long | | zcurve.y | Returns the Y component of the given zcurve encoded 2d point | Long | Long | | uca | Converts the attribute string using [unicode collation algorithm](https://www.unicode.org/reports/tr10/). Groups are sorted using locale aware sorting, with the default and primary strength values, respectively: ``` all( group(s) order(max(uca(s, "sv"))) each(output(count())) ) ``` ``` all( group(s) order(max(uca(s, "sv", "PRIMARY"))) each(output(count())) ) ``` | Any, Locale(String), Strength(String) | Raw | | ###### Single argument standard mathematical expressions These are the standard mathematical functions as found in the Java [Math](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html) class. | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | math.exp |   | Double | Double | | math.log |   | Double | Double | | math.log1p |   | Double | Double | | math.log10 |   | Double | Double | | math.sqrt |   | Double | Double | | math.cbrt |   | Double | Double | | math.sin |   | Double | Double | | math.cos |   | Double | Double | | math.tan |   | Double | Double | | math.asin |   | Double | Double | | math.acos |   | Double | Double | | math.atan |   | Double | Double | | math.sinh |   | Double | Double | | math.cosh |   | Double | Double | | math.tanh |   | Double | Double | | math.asinh |   | Double | Double | | math.acosh |   | Double | Double | | math.atanh |   | Double | Double | | ###### Dual argument standard mathematical expressions | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | math.pow | Return X^Y. | Double, Double | Double | | math.hypot | Return length of hypotenuse given X and Y sqrt(X^2 + Y^2) | Double, Double | Double | ##### Filters | ###### String filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | regex | Matches a field against a regular expression string. | String, Expression | Bool | | ###### Numeric filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | range | Matches when a field is between a lower and upper bound. | Numeric, Numeric, Expression, Bool?, Bool? | Bool | | ###### Predicate filters | | Name | Description | Arguments | Result | | --- | --- | --- | --- | | and | Logical `and` between the arguments. | Filter, Filter | Bool | | not | Logical `not` on the argument. | Filter | Bool | | or | Logical `or` between the arguments. | Filter, Filter | Bool | ##### Grouping language grammar ``` request ::= "all(" operations ")" group ::= ( "all" | "each") "(" operations ")" ["as" "(" identifier ")"] operations ::= ["group" "(" exp ")"] ( ( "alias" "(" identifier "," exp ")" ) | ( "filter" "(" filterOp ")" ) | ( "max" "(" ( number | "inf" ) ")" ) | ( "order" "(" expList | aggrList ")" ) | ( "output" "(" aggrList ")" ) | ( "precision" "(" number ")" ) )* group* aggrList ::= aggr ( "," aggr )* aggr ::= ( ( "count" "(" ")" ) | ( "sum" "(" exp ")" ) | ( "avg" "(" exp ")" ) | ( "max" "(" exp ")" ) | ( "min" "(" exp ")" ) | ( "xor" "(" exp ")" ) | ( "stddev" "(" exp ")" ) | ( "summary" "(" [identifier] ")" ) ) ["as" "(" identifier ")"] expList ::= exp ( "," exp )* exp ::= ( "+" | "-") ( "$" identifier ["=" math] ) | ( math ) | ( aggr ) filterOp ::= "regex" "(" string "," exp ")" math ::= value [( "+" | "-" | "*" | "/" | "%" ) value] value ::= ( "(" exp ")" ) | ( "add" "(" expList ")" ) | ( "and" "(" expList ")" ) | ( "cat" "(" expList ")" ) | ( "div" "(" expList ")" ) | ( "docidnsspecific" "(" ")" ) | ( "fixedwidth" "(" exp "," number ")" ) | ( "interpolatedlookup" "(" attributeName "," exp ")") | ( "math" "." ( ( "exp" | "log" | "log1p" | "log10" | "sqrt" | "cbrt" | "sin" | "cos" | "tan" | "asin" | "acos" | "atan" | "sinh" | "cosh" | "tanh" | "asinh" | "acosh" | "atanh" ) "(" exp ")" | ( "pow" | "hypot" ) "(" exp "," exp ")" )) | ( "max" "(" expList ")" ) | ( "md5" "(" exp "," number "," number ")" ) | ( "min" "(" expList ")" ) | ( "mod" "(" expList ")" ) | ( "mul" "(" expList ")" ) | ( "or" "(" expList ")" ) | ( "predefined" "(" exp "," "(" bucket ( "," bucket )* ")" ")" ) | ( "reverse" "(" exp ")" ) | ( "relevance" "(" ")" ) | ( "sort" "(" exp ")" ) | ( "strcat" "(" expList ")" ) | ( "strlen" "(" exp ")" ) | ( "size" "(" exp")" ) | ( "sub" "(" expList ")" ) | ( "time" "." ( "date" | "year" | "monthofyear" | "dayofmonth" | "dayofyear" | "dayofweek" | "hourofday" | "minuteofhour" | "secondofminute" ) "(" exp ")" ) | ( "todouble" "(" exp ")" ) | ( "tolong" "(" exp ")" ) | ( "tostring" "(" exp ")" ) | ( "toraw" "(" exp ")" ) | ( "uca" "(" exp "," string ["," string] ")" ) | ( "xor" "(" expList ")" ) | ( "xorbit" "(" exp "," number ")" ) | ( "zcurve" "." ( "x" | "y" ) "(" exp ")" ) | ( attributeName "." "at" "(" number ")") | ( attributeName ) bucket ::= "bucket" ( "(" | "[" | "<" ) ( "-inf" | rawvalue | number | string ) ["," ( "inf" | rawvalue | number | string )] ( ")" | "]" | ">" ) rawvalue ::= "{" ( ( string | number ) "," )* "}" ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Group](#group) - [Multivalue attributes](#multivalue-attributes) - [Filtering groups](#filtering-groups) - [Regex filter](#regex-filter) - [Range filter](#range-filter) - [Logical predicates](#logical-predicates-filter) - [Order / max](#order) - [Continuations](#continuations) - [Labels](#labels) - [Aliases](#aliases) - [Precision](#precision) - [Query parameters](#query-parameters) - [Grouping Session Cache](#grouping-session-cache) - [Aggregators](#aggregators) - [Group list aggregators](#group-list-aggregators) - [Group aggregators](#group-aggregators) - [Hit aggregators](#hit-aggregators) - [Expressions](#expressions) - [Arithmetic expressions](#arithmetic-expressions) - [Bitwise expressions](#bitwise-expressions) - [String expressions](#string-expressions) - [Type conversion expressions](#type-conversion-expressions) - [Raw data expressions](#raw-data-expressions) - [Accessor expressions](#accessor-expressions) - [Bucket expressions](#bucket-expressions) - [Time expressions](#time-expressions) - [List expressions](#list-expressions) - [Other expressions](#other-expressions) - [Single argument standard mathematical expressions](#single-argument-standard-mathematical-expressions) - [Dual argument standard mathematical expressions](#dual-argument-standard-mathematical-expressions) - [Filters](#filters) - [String filters](#string-filters) - [Numeric filters](#numeric-filters) - [Predicate filters](#bitwise-expressions) - [Grouping language grammar](#grouping-language-grammar) --- ## Grouping ### Grouping Information in Results Try running requests on the [grouping example data](https://github.com/vespa-engine/sample-apps/blob/master/examples/part-purchases-demo/ext/feed.jsonl): #### Grouping Information in Results ##### Grouping Interface Try running requests on the [grouping example data](https://github.com/vespa-engine/sample-apps/blob/master/examples/part-purchases-demo/ext/feed.jsonl): all( group(customer) each(output(sum(price))) ) Run Grouping The Vespa grouping language is a list-processing language which describes how the query hits should be grouped, aggregated, and presented in result sets. A grouping statement takes the list of all matches to a query as input and groups/aggregates it, possibly in multiple nested and parallel ways to produce the output. This is a logical specification and does not indicate how it is executed, as instantiating the list of all matches to the query somewhere would be too expensive, and execution is distributed instead. Refer to the [Query API reference](reference/query-api-reference.html#select) for how to set the _select_ parameter, and the [Grouping reference](reference/grouping-syntax.html) for details. Fields used in grouping must be defined as [attribute](attributes.html) in the document schema. Grouping supports continuation objects for [pagination](#pagination). The [Grouping Results](https://github.com/vespa-engine/sample-apps/tree/master/examples/part-purchases-demo) sample application is a practical example. ##### The grouping language structure The operations defining the structure of a grouping are: - `all(statement)`: Execute the nested statement once on the input list as a whole. - `each(statement)`: Execute the nested statement on each element of the input list. - `group(specification)`: Turn the input list into a list of lists according to the grouping specification. - `output`: Output some value(s) at the current location in the structure. The parallel and nested collection of these operations defines both the structure of the computation and of the result it produces. For example, `all(group(customer) each(output(count())))` will take all matches, group them by customer id, and for each group, output the count of hits in the group. Vespa distributes and executes the grouping program on content nodes and merges results on container nodes - in multiple phases, as needed. As realizing such programs over a distributed data set requires more network round-trips than a regular search query, these queries may be more expensive than regular queries - see [defaultMaxGroups](reference/query-api-reference.html#grouping.defaultmaxgroups) and the likes for how to control resource usage. ##### Grouping by example For the entirety of this document, assume an index of engine part purchases: | Date | Price | Tax | Item | Customer | | --- | --- | --- | --- | --- | | 2006-09-06 09:00:00 | $1 000 | 0.24 | Intake valve | Smith | | 2006-09-07 10:00:00 | $1 000 | 0.12 | Rocker arm | Smith | | 2006-09-07 11:00:00 | $2 000 | 0.24 | Spring | Smith | | 2006-09-08 12:00:00 | $3 000 | 0.12 | Valve cover | Jones | | 2006-09-08 10:00:00 | $5 000 | 0.24 | Intake port | Jones | | 2006-09-08 11:00:00 | $8 000 | 0.12 | Head | Brown | | 2006-09-09 12:00:00 | $1 300 | 0.24 | Coolant | Smith | | 2006-09-09 10:00:00 | $2 100 | 0.12 | Engine block | Jones | | 2006-09-09 11:00:00 | $3 400 | 0.24 | Oil pan | Brown | | 2006-09-09 12:00:00 | $5 500 | 0.12 | Oil sump | Smith | | 2006-09-10 10:00:00 | $8 900 | 0.24 | Camshaft | Jones | | 2006-09-10 11:00:00 | $1 440 | 0.12 | Exhaust valve | Brown | | 2006-09-10 12:00:00 | $2 330 | 0.24 | Rocker arm | Brown | | 2006-09-10 10:00:00 | $3 770 | 0.12 | Spring | Brown | | 2006-09-10 11:00:00 | $6 100 | 0.24 | Spark plug | Smith | | 2006-09-11 12:00:00 | $9 870 | 0.12 | Exhaust port | Jones | | 2006-09-11 10:00:00 | $1 597 | 0.24 | Piston | Brown | | 2006-09-11 11:00:00 | $2 584 | 0.12 | Connection rod | Smith | | 2006-09-11 12:00:00 | $4 181 | 0.24 | Rod bearing | Jones | | 2006-09-11 13:00:00 | $6 765 | 0.12 | Crankshaft | Jones | ##### Basic Grouping Example: _Return the total sum of purchases per customer_ - steps: 1. Select all documents: ``` /search/?yql=select * from sources * where true ``` 2. Take the list of all hits: ``` all(...) ``` 3. Turn it into a list of lists of all hits having the same customer id: ``` group(customer) ``` 4. For each of those lists of same-customer hits: each(...) 5. Output the sum (an aggregator) of the price over all items in that list of hits: ``` output(sum(price)) ``` Final query, producing the sum of the price of all purchases for each customer: ``` /search/?yql=select * from sources * where true limit 0 | all( group(customer) each(output(sum(price))) ) ``` Here, limit is set to zero to get the grouping output only. URL encoded equivalent: ``` /search/?yql=select%20%2A%20from%20sources%20%2A%20where%20true%20limit%200%20%7C%20 all%28%20group%28customer%29%20each%28output%28sum%28price%29%29%29%20%29 ``` Result: | GroupId | Sum(price) | | --- | --- | | Brown | $20 537 | | Jones | $39 816 | | Smith | $19 484 | Example: _Sum price of purchases [per date](#time-and-date):_ ``` select (…) | all(group(time.date(date)) each(output(sum(price)))) ``` Note: in examples above, _all_ documents are evaluated. Modify the query to add filters (and thus cut latency), like (remember to URL encode): ``` /search/?yql=select * from sources * where customer contains "smith" ``` ##### Ordering and Limiting Groups In many scenarios, a large collection of groups is produced, possibly too large to display or process. This is handled by ordering groups, then limiting the number of groups to return. The `order` clause accepts a list of one or more expressions. Each of the arguments to `order` is prefixed by either a plus/minus for ascending/descending order. Limit the number of groups using `max` and `precision` - the latter is the number of groups returned per content node to be merged to the global result. Larger document distribution skews hence require a higher `precision` for accurate results. An implicit limit can be specified through the [grouping.defaultMaxGroups](reference/query-api-reference.html#grouping.defaultmaxgroups) query parameter. This value will always be overridden if `max` is explicitly specified in the query. Use `max(inf)` to retrieve all groups when the query parameter is set. If `precision` is not specified, it will default to a factor times `max`. This factor can be overridden through the [grouping.defaultPrecisionFactor](reference/query-api-reference.html#grouping.defaultprecisionfactor) query parameter. Example: To find the 2 globally best groups, make an educated guess on how many samples are needed to fetch from each node in order to get the right groups. This is the `precision`. An initial factor of 3 has proven to be quite good in most use cases. If however, the data for customer 'Jones' was spread on 3 different content nodes, 'Jones' might be among the 2 best on only one node. But based on the distribution of the data, we have concluded by earlier tests that if we fetch 5.67 as many groups as we need to, we will have a correct answer with at least 99.999% confidence. So then we just use 6 times as many groups when doing the merge. However, there is one exception. Without an `order` constraint, `precision` is not required. Then, local ordering will be the same as global ordering. Ordering will not change after a merge operation. ###### Example Example: _The two customers with most purchases, returning the sum for each:_ ``` select (…) | all(group(customer) max(2) precision(12) order(-count()) each(output(sum(price)))) ``` ##### Hits per Group Use `summary` to print the fields for a hit, and `max` to limit the number of hits per group. An implicit limit can be specified through the [grouping.defaultMaxHits](reference/query-api-reference.html#grouping.defaultmaxhits) query parameter. This value will always be overridden if `max` is explicitly specified in the query. Use `max(inf)` to retrieve all hits when the query parameter is set. ###### Example Example: Return the three most expensive parts per customer: ``` /search/?yql=select * from sources * where true | all(group(customer) each(max(3) each(output(summary())))) ``` Notes on ordering in the example above: - The `order` clause is a directive for _group_ ordering, not _hit_ ordering. Here, there is no order clause on the groups, so default ordering `-max(relevance())` is used. The _-_ denotes the sorting order, _-_ means descending (higher score first). In this case, the query is "all documents", so all groups are equally relevant and the group order is random. - To order hits inside groups, use ranking. Add `ranking=pricerank` to the query to use the pricerank [rank profile](ranking.html) to rank by price: ``` rank-profile pricerank inherits default { first-phase { expression: attribute(price) } } ``` ##### Filter within a group Use the `filter` clause to select which values to keep in a group. See the [reference](reference/grouping-syntax.html#filtering-groups) for details. ###### Example Example: Sum the price per customer of `Bonn.*` where price was over 1000. ``` /search/?yql=select * from sources * where true | all(group(customer) filter(regex("Bonn.*", attributes{"sales_rep"}) and not range(0, 1000, price)) each(output(sum(price)) each(output(summary())))) ``` ##### Global limit for grouping queries - [add](reference/document-json-format.html#add) Use the [grouping.globalMaxGroups](reference/query-api-reference.html#grouping.globalmaxgroups) query parameter to restrict execution of queries that are potentially too expensive in terms of compute and bandwidth. Queries that may return a result exceeding this threshold are failed preemptively. This limit is compared against the total number of groups and hits that query could return at worst-case. ###### Examples The following query may return 5 groups and 0 hits. It will be rejected when `grouping.globalMaxGroups < 5` ``` select (…) | all(group(item) max(5) each(output(count()))) ``` The following query may return 5 groups and 35 hits. It will be rejected when `grouping.globalMaxGroups < 5+5*7`. ``` select (…) | all( group(customer) max(5) each( output(count()) max(7) each(output(summary())) ) ) ``` The following query may return 6 groups and 30 hits. It will be rejected when `grouping.globalMaxGroups < 2*(3+3*5)`. ``` select (…) | all( all(group(item) max(3) each(output(count()) max(5) each(output(summary())))) all(group(customer) max(3) each(output(count()) max(5) each(output(summary()))))) ``` ###### Combining with default limits for groups/hits The `grouping.globalMaxGroups` restriction will utilize the [grouping.defaultMaxGroups](reference/query-api-reference.html#grouping.defaultmaxgroups)/ [grouping.defaultMaxHits](reference/query-api-reference.html#grouping.defaultmaxhits) values for grouping statements without a `max`. The two queries below are identical, assuming `defaultMaxGroups=5` and `defaultMaxHits=7`, and both will be rejected when `globalMaxGroups < 5+5*7`. ``` select (…) | all( group(customer) max(5) each( output(count()) max(7) each(output(summary())) ) ) ``` ``` select (…) | all( group(customer) each( output(count()) each(output(summary())) ) ) ``` A grouping without `max` combined with `defaultMaxGroups=-1`/`defaultMaxHits=-1` will be rejected unless `globalMaxGroups=-1`. This is because the query produces an unbounded result, an infinite number of groups if `defaultMaxGroups=-1` or an infinite number of summaries if `defaultMaxHits=-1`. An unintentional DoS (Denial of Service) could be the utter consequence if a query returns thousands of groups and summaries. This is why setting `globalMaxGroups=-1` is risky. ###### Recommended settings The best practice is to always specify `max` in groupings, making it easy to reason about the worst-case cardinality of the query results. The performance will also benefit. Set `globalMaxGroups` to the overall worst-case result cardinality with some margin. The `defaultMaxGroups`/`defaultMaxHits` should be overridden in a query profile if some groupings do not use `max` and the default values are too low. ``` 20 100 8000 ``` ##### Performance and Correctness Grouping is, by default, tuned to favor performance over correctness. Perfect correctness may not be achievable; result of queries using [non-default ordering](#ordering-and-limiting-groups) can be approximate, and correctness can only be partially achieved by a larger `precision` value that sacrifices performance. The [grouping session cache](reference/grouping-syntax.html#grouping-session-cache) is enabled by default. Disabling it will improve correctness, especially for queries using `order` and `max`. The cost of multi-level grouping expressions will increase, though. Consider increasing the [precision](#ordering-and-limiting-groups) value when using `max` in combination with `order`. The default precision may not achieve the required correctness for your use case. ##### Nested Groups Groups can be nested. This offers great drilling capabilities, as there are no limits to nesting depth or presented information on any level. Example: How much each customer has spent per day by grouping on customer, then date: ``` select (…) | all(group(customer) each(group(time.date(date)) each(output(sum(price))))) ``` Use this to query for all items on a per-customer basis, displaying the most expensive hit for each customer, with subgroups of purchases on a per-date basis. Use the [summary](#hits-per-group) clause to show hits inside any group at any nesting level. Include the sum price for each customer, both as a grand total and broken down on a per-day basis: ``` /search/?yql=select * from sources * where true limit 0| all(group(customer) each(max(1) output(sum(price)) each(output(summary()))) each(group(time.date(date)) each(max(10) output(sum(price)) each(output(summary()))))) &ranking=pricerank ``` | GroupId | sum(price) | | | | | | | --- | --- | --- | --- | --- | --- | --- | | Brown | $20 537 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 11:00 | $8 000 | 0.12 | Head | Brown | | | | GroupId | Sum(price) | | | | | | | 2006-09-08 | $8 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 11:00 | $8 000 | 0.12 | Head | Brown | | | 2006-09-09 | $3 400 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 11:00 | $3 400 | 0.12 | Oil pan | Brown | | | 2006-09-10 | $7 540 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 10:00 | $3 770 | 0.12 | Spring | Brown | | | | 2006-09-10 12:00 | $2 330 | 0.24 | Rocker arm | Brown | | | | 2006-09-10 11:00 | $1 440 | 0.12 | Exhaust valve | Brown | | | 2006-09-11 | $1 597 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 10:00 | $1 597 | 0.24 | Piston | Brown | | Jones | $39 816 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 12:00 | $9 870 | 0.12 | Exhaust port | Jones | | | | GroupId | Sum(price) | | | | | | | 2006-09-08 | $8 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-08 10:00 | $5 000 | 0.24 | Intake port | Jones | | | | 2006-09-08 12:00 | $3 000 | 0.12 | Valve cover | Jones | | | 2006-09-09 | $2 100 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 10:00 | $2 100 | 0,12 | Engine block | Jones | | | 2006-09-10 | $8 900 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 10:00 | $8 900 | 0.24 | Camshaft | Jones | | | 2006-09-11 | $20 816 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 12:00 | $9 870 | 0.12 | Exhaust port | Jones | | | | 2006-09-11 13:00 | $6 765 | 0.12 | Crankshaft | Jones | | | | 2006-09-11 12:00 | $4 181 | 0.24 | Rod bearing | Jones | | Smith | $19 484 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 11:00 | $6 100 | 0.24 | Spark plug | Smith | | | | GroupId | Sum(price) | | | | | | | 2006-09-06 | $1 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-06 09:00 | $1 000 | 0.24 | Intake valve | Smith | | | 2006-09-07 | $3 000 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-07 11:00 | $2 000 | 0.24 | Spring | Smith | | | | 2006-09-07 10:00 | $1 000 | 0.12 | Rocker arm | Smith | | | 2006-09-09 | $6 800 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-09 12:00 | $5 500 | 0.12 | Oil sump | Smith | | | | 2006-09-09 12:00 | $1 300 | 0.24 | Coolant | Smith | | | 2006-09-10 | $6 100 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-10 11:00 | $6 100 | 0.24 | Spark plug | Smith | | | 2006-09-11 | $2 584 | | | | | | | | Date | Price | Tax | Item | Customer | | | | 2006-09-11 11:00 | $2 584 | 0.12 | Connection rod | Smith | ##### Structured grouping Structured grouping is nested grouping over an array of structs or maps. In this case, each array element is treated as a sub-document and may be grouped separately. See the reference for grouping on[multivalue attributes](reference/grouping-syntax.html#multivalue-attributes)for details. It is also possible to[filter the groups](reference/grouping-syntax.html#filtering-groups)so only matching elements are considered. An example could be: ``` select (…) | all(group(attributes.value) filter(regex("delivery_method",attributes.key)) each(output(sum(price)) each(output(summary())))) ``` ##### Range grouping In the examples above, results are grouped on distinct values, like customer or date. To group on price: ``` select (…) | all(group(price) each(each(output(summary())))) ``` This gives one group per price. To group on price _ranges_, one could compress the price range. This gives prices in $0 - $999 in bucket 0, $1 000 - $2 000 in bucket 1 and so on: ``` select (…) | all(group(price/1000) each(each(output(summary())))) ``` An alternative is using [bucket expressions](reference/grouping-syntax.html#bucket-expressions) - think of a bucket as the range per group. Group on price, make groups have a width of 1000: ``` select (…) | all(group(fixedwidth(price,1000)) each(each(output(summary())))) ``` Use `predefined` to configure group sizes individually (the two below are equivalent): ``` select (…) | all( group(predefined(price, bucket(0,1000), bucket(1000,2000), bucket(2000,5000), bucket(5000,inf))) each(each(output(summary()))) ) ``` This works with strings as well - put Jones and Smith in the second group: ``` select (…) | all(group(predefined(customer, bucket(-inf,"Jones"), bucket("Jones", inf))) each(each(output(summary())))) ``` ... or have Jones in his own group: ``` select (…) | all(group(predefined(customer, bucket<-inf,"Jones">, bucket["Jones"], bucket<"Jones", inf>)) each(each(output(summary())))) ``` Use decimal numbers in bucket definitions if the expression evaluates to a double or float: ``` select (…) | all( group(predefined(tax, bucket(0.0, 0.2), bucket(0.2, 0.5), bucket(0.5, inf))) each( each(output(summary())) ) ) ``` ##### Pagination Grouping supports [continuation](reference/grouping-syntax.html#continuations) objects that are passed as annotations to the grouping statement. The `continuations` annotation is a list of zero or more continuation strings, returned in the grouping result. For example, given the result: ``` ``` { "root": { "children": [ { "children": [ { "children": [ { "fields": { "count()": 7 }, "value": "Jones", "id": "group:string:Jones", "relevance": 1.0 } ], "continuation": { "next": "BGAAABEBEBC", "prev": "BGAAABEABC" }, "id": "grouplist:customer", "label": "customer", "relevance": 1.0 } ], "continuation": { "this": "BGAAABEBCA" }, "id": "group:root:0", "relevance": 1.0 } ], "fields": { "totalCount": 20 }, "id": "toplevel", "relevance": 1.0 } } ``` ``` reproduce the same result by passing the _this_-continuation along the original select: ``` select (…) | { 'continuations':['BGAAABEBCA'] }all(…) ``` To display the next page of customers, pass the _this_-continuation of the root group, and the _next_ continuation of the customer list: ``` select (…) | { 'continuations':['BGAAABEBCA', 'BGAAABEBEBC'] }all(…) ``` To display the previous page of customers, pass the _this_-continuation of the root group, and the _prev_ continuation of the customer list: ``` select (…) | { 'continuations':['BGAAABEBCA', 'BGAAABEABC'] }all(…) ``` The `continuations` annotation is an ordered list of continuation strings. These are combined by replacement so that a continuation given later will replace any shared state with a continuation given before. Also, when using the `continuations` annotation, always pass the _this_-continuation as its first element. **Note:** Continuations work best when the ordering of hits is stable - which can be achieved by using [ranking](ranking.html) or[ordering](reference/grouping-syntax.html#order). Adding a tie-breaker might be needed - like [random.match](reference/rank-features.html#random)or a random double value stored in each document - to keep the ordering stable in case of multiple documents that would otherwise get the same rank score or the same value used for ordering. ##### Expressions Instead of just grouping on some attribute value, the `group` clause may contain arbitrarily complex expressions - see `group` in the[grouping reference](reference/grouping-syntax.html) for an exhaustive list. Examples: - Select everything. For example, `group("all") each(output(sum(price)))` gives total revenue - Select the minimum or maximum of sub-expressions - Addition, subtraction, multiplication, division, and even modulo of sub-expressions - Bitwise operations on sub-expressions - Concatenation of the results of sub-expressions Sum the prices of purchases on a per-hour-of-day basis: ``` select (…) | all(group(mod(div(date,mul(60,60)),24)) each(output(sum(price)))) ``` These types of expressions may also be used inside `output` operations, so instead of simply calculating the sum price of the grouped purchases, calculate the sum income after taxes per customer: ``` select (…) | all(group(customer) each(output(sum(mul(price,sub(1,tax)))))) ``` Note that the validity of an expression depends on the current nesting level. For, while `sum(price)` would be a valid expression for a group of hits, `price` would not. As a general rule, each operator within an expression either applies to a single hit or aggregates values across a group. ##### Search Container API As an alternative to a textual representation, one can use the programmatic API to execute grouping requests. This allows multiple grouping requests to run in parallel, and does not collide with the `yql` parameter - example: ``` ``` @Override public Result search(Query query, Execution execution) { // Create grouping request. GroupingRequest request = GroupingRequest.newInstance(query); request.setRootOperation(new AllOperation() .setGroupBy(new AttributeValue("foo")) .addChild(new EachOperation() .addOutput(new CountAggregator().setLabel("count")))); // Perform grouping request. Result result = execution.search(query); // Process grouping result. Group root = request.getResultGroup(result); GroupList foo = root.getGroupList("foo"); for (Hit hit : foo) { Group group = (Group)hit; Long count = (Long)group.getField("count"); // TODO: Process group and count. } // Pass results back to calling searcher. return result; } ``` ``` Refer to the[API documentation](https://javadoc.io/doc/com.yahoo.vespa/container-search/latest/com/yahoo/search/grouping/package-summary.html) for the complete reference. ##### TopN / Full corpus Simple grouping: count the number of documents in each group: ``` select * from purchase where true | all( group(customer) each(output(count())) ) ``` Two parallel groupings: ``` select * from purchase where true | all( all( group(customer) each(output(count())) ) all( group(item) each(output(count())) ) ) ``` Only the 1000 best hits will be grouped at each content node. Lower accuracy, but higher speed: ``` select * from purchase where true limit 0 | all( max(1000) all( group(customer) each(output(count())) ) ) ``` ##### Selecting groups Do a modulo 3 operation before selecting the group: ``` select * from purchase where true limit 0 | all( group(price % 3) each(output(count())) ) ``` Do `price + tax * price` before selecting the group: ``` select * from purchase where true limit 0 | all( group(price + tax * price) each(output(count())) ) ``` ##### Ordering groups Do a modulo 5 operation before selecting the group - the groups are then ordered by their aggregated sum of attribute "tax": ``` select * from purchase where true limit 0 | all( group(price % 5) order(sum(tax)) each(output(count())) ) ``` Do `price + tax * price` before selecting the group. Ordering is given by the maximum value of attribute "price" in each group: ``` select * from purchase where true limit 0 | all( group(price + tax * price) order(max(price)) each(output(count())) ) ``` Take the average relevance of the groups and multiply it with the number of groups to get a cumulative count: ``` select * from purchase where true limit 0 | all( group(customer) order(avg(relevance()) * count()) each(output(count())) ) ``` One can not directly reference an attribute in the order clause, as this: ``` select * from purchase where true limit 0 | all( group(customer) order(price * count()) each(output(count())) ) ``` However, one can do this: ``` select * from purchase where true limit 0 | all( group(customer) order(max(price) * count()) each(output(count())) ) ``` ##### Collecting aggregates Simple grouping to count the number of documents in each group and return the best hit in each group: ``` select * from purchase where true limit 0 | all( group(customer) each( max(1) each(output(summary())) ) ) ``` Also return the sum of attribute "price": ``` select * from purchase where true limit 0 | all( group(customer) each(max(1) output(count(), sum(price)) each(output(summary()))) ) ``` Also, return an XOR of the 64 most significant bits of an MD5 over the concatenation of attributes "customer", "price" and "tax": ``` select * from purchase where true limit 0 | all(group(customer) each(max(1) output(count(), sum(price), xor(md5(cat(customer, price, tax), 64))) each(output(summary())))) ``` It is also possible to return quantiles, for instance, the p50 and p90 of the price. ``` select * from purchase where true limit 0 | all(group(customer) each(output(quantiles([0.5,0.9], price)))) ``` ##### Grouping Single-level grouping on "customer" attribute, returning at most 5 groups with full hit count as well as the 69 best hits. ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(max(69) output(count()) each(output(summary())))) ``` Two level grouping on "customer" and "item" attribute: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(group(item) max(5) each(max(69) output(count()) each(output(summary())))))) ``` Three-level grouping on "customer", "item" and "attributes.key(coupon)" attribute: ``` select * from purchase where true limit 0 | all(group(customer) max(1) each(output(count()) all(group(item) max(1) each(output(count()) max(1) all(group(attributes.key) max(1) each(output(count()) each(output(summary())))))))) ``` As above, but also collect best hit in level 2: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(attributes.key) max(5) each(max(69) output(count()) each(output(summary())))))))) ``` As above, but also collect best hit in level 1: ``` select * from purchase where true limit 0 | all(group(customer) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(attributes.key) max(5) each(max(69) output(count()) each(output(summary())))))))) ``` As above, but using different document summaries on each level: ``` select * from purchase where true limit 0 | all( group(customer) max(5) each(output(count()) all(max(1) each(output(summary(complexsummary)))) all(group(item) max(5) each(output(count()) all(max(1) each(output(summary(simplesummary)))) all(group(price) max(5) each(max(69) output(count()) each(output(summary(fastsummary)))))) ) ``` Deep grouping with counting and hit collection on all levels: ``` select * from purchase where true limit 0 | all( group(customer) max(5) each(output(count()) all(max(1) each(output(summary()))) all(group(item) each(output(count()) all(max(1) each(output(summary()))) all(group(price) each(output(count()) all(max(1) each(output(summary())))))))) ) ``` ##### Time and date The field (`time` below, but can have any name) must be a [long](reference/schema-reference.html#long), with second resolution (unix timestamp/epoch). See the [reference](reference/grouping-syntax.html#time-expressions) for all time-functions. Group by year: ``` select * from purchase where true limit 0 | all(group(time.year(date)) each(output(count()))) ``` Group by year, then by month: ``` select * from purchase where true limit 0 | all( group(time.year(date)) each(output(count()) all(group(time.monthofyear(date)) each(output(count())))) ) ``` Groups _today_, _yesterday_, _lastweek_, and _lastmonth_ using `predefined` aggregator, and groups each day within each of these separately: ``` select * from purchase where true limit 0 | all( group( predefined((now() - date) / (60 * 60 * 24), bucket(0,1), bucket(1,2), bucket(3,7), bucket(8,31)) ) each(output(count()) all(max(2) each(output(summary()))) all(group((now() - date) / (60 * 60 * 24)) each(output(count()) all(max(2) each(output(summary()))) ) ) ) ) ``` ###### Timezones in grouping The `timezone` query parameter can be used to rewrite each time-function with a timezone offset. See the [reference](reference/query-api-reference.html#timezone). Example: ``` $ vespa query "select * from purchase where true | \ all( group(time.hourofday(date)) each(output(count()))" \ "timezone=America/Los_Angeles" ``` This query selects all documents from `purchase`, groups them by the hour they were made (adjusted to the local time in `America/Los_Angeles`), and counts how many purchases fall into each hour. ##### Counting unique groups The `count` aggregator can be applied on a list of groups to determine the number of unique groups without having to explicitly retrieve all groups. Note that this count is an estimate using HyperLogLog++ which is an algorithm for the count-distinct problem. To get an accurate count, one needs to explicitly retrieve all groups and count them in a custom component or in the middle tier calling out to Vespa. This is network intensive and might not be feasible in cases with many unique groups. Another use case for this aggregator is counting the number of unique instances matching a given expression. Output an estimate of the number of groups, which is equivalent to the number of unique values for attribute "customer": ``` select * from purchase where true limit 0 | all( group(customer) each(output(count())) ) ``` Output an estimate of the number of unique string lengths for the attribute "item": ``` select * from purchase where true limit 0 | all(group(strlen(item)) each(output(count()))) ``` Output the sum of the "price" attribute for each group in addition to the accurate count of the overall number of unique groups as the inner each causes all groups to be returned. ``` select * from purchase where true limit 0 | all(group(customer) output(count()) each(output(sum(price)))) ``` The `max` clause is used to restrict the number of groups returned. The query outputs the sum for the 3 best groups. The `count` clause outputs the estimated number of groups (potentially \>3). The `count` becomes an estimate here as the number of groups is limited by max, while in the above example, it's not limited by max: ``` select * from purchase where true limit 0 | all(group(customer) max(3) output(count()) each(output(sum(price)))) ``` Output the number of top-level groups, and for the 10 best groups, output the number of unique values for attribute "item": ``` select * from purchase where true limit 0 | all(group(customer) max(10) output(count()) each(group(item) output(count()))) ``` ##### Counting unique groups - multivalue fields A [multivalue](/en/schemas.html#multivalue-field) attribute is a [weighted set](/en/reference/schema-reference.html#weightedset), [array](/en/reference/schema-reference.html#array) or [map](/en/reference/schema-reference.html#map). Most grouping functions will just handle the elements of multivalued attributes separately, as if they were all individual values in separate documents. If you are grouping over array of struct or maps, scoping will be used to preserve structure. Each entry in the array/map will be treated as a separate sub-document, so documents can be counted twice or more - see [#33646](https://github.com/vespa-engine/vespa/issues/33646) for details. This could be solved by performing adding an additional level grouping, where you group on a field that is unique for each document (grouping on document id is not supported). You may then count the unique groups to determine the unique document count: ``` select * from purchase where true limit 0 | all(group(customer) each(group(item) output(count()))) ``` ##### Impression forecasting Using impression logs for a given user, one can make a function that maps from rank score to the number of impressions an advertisement would get - example: ``` Score Integer (# impressions for this user) 0.200 0 0.210 1 0.220 2 0.240 3 0.320 4 0.420 5 0.560 6 0.700 7 0.800 8 0.880 9 0.920 10 0.940 11 0.950 12 ``` Storing just the first column (the rank scores, including a rank score for 0 impressions) in an array attribute named _impressions_, the grouping operation[interpolatedlookup(impressions, relevance())](reference/grouping-syntax.html#interpolatedlookup)can be used to figure out how many times a given advertisement would have been shown to this particular user. So if the rank score is 0.420 for a specific user/ad/bid combination, then `interpolatedlookup(impressions, relevance())` would return 5.0. If the bid is increased so the rank score gets to 0.490, it would get 5.5 as the return value instead. In this context, a count of 5.5 isn't meaningful for the past of a single user, but it gives more information that may be used as a forecast. Summing this across more, different users may then be used to forecast the total of future impressions for the advertisement. ##### Aggregating over all documents Grouping is useful for analyzing data. To aggregate over the full document set, create _one_ group (which will have _all_ documents) by using a constant (here 1) - example: ``` select rating from restaurant where true | all(group(1) each(output(avg(price)))) ``` Make sure all documents have a value for the given field, if not, NaN is used, and the final result is also NaN: ``` ``` { "id": "group:long:1", "relevance": 0.0, "value": "1", "fields": { "avg(rating)": "NaN" } } ``` ``` ##### Count fields with NaN Count number of documents missing a value for an [attribute](/en/attributes.html) field (actually, in this example, unset or less than 0, see the bucket expression below). Set a higher query timeout, just in case. Example, analyzing a field called _price_: ``` select rating from restaurant where true | all( group(predefined(price, bucket[-inf, 0>, bucket[0, inf>)) each(output(count())) ) ``` Example output, counting 2 documents with `-inf` in _rating_: ``` ``` "children": [ { "id": "group:long_bucket:-9223372036854775808:0", "relevance": 0.0, "limits": { "from": "-9223372036854775808", "to": "0" }, "fields": { "count()": 2 } }, { "id": "group:long_bucket:0:9223372036854775807", "relevance": 0.0, "limits": { "from": "0", "to": "9223372036854775807" }, "fields": { "count()": 8 } } ] ``` ``` See [analyzing field values](visiting.html#analyzing-field-values) for how to export ids of documents meeting given criteria from the full corpus. ##### List fields with NaN This is similar to the counting of NaN above, but instead of aggregating the count, for each hit, print a [document summary](/en/reference/schema-reference.html#document-summary): ``` select rating from restaurant where true | all( group(predefined(price, bucket[-inf, 0>, bucket[0, inf>)) order(max(price)) max(1) each( max(100) each(output(summary()))) ) ``` Notes: - We are only interested in the first group, so order by `max(price)` and use `max(1)` to get only the first - Uses `max(100)` in order to limit result set sizes. Read more about [grouping.defaultmaxhits](/en/reference/query-api-reference.html#grouping.defaultmaxhits). - Use the [continuation token](#pagination) to iterate over the result set. ##### Grouping over a Map field In the example data, a record looks like: ``` ``` { "fields": { "attributes": { "delivery_method": "Curbside Pickup", "sales_rep": "Bonnie", "coupon": "SAVE10" }, "customer": "Smith", "date": 1157526000, "item": "Intake valve", "price": "1000", "tax": "0.24" } } ``` ``` The map field [schema definition](/en/reference/schema-reference.html#map) is: ``` field attributes type map { indexing: summary struct-field key { indexing: attribute } struct-field value { indexing: attribute } } ``` With this, one can group on both key (`delivery_method`, `sales_rep`, and `coupon`) and values (here counting each value). Try the link to see the output: ``` select * from purchase where true limit 0 | all( group(attributes.key) each( group(attributes.value) each(output(count()))) ) ``` A more interesting example is to see the sum per sales rep: ``` select * from purchase where true limit 0 | all( group(attributes.key) each( group(attributes.value) each(output(sum(price)))) ) ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Grouping Interface](#) - [The grouping language structure](#the-grouping-language-structure) - [Grouping by example](#grouping-by-example) - [Basic Grouping](#basic-grouping) - [Ordering and Limiting Groups](#ordering-and-limiting-groups) - [Example](#ordering-and-limiting-groups-example) - [Hits per Group](#hits-per-group) - [Example](#hits-per-group-example) - [Filter within a group](#hits-per-group) - [Example](#filter-example) - [Global limit for grouping queries](#global-limit) - [Examples](#resource-control-example) - [Combining with default limits for groups/hits](#global-limit-combining) - [Recommended settings](#global-limit-recommendation) - [Performance and Correctness](#performance-and-correctness) - [Nested Groups](#nested-groups) - [Structured grouping](#structured-grouping) - [Range grouping](#range-grouping) - [Pagination](#pagination) - [Expressions](#expressions) - [Search Container API](#search-container-api) - [TopN / Full corpus](#topn-full-corpus) - [Selecting groups](#selecting-groups) - [Ordering groups](#ordering-groups) - [Collecting aggregates](#collecting-aggregates) - [Grouping](#grouping) - [Time and date](#time-and-date) - [Timezones in grouping](#timezone-grouping) - [Counting unique groups](#counting-unique-groups) - [Counting unique groups - multivalue fields](#counting-unique-groups-multivalue-fields) - [Impression forecasting](#impression-forecasting) - [Aggregating over all documents](#aggregating-over-all-documents) - [Count fields with NaN](#count-fields-with-nan) - [List fields with NaN](#list-fields-with-nan) - [Grouping over a Map field](#grouping-over-a-map-field) --- ## Healthchecks ### Healthchecks This is the reference for loadbalancer healthchecks to [containers](../jdisc). #### Healthchecks This is the reference for loadbalancer healthchecks to [containers](../jdisc). By default, a container configures an instance of [VipStatusHandler](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/container/handler/VipStatusHandler.java) to serve `/status.html`. This will respond with status code 200 and text _OK_ if content clusters are UP. See [VipStatus.java](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/java/com/yahoo/container/handler/VipStatus.java) for details. Applications with multiple content clusters should implement custom handlers for healthchecks, if the built-in logic is inadequate for the usage. Also refer to [federation](../federation.html) for how to manage data sources. ##### Override using a status file Use `container.core.vip-status` to make `VipStatusHandler` use a file for health status: ``` true /full-path-to/status-response.html ``` If the file exists, its contents will be served on `/status.html`, otherwise an error message will be generated. To remove a container from service, delete or rename the file to serve. ##### Alternative / multiple paths `VipStatusHandler` only looks at a single file path by default. As it is independent of the URI path, it is possible to configure multiple handler instances to serve alternative or custom messages - example: ``` http://*:*/docproc/freshness-data.xml true /full-path-to/freshness-data.xml http://*:*/docproc/ClusteringDocproc.status true /full-path-to/ClusteringDocproc.status ``` The paths `/docproc/freshness-data.xml` and `/docproc/ClusteringDocproc.status` serves the files located at `/full-path-to/freshness-data.xml` and `/full-path-to/ClusteringDocproc.status`, respectively. As the handler instances are independent, a container can be taken out of one type of rotation without affecting another. Copyright © 2025 - [Cookie Preferences](#) --- ## Hosts ### hosts.xml _hosts.xml_ is a configuration file in an [application package](../application-packages.html). #### hosts.xml _hosts.xml_ is a configuration file in an [application package](../application-packages.html). Elements: ``` hosts[host [name]](#host)[alias](#alias) ``` The purpose of _hosts.xml_ is to add aliases for real hostnames to self-defined aliases. The aliases are used in [services.xml](services.html) to map service instances to hosts. It is only needed when deploying to multiple hosts. ##### host Sub-elements: - [`alias`](#alias) Example: ``` ``` SEARCH0 CONTAINER0 SEARCH1 CONTAINER1 ``` ``` ##### alias Alias used in [services.xml](services.html) to refer to the host. Copyright © 2025 - [Cookie Preferences](#) --- ## Http Api Tutorial ### Building an HTTP API using request handlers and processors This tutorial builds a simple application consisting of these pieces: #### Building an HTTP API using request handlers and processors This tutorial builds a simple application consisting of these pieces: - A custom REST API - implemented in a _request handler_. - Two pieces of request/response processing logic - implemented as two chained _processors_. - A _component_ shared by the above processors. - A custom output format - a _renderer_. The end result is to process incoming request of the form: ``` http://hostname:port/demo?terms=something%20completely%20different ``` into a nested structure response produced by the processors and serialized by the renderer. Use the sample application found at [http-api-using-request-handlers-and-processors](https://github.com/vespa-engine/sample-apps/tree/master/examples/http-api-using-request-handlers-and-processors). ##### Request handler The custom request handler is required to implement a custom API. In many cases it is not necessary to add a custom handler as the Processors can access the request data directly. However, it is needed if e.g. your application wants more control over exactly which parameters are used to route to a particular processing chain. In this case, the request handler will simply add the request URI as a property and then forward to the built-in processing handler for processing. Review the code in [DemoHandler.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoHandler.java) ##### Processors This application contains two processors, one for annotating the incoming request (using default values from config) and checking the result, and one for creating the result using the shared component. ###### AnnotatingProcessor Review the code in [AnnotatingProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/AnnotatingProcessor.java) ###### DataProcessor The other processor creates some structured Response Data from data handled to it in the request. This is done in cases where the web service is a processing service. In cases where the service is implementing some middleware on top of other services, similar processors will instead make outgoing requests to downstream web services to produce Response Data. Review the code in [DataProcessor.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DataProcessor.java) Notice how the task of the server is decomposed into separate Processing steps which can be composed by chaining at configuration time and which communicates through the Request and Response only. This structure enhances sharing, reuse and modularity and makes it easy to create variations where some logic encapsulated in a Processor is added, removed or modified. The order of the processors is decided by the @Before and @After annotations - refer to [chained components](../components/chained-components.html). ###### Custom configuration The default terms used by the AnnotatingProcessor are placed in user configuration, where the definition is in [demo.def](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/resources/configdefinitions/demo.def): ``` package=com.mydomain.demo demo[].term string ``` In other words, a configuration class containing a single array named _demo_, containing a class Demo which only contains single string named _term_. ##### Renderer The responsibility of the renderer is to serialize the structured result into bytes for transport back to the client. Rendering works by first creating a single instance of the renderer, invoking the constructor, then cloning a new renderer for each result set to be rendered. `init()` will be invoked once on each new clone before `render()` is invoked. Review the code in [DemoRenderer.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoRenderer.java) ##### Shared component The responsibility of this custom component is to decouple some parts of the application from the Searcher. This makes it possible to reconfigure the Searcher without rebuilding the potentially costly custom component. In this case, what the component does is more than a little silly. More typical use would be an [FSA](/en/operations/tools.html#vespa-makefsa) or complex, shared helper functionality. Review the code in [DemoComponent.java](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/java/ai/vespa/examples/DemoComponent.java) ##### Application Review the application's configuration in [services.xml](https://github.com/vespa-engine/sample-apps/blob/master/examples/http-api-using-request-handlers-and-processors/src/main/application/services.xml) ##### Try it! Build the project, then [run a test](../developer-guide.html), querying [http://localhost:8080/demo?terms=1%202%203%204](http://localhost:8080/demo?terms=1%202%203%204) gives: ``` OK Renderer initialized: 1369733374898 http://localhost:8080/demo?terms=1%202%203%204 1 2 3 4 Rendering finished work: 1369733374902 ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Request handler](#request-handler) - [Processors](#processors) - [AnnotatingProcessor](#annotatingprocessor) - [DataProcessor](#dataprocessor) - [Custom configuration](#custom-configuration) - [Renderer](#renderer) - [Shared component](#shared-component) - [Application](#application) - [Try it!](#try-it) --- ## Http Best Practices ### HTTP Best Practices As connections to a JDisc container cluster are terminated at the individual container nodes, the cost of connection overhead will impact their serving capability. #### HTTP Best Practices ##### Always re-use connections As connections to a JDisc container cluster are terminated at the individual container nodes, the cost of connection overhead will impact their serving capability. This is especially important for HTTPS/TLS as full TLS handshakes are expensive in terms of CPU cycles. A handshake also entails multiple network round-trips that certainly degrades request latency for new connections. A client instance should therefore re-use HTTPS connections if possible for subsequent requests. Note that some client implementation may not re-use connections by default. For instance _Apache HttpClient (Java)_[will by default not re-use connections when configured with a client X.509 certificate](https://stackoverflow.com/a/13049131/1615280). Most programmatic clients require the response content to be fully consumed/read for a connection to be reused. ##### Use multiple connections Clients performing feed/query must use sufficient number of connections to spread the load evenly among all containers in a cluster. This is due to container clusters being served through a layer 4 load balancer (_Network Load Balancer_). Too few connections overall may result in an unbalanced workload, and some containers may not receive any traffic at all. This aspect is particular relevant for applications with large container clusters and/or few client instances. ##### Be aware of server-initiated connection termination Vespa Cloud will terminate idle connections after a timeout and active connections after a max age threshold is exceeded. The latter is performed gracefully through mechanisms in the HTTP protocol. - _HTTP/1.1_: A `Connection: close` header is added to the response for the subsequent request received after timeout. - _HTTP/2_: A `GOAWAY` frame with error code `NO_ERROR (0x0)` is returned for the subsequent request received after timeout. Be aware that some client implementation may not handle this scenario gracefully. Both the idle timeout and max age threshold are aggressive to regularly rebalanced traffic. This ensures that new container nodes quickly receives traffic from existing client instances, for example when new resources are introduced by the [autoscaler](autoscaling.html). To avoid connection termination issues, clients should either set the `Connection: close` header to explicitly close connections after each request, or configure client-side idle timeouts to **30 seconds or less**. Doing so proactively closes idle connections before the server does and helps prevent errors caused by server-initiated terminations. ##### Prefer HTTP/2 We recommend _HTTP/2_ over _HTTP/1.1_. _HTTP/2_ multiplexes multiple concurrent requests over a single connection, and its binary protocol is more compact and efficient. See Vespa's documentation on [HTTP/2](/en/performance/http2.html) for more details. ##### Be deliberate with timeouts and retries Make sure to configure your clients with sensible timeouts and retry policies. Too low timeouts combined with aggressive retries may cause havoc on your Vespa application if latency increases due to overload. Handle _transient failures_ and _partial failures_ through a retry strategy with backoff, for instance _capped exponential backoff_ with a random _jitter_. Consider implementing a [_circuit-breaker_](https://martinfowler.com/bliki/CircuitBreaker.html) for failures persisting over a longer time-span. Only retry requests on _server errors_ - not on _client errors_. A client should typically not retry requests after receiving a `400 Bad Request` response, or retry a TLS connection after handshake fails with client's X.509 certificate being expired. Be careful when handling 5xx responses, especially `503 Service Unavailable` and `504 Gateway Timeout`. These responses typically indicate an overloaded system, and blindly retrying without backoff will only worsen the situation. Clients should reduce overall throughput when receiving such responses. The same principle applies to `429 Too Many Requests` responses from the [Document v1 API](/en/document-v1-api-guide.html), which indicates that the client is exceeding the system's feed capacity. Clients should implement strategies such as reducing the request rate by a specific percentage, introducing exponential backoff, or pausing requests for a short duration before retrying. These adjustments help prevent further overload and allow the system to recover. For more general advise on retries and timeouts see _Amazon Builder's Library_'s[excellent article](https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/) on the subject. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Always re-use connections](#always-re-use-connections) - [Use multiple connections](#use-multiple-connections) - [Be aware of server-initiated connection termination](#be-aware-of-server-initiated-connection-termination) - [Prefer HTTP/2](#prefer-http2) - [Be deliberate with timeouts and retries](#be-deliberate-with-timeouts-and-retries) --- ## Http Server And Filters ### Configuring Http Servers and Filters This document explains how to set up http servers and filters in the Container. #### Configuring Http Servers and Filters This document explains how to set up http servers and filters in the Container. Before proceeding, familiarize with the [Developer Guide](../developer-guide.html). ##### Set up Http servers To accept http requests on e.g. port 8090, add an `http` section with a server to _services.xml_: ``` ``` ``` ``` To verify that the new server is running, check the default handler on the root path, which will return a list of all http servers: ``` $ curl http://localhost:8090/ ``` Adding an `http` section to _services.xml_**disables the default http server** at port 8080. Binding to privileged ports (\< 1024) is supported. Note that this **only** works when running as a standalone container, and **not** when running as a Vespa cluster. ###### Configure the HTTP Server Configuration settings for the server can be modified by setting values for the `jdisc.http.connector` config inside the `server` element: ``` ``` false ``` ``` Note that it is not allowed to set the `listenPort` in the http-server config, as it conflicts with the port that is set in the _port_ attribute in the _server_ element. For a complete list of configuration fields that can be set, refer to the config definition schema in [jdisc.http.connector.def](https://github.com/vespa-engine/vespa/blob/master/container-core/src/main/resources/configdefinitions/jdisc.http.jdisc.http.connector.def). ###### TLS TLS can be configured using either the [ssl](../reference/services-http.html#ssl) or the [ssl-provider](../reference/services-http.html#ssl-provider) element. ``` ``` /path/to/private-key.pem /path/to/certificate.pem /path/to/ca-certificates.pem want TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLSv1.2,TLSv1.3 ``` ``` Refer to the [multinode-HA](https://github.com/vespa-engine/sample-apps/tree/master/examples/operations/multinode-HA) sample application for an example. ##### Set up Filter Chains There are two main types of filters: - request filters - response filters Request filters run before the handler that processes the request, and response filters run after. They are used for tasks such as authentication, error checking and modifying headers. ###### Using Filter Chains Filter chains are set up by using the `request-chain` and `response-chain` elements inside the [filtering](../reference/services-http.html#filtering) element. Example setting up two request filter chains, and one response filter chain: ``` ``` ``` ``` Filters that should be used in more than one chain, must be defined directly in the `filtering` element, as shown with `request-filter1` in the example above. To actually use a filter chain, add one or more URI [bindings](../reference/services-http.html#binding): ``` ``` http://*/* http://*/* ``` ``` These bindings say that both the request chain and the response chain should be used when the request URI matches `http://*/*`. So both a request filter chain and a response filter chain can be used on a single request. However, only one request chain will be used if there are multiple request chains that have a binding that matches a request. And vice versa for response chains. Refer to the [javadoc](https://javadoc.io/doc/com.yahoo.vespa/jdisc_core/latest/com/yahoo/jdisc/application/UriPattern.html) for information about which chain that will be used in such cases. In order to bind a filter chain to a specific _server_, add the server port to the binding: ``` ``` http://*:8080/* http://*:9000/* ``` ``` A request must match a filter chain if any filter is configured. A 403 response is returned for non-matching request. This semantic can be disabled - see [strict-mode](../reference/services-http.html#filtering). ###### Excluding Filters from an Inherited Chain Say you have a request filter chain that you are binding to most of your URIs. Now, you want to run almost the same chain on another URI, but you need to exclude one of the filters. This is done by adding `excludes`, which takes a space separated list of filter ids, to the [chain element](../reference/services-http.html#chain). Example where a security filter is excluded from an inherited chain for _status.html_: ``` ``` http://*/status.html ``` ``` ###### Creating a custom Filter Create an [application package](../developer-guide.html) with artifactId `filter-bundle`. Create a new file `filter-bundle/components/src/main/java/com/yahoo/demo/TestRequestFilter.java`: ``` ``` package com.yahoo.demo; import com.yahoo.jdisc.*; import com.yahoo.jdisc.handler.*; import com.yahoo.jdisc.http.*; import com.yahoo.jdisc.http.filter.RequestFilter; import java.net.*; import java.nio.ByteBuffer; public class TestRequestFilter extends AbstractResource implements RequestFilter { @Override public void filter(HttpRequest httpRequest, ResponseHandler responseHandler) { if (isLocalAddress(httpRequest.getRemoteAddress())) { rejectRequest(httpRequest, responseHandler); } else { httpRequest.context().put("X-NOT-LOCALHOST", "true"); } } private boolean isLocalAddress(SocketAddress socketAddress) { if (socketAddress instanceof InetSocketAddress) { InetAddress address = ((InetSocketAddress)socketAddress).getAddress(); return address.isAnyLocalAddress() || address.isLoopbackAddress(); } else { return false; } } private void rejectRequest(HttpRequest request, ResponseHandler responseHandler) { HttpResponse response = HttpResponse.newInstance(request, Response.Status.FORBIDDEN); ContentChannel channel = responseHandler.handleResponse(response); channel.write(ByteBuffer.wrap("Not accessible by localhost.".getBytes()), null); channel.close(null); } } ``` ``` Build a bundle, and place it in the [application package](../application-packages.html)'s _components_ directory. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Set up Http servers](#set-up-http-servers) - [Configure the HTTP Server](#configure-the-http-server) - [TLS](#tls) - [Set up Filter Chains](#set-up-filter-chains) - [Using Filter Chains](#using-filter-chains) - [Creating a custom Filter](#creating-a-custom-filter) --- ## Http2 ### HTTP/2 This document contains HTTP/2 performance considerations on the container—see [Container tuning](container-tuning.html) for general tuning of container clusters. #### HTTP/2 This document contains HTTP/2 performance considerations on the container—see [Container tuning](container-tuning.html) for general tuning of container clusters. ##### Enabling HTTP/2 on container HTTP/2 is enabled by default on a container for all connectors. We recommend HTTP/2 with TLS, both for added security, but also for a more robust connection upgrade mechanism. Web browsers will typically only allow HTTP/2 over TLS. ###### HTTP/2 with TLS Both HTTP/1.1 and HTTP/2 will be served over the same connector using the [TLS ALPN Extension](https://datatracker.ietf.org/doc/html/rfc7301). The Application-Layer Protocol Negotiation (ALPN) extension allows the client to send a list of supported protocols during TLS handshake. The container selects a supported protocol from that list. The [HTTP/2 specification](https://datatracker.ietf.org/doc/html/rfc7540) dictates multiple requirements for the TLS connection. Vespa may enforce some or all of these restrictions. See the HTTP/2 specification for the full list. The most significant are listed below: - Client must use at least TLSv1.2. - Client must provide target domain with the TLS Server Name Indication (SNI) Extension. - Client must not use any of the banned [TLSv1.2 ciphers](https://datatracker.ietf.org/doc/html/rfc7540#appendix-A). ###### HTTP/2 without TLS The jdisc container supports both mechanism for HTTP/2 without TLS - see [testing](#testing): 1. Upgrading to HTTP/2 from HTTP/1 2. HTTP/2 with prior knowledge ##### Feeding over HTTP/2 One of the major improvements with HTTP/2 is multiplexing of multiple concurrent requests over a single TCP connection. This allows for high-throughput feeding through the [/document/v1/](../reference/document-v1-api-reference.html) HTTP API, with a simple one-operation–one-request model, but without the overhead of hundreds of parallel connections that HTTP/1.1 would require for sufficient concurrency. `vespa feed` in the [Vespa CLI](../vespa-cli.html#documents) and [vespa-feed-client](../vespa-feed-client.html) use /document/v1/ over HTTP/2. ##### Performance tuning ###### Client The number of multiple concurrent requests per connection is typically adjustable in HTTP/2 clients/libraries. Document v1 API is designed for high concurrency and can easily handle thousands of concurrent requests. Its implementation is asynchronous and max concurrency is not restricted by a thread pool size, so configure your client to allow enough concurrent requests/streams to saturate the feed container. Other APIs such as the [Query API](../query-api.html) is backed by a synchronous implementation, and max concurrency is restricted by the [underlying thread pool size](container-tuning.html#container-worker-threads). Too many concurrent streams may result in the container rejecting requests with 503 responses. There are also still some reasons to use multiple TCP connections—even with HTTP/2: - **Utilize multiple containers**. A single container may not saturate the content layer. A client may have to use more connections than container nodes if the containers are behind a load balancer. - **Higher throughput**. Many clients allow only for a single thread to operate each connection. Multiple connections may be required for utilizing several CPU cores. ##### Client recommendations Use [vespa-feed-client](../vespa-feed-client.html) for feeding through Document v1 API (JDK8+). We recommend the [h2load benchmarking tool](https://nghttp2.org/documentation/h2load-howto.html) for load testing. [vespa-fbench](/en/operations/tools.html#vespa-fbench) does not support HTTP/2 at the moment. For Java there are 4 good alternatives: 1. [Jetty Client](https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/client/HttpClient.html) 2. [OkHttp](https://square.github.io/okhttp/) 3. [Apache HttpClient 5.x](https://hc.apache.org/httpcomponents-client-5.1.x/) 4. [java.net.http.HttpClient (JDK11+)](https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html) ##### Testing The server does not perform a protocol upgrade if a request contains content (POST, PUT, PATCH with payload). This might be a limitation in Jetty, the HTTP server used in Vespa. Any client should assume HTTP/2 supported - example using `curl --http2-prior-knowledge`: ``` $ curl -i --http2-prior-knowledge \ -X POST -H 'Content-Type: application/json' \ --data-binary @ext/A-Head-Full-of-Dreams.json \ http://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsHTTP/2 200date: Tue, 06 Dec 2022 11:04:13 GMT content-type: application/json;charset=utf-8 vary: Accept-Encoding content-length: 122 ``` Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Enabling HTTP/2 on container](#enabling-http-2-on-container) - [HTTP/2 with TLS](#http-2-with-tls) - [HTTP/2 without TLS](#http-2-without-tls) - [Feeding over HTTP/2](#feeding-over-http-2) - [Performance tuning](#performance-tuning) - [Client](#client) - [Client recommendations](#client-recommendations) - [Testing](#testing) --- ## Hybrid Search ### Hybrid Text Search Tutorial Hybrid search combines different retrieval methods to improve search quality. #### Hybrid Text Search Tutorial Hybrid search combines different retrieval methods to improve search quality. This tutorial distinguishes between two core components of search: - **Retrieval**: Identifying a subset of potentially relevant documents from a large corpus. Traditional lexical methods like [BM25](../reference/bm25.html) excel at this, as do modern, embedding-based [vector search](../vector-search.html) approaches. - **Ranking**: Ordering retrieved documents by relevance to refine the results. Vespa's flexible [ranking framework](../ranking.html) enables complex scoring mechanisms. This tutorial demonstrates building a hybrid search application with Vespa that leverages the strengths of both lexical and embedding-based approaches. We'll use the [NFCorpus](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/) dataset from the [BEIR](https://github.com/beir-cellar/beir) benchmark and explore various hybrid search techniques using Vespa's query language and ranking features. The main goal is to set up a text search app that combines simple text scoring features such as [BM25](../reference/bm25.html) [1](#fn:1) with vector search in combination with text-embedding models. We demonstrate how to obtain text embeddings within Vespa using Vespa's [embedder](/en/embedding.html#huggingface-embedder)functionality. In this guide, we use [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) as the text embedding model. It is a small model that is fast to run and has a small memory footprint. **Prerequisites:** - Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with Podman or [Docker](https://docs.docker.com/engine/install/) installed. See [Docker Containers](/en/operations-selfhosted/docker-containers.html) for system limits and other settings. For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html) - Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations-selfhosted/node-setup.html#memory-settings). - Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/operations/feed-block.html). - [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). - Python3 - `curl` ##### Installing vespa-cli and ir\_datasets This tutorial uses [Vespa-CLI](../vespa-cli.html) to deploy, feed, and query Vespa. We also use [ir-datasets](https://ir-datasets.com/) to obtain the NFCorpus relevance dataset. ``` $ pip3 install --ignore-installed vespacli ir_datasets ir_measures requests ``` We can quickly look at a document from [nfcorpus](https://ir-datasets.com/beir.html#beir/nfcorpus): ``` $ ir_datasets export beir/nfcorpus docs --format jsonl | head -1 ``` Which outputs: ``` ``` {"doc_id": "MED-10", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995\u20132003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08\u20139.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38\u20130.55 and HR 0.54, 95% CI 0.44\u20130.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins\u2019 effect on survival in breast cancer patients.", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "url": "http://www.ncbi.nlm.nih.gov/pubmed/25329299"} ``` ``` The NFCorpus documents have four fields: - The `doc_id` and `url` - The `text` and the `title` We are interested in the title and the text, and we want to be able to search across these two fields. We also need to store the `doc_id` to evaluate [ranking](../ranking.html)accuracy. We will create a small script that converts the above output to [Vespa JSON document](../reference/document-json-format.html) format. Create a `convert.py` file: ``` ``` import sys import json for line in sys.stdin: doc = json.loads(line) del doc['url'] vespa_doc = { "put": "id:hybrid-search:doc::%s" % doc['doc_id'], "fields": { **doc } } print(json.dumps(vespa_doc)) ``` ``` With this script, we convert the document dump to Vespa JSON format. Use the following command to convert the entire dataset to Vespa JSON format: ``` $ ir_datasets export beir/nfcorpus docs --format jsonl | python3 convert.py > vespa-docs.jsonl ``` Now, we will create the Vespa application package and schema to index the documents. ##### Create a Vespa Application Package A [Vespa application package](../application-packages.html) is a set of configuration files and optional Java components that together define the behavior of a Vespa system. Let us define the minimum set of required files to create our hybrid text search application: `doc.sd` and `services.xml`. ``` $ mkdir -p app/schemas ``` ###### Schema A [schema](../schemas.html) is a document-type configuration; a single Vespa application can have multiple schemas with document types. For this application, we define a schema `doc`, which must be saved in a file named `schemas/doc.sd` in the application package directory. Write the following to `app/schemas/doc.sd`: ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile bm25 { first-phase { expression: bm25(title) + bm25(text) } } rank-profile semantic { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) } } } ``` A lot is happening here; let us go through it in detail. ###### Document type and fields The `document` section contains the fields of the document, their types, and how Vespa should index and [match](/en/reference/schema-reference.html#match) them. The field property `indexing` configures the _indexing pipeline_ for a field. For more information, see [schemas - indexing](../schemas.html#indexing). The [string](../reference/schema-reference.html#string) data type represents both unstructured and structured texts, and there are significant differences between [index and attribute](../text-matching.html#index-and-attribute). The above schema includes default `match` modes for `attribute` and `index` property for visibility. Note that we are enabling [BM25](../reference/bm25.html) for `title` and `text`by including `index: enable-bm25`. The language field is the only field that is not the NFCorpus dataset. We hardcode its value to "en" since the dataset is English. Using `set_language` avoids automatic language detection and uses the value when processing the other text fields. Read more in [linguistics](../linguistics.html). ###### Fieldset for matching across multiple fields [Fieldset](../reference/schema-reference.html#fieldset) allows searching across multiple fields. Defining `fieldset` does not add indexing/storage overhead. String fields grouped using fieldsets must share the same [match](../reference/schema-reference.html#match) and [linguistic processing](../linguistics.html) settings because the query processing that searches a field or fieldset uses _one_ type of transformation. ###### Embedding inference Our `embedding` vector field is of [tensor](../tensor-user-guide.html) type with a single named dimension (`v`) of 384 values. ``` field embedding type tensor(v[384]) { indexing: input title." ".input text | embed arctic | attribute attribute { distance-metric: angular } } ``` The `indexing` expression creates the input to the `embed` inference call (in our example the concatenation of the title and the text field). Since the dataset is small, we do not specify `index` which would build [HNSW](../approximate-nn-hnsw.html) data structures for faster (but approximate) vector search. This guide uses [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) as the text embedding model. The model is trained with cosine similarity, which maps to Vespa's `angular` [distance-metric](../reference/schema-reference.html#distance-metric) for nearestNeighbor search. ###### Ranking to determine matched documents ordering You can define many [rank profiles](../ranking.html), named collections of score calculations, and ranking phases. In this starting point, we have two simple rank-profile's: - a `bm25` rank-profile that uses [BM25](../reference/bm25.html). We sum the two field-level BM25 scores using a Vespa [ranking expression](../ranking-expressions-features.html). - a `semantic` rank-profile which is used in combination Vespa's nearestNeighbor query operator (vector search). Both profiles specify a single [ranking phase](../phased-ranking.html). ###### Services Specification The [services.xml](../reference/services.html) defines the services that make up the Vespa application — which services to run and how many nodes per service. Write the following to `app/services.xml`: ``` ``` cls Represent this sentence for searching relevant passages: 1 ``` ``` Some notes about the elements above: - `` defines the [container cluster](../jdisc/index.html) for document, query and result processing. - `` sets up the [query endpoint](../query-api.html). The default port is 8080. - `` sets up the [document endpoint](../reference/document-v1-api-reference.html) for feeding. - `` with type `hugging-face-embedder` configures the embedder in the application package. This includes where to fetch the model files from, the prepend instructions, and the pooling strategy. See [huggingface-embedder](../embedding.html#huggingface-embedder) for details and other embedders supported. - `` defines how documents are stored and searched. - `` denotes how many copies to keep of each document. - `` assigns the document types in the _schema_ to content clusters. ##### Deploy the application package Once we have finished writing our application package, we can deploy it. We use settings similar to those in the [Vespa quick start guide](../deploy-an-application-local.html). Start the Vespa container: ``` $ docker run --detach --name vespa-hybrid --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa ``` Notice that we publish two ports: 8080 is the data-plane where we write and query documents, and 19071 is the control-plane where we can deploy the application. Note that the data-plane port is inactive before deploying the application. Configure the Vespa CLI to use the local container: ``` $ vespa config set target local ``` Starting the container can take a short while. Make sure that the configuration service is running by using `vespa status`. ``` $ vespa status deploy --wait 300 ``` Now, deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` ##### Feed the data The data fed to Vespa must match the document type in the schema. This step performs embed inference inside Vespa using the snowflake arctic embedding model. Remember the `component` definition in `services.xml` and the `embed` call in the schema. ``` $ vespa feed -t http://localhost:8080 vespa-docs.jsonl ``` The output should look like this (rates may vary depending on your machine HW): ``` ``` { "feeder.operation.count": 3633, "feeder.seconds": 148.515, "feeder.ok.count": 3633, "feeder.ok.rate": 24.462, "feeder.error.count": 0, "feeder.inflight.count": 0, "http.request.count": 3633, "http.request.bytes": 2985517, "http.request.MBps": 0.020, "http.exception.count": 0, "http.response.count": 3633, "http.response.bytes": 348320, "http.response.MBps": 0.002, "http.response.error.count": 0, "http.response.latency.millis.min": 316, "http.response.latency.millis.avg": 787, "http.response.latency.millis.max": 1704, "http.response.code.counts": { "200": 3633 } } ``` ``` Notice: - `feeder.ok.rate` which is the throughput (Note that this step includes embedding inference). See [embedder-performance](../embedding.html#embedder-performance) for details on embedding inference performance. In this case, embedding inference is the bottleneck for overall indexing throughput. - `http.response.code.counts` matches with `feeder.ok.count`. The dataset has 3633 documents. Note that if you observe any `429` responses, these are harmless. Vespa asks the client to slow down the feed speed because of resource contention. ##### Sample queries We can now run a few sample queries to demonstrate various ways to perform searches over this data using the [Vespa query language](../query-language.html). ``` $ ir_datasets export beir/nfcorpus/test queries --fields query_id text | head -1 ``` ``` PLAIN-2 Do Cholesterol Statin Drugs Cause Breast Cancer? ``` If you see a pipe related error from the above command, you can safely ignore it. Here, `PLAIN-2` is the query id of the first test query. We'll use this test query to demonstrate querying Vespa. ###### Lexical search with BM25 scoring The following query uses [weakAnd](../using-wand-with-vespa.html) and where `targetHits` is a hint of how many documents we want to expose to configurable [ranking phases](../phased-ranking.html). Refer to [text search tutorial](text-search.html#querying-the-data) for more on querying with `userInput`. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10}userInput(@user-query)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'hits=1' \ 'language=en' \ 'ranking=bm25' ``` Notice that we choose `ranking` to specify which rank profile to rank the documents retrieved by the query. This query returns the following [JSON result response](../reference/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 46 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-10", "relevance": 25.521817426330887, "source": "content", "fields": { "sddocname": "doc", "documentid": "id:doc:doc::MED-10", "doc_id": "MED-10", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08–9.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38–0.55 and HR 0.54, 95% CI 0.44–0.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins’ effect on survival in breast cancer patients." } } ] } } ``` ``` The query retrieves and ranks `MED-10` as the most relevant document—notice the `totalCount` which is the number of documents that were retrieved for ranking phases. In this case, we exposed about 50 documents to first-phase ranking, it is higher than our target, but also fewer than the total number of documents that match any query terms. In the example below, we change the grammar from the default `weakAnd` to `any`, and the query matches 1780, or almost 50% of the indexed documents. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10, grammar:"any"}userInput(@user-query)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'hits=1' \ 'language=en' \ 'ranking=bm25' ``` The bm25 rank profile calculates the relevance score (~25.521), which is configured in the schema as: ``` rank-profile bm25 { first-phase { expression: bm25(title) + bm25(text) } } ``` So, in this case, `relevance` is the sum of the two BM25 scores. The retrieved document looks relevant; we can look at the graded judgment for this query `PLAIN-2`. The following exports the query relevance judgments (we grep for the query id that we are interested in): ``` $ ir_datasets export beir/nfcorpus/test qrels | grep "PLAIN-2 " ``` The following is the output from the above command. Notice line two, the `MED-10` document retrieved above, is judged as very relevant with the grade 2 (perfect) for the query\_id PLAIN-2. This dataset has graded relevance judgments where a grade of 1 is less relevant than 2. Documents retrieved by the system without a relevance judgment are assumed to be irrelevant (grade 0). ``` PLAIN-2 0 MED-2427 2 PLAIN-2 0 MED-10 2 PLAIN-2 0 MED-2429 2 PLAIN-2 0 MED-2430 2 PLAIN-2 0 MED-2431 2 PLAIN-2 0 MED-14 2 PLAIN-2 0 MED-2432 2 PLAIN-2 0 MED-2428 1 PLAIN-2 0 MED-2440 1 PLAIN-2 0 MED-2434 1 PLAIN-2 0 MED-2435 1 PLAIN-2 0 MED-2436 1 PLAIN-2 0 MED-2437 1 PLAIN-2 0 MED-2438 1 PLAIN-2 0 MED-2439 1 PLAIN-2 0 MED-3597 1 PLAIN-2 0 MED-3598 1 PLAIN-2 0 MED-3599 1 PLAIN-2 0 MED-4556 1 PLAIN-2 0 MED-4559 1 PLAIN-2 0 MED-4560 1 PLAIN-2 0 MED-4828 1 PLAIN-2 0 MED-4829 1 PLAIN-2 0 MED-4830 1 ``` ###### Dense search using text embedding Now, we turn to embedding-based retrieval, where we embed the query text using the configured text-embedding model and perform an exact `nearestNeighbor` search. We use [embed query](../embedding.html#embedding-a-query-text) to produce the input tensor `query(e)`, defined in the `semantic` rank-profile in the schema. ``` $ vespa query \ 'yql=select * from doc where {targetHits:10}nearestNeighbor(embedding,e)' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'ranking=semantic' ``` This query returns the following [JSON result response](../reference/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 64 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-2429", "relevance": 0.6061378635706601, "source": "content", "fields": { "sddocname": "doc", "documentid": "id:doc:doc::MED-2429", "doc_id": "MED-2429", "title": "Statin use and risk of breast cancer: a meta-analysis of observational studies.", "text": "Emerging evidence suggests that statins' may decrease the risk of cancers. However, available evidence on breast cancer is conflicting. We, therefore, examined the association between statin use and risk of breast cancer by conducting a detailed meta-analysis of all observational studies published regarding this subject. PubMed database and bibliographies of retrieved articles were searched for epidemiological studies published up to January 2012, investigating the relationship between statin use and breast cancer. Before meta-analysis, the studies were evaluated for publication bias and heterogeneity. Combined relative risk (RR) and 95 % confidence interval (CI) were calculated using a random-effects model (DerSimonian and Laird method). Subgroup analyses, sensitivity analysis, and cumulative meta-analysis were also performed. A total of 24 (13 cohort and 11 case-control) studies involving more than 2.4 million participants, including 76,759 breast cancer cases contributed to this analysis. We found no evidence of publication bias and evidence of heterogeneity among the studies. Statin use and long-term statin use did not significantly affect breast cancer risk (RR = 0.99, 95 % CI = 0.94, 1.04 and RR = 1.03, 95 % CI = 0.96, 1.11, respectively). When the analysis was stratified into subgroups, there was no evidence that study design substantially influenced the effect estimate. Sensitivity analysis confirmed the stability of our results. Cumulative meta-analysis showed a change in trend of reporting risk of breast cancer from positive to negative in statin users between 1993 and 2011. Our meta-analysis findings do not support the hypothesis that statins' have a protective effect against breast cancer. More randomized clinical trials and observational studies are needed to confirm this association with underlying biological mechanisms in the future." } } ] } } ``` ``` The result of this vector-based search differed from the previous sparse keyword search, with a different relevant document at position 1. In this case, the relevance score is 0.606 and calculated by the `closeness` function in the `semantic` rank-profile. Note that more documents were retrieved than the `targetHits`. ``` rank-profile semantic { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) } } ``` Where [closeness(field, embedding)](../reference/rank-features.html#attribute-match-features-normalized) is a ranking feature that calculates the cosine similarity between the query and the document embedding. This returns the inverted of the distance between the two vectors. Small distance = higher closeness. This because Vespa sorts results in descending order of relevance. Descending order means the largest will appear at the top of the ranked list. Note that similarity scores of embedding vectors are often optimized via contrastive or ranking losses, which make them difficult to interpret. ##### Evaluate ranking accuracy The previous section demonstrated how to combine the Vespa query language with rank profiles to implement two different retrieval and ranking strategies. In the following section we evaluate all 323 test queries with both models to compare their overall effectiveness, measured using [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain). `nDCG@10` is the official evaluation metric of the BEIR benchmark and is an appropriate metric for test sets with graded relevance judgments. For this evaluation task, we need to write a small script. The following script iterates over the queries in the test set, executes the query against the Vespa instance, and reads the response from Vespa. It then evaluates and prints the metric. The overall effectiveness is measured using the average of each query `nDCG@10` metric. ``` ``` import requests import ir_datasets from ir_measures import calc_aggregate, nDCG, ScoredDoc from enum import Enum from typing import List class RModel(Enum): SPARSE = 1 DENSE = 2 HYBRID = 3 def parse_vespa_response(response:dict, qid:str) -> List[ScoredDoc]: result = [] hits = response['root'].get('children',[]) for hit in hits: doc_id = hit['fields']['doc_id'] relevance = hit['relevance'] result.append(ScoredDoc(qid, doc_id, relevance)) return result def search(query:str, qid:str, ranking:str, hits=10, language="en", mode=RModel.SPARSE) -> List[ScoredDoc]: yql = "select doc_id from doc where ({targetHits:100}userInput(@user-query))" if mode == RModel.DENSE: yql = "select doc_id from doc where ({targetHits:10}nearestNeighbor(embedding, e))" elif mode == RModel.HYBRID: yql = "select doc_id from doc where ({targetHits:100}userInput(@user-query)) OR ({targetHits:10}nearestNeighbor(embedding, e))" query_request = { 'yql': yql, 'user-query': query, 'ranking.profile': ranking, 'hits' : hits, 'language': language } if mode == RModel.DENSE or mode == RModel.HYBRID: query_request['input.query(e)'] = "embed(@user-query)" response = requests.post("http://localhost:8080/search/", json=query_request) if response.ok: return parse_vespa_response(response.json(), qid) else: print("Search request failed with response " + str(response.json())) return [] def main(): import argparse parser = argparse.ArgumentParser(description='Evaluate ranking models') parser.add_argument('--ranking', type=str, required=True, help='Vespa ranking profile') parser.add_argument('--mode', type=str, default="sparse", help='retrieval mode, valid values are sparse, dense, hybrid') args = parser.parse_args() mode = RModel.HYBRID if args.mode == "sparse": mode = RModel.SPARSE elif args.mode == "dense": mode = RModel.DENSE dataset = ir_datasets.load("beir/nfcorpus/test") results = [] metrics = [nDCG@10] for query in dataset.queries_iter(): qid = query.query_id query_text = query.text results.extend(search(query_text, qid, args.ranking, mode=mode)) metrics = calc_aggregate(metrics, dataset.qrels, results) print("Ranking metric NDCG@10 for rank profile {}: {:.4f}".format(args.ranking, metrics[nDCG@10])) if __name__ == "__main__": main() ``` ``` Then execute the script: ``` $ python3 evaluate_ranking.py --ranking bm25 --mode sparse ``` The script will produce the following output: ``` Ranking metric NDCG@10 for rank profile bm25: 0.3210 ``` Now, we can evaluate the dense model using the same script: ``` $ python3 evaluate_ranking.py --ranking semantic --mode dense ``` ``` Ranking metric NDCG@10 for rank profile semantic: 0.3077 ``` Note that the _average_ `nDCG@10` score is computed across all the 327 test queries. You can also experiment beyond a single metric and modify the script to calculate more [measures](https://ir-measur.es/en/latest/measures.html), for example, including precision with a relevance label cutoff of 2: ``` metrics = [nDCG@10, P(rel=2)@10] ``` Also note that the exact nDCG@10 values may vary slightly between runs. ##### Hybrid Search & Ranking We demonstrated and evaluated two independent retrieval and ranking strategies in the previous sections. Now, we want to explore hybrid search techniques where we combine: - traditional lexical keyword matching with a text scoring method (BM25) - embedding-based search using a text embedding model With Vespa, there is a distinction between retrieval (matching) and configurable [ranking](../ranking.html). In the Vespa ranking phases, we can express arbitrary scoring complexity with the full power of the Vespa [ranking](../ranking.html) framework. Meanwhile, top-k retrieval relies on simple built-in functions associated with Vespa's top-k query operators. These top-k operators aim to avoid scoring all documents in the collection for a query by using a simplistic scoring function to identify the top-k documents. These top-k query operators use `index` structures to accelerate the query evaluation, avoiding scoring all documents using heuristics. In the context of hybrid text search, the following Vespa top-k query operators are relevant: - YQL `{targetHits:k}nearestNeighbor()` for dense representations (text embeddings) using a configured [distance-metric](../reference/schema-reference.html#distance-metric) as the scoring function. - YQL `{targetHits:k}userInput(@user-query)` which by default uses [weakAnd](../using-wand-with-vespa.html) for sparse representations. We can combine these operators using boolean query operators like AND/OR/RANK to express a hybrid search query. Then, there is a wild number of ways that we can combine various signals in [ranking](../ranking.html). ###### Define our first simple hybrid rank profile First, we can add our first simple hybrid rank profile that combines the dense and sparse components using multiplication to combine them into a single score. ``` closeness(field, embedding) * (1 + bm25(title) + bm25(text)) ``` - the [closeness(field, embedding)](../reference/rank-features.html#attribute-match-features-normalized) rank-feature returns a normalized score in the range 0 to 1 inclusive - Any of the per-field BM25 scores are in the range of 0 to infinity We add a bias constant (1) to avoid the overall score becoming 0 if the document does not match any query terms, as the BM25 scores would be 0. We also add `match-features` to be able to debug each of the scores. ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile hybrid { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) * (1 + (bm25(title) + bm25(text))) } match-features: bm25(title) bm25(text) closeness(field, embedding) } } ``` Now, re-deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` After that, we can start experimenting with how to express hybrid queries using the Vespa query language. ###### Hybrid query examples The following demonstrates combining the two top-k query operators using the Vespa query language. In a later section, we will show how to combine the two retrieval strategies using the Vespa ranking framework. This section focuses on the top-k retrieval part that exposes matched documents to the Vespa [ranking](../ranking.html) phase(s). ###### Hybrid query using the OR operator The following query exposes documents to ranking that match the query using _either (OR)_ the sparse or dense representation. ``` $ vespa query \ 'yql=select * from doc where ({targetHits:10}userInput(@user-query)) or ({targetHits:10}nearestNeighbor(embedding,e))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` The documents retrieved into ranking is scored by the `hybrid` rank-profile. Note that both top-k query operators might expose more than the the `targetHits` setting. The above query returns the following [JSON result response](../reference/default-result-format.html): ``` ``` { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 87 }, "coverage": { "coverage": 100, "documents": 3633, "full": true, "nodes": 1, "results": 1, "resultsFull": 1 }, "children": [ { "id": "id:doc:doc::MED-10", "relevance": 15.898915593367988, "source": "content", "fields": { "matchfeatures": { "bm25(text)": 17.35556767018612, "bm25(title)": 8.166249756144769, "closeness(field,embedding)": 0.5994655395517325 }, "sddocname": "doc", "documentid": "id:doc:doc::MED-10", "doc_id": "MED-10", "title": "Statin Use and Breast Cancer Survival: A Nationwide Cohort Study from Finland", "text": "Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08–9.0 years) 6,011 participants died, of which 3,619 (60.2%) was due to breast cancer. After adjustment for age, tumor characteristics, and treatment selection, both post-diagnostic and pre-diagnostic statin use were associated with lowered risk of breast cancer death (HR 0.46, 95% CI 0.38–0.55 and HR 0.54, 95% CI 0.44–0.67, respectively). The risk decrease by post-diagnostic statin use was likely affected by healthy adherer bias; that is, the greater likelihood of dying cancer patients to discontinue statin use as the association was not clearly dose-dependent and observed already at low-dose/short-term use. The dose- and time-dependence of the survival benefit among pre-diagnostic statin users suggests a possible causal effect that should be evaluated further in a clinical trial testing statins’ effect on survival in breast cancer patients." } } ] } } ``` ``` What is going on here is that we are combining the two top-k query operators using a boolean OR (disjunction). The `totalCount` is the number of documents retrieved into ranking (About 90, which is higher than 10 + 10). The `relevance` is the score assigned by `hybrid` rank-profile. Notice that the `matchfeatures` field shows all the feature scores. This is useful for debugging and understanding the ranking behavior, also for feature logging. ###### Hybrid query with AND operator The following combines the two top-k operators using AND, meaning that the retrieved documents must match both the sparse and dense top-k operators: ``` $ vespa query \ 'yql=select * from doc where ({targetHits:10}userInput(@user-query)) and ({targetHits:10}nearestNeighbor(embedding,e))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` For the sparse keyword query matching, the `weakAnd` operator is used by default and it requires that at least one term in the query matches the document (fieldset searched). ###### Hybrid query with rank query operator The following combines the two top-k operators using the [rank](../reference/query-language-reference.html#rank) query operator, which allows us to retrieve using only the first operand of the rank operator, but where the remaining operands allow computing (match) features that can be used in ranking phases. This query is meaningful because we can use the computed features in the ranking expressions but retrieve only by the dense representation. This is usually the most resource-effective way to combine the two representations. ``` $ vespa query \ 'yql=select * from doc where rank(({targetHits:10}nearestNeighbor(embedding,e)), ({targetHits:10}userInput(@user-query)))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` We can also invert the order of the operands to the `rank` query operator that retrieves by the sparse representation but uses the dense representation to compute features for ranking. This is very useful in cases where we do not want to build HNSW indexes (adds memory and slows down indexing), but still be able to use semantic signals in ranking phases. ``` $ vespa query \ 'yql=select * from doc where rank(({targetHits:10}userInput(@user-query)),({targetHits:10}nearestNeighbor(embedding,e)))' \ 'user-query=Do Cholesterol Statin Drugs Cause Breast Cancer?' \ 'input.query(e)=embed(@user-query)' \ 'hits=1' \ 'language=en' \ 'ranking=hybrid' ``` This way of performing hybrid retrieval allows retrieving only by the sparse representation and uses the dense vector representation to compute features for ranking. ##### Hybrid ranking In the previous section, we demonstrated combining the two top-k query operators using boolean query operators. This section will show combining the two retrieval strategies using the Vespa ranking framework. We can first start evaluating the effectiveness of the hybrid rank profile that combines the two retrieval strategies. ``` $ python3 evaluate_ranking.py --ranking hybrid --mode hybrid ``` Which outputs ``` Ranking metric NDCG@10 for rank profile hybrid: 0.3287 ``` The `nDCG@10` score is slightly higher than the profiles that only use one of the ranking strategies. Now, we can experiment with more complex ranking expressions that combine the two retrieval strategies. We add a few more rank profiles to the schema that combine the two retrieval strategies in different ways. ``` schema doc { document doc { field language type string { indexing: "en" | set_language } field doc_id type string { indexing: attribute | summary match: word } field title type string { indexing: index | summary match: text index: enable-bm25 } field text type string { indexing: index | summary match: text index: enable-bm25 } } fieldset default { fields: title, text } field embedding type tensor(v[384]) { indexing: input title." ".input text | embed | attribute attribute { distance-metric: angular } } rank-profile hybrid { inputs { query(e) tensor(v[384]) } first-phase { expression: closeness(field, embedding) * (1 + (bm25(title) + bm25(text))) } match-features: bm25(title) bm25(text) closeness(field, embedding) } rank-profile hybrid-sum inherits hybrid { first-phase { expression: closeness(field, embedding) + ((bm25(title) + bm25(text))) } } rank-profile hybrid-normalize-bm25-with-atan inherits hybrid { function scale(val) { expression: 2*atan(val/8)/(3.14159) } function normalized_bm25() { expression: scale(bm25(title) + bm25(text)) } function cosine() { expression: cos(distance(field, embedding)) } first-phase { expression: normalized_bm25 + cosine } match-features { normalized_bm25 cosine bm25(title) bm25(text) } } rank-profile hybrid-rrf inherits hybrid-normalize-bm25-with-atan{ function bm25_score() { expression: bm25(title) + bm25(text) } global-phase { rerank-count: 100 expression: reciprocal_rank(bm25_score) + reciprocal_rank(cosine) } match-features: bm25(title) bm25(text) bm25_score cosine } rank-profile hybrid-linear-normalize inherits hybrid-normalize-bm25-with-atan{ function bm25_score() { expression: bm25(title) + bm25(text) } global-phase { rerank-count: 100 expression: normalize_linear(bm25_score) + normalize_linear(cosine) } match-features: bm25(title) bm25(text) bm25_score cosine } } ``` Now, re-deploy the Vespa application from the `app` directory: ``` $ vespa deploy --wait 300 app ``` Let us break down the new rank profiles: - `hybrid-sum` combines the two retrieval strategies using addition. This is a simple way to combine the two strategies. But since the BM25 scores are not normalized (unbound) and the closeness score is normalized (0-1), the BM25 scores will dominate the closeness score. - `hybrid-normalize-bm25-with-atan` combines the two strategies using a normalized BM25 score and the cosine similarity. The BM25 scores are normalized using the `atan` function. - `hybrid-rrf` combines the two strategies using the reciprocal rank feature. This is a way to combine the two strategies using a reciprocal rank feature. - `hybrid-linear-normalize` combines the two strategies using a linear normalization function. This is a way to combine the two strategies using a linear normalization function. The two last profiles are using `global-phase` to rerank the top 100 documents using the reciprocal rank and linear normalization functions. This can only be done in the global phase as it requires access to all the documents that are retrieved into ranking and in a multi-node setup, this requires communication between the nodes and knowledge of the score distribution across all the nodes. In addition, each ranking phase can only order the documents by a single score. ###### Evaluate the new rank profiles Adding new rank-profiles is a hot change. Once we have deployed the application, we can evaluate the new hybrid profiles using the script: ``` $ python3 evaluate_ranking.py --ranking hybrid-sum --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-sum: 0.3244 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-normalize-bm25-with-atan --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-normalize-bm25-with-atan: 0.3410 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-rrf --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-rrf: 0.3207 ``` ``` $ python3 evaluate_ranking.py --ranking hybrid-linear-normalize --mode hybrid ``` ``` Ranking metric NDCG@10 for rank profile hybrid-linear-normalize: 0.3387 ``` On this particular dataset, the `hybrid-normalize-bm25-with-atan` rank profile performs the best, but the difference is small. This also demonstrates that hybrid search and ranking is a complex problem and that the effectiveness of the hybrid model depends on the dataset and the retrieval strategies. These results (which is the best) might not transfer to your specific retrieval use case and dataset, so it is important to evaluate the effectiveness of a hybrid model on your specific dataset. See [Improving retrieval with LLM-as-a-judge](https://blog.vespa.ai/improving-retrieval-with-llm-as-a-judge/) for more information on how to collect relevance judgments for your dataset. ###### Summary We showed how to express hybrid queries using the Vespa query language and how to combine the two retrieval strategies using the Vespa ranking framework. We also showed how to evaluate the effectiveness of the hybrid ranking model using one of the datasets that are a part of the BEIR benchmark. We hope this tutorial has given you a good understanding of how to combine different retrieval strategies using Vespa, and that there is not a single silver bullet for all retrieval problems. ##### Cleanup ``` $ docker rm -f vespa-hybrid ``` 1. Robertson, Stephen and Zaragoza, Hugo and others, 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval. [↩](#fnref:1) Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Installing vespa-cli and ir\_datasets](#installing-vespa-cli-and-ir_datasets) - [Create a Vespa Application Package](#create-a-vespa-application-package) - [Schema](#schema) - [Services Specification](#services-specification) - [Deploy the application package](#deploy-the-application-package) - [Feed the data](#feed-the-data) - [Sample queries](#sample-queries) - [Lexical search with BM25 scoring](#lexical-search-with-bm25-scoring) - [Dense search using text embedding](#dense-search-using-text-embedding) - [Evaluate ranking accuracy](#evaluate-ranking-accuracy) - [Hybrid Search & Ranking](#hybrid-search--ranking) - [Define our first simple hybrid rank profile](#define-our-first-simple-hybrid-rank-profile) - [Hybrid query examples](#hybrid-query-examples) - [Hybrid ranking](#hybrid-ranking) - [Evaluate the new rank profiles](#evaluate-the-new-rank-profiles) - [Summary](#summary) - [Cleanup](#cleanup) --- ## Ide Support ### IDE support Vespa provides plugins for working with schemas and rank profiles in IDE's: #### IDE support Vespa provides plugins for working with schemas and rank profiles in IDE's: - VSCode: [VS Code extension](https://marketplace.visualstudio.com/items?itemName=vespaai.vespa-language-support) - IntelliJ, PyCharm or WebStorm: [Jetbrains plugin](https://plugins.jetbrains.com/plugin/18074-vespa-schema-language-support) - Vim: [neovim](https://blog.vespa.ai/interns-languageserver/#neovim-plugin) If you are working with non-trivial Vespa applications, installing a plugin is highly recommended! ![IDE demo](/assets/img/ide.gif) Copyright © 2025 - [Cookie Preferences](#) --- ## Idealstate ### Distribution Algorithm The distribution algorithm decides what nodes should be responsible for a given bucket. #### Distribution Algorithm The distribution algorithm decides what nodes should be responsible for a given bucket. This is used directly in the clients to calculate distributor to talk to. Content nodes need time to move buckets when the distribution is changing, so routing to content nodes is done using tracked current state. The distribution algorithm decides which content nodes is wanted to store the bucket copies though, and due to this, the algorithm is also referred to as the ideal state algorithm. The input to the distribution algorithm is a bucket identifier, together with knowledge about what nodes are available, and what their capacities are. The output of the distribution algorithm is a sorted list of the available nodes. The first node in the order is the node most preferred to handle a given bucket. Currently, the highest order distributor node will be the owning distributor, and the redundancy factor decides how many of the highest order content nodes are preferred to store copies for a bucket. To enable minimal transfer of buckets when the list of available nodes changes, the removal or addition of nodes should not alter the sort order of the remaining nodes. Desired qualities for the ideal state algorithm: | Minimal reassignment on cluster state change | - If a node goes down, only buckets that resided on that node should be reassigned. - If a node comes up, only buckets that are moved to the new node should relocate. - Increasing the capacity of a single node should only move buckets to that node. - Reducing the capacity of a single node should only move buckets away from that node. | | No skew in distribution | - Nodes should get an amount of data relative to their capacity. | | Lightweight | - A simple algorithm that is easy to understand is a plus. Being lightweight to calculate is also a plus, giving more options of how to use it, without needing to cache results. | ##### Computational cost When considering how efficient the algorithm have to be, it is important to consider how often we need to calculate the ideal locations. Calculations are needed for the following tasks: - A client needs to map buckets to the distributors. If there are few buckets existing, all the results can be cached in clients, but for larger clusters, a lot of buckets may need to exist to create an even distribution, and caching becomes more memory intensive. Preferably the computational cost is cheap enough, such that no caching is needed. Currently, no caching is done by clients, but there is typically less than a million buckets, so caching all results would still have been viable. - Distributors need to calculate ideal state for a single bucket to verify that incoming operations are mapped to the correct distributor (clients have cluster state matching the distributor). This could be eliminated for buckets pre-existing in the bucket database, which would be true in most all cases. Currently, calculation is done for all requests. - Distributors need to calculate correct content nodes to create bucket copies on when operations to currently non-existing buckets come in. This is typically only something happening at the start of the cluster lifetime though. Normally buckets are created through splitting or joining existing buckets. - Distributors need to calculate ideal state to check if any maintenance operations need to be done for a bucket. - Content nodes need to calculate ideal state for a single bucket to verify that the correct distributor sent the request. This could be cached or served through bucket database but currently there is no need. As long as the algorithm is cheap, we can avoid needing to cache the result. The cache will then not limit scalability, and we have less dependencies and complexity within the content layer. The current algorithm has shown itself cheap enough, such that little caching has been needed. ##### A simple example: Modulo A simple approach would be to use a modulo operation to find the most preferred node, and then just order the nodes in configured order from there, skipping nodes that are currently not available: $$\text{most preferred node} = \text{bucket % nodecount}$$ Properties: - Computational lightweight and easy to understand - Perfect distribution among nodes. - Total redistribution on state change. By just skipping currently unavailable nodes, nodes can go down and up with minimal movement. However, if the number of configured nodes change, practically all buckets will be redistributed. As the content layer is intended to be scalable, this breaks with one of the intentions and this algorithm has thus not been considered. ##### Weighted random election This is the algorithm that is currently used for distribution in the content layer, as it fits our use case well. To avoid a total redistribution on state change, the mapping can not be heavily dependent on the number of nodes in the cluster. By using random numbers, we can distribute the buckets randomly between the nodes, in such a fashion that altering the cluster state has a small impact. As we need the result to be reproducible, we obviously need to use a pseudo-random number generator and not real random numbers. The idea is as follows. To find the location of a given bucket, seed a random number generator with the bucket identifier, when draw one number for each node. The drawn numbers will then decide upon the preferred node order for that specific bucket. For this to be reproducible, all nodes need to draw the same numbers each time. Each node is assigned a distribution key in the configuration. This key decides what random number the node will be assigned. For instance, a node with distribution key 13, will be assigned the 14th random number generated. (As the first will go to the node with key 0). The existence of this node then also requires us to always generate at least 14 random numbers to do the calculation. Thus, one may end up calculating random numbers for nodes that are currently not available, either because they are temporarily down, or because the configuration have left holes in the distribution key space. It is recommended to not leave too large holes in the distribution key space to not waste too much. Using this approach, if you add another node to the cluster, it will roll for each bucket. It should thus steal ownership of some of the buckets. As all the numbers are random, it will steal buckets from all the other nodes, thus, given that the bucket count is large compared to the number of nodes, it will steal on average 1/n of the buckets from each pre-existing node, where n is the number of nodes in the current cluster. Likewise, if a node is removed from the cluster, the remaining nodes will divide the extra load between them. ###### Weighting nodes By enforcing all the numbers drawn to be floating point numbers between 0 and 1, we can introduce node weights using the following formula: $${r}^{\frac{1}{c}}$$ Where r is the floating point number between 0 and 1 that was drawn for a given node, and c is the node capacity, which is the weight of the node. Proof not included here, but this will end up giving each node on average an amount of data that is relative to its capacity. That is, among any nodes there are two nodes X and Y, where the number of buckets given to X should be equal to the number of buckets given to Y multiplied by capacity(X)/capacity(Y). (Given perfect random distribution) Altering the weight in a running system will also create a minimal redistribution of data. If we reduce the capacity, all the nodes number will be reduced, and some of its buckets will be taken over by the other nodes, and vice versa if the capacity is increased. Properties: - Minimum data movement on state changes. - Some skew, depending on how good the random number generator is, the amount of nodes we have to divide buckets between, and the number of buckets we have to divide between them. - Fairly cheap to compute given a reasonable amount of nodes, and an inexpensive pseudo-random number generator. ###### Distribution skew The algorithm does generate a bit of skew in the distribution, as it is essentially random. The following attributes decrease the skew: - Having more buckets to distribute. - Having less targets (nodes and partitions) to distribute buckets to. - Having a more uniform pseudo-random function. The more buckets exist, the more metadata needs to be tracked in the distributors though, and operations that wants to scan all the buckets will take longer. Additionally, the backend may want buckets above a given size to improve performance, storage efficiency or similar. Consequently, we typically want to enforce enough buckets for a decent distribution, but not more. Then the number of nodes increase, more buckets need to exist to keep the distribution even. If the number of nodes is doubled, the number of buckets must typically more than double to keep the distribution equally even. Thus, this scales worse than linear. It does not scale much worse though, and this has not proved to be a practical problem for the cluster sizes we have used up until now. (A cluster size of a thousand nodes does not seem to be any issue here) Having a good and uniform pseudo-random function makes the distribution more even. However, this may require more computationally heavy generators. Currently, we are using a simple and fast algorithm, and it has proved more than sufficient for our needs. The distribution to distributors are done to create an even distribution between the nodes. The distributors are free to split the buckets further if the backend wants buckets to contain less data. They can not use fewer buckets than are needed for distribution though. By using a minimum amount of buckets for distribution, the distributors have more freedom to control sizes of buckets. ###### Distribution waste To measure how many buckets are needed to create a decent distribution a metric is needed. We have defined a waste metric for this purpose as follows: Distribute the buckets to all the units. Assume the size of all units are identical. Assume the unit with the most units assigned to it is at 100% capacity. The wasted space is the percentage of unused capacity compared to the used capacity. This definition seems useful as a cluster is considered at full capacity once one of its partitions is at full capacity. Having one node with more buckets than the rest is thus damaging, while having one node with fewer buckets than the rest is just fine. Example: There are 4 nodes distributing 18 units. The node with the most units has 6. Distribution waste is `100% * (4 * 6 - 18) / (4 * 6) = 25%`. Below we have calculated waste based on number of nodes and the amount of buckets to distribute between them. Bits refer to distribution bits used. A distribution bit count of 16 indicates that there will be 216 buckets. The calculations assume all buckets have the same size. This is normally close to true as documents are randomly assigned to buckets. There will be lots of buckets per node too, so a little variance typically evens out fairly well. The tables below assume only one partition exist on each node. If you have 4 partitions on 16 nodes, you should rather use the values for `4 * 16 = 64` nodes. A higher redundancy factor indicates more buckets to distribute between the same amount of nodes, resulting in a more even distribution. Doubling the redundancy has the same effect as adding one to the distribution bit count. To get values for redundancy 4, the redundancy 2 values can be used, and then the waste will be equal to the value with one less distribution bit used. ###### Calculated waste from various cluster sizes A value of 1 indicates 100% waste. A value of 0.1 indicates 10% waste. A waste below 1 % is shown green, below 10% as yellow and below 30% as orange. Red indicates more than 30% waste. ###### Distribution with redundancy 1: | Bits \ Nodes | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | | 1 | 0.0000 | 0.0000 | 0.3333 | 0.5000 | 0.6000 | 0.6667 | 0.7143 | 0.7500 | 0.7778 | 0.8000 | 0.8182 | 0.8333 | 0.8462 | 0.8571 | 0.8667 | | 2 | 0.0000 | 0.3333 | 0.3333 | 0.5000 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.7143 | 0.7333 | | 3 | 0.0000 | 0.2000 | 0.1111 | 0.3333 | 0.2000 | 0.3333 | 0.6190 | 0.6667 | 0.8222 | 0.8400 | 0.8545 | 0.8333 | 0.6923 | 0.7143 | 0.7333 | | 4 | 0.0000 | 0.1111 | 0.1111 | 0.3333 | 0.3600 | 0.3333 | 0.4286 | 0.5000 | 0.7778 | 0.8000 | 0.8182 | 0.8095 | 0.6923 | 0.7143 | 0.6444 | | 5 | - | 0.0588 | 0.1111 | 0.2727 | 0.2889 | 0.4074 | 0.2381 | 0.3333 | 0.8129 | 0.8316 | 0.8469 | 0.8519 | 0.8359 | 0.8367 | 0.8359 | | 6 | - | 0.0000 | 0.0725 | 0.1579 | 0.1467 | 0.1111 | 0.1688 | 0.3846 | 0.7037 | 0.7217 | 0.7470 | 0.7460 | 0.7265 | 0.6952 | 0.6718 | | 7 | - | 0.0725 | 0.0519 | 0.0857 | 0.0857 | 0.1111 | 0.2050 | 0.2000 | 0.4530 | 0.4667 | 0.5152 | 0.5152 | 0.4530 | 0.3905 | 0.3436 | | 8 | - | 0.0000 | 0.0078 | 0.0725 | 0.0857 | 0.0922 | 0.1293 | 0.1351 | 0.1634 | 0.1742 | 0.1688 | 0.2381 | 0.2426 | 0.2967 | 0.3173 | | 9 | - | 0.0039 | 0.0192 | 0.1467 | 0.1607 | 0.1203 | 0.1080 | 0.1111 | 0.1380 | 0.1322 | 0.1218 | 0.1795 | 0.1962 | 0.2381 | 0.2580 | | 10 | - | 0.0019 | 0.0275 | 0.0922 | 0.0898 | 0.0623 | 0.0741 | 0.0922 | 0.1111 | 0.1018 | 0.1218 | 0.1203 | 0.1438 | 0.1688 | 0.1675 | | 11 | - | 0.0019 | 0.0234 | 0.0430 | 0.0385 | 0.0248 | 0.0248 | 0.0483 | 0.0636 | 0.0648 | 0.0737 | 0.0725 | 0.0894 | 0.0800 | 0.0958 | | 12 | - | - | 0.0121 | 0.0285 | 0.0282 | 0.0121 | 0.0149 | 0.0571 | 0.0577 | 0.0562 | 0.0549 | 0.0412 | 0.0510 | 0.0439 | 0.0616 | | 13 | - | - | 0.0074 | 0.0019 | 0.0070 | 0.0177 | 0.0304 | 0.0303 | 0.0337 | 0.0189 | 0.0252 | 0.0358 | 0.0409 | 0.0501 | 0.0385 | | 14 | - | - | 0.0041 | 0.0024 | 0.0037 | 0.0027 | 0.0145 | 0.0073 | 0.0101 | 0.0130 | 0.0220 | 0.0234 | 0.0290 | 0.0248 | 0.0195 | | 15 | - | - | 0.0019 | 0.0021 | 0.0036 | 0.0083 | 0.0059 | 0.0056 | 0.0101 | 0.0097 | 0.0123 | 0.0163 | 0.0150 | 0.0186 | 0.0173 | | 16 | - | - | 0.0010 | 0.0007 | 0.0010 | 0.0030 | 0.0049 | 0.0039 | 0.0085 | 0.0072 | 0.0097 | 0.0108 | 0.0135 | 0.0141 | 0.0115 | | 17 | - | - | - | - | - | 0.0030 | 0.0033 | 0.0024 | 0.0036 | 0.0030 | 0.0055 | 0.0091 | 0.0135 | 0.0156 | 0.0143 | | 18 | - | - | - | - | - | - | 0.0019 | - | 0.0029 | 0.0027 | 0.0043 | 0.0040 | 0.0066 | 0.0061 | 0.0060 | | 19 | - | - | - | - | - | - | - | - | 0.0019 | - | 0.0021 | 0.0030 | 0.0023 | 0.0031 | 0.0042 | | 20 | - | - | - | - | - | - | - | - | - | - | - | 0.0029 | 0.0025 | 0.0037 | 0.0044 | | 21 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0026 | 0.0035 | 0.0040 | ###### Distribution with redundancy 2: | Bits \ Nodes | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | | 1 | 0.0000 | 0.0000 | 0.3333 | 0.5000 | 0.6000 | 0.6667 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.7143 | 0.7333 | | 2 | 0.0000 | 0.0000 | 0.3333 | 0.3333 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.5556 | 0.6000 | 0.6364 | 0.6667 | 0.6923 | 0.4286 | 0.4667 | | 3 | 0.0000 | 0.0000 | 0.1111 | 0.2000 | 0.2000 | 0.3333 | 0.4286 | 0.5000 | 0.7037 | 0.7333 | 0.7576 | 0.7778 | 0.7949 | 0.7714 | 0.7333 | | 4 | 0.0000 | 0.0000 | 0.1111 | 0.2000 | 0.2000 | 0.3333 | 0.3469 | 0.2000 | 0.7460 | 0.7714 | 0.7762 | 0.7778 | 0.7949 | 0.7714 | 0.7630 | | 5 | - | - | 0.0725 | 0.1579 | 0.2471 | 0.2381 | 0.2967 | 0.2727 | 0.7265 | 0.7538 | 0.7673 | 0.7778 | 0.7949 | 0.7922 | 0.7968 | | 6 | - | - | 0.0519 | 0.1111 | 0.1742 | 0.1467 | 0.2050 | 0.2381 | 0.6908 | 0.7023 | 0.7016 | 0.7117 | 0.7265 | 0.7229 | 0.7247 | | 7 | - | - | 0.0303 | 0.0154 | 0.0340 | 0.0303 | 0.0857 | 0.1111 | 0.4921 | 0.4880 | 0.4828 | 0.4797 | 0.5077 | 0.4622 | 0.4667 | | 8 | - | - | 0.0078 | 0.0303 | 0.0248 | 0.0623 | 0.0857 | 0.0725 | 0.0970 | 0.1322 | 0.1049 | 0.1293 | 0.1620 | 0.1873 | 0.2242 | | 9 | - | - | 0.0019 | 0.0266 | 0.0519 | 0.0466 | 0.0682 | 0.0791 | 0.0824 | 0.0519 | 0.0691 | 0.0519 | 0.0623 | 0.0741 | 0.0898 | | 10 | - | - | 0.0063 | 0.0173 | 0.0154 | 0.0275 | 0.0116 | 0.0340 | 0.0558 | 0.0294 | 0.0452 | 0.0466 | 0.0567 | 0.0501 | 0.0584 | | 11 | - | - | 0.0078 | 0.0049 | 0.0154 | 0.0177 | 0.0149 | 0.0210 | 0.0275 | 0.0177 | 0.0252 | 0.0303 | 0.0305 | 0.0344 | 0.0317 | | 12 | - | - | - | 0.0073 | 0.0112 | 0.0192 | 0.0231 | 0.0312 | 0.0296 | 0.0177 | 0.0278 | 0.0358 | 0.0245 | 0.0312 | 0.0385 | | 13 | - | - | - | 0.0061 | 0.0049 | 0.0096 | 0.0112 | 0.0201 | 0.0218 | 0.0088 | 0.0077 | 0.0199 | 0.0138 | 0.0304 | 0.0317 | | 14 | - | - | - | 0.0059 | 0.0058 | 0.0058 | 0.0057 | 0.0092 | 0.0128 | 0.0082 | 0.0139 | 0.0081 | 0.0096 | 0.0199 | 0.0213 | | 15 | - | - | - | - | 0.0014 | 0.0039 | 0.0052 | 0.0034 | 0.0051 | 0.0085 | 0.0044 | 0.0072 | 0.0107 | 0.0101 | 0.0082 | | 16 | - | - | - | - | 0.0016 | 0.0030 | 0.0026 | 0.0036 | 0.0065 | 0.0051 | 0.0061 | 0.0084 | 0.0065 | 0.0083 | 0.0100 | | 17 | - | - | - | - | - | - | 0.0010 | 0.0020 | 0.0028 | - | 0.0040 | 0.0049 | 0.0067 | 0.0071 | 0.0062 | | 18 | - | - | - | - | - | - | - | - | 0.0032 | - | 0.0024 | - | 0.0034 | 0.0056 | 0.0041 | | 19 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0025 | 0.0018 | - | ###### Distribution with redundancy 2: | Bits \ Nodes | 16 | 20 | 32 | 48 | 64 | 100 | 128 | 160 | 200 | 256 | 350 | 500 | 800 | 1000 | 5000 | | 8 | 0.2000 | 0.3081 | 0.2727 | 0.5152 | 0.5294 | 0.5733 | 0.6364 | 0.7091 | 0.7673 | 0.8000 | 0.8537 | 0.8862 | 0.8933 | 0.8976 | 0.9659 | | 9 | 0.0725 | 0.2242 | 0.1795 | 0.1795 | 0.3043 | 0.3173 | 0.3846 | 0.5077 | 0.5345 | 0.6364 | 0.7340 | 0.7952 | 0.8400 | 0.8720 | 0.9317 | | 10 | 0.0725 | 0.1322 | 0.1233 | 0.2099 | 0.1579 | 0.2415 | 0.3333 | 0.5733 | 0.4611 | 0.5789 | 0.6558 | 0.7269 | 0.8293 | 0.8425 | 0.8976 | | 11 | 0.0340 | 0.0857 | 0.0922 | 0.1111 | 0.1233 | 0.1969 | 0.2558 | 0.5937 | 0.5643 | 0.5897 | 0.5965 | 0.6099 | 0.6587 | 0.7591 | 0.8830 | | 12 | 0.0448 | 0.0385 | 0.0623 | 0.1065 | 0.0986 | 0.1285 | 0.3725 | 0.3831 | 0.4064 | 0.4074 | 0.4799 | 0.4880 | 0.5124 | 0.8328 | 0.8976 | | 13 | 0.0340 | 0.0328 | 0.0554 | 0.0699 | 0.0623 | 0.0948 | 0.1049 | 0.2183 | 0.2344 | 0.3191 | 0.3498 | 0.4539 | 0.5733 | 0.6656 | 0.8870 | | 14 | 0.0140 | 0.0189 | 0.0376 | 0.0452 | 0.0466 | 0.0717 | 0.0986 | 0.1057 | 0.1047 | 0.2242 | 0.2853 | 0.2798 | 0.4064 | 0.4959 | 0.8830 | | 15 | 0.0094 | 0.0118 | 0.0385 | 0.0268 | 0.0331 | 0.0638 | 0.0708 | 0.0775 | 0.0898 | 0.1322 | 0.2133 | 0.2104 | 0.3550 | 0.4446 | 0.8752 | | 16 | 0.0097 | 0.0081 | 0.0380 | 0.0303 | 0.0362 | 0.0577 | 0.0501 | 0.0627 | 0.0717 | 0.1033 | 0.1733 | 0.1678 | 0.2586 | 0.3101 | 0.8511 | | 17 | 0.0075 | 0.0066 | 0.0346 | 0.0293 | 0.0154 | 0.0258 | 0.0466 | 0.0546 | 0.0704 | 0.1041 | 0.1469 | 0.1983 | 0.2702 | 0.2972 | 0.7740 | | 18 | 0.0053 | 0.0057 | 0.0098 | 0.0098 | 0.0122 | 0.0149 | 0.0238 | 0.0300 | 0.0394 | 0.0353 | 0.0434 | 0.0553 | 0.0611 | 0.1782 | 0.6334 | | 19 | - | 0.0022 | 0.0050 | 0.0162 | 0.0098 | 0.0133 | 0.0149 | 0.0220 | 0.0242 | 0.0252 | 0.0333 | 0.0398 | 0.0495 | 0.0999 | 0.5145 | | 20 | - | - | 0.0030 | 0.0107 | 0.0088 | 0.0098 | 0.0144 | 0.0140 | 0.0148 | 0.0203 | 0.0195 | 0.0255 | 0.0348 | 0.1133 | 0.4481 | | 21 | - | - | 0.0043 | 0.0063 | 0.0051 | 0.0074 | 0.0079 | 0.0085 | 0.0086 | 0.0113 | 0.0147 | 0.0170 | 0.0237 | 0.1068 | 0.4422 | | 22 | - | - | - | 0.0026 | 0.0035 | 0.0037 | 0.0082 | 0.0061 | 0.0077 | 0.0087 | 0.0101 | 0.0134 | 0.0193 | 0.1140 | 0.4635 | | 23 | - | - | - | 0.0019 | - | 0.0026 | 0.0080 | 0.0055 | 0.0056 | 0.0057 | 0.0063 | 0.0096 | 0.0155 | 0.1294 | 0.4982 | | 24 | - | - | - | 0.0013 | - | - | 0.0074 | 0.0060 | 0.0058 | 0.0053 | 0.0049 | 0.0068 | 0.0112 | 0.0471 | 0.3219 | | 25 | - | - | - | - | - | - | - | - | - | 0.0043 | 0.0043 | 0.0058 | 0.0067 | 0.0512 | 0.2543 | | 26 | - | - | - | - | - | - | - | - | - | - | 0.0040 | 0.0042 | 0.0043 | 0.0051 | 0.0210 | | 27 | - | - | - | - | - | - | - | - | - | - | - | - | 0.0028 | 0.0157 | 0.0814 | ###### Default number of distribution bits used Note that changing the amount of distribution bits used will change what buckets exist, which will change the distribution considerably. We thus do not want to alter the distribution bit count too often. Ideally, the users would be allowed to configure minimal and maximal acceptable waste, and the current amount of distribution bits could then just be calculated on the fly. But as computing the waste values above are computational heavy, especially with many nodes and many distribution bits, currently only a couple of profiles are available for you to configure. **Vespa Cloud note:** Vespa Cloud locks distribution bit count to 16. This is because Vespa Cloud offers auto-scaling of nodes, and such a scaling decision should not implicitly lead to a full redistribution of data by crossing a distribution bit node count boundary. 16 bits strikes a good balance of low skew and high performance for most production deployments. ###### Loose mode (default) The loose mode allows for more waste, allowing the amount of nodes to change considerably without altering the distribution bit counts. | Node count | 1-4 | 5-199 | 200-\> | | Distribution bit count | 8 | 16 | 24 | | Max calculated waste \*) | 3.03 % | 7.17 % | ? | | Minimum buckets/node \*\*) | 256 - 64 | 13108 - 329 | 83886 - | ###### Strict mode (not default) The strict mode attempts to keep the waste below 1.0 %. When it needs to increase the bit count it increases the bit count significantly to allow considerable more growth before having to adjust the count again. | Node count | 1-4 | 5-14 | 15-199 | 200-799 | 800-1499 | 1500-4999 | 5000-\> | | Distribution bit count | 8 | 16 | 21 | 25 | 28 | 30 | 32 | | Max calculated waste \*) | 3 % | 0.83 % | 0.86 % | 0.67 % | ? | ? | ? | | Minimum buckets/node \*\*) | 256 - 64 | 13107 - 4681 | 139810 - 10538 | 167772 - 41995 | 335544 - 179076 | 715827 - 214791 | 858993 - | \*) Max calculated waste, given redundancy 2 and the max node count in the given range, as shown in the table above. (Note that this assumes equal sized buckets, and that every possible bucket exist. In a real system there will be random variation). \*\*) Given a node count and distribution bits, there is a minimum number of buckets enforced to exist. However, splitting due to bucket size may increase this count beyond this number. This value shows the maximum value of the minimum. (That is the number of buckets per node enforced for the lowest node count in the range) Ideally one wants to have few buckets enforced by distribution and rather let bucket size split buckets, as that leaves more freedom to users. ##### Q/A **Q: I have a cluster with multiple groups, with the same number of nodes (more than one) in each group. Why does the first node in the first group store a slightly different number of documents than the first node in the second group (and so on)?** A: This is both expected and intentional—to see why we must look at how the ideal state algorithm works. As previously outlined, the ideal state algorithm requires 3 distinct inputs: 1. The ID of the bucket to be replicated across content nodes. 2. The set of all nodes (i.e. unique distribution keys) in the cluster _across_ all groups, and their current availability state (Down, Up, Maintenance etc.). 3. The cluster topology and replication configuration. The topology includes knowledge of all groups. From this the algorithm returns a deterministic, ordered sequence of nodes (i.e. distribution keys) across all configured groups. The ordering of nodes is given by their individual pseudo-random node _score_, where higher scoring nodes are considered more _ideal_ for storing replicas for a given bucket. The set of nodes in this sequence respects the constraints given by the configured group topology and replication level. When computing node scores within a group, the _absolute_ distribution keys are used rather than a node's _relative_ ordering within the group. This means the individual node scores—and consequently the distribution of bucket replicas—within one group is different (with a very high probability) from all other groups. What the ideal state algorithm ensures is that there exists a deterministic, configurable number of replicas per bucket within each group and that they are evenly distributed across each group's nodes—the exact mapping can be considered an unspecified "implementation detail". The rationale for using absolute distribution keys rather than relative ordering is closely related to the earlier discussion about why [modulo distribution](#a-simple-example-modulo) is a poor choice. Let \(N\_g \gt 1\) be the number of nodes in a given group: - A relative ordering means that removing—or just reordering—a single node from the configuration can potentially lead to a full redistribution of all data within that group, not just \( \frac{1}{N\_g} \) of the data. Imagine for instance moving a node from being first in the group to being the last. - If we require nodes with the same relative index in each group to store the same data set (i.e. a row-column strategy), this immediately suffers in failure scenarios even when just a single node becomes unavailable. Data coverage in the group remains reduced until the node is replaced, as no other nodes can take over responsibility for the data. This is because removing the node leads to the problem in the previous point, where a disproportionally large amount of data must be moved due to the relative ordering changing. With the ideal state algorithm, the remaining nodes in the group will transparently assume ownership of the data, with each node receiving an expected \( \frac{1}{N\_g - 1} \) of the unavailable node's buckets. Copyright © 2025 - [Cookie Preferences](#) ###### On this page: - [Computational cost](#computational-cost) - [A simple example: Modulo](#a-simple-example-modulo) - [Weighted random election](#weighted-random-election) - [Weighting nodes](#weighting-nodes) - [Distribution skew](#distribution-skew) - [Distribution waste](#distribution_waste) - [Calculated waste from various cluster sizes](#calculated-waste-from-various-cluster-sizes) - [Default number of distribution bits used](#default-number-of-distribution-bits-used) - [Q/A](#qa) --- ## Index Bootstrap ### Index bootstrap When bootstrapping an index, one must consider node resource configuration and number of nodes. #### Index bootstrap When bootstrapping an index, one must consider node resource configuration and number of nodes. The strategy is to iterate: ![Growing a Vespa cluster in steps](/assets/img/index-bootstrap.svg) 1. Feed smaller chunks of data 2. Evaluate 3. Deploy new node counts / node resource configuration 4. Wait for data migration to complete 5. Evaluate While doing this, ensure the cluster is **never more than 50% full** - this gives headroom to later increase/shrink the index and change schema configuration easily using automatic reindexing. It is easy to downscale resources after the bootstrap, and it saves a lot of time keeping the clusters within limits - hence max 50%. Review the [Vespa Overview](/en/overview.html) to understand the different between _container_ and _content_ clusters before continuing. ##### Preparations The content node resource configuration should not have ranges for index bootstrap, as autoscaling will interfere with the evaluation in this step. This is a good starting point, **make sure there are no ranges like [2,3]**: ``` ``` ``` ``` To evaluate how full the content cluster is, use [metrics](monitoring) from content nodes - example: ``` $ curl \ --cert data-plane-public-cert.pem \ --key data-plane-private-key.pem \ https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud/prometheus/v1/values | \ egrep 'disk.util|mem.util' | egrep 'clusterId="content/' ``` Once able to get the metrics above, you are ready to bootstrap the index. ##### Bootstrap | Step | Description | | --- | --- | | **1% feed** | The purpose of this step is to feed a tiny chunk of the corpus to: 1. Estimate the memory and disk resource configuration. 2. Estimate the number of nodes required for the 10% step. Feed a small data set, using `vespa feed` as in [getting started](https://cloud.vespa.ai/en/getting-started). Observe the util metrics, stop no later than 50% memory/disk util. The resource configuration should be modified so disk is in the 50-80% range of memory. Example: if memory util is 50%, disk util should be 30-45%. The reasoning is that memory is a more expensive component than disk, better over-allocate on disk and just track memory usage. Look at memory util. Say the 1% feed caused a 15% memory util - this means that the 10% feed will take 150%, or 3X more than the 50% max. There are two options, either increase memory/disk or add more nodes. A good rule of thumb at this stage is that the final 100% feed could fit on 4 or more nodes, and there is a 2-node minimum for redundancy. The default configuration at the start of this document is quite small, so a 3X at this stage means triple the disk and memory, and add more nodes in later steps. Deploy changes (if needed). Whenever node count increases or resource configuration is modified, new nodes are added, and data is migrated to new nodes. Example: growing from 2 to 3 nodes means each of the 2 current nodes will migrate 33% of their data to the new node. Read more in [elasticity](/en/elasticity.html). It saves time to let the cluster finish data migration before feeding more data. In this step it will be fast as the data volume is small, but nevertheless check the [vds.idealstate.merge\_bucket.pending.average](/en/reference/distributor-metrics-reference.html#vds_idealstate_merge_bucket_pending) metric. Wait for 0 for all nodes - this means data migration is completed: ``` $ curl \ --cert ~/.vespa/mytenant.myapp.default/data-plane-public-cert.pem \ --key ~/.vespa/mytenant.myapp.default/data-plane-private-key.pem \ https://vespacloud-docsearch.vespa-team.aws-us-east-1c.z.vespa-app.cloud/prometheus/v1/values?consumer=Vespa | \ egrep 'vds_idealstate_merge_bucket_pending_average' ``` At this point, you can validate that both memory and disk util is less than 5%, so the 10% feed will fit. | | **10% feed** | Feed the 10% corpus, still observing util metrics. As the content cluster capacity is increased, it is normal to eventually be CPU bound in the container or content cluster. Grep for `cpu_util` in metrics (like in the example above) to evaluate. A 10% feed is a great baseline for the full capacity requirements. Fine tune the resource config and number of hosts as needed. If you deploy changes, wait for the `vds.idealstate.merge_bucket.pending.average` metric to go to zero again. This now takes longer time as nodes are configured larger, it normally completes within a few hours. Again validate memory and disk util is less than 5% before the full feed. | | **100% feed** | Feed the full data set, observing the metrics. You should be able to estimate timing by extrapolation, this is linear at this scale. At feed completion, observe the util metrics for the final fine-tuning. A great exercise at this point is to add a node then reduce a node, and take the time to completion (`vds.idealstate.merge_bucket.pending.average` to 0). This is useful information when the application is in production, as you know the time to add or shrink capacity in advance. It can be a good idea to reduce node count to get the memory util closer to 70% at this step, to optimize for cost. However, do not spend too much time optimizing in this step, next step is normally [sizing for query load](/en/performance/sizing-search.html). This will again possibly alter resource configuration and node counts / topology, but now you have a good grasp at how to easily bootstrap the index for these experiments. | ##### Troubleshooting Make sure you are able to feed and access documents as the example in [preparations](#preparations). Read [security guide](/en/cloud/security/guide.html) for cert/key usage. Feeding too much will cause a [feed blocked](/en/operations/feed-block.html) state. Add a node to the full content cluster in services.xml, and wait for data migration to complete - i.e. wait for the `vds.idealstate.merge_bucket.pending.average` metric to go to zero. It is better to add a node than increasing node resources, as data migration is quicker. ##### Further reading - [Reads and Writes](/en/reads-and-writes.html) - [Vespa Feed Sizing Guide](/en/performance/sizing-feeding.html) - [Vespa Cloud Benchmarking](https://cloud.vespa.ai/en/benchmarking) - [Monitoring](https://cloud.vespa.ai/en/monitoring) Copyright © 2025 - [Cookie Preferences](#) --- ## Jdisc ### Java Data Intensive Serving Container - JDisc Vespa's Java container - JDisc, hosts all application components as well as the stateless logic of Vespa itself. #### Java Data Intensive Serving Container - JDisc Vespa's Java container - JDisc, hosts all application components as well as the stateless logic of Vespa itself. Which particular components are hosted by a container cluster is configured in services.xml. The main features of JDIsc are: - HTTP serving out of the box from an embedded Jetty server, and support for plugging in other transport mechanisms. - Integration with the config system of Vespa which allows components to [receive up-to-date config](../configuring-components.html) (by constructor injection) resulting from application deployment. - [Dependency injection based on Guice](injecting-components.html) (Felix), but extended for configs and component collections. - A component model based on [OSGi](../components/bundles.html) which allows component to be (re)deployed to running servers, and to control which APIs they expose to others. - The features above combine to allow application package changes (changes to components, configuration or data) to be applied by Vespa without disrupting request serving nor requiring restarts. - Standard component types exists for - [general request handling](developing-request-handlers.html) - [chained request-response processing](processing.html) - [processing document writes](../document-processing.html) - [intercepting queries and results](../searcher-development.html) - [rendering responses](../result-rendering.html) Application components can be of any other type as well and do not need to reference any Vespa API to be loaded and managed by the container. - A general [chain composition](../components/chained-components.html) mechanism for components. ##### Developing Components - The JDisc container provides a framework for processing requests and responses, named _Processing_ - its building blocks are: - [Chains](../components/chained-components.html) of other components that are to be executed serially, with each providing some service or transform - [Processors](processing.html) that change the request and / or the response. They may also make multiple forward requests, in series or parallel, or manufacture the response content themselves - [Renderers](processing.html#response-rendering) that are used to serialize a Processor's response before returning it to a client - Application Lifecycle and unit testing: - [Configuring components](../configuring-components.html) with custom configuration - [Component injection](injecting-components.html) allows components to access other application components - Learn how to [build OSGi bundles](../components/bundles.html) and how to [troubleshoot](../components/bundles.html#troubleshooting) classloading issues - Using [Libraries for Pluggable Frameworks](pluggable-frameworks.html) from a component may result in class loading issues that require extra setup in the application - [Unit testing configurable components](../unit-testing.html#unit-testing-configurable-components) - Handlers and filters: - [Http servers and security filters](http-server-and-filters.html) for incoming connections on HTTP and HTTPS - [Request handlers](developing-request-handlers.html) to process incoming requests and generate responses - Searchers and Document Processors: - [Searcher](../searcher-development.html) and [search result renderer](../result-rendering.html) development - [Document processing](../document-processing.html) ##### Reference documentation - [services.xml](../reference/services-container.html) ##### Other related documents - [Designing RESTful web services](../developing-web-services.html) as Vespa Components - [healthchecks](../reference/healthchecks.html) - using the Container with a VIP - [Vespa Component Reference](../reference/component-reference.html): The Container's request processing lifecycle Copyright © 2025 - [Cookie Preferences](#) --- ## Performance ### Performance See [practical search performance guide](practical-search-performance-guide.html). #### Performance ##### Practical search performance guide See [practical search performance guide](practical-search-performance-guide.html). The guide walks through a music search use case and gives a practical introduction to Vespa search performance. ##### Sizing and capacity planning Sizing and capacity planning involves figuring out how many nodes are needed and what kind of hardware flavor best fits the use case: - [Sizing Vespa search](sizing-search.html): How to size a Vespa search cluster - [Caching in Vespa](caches-in-vespa.html): How to enable caches in Vespa - [Attributes and memory usage](../attributes.html): How attributes impact the memory footprint, find attribute memory usage - [Proton maintenance jobs](../proton.html#proton-maintenance-jobs): Impact on resource usage - [Coverage degradation](../graceful-degradation.html): Timeout handling and Degraded Coverage ##### Benchmarking and tuning Benchmarking is important both during sizing and for testing new features. What tools to use for benchmarking and how to tune system aspects of Vespa: - [Benchmarking Vespa](vespa-benchmarking.html): Test Vespa performance - [Search features and performance](feature-tuning.html) - [Feed performance](sizing-feeding.html) - [Container Http performance testing using Gatling](container-http.html) - [Container tuning](container-tuning.html): JVM, container, docproc - [vespa-fbench](/en/operations/tools.html#vespa-fbench): Reference documentation - [HTTP/2](http2.html): improve HTTP performance using HTTP/2 ##### Profiling Do a deep performance analysis - how to profile the application as well as Vespa: - [Profiling](profiling.html): Generic profiling tips. - [Valgrind](valgrind.html): Run Vespa with Valgrind Copyright © 2025 - [Cookie Preferences](#) --- ## Root ### Vespa documentation Welcome to the Vespa documentation site! #### Vespa documentation Welcome to the Vespa documentation site! To see what Vespa is about, visit the [Vespa home page](https://vespa.ai). [Getting started](en/getting-started.html) - Create your first app [Table of contents](sitemap.html) - Browse documentation [FAQ](en/faq.html) - Frequently asked questions [Vespa Slack](http://slack.vespa.ai) - chat with users and developers [GitHub Issues](https://github.com/vespa-engine/vespa/issues) - browse and create issues [Stack Overflow](https://stackoverflow.com/questions/tagged/vespa) - questions tagged vespa [Vespa Blog](https://blog.vespa.ai/) - the Vespa tech blog [vespaengine@](https://twitter.com/vespaengine) - Vespa on Twitter Copyright © 2025 - [Cookie Preferences](#) --- ## Tutorials ### Tutorials The [News Search Tutorial](news-1-getting-started.html) is a set of articles to explore Vespa features. #### Tutorials ##### News Search and recommendation The [News Search Tutorial](news-1-getting-started.html) is a set of articles to explore Vespa features. This is the best tutorial to start with: 1. [Getting started](news-1-getting-started.html) 2. [A basic news search application](news-2-basic-feeding-and-query.html) - application packages, feeding, query 3. [News search](news-3-searching.html) - sorting, grouping, and ranking 4. [Generating embeddings for users and news articles](news-4-embeddings.html) 5. [News recommendation](news-5-recommendation.html) - partial updates (news embeddings), ANNs, filtering 6. [News recommendation with searchers](news-6-recommendation-with-searchers.html) - custom searchers, doc processors 7. [News recommendation with parent-child](news-7-recommendation-with-parent-child.html) - parent-child, tensor ranking ##### Text search [Text Search](text-search.html) is a set of tutorials: 1. [Text Search](text-search.html) 2. [Text Search ML](text-search-ml.html) ##### Models hot swap The [Models hot swap tutorial](models-hot-swap.html) builds on the news recommendation tutorial. It is a guide on how to manage an application with multiple model versions. Copyright © 2025 - [Cookie Preferences](#) --- ## Indexing Language Reference ### Indexing Language Reference This reference documents the full Vespa _indexing language_. #### Indexing Language Reference This reference documents the full Vespa _indexing language_. If more complex processing of input data is required, implement a[document processor](../document-processing.html). The indexing language is analogous to UNIX pipes, in that statements consists of expressions separated by the _pipe_ symbol where the output of each expression is the input of the next. Statements are terminated by semicolon and are independent of each other (except when using variables). Find examples in the [indexing](/en/indexing.html) guide. ##### Indexing script An indexing script is a sequence of [indexing statements](#indexing-statement) separated by a semicolon (`;`). A script is executed statement-by-statement, in order, one document at a time. Vespa derives one indexing script per search cluster based on the search definitions assigned to that cluster. As a document is fed to a search cluster, it passes through the corresponding[indexing cluster](services-content.html#document-processing), which runs the document through its indexing script. Note that this also happens whenever the document is[reindexed](../operations/reindexing.html), so expressions such as [now](#now) must be thought of as the time the document was (last) _indexed_, not when it was _fed_. You can examine the indexing script generated for a specific search cluster by retrieving the configuration of the indexing document processor. ``` $ vespa-get-config -i search/cluster. -n vespa.configdefinition.ilscripts ``` The current _execution value_ is set to `null` prior to executing a statement. ##### Indexing statement An indexing statement is a sequence of [indexing expressions](#indexing-expression) separated by a pipe (`|`). A statement is executed expression-by-expression, in order. Within a statement, the execution value is passed from one expression to the next. The simplest of statements passes the value of an input field into an attribute: ``` input year | attribute year; ``` The above statement consists of 2 expressions; `input year` and`attribute year`. The former sets the execution value to the value of the "year" field of the input document. The latter writes the current execution value into the attribute "year". ##### Indexing expression ###### Primitives A string, numeric literal and true/false can be used as an expression to explicitly set the execution value. Examples: `"foo"`, `69`, `true`). ###### Outputs An output expression is an expression that writes the current execution value to a document field. These expressions also double as the indicator for the type of field to construct (i.e. attribute, index or summary). It is important to note that you can not assign different values to the same field in a single document (e.g. `attribute | lowercase | index` is **illegal** and will not deploy). | Expression | Description | | --- | --- | | `attribute` | Writes the execution value to the current field. During deployment, this indicates that the field should be stored as an attribute. | | `index` | Writes the execution value to the current field. During deployment, this indicates that the field should be stored as an index field. | | `summary` | Writes the execution value to the current field. During deployment, this indicates that the field should be included in the document summary. | ###### Arithmetics Indexing statements can contain any combination of arithmetic operations, as long as the operands are numeric values. In case you need to convert from string to numeric, or convert from one numeric type to another, use the applicable [converter](#converters) expression. The supported arithmetic operators are: | Operator | Description | | --- | --- | | ` + ` | Sets the execution value to the result of adding of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` - ` | Sets the execution value to the result of subtracting of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` * ` | Sets the execution value to the result of multiplying of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` / ` | Sets the execution value to the result of dividing of the execution value of the `lhs` expression with that of the `rhs` expression. | | ` % ` | Sets the execution value to the remainder of dividing the execution value of the `lhs` expression with that of the `rhs` expression. | | ` . ` | Sets the execution value to the concatenation of the execution value of the `lhs` expression with that of the `rhs` expression. If _both_ `lhs` and `rhs` are collection types, this operator will append `rhs` to `lhs` (if any operand is null, it is treated as an empty collection). If not, this operator concatenates the string representations of `lhs` and `rhs` (if any operand is null, the result is null). | You may use parenthesis to declare precedence of execution (e.g. `(1 + 2) * 3`). This also works for more advanced array concatenation statements such as `(input str_a | split ',') . (input str_b | split ',') | index arr`. ###### Converters These expressions let you convert from one data type to another. | Converter | Input | Output | Description | | --- | --- | --- | --- | | `binarize [threshold]` | Any tensor | Any tensor | Replaces all values in a tensor by 0 or 1. This takes an optional argument specifying the threshold a value needs to be larger than to be replaced by 1 instead of 0. The default threshold is 0. This is useful to create a suitable input to [pack\_bits](#pack_bits). | | `embed [id] [args]` | String | A tensor | Invokes an [embedder](../embedding.html) to convert a text to one or more vector embeddings. The type of the output tensor is what is required by the following expression (as supported by the specific embedder). Arguments are given space separated, as in `embed colbert chunk`. The first argument and can be omitted when only a single embedder is configured. Any additional arguments are passed to the embedder implementation. If the same chunk expression with the same input occurs multiple times in a schema, its value will only be computed once. | | `chunk id [args]` | String | A tensor | Invokes a which convert a string into an array of strings. Arguments are given space separated, as in `chunk fixed-length 512`. The id of the chunker to use is required and can be a chunker bundled with Vespa, or any chunker component added in the services.xml, see the [chunking reference](chunking-reference.html). Any additional arguments are passed to the chunker implementation. If the same chunk expression with the same input occurs multiple times in a schema, its value will only be computed once. | | `hash` | String | int or long | Converts the input to a hash value (using SipHash). The hash will be int or long depending on the target field. | | `pack_bits` | A tensor | A tensor | Packs the values of a binary tensor into bytes with 1 bit per value in big-endian order. The input tensor must have a single dense dimension. It can have any value type and any number of sparse dimensions. Values that are not 0 or 1 will be binarized with 0 as the threshold. The output tensor will have: - `int8` as the value type. - The dense dimension size divided by 8 (rounded upwards to integer). - The same sparse dimensions as before. The resulting tensor can be unpacked during ranking using [unpack\_bits](ranking-expressions.html#unpack-bits). A tensor can be converted to binary form suitable as input to this by the [binarize function](#binarize). | | `to_array` | Any | Array\ | Converts the execution value to a single-element array. | | `to_byte` | Any | Byte | Converts the execution value to a byte. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_double` | Any | Double | Converts the execution value to a double. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_float` | Any | Float | Converts the execution value to a float. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_int` | Any | Integer | Converts the execution value to an int. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_long` | Any | Long | Converts the execution value to a long. This will throw a NumberFormatException if the string representation of the execution value does not contain a parseable number. | | `to_bool` | Any | Bool | Converts the execution value to a boolean type. If the input is a string it will become true if it is not empty. If the input is a number it will become true if it is != 0. | | `to_pos` | String | Position | Converts the execution value to a position struct. The input format must be either a) `[N|S];[E|W]`, or b) `x;y`. | | `to_string` | Any | String | Converts the execution value to a string. | | `to_uri` | String | Uri | Converts the execution value to a URI struct | | `to_wset` | Any | WeightedSet\ | Converts the execution value to a single-element weighted set with default weight. | | `to_epoch_second` | String | Long | Converts an ISO-8601 instant formatted String to Unix epoch (or Unix time or POSIX time or Unix timestamp) which is the number of seconds elapsed since January 1, 1970, UTC. The converter uses [java.time.Instant.parse](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/time/Instant.html#parse(java.lang.CharSequence)) to parse the input string value. This will throw a DateTimeParseException if the input cannot be parsed. Examples: - `2023-12-24T17:00:43.000Z` is converted to `1703437243L` - `2023-12-24T17:00:43Z` is converted to `1703437243L` - `2023-12-24T17:00:43.431Z` is converted to `1703437243L` - `2023-12-24T17:00:43.431+00:00` is converted to `1703437243L` | ###### Other expressions The following are the unclassified expressions available: | Expression | Description | | --- | --- | | `_` | Returns the current execution value. This is useful, e.g., to prepend some other value to the current execution value, see [this example](/en/indexing.html#execution-value-example). | | `attribute ` | Writes the execution value to the named attribute field. | | `base64decode` | If the execution value is a string, it is base-64 decoded to a long integer. If it is not a string, the execution value is set to `Long.MIN_VALUE`. | | `base64encode` | If the execution value is a long integer, it is base-64 encoded to a string. If it is not a long integer, the execution value is set to `null`. | | `echo` | Prints the execution value to standard output, for debug purposes. | | `flatten` | **Deprecated:** Use [tokens](/en/reference/schema-reference.html#tokens) in the schema instead. | | `for_each {