Refer to Vespa Support for more support options.

Ranking

Does Vespa support a flexible ranking score?

Ranking is maybe the primary Vespa feature - we like to think of it as scalable, online computation. A rank profile is where the application’s logic is implemented, supporting simple types like double and complex types like tensor. Supply ranking data in queries in query features (e.g. different weights per customer), or look up in a Searcher. Typically, a document (e.g. product) “feature vector”/”weights” will be compared to a user-specific vector (tensor).

Where would customer specific weightings be stored?

Vespa doesn’t have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in album-recommendation-docproc.

How to create a tensor on the fly in the ranking expression?

Create a tensor in the ranking function from arrays or weighted sets using tensorFrom... functions - see document features.

How to set a dynamic (query time) ranking drop threshold?

Pass a ranking feature like query(threshold) and use an if statement in the ranking expression - see retrieval and ranking. Example:

rank-profile drop-low-score {
function my_score() {
expression: ..... #custom first phase score
}
rank-score-drop-limit:0.0
first-phase {
if(my_score() < query(threshold), -1, my_score())
}
}


Does Vespa support early termination of matching and ranking?

Yes, this can be accomplished by configuring match-phase in the rank profile, or by adding a range query item using hitLimit to the query tree, see capped numeric range search.
Both methods require an attribute field with fast-search. The capped range query is faster but beware that if there are other restrictive filters in the query one might end up with 0 hits. The additional filters are applied as a post filtering step over the hits from the capped range query. match-phase on the other hand is safe to use with filters or other query terms, and also supports diversification which the capped range query term does not support.

Documents

What limits apply to json document size?

There is no hard limit. Vespa requires a document to be able to load into memory in serialized form. Vespa is not optimized for huge documents.

Can a document have lists (key value pairs)?

E.g. a product is offered in a list of stores with a quantity per store. Use multivalue fields (array of struct) or parent child. Which one to chose depends on use case, see discussion in the latter link.

Does a whole document need to be updated and re-indexed?

E.g. price and quantity available per store may often change vs the actual product attributes. Vespa supports partial updates of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently.

What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?

See the Vespa Consistency Model. Vespa is not transactional in the traditional sense, it doesn’t have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off.

Does vespa support wildcard fields?

Wildcard fields are not supported in vespa. Workaround would be to use maps to store the wildcard fields. Map needs to be defined with indexing: attribute and hence will be stored in memory. Refer to map.

Is there any size limitation in multivalued fields?

No limit, except memory.

Can we set a limit for the number of elements that can be stored in an array?

Implement a document processor for this.

How to auto-expire documents / set up garbage collection?

Set a selection criterion on the document element in services.xml. The criterion selects documents to keep. I.e. to purge documents “older than two weeks”, the expression should be “newer than two weeks”. Read more about document expiry.

How to increase redundancy and track data migration progress?

Changing redundancy is a live and safe change (assuming there is headroom on disk / memory - e.g. from 2 to 3 is 50% more). The time to migrate will be quite similar to what it took to feed initially - a bit hard to say generally, and depends on IO and index settings, like if building an index for ANN. To monitor progress, take a look at the multinode sample application for the clustercontroller status page - this shows buckets pending, live. Finally, use the .idealstate.merge_bucket.pending metric to track progress - when 0, there are no more data syncing operations - see monitor distance to ideal state. Nodes will work as normal during data sync, and query coverage will be the same. Nodes will be busier, though.

How does namespace relate to schema?

It does not, namespace is a mechanism to split the document space into parts that can be used for document selection - see documentation.

Visiting does not dump all documents, and/or hangs.

There are multiple things that can cause this, see visiting troubleshooting.

Query

Is hierarchical facets supported?

Facets is called grouping in Vespa. Groups can be multi-level.

Is filters supported?

Add filters to the query using YQL using boolean, numeric and text matching.

How to query for similar items?

One way is to describe items using tensors and query for the nearest neighbor - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using dot products or weak and are alternatives.

Stop-word support?

Vespa does not have a stop-word concept inherently. See the sample app for how to use filter terms.

How to extract more than 400 hits / query and get ALL documents?

Trying to request more than 400 hits in a query, getting this error: {'code': 3, 'summary': 'Illegal query', 'message': '401 hits requested, configured limit: 400.'}.

• To increase max result set size (i.e. allow a higher hits), configure maxHits in a query profile, e.g. <field name="maxHits">500</field> in search/query-profiles/default.xml (create as needed). The query timeout can be increased, but it will still be costly and likely impact other queries - large limit more so than a large offset. It can be made cheaper by using a smaller document summary, and avoiding fields on disk if possible.
• Using visit in the document/v1/ API is usually a better option for dumping all the data.

How to make a sub-query to get data to enrich the query, like get a user profile?

See the UserProfileSearcher for how to create a new query to fetch data - this creates a new Query, sets a new root and parameters - then fills the Hits.

How to create a cache that refreshes itself regularly

public class ConfigCacheRefresher extends AbstractComponent {
...
private final ScheduledExecutorService configFetchService = Executors.newSingleThreadScheduledExecutor();
private Chain<Searcher> searcherChain;
...
void initialize() {
Runnable task = () -> refreshCache();
searcherChain = executionFactory.searchChainRegistry().getChain(new ComponentId("configDefaultProvider"));
}
...
public void refreshCache() {
Execution execution = executionFactory.newExecution(searcherChain);
Query query = createQuery(execution);
...
public void deconstruct() {
super.deconstruct();
try {
configFetchService.shutdown();
configFetchService.awaitTermination(1, TimeUnit.MINUTES);
...
}


Is it possible to query Vespa using a list of document ids?

The best article on the subject is multi-lookup set filtering. Refer to the weightedset-example - also see weightedset for writing a YQL query to select multiple IDs. The ID must be a field in the document type.

How to count hits / all documents without returning results?

Count all documents using a query like select * from doc where true - this counts all documents from the “doc” source. Using select * from doc where true limit 0 will return the count and no hits, alternatively add hits=0. Pass ranking.profile=unranked to make the query less expensive to run. If an estimate is good enough, use hitcountestimate=true.

Must all fields in a fieldset have the same type?

Yes - a deployment warning with This may lead to recall and ranking issues can be emitted if not. This warning comes whenever fields with different kinds of tokenization are put in the same fieldset. This is because a given piece of text searching one fieldset is tokenized just once, so there’s no right choice of tokenization in this case. More details on stack overflow.

Searchers and timeout questions

During multi-phase searching, is the query timeout set for each individual searcher or is the query timeout set for the entire search chain? Also, if we asynchronously execute several search chains, can we set different query timeouts for each of these chains plus a separate overall timeout for the searcher that performs the asynchronous executions?

The timeout is for the entire query (and most Searchers don’t check timeout - use getTimeLeft). E.g. if a search chain has 3 searchers, it is OK for 1 searcher to take 497 ms and 2 searchers to each take 1 ms for a query timeout of 500 ms. You can set a different timeout in each cloned query you send to any of those chains, and you can specify the timeout when waiting for responses from them.

How does backslash escapes work?

Backslash is used to escape special characters in YQL. For example, to query with a literal backslash, which is useful in regexpes, you need to escape it with another backslash: \. Unescaped backslashes in YQL will lead to “token recognition error at: ‘'”.

In addition, Vespa CLI unescapes double backslashes to single (while single backslashes are left alone), so if you query with Vespa CLI you need to escape with another backslash: \\. The same applies to strings in Java.

Also note that both log messages and JSON results escape backslashes, so any \ becomes \.

Feeding

How to debug document processing chain configuration?

This configuration is a combination of content and container cluster configuration, see indexing and feed troubleshooting.

I feed documents with no error, but they are not in the index

This is often a problem if using document expiry, as documents already expired will not be persisted, they are silently dropped. Feeding stale test data with old timestamps can cause this.

How to feed many files, avoiding 429 error?

Using too many clients can generate a 429 response code. The Vespa sample apps use the vespa-feed-client which uses HTTP/2 for high throughput - it is better to stream the feed files through this client.

Does Vespa support addition of flexible NLP processing for documents and search queries?

E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well:

Does Vespa support customization of the inverted index?

E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses - e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index format means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a custom tokenizer. That gets called from the query parser and bundled linguistics processing Searchers as well as the Document Processor creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself.

Does vespa provide any support for named entity extraction?

It provides the building blocks but not an out-of-the-box solution. We can write a Searcher to detect query-side entities and rewrite the query, and a DocProc if we want to handle them in some special way on the indexing side.

Does vespa provide support for text extraction?

You can write a document processor for text extraction, Vespa does not provide it out of the box.

How to do Text Search in an imported field?

Imported fields from parent documents are defined as attributes, and have limited text match modes (i.e. indexing: index cannot be used). Details.

Why is closeness 1 for all my vectors?

If you have added vectors to your documents and queries, and see that the rank feature closeness(field, yourEmbeddingField) produces 1.0 for all documents, you are likely using distance-metric: innerproduct, but your vectors are not normalized, and the solution is normally to switch to distance-metric: angular.

With non-normalized vectors, you often get negative distances, and those are capped to 0, leading to closeness 1.0. Some models, such as models from sbert.net, claim to be normalized but are not.

Programming Vespa

Is Python plugins supported / is there a scripting language?

Plugins have to run in the JVM - jython might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like painless - it is more flexible to write application logic in a JVM-supported language, using Searchers and Document Processors.

How can I batch-get documents by ids in a Searcher

A Searcher intercepts a query and/or result. To get a number of documents by id in a Searcher or other component like a Document processor, you can have an instance of com.yahoo.documentapi.DocumentAccess injected and use that to get documents by id instead of the HTTP API.

Performance

Vespa has a near real-time indexing core with typically sub-second latencies from document ingestion to being indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the scaling guide.

Is there a batch ingestion mode, what limits apply?

Vespa does not have a concept of “batch ingestion” as it contradicts many of the core features that are the strengths of Vespa, including serving elasticity and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the feed sizing guide.

Can the index support up to 512 GB index size in memory?

Yes. The content node is implemented in C++ and not memory constrained other than what the operating system does.

Get request for a document when document is not in sync in all the replica nodes?

If the replicas are in sync the request is only sent to the primary content node. Otherwise, it’s sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document).

How to keep indexes in memory?

Attribute (with or without fast-search) is always in memory, but does not support tokenized matching. It is for structured data. Index (where there’s no such thing as fast-search since it is always fast) is in memory to the extent there is available memory and supports tokenized matching. It is for unstructured text.

It is possible to guarantee that fields that are defined with index have both the dictionary and the postings in memory by changing from mmap to populate, see index > io > search. Make sure that the content nodes run on nodes with plenty of memory available, during index switch the memory footprint will 2x. Familiarity with Linux tools like pmap can help diagnose what is mapped and if it’s resident or not.

Fields that are defined with attribute are in-memory, fields that have both index and attribute have separate data structures, queries will use the default mapped on disk data structures that supports text matching, while grouping, summary and ranking can access the field from the attribute store.

A Vespa query is executed in two phases as described in sizing search, and summary requests can touch disk (and also uses mmap by default). Due to their potential size there is no populate option here, but one can define dedicated document summary containing only fields that are defined with attribute.

The practical performance guide can be a good starting point as well to understand Vespa query execution, difference between index and attribute and summary fetching performance.

Is memory freed when deleting documents?

Deleting documents, by using the document API or garbage collection will increase the capacity on the content nodes. However, this is not necessarily observable in system metrics - this depends on many factors, like what kind of memory that is released, when flush jobs are run and document schema.

In short, Vespa is not designed to release memory once used. It is designed for sustained high throughput, low latency, keeping maximum memory used under control using features like feed block.

When deleting documents, one can observe a slight increase in memory. A deleted document is represented using a tombstone, that will later be removed, see removed-db-prune-age. When running garbage collection, the summary store is scanned using mmap and both VIRT and page cache memory usage increases.

Read up on attributes to understand more of how such fields are stored and managed. Paged attributes trades off memory usage vs. query latency for a lower max memory usage.

Can one do a partial deploy to the config server / update the schema without deploying all the node configs?

Yes, deployment is using this web service API, which allows you to create an edit session from the currently deployed package, make modifications, and deploy (prepare+activate) it: deploy-rest-api-v2.html. However, this is only useful in cases where you want to avoid transferring data to the config server unnecessarily. When you resend everything, the config server will notice that you did not actually change e.g. the node configs and avoid unnecessary noop changes.

How fast can nodes be added and removed from a running cluster?

Elasticity is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to ideal data distribution.

Should Vespa API search calls be load balanced or does Vespa do this automatically?

You will need to load balance incoming requests between the nodes running the stateless Java container cluster(s). This can typically be done using a simple network load balancer available in most cloud services. This is included when using Vespa Cloud, with an HTTPS endpoint that is already load balanced - both locally within the region and globally across regions.

Supporting index partitions

Search sizing is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost.

Can a running cluster be upgraded with zero downtime?

With Vespa Cloud, we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in live-upgrade.

Can Vespa be deployed multi-region?

Vespa Cloud has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones.

Can Vespa serve an Offline index?

Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported.

Does vespa give us any tool to browse the index and attribute data?

No. Use visiting to dump all or a subset of documents. See dumping-data for a sample script.

What is the response when data is written only on some nodes and not on all replica nodes (Based on the redundancy count of the content cluster)?

Failure response will be given in case the document is not written on some replica nodes.

When the doc is not written to some nodes, will the document become available due to replica reconciliation?

Yes, it will be available, eventually. Also try Multinode testing and observability.

Does vespa provide soft delete functionality?

Yes just add a deleted attribute, add fast-search on it and create a searcher which adds an andnot deleted item to queries.

Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?

You can set a transition-time in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down.

Grouped distribution is used to reduce search latency. Content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all groups.

How to set up for disaster recovery / backup?

Refer to #17898 for a discussion of options.

How to check Vespa version for a running instance?

Use /state/v1/version to find Vespa version.

Troubleshooting

The endpoint does not come up after deployment

When deploying an application package, with some kind of error, the endpoints might fail, like:

$vespa deploy --wait 300 Uploading application package ... done Success: Deployed target/application.zip Waiting up to 5m0s for query service to become available ... Error: service 'query' is unavailable: services have not converged  Another example: [INFO] [03:33:48] Failed to get 100 consecutive OKs from endpoint ...  There are many ways this can fail, the first step is to check the Vespa Container: $ docker exec vespa /opt/vespa/bin/vespa-logfmt -l error

[2022-10-21 10:55:09.744] ERROR   container
Container.com.yahoo.container.jdisc.ConfiguredApplication
Reconfiguration failed, your application package must be fixed, unless this is a JNI reload issue:
Could not create a component with id 'ai.vespa.example.album.MetalSearcher'.
Tried to load class directly, since no bundle was found for spec: album-recommendation-java.
If a bundle with the same name is installed,
there is a either a version mismatch or the installed bundle's version contains a qualifier string.
...


Bundle plugin troubleshooting is a good resource to analyze Vespa container startup / bundle load problems.

Starting Vespa using Docker on M1 fails

Using an M1 MacBook Pro / AArch64 makes the Docker run fail:

WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)
and no specific platform was requested


Make sure you are running a recent version of the Docker image, do docker pull vespaengine/vespa.

Deployment fails / nothing is listening on 19071

Make sure all Config servers are started, and are able to establish ZooKeeper quorum (if more than one) - see the multinode sample application. Validate that the container has enough memory.

Startup problems in multinode Kubernetes cluster - readinessProbe using 19071 fails

The Config Server cluster with 3 nodes fails to start. The ZooKeeper cluster the Config Servers use waits for hosts on the network, the hosts wait for ZooKeeper in a catch 22 - see sampleapp troubleshooting.