FAQ - frequently asked questions

Refer to Vespa Support for more support options.


Does Vespa support a flexible ranking score?

Ranking is maybe the primary Vespa feature - we like to think of it as scalable, online computation. A rank profile is where the application’s logic is implemented, supporting simple types like double and complex types like tensor. Supply ranking data in queries in query features (e.g. different weights per customer), or look up in a Searcher. Typically a document (e.g. product) “feature vector”/”weights” will be compared to a user-specific vector (tensor).

Where would customer specific weightings be stored?

Vespa doesn’t have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in album-recommendation-docproc.

How to create a tensor on the fly in the ranking expression?

Create a tensor in the ranking function from arrays or weighted sets using tensorFrom... functions - see document features.

How to set a dynamic (query time) ranking drop threshold?

Pass a ranking feature like query(threshold) and use an if statement in the ranking expression - see retrieval and ranking.

For example:

rank-profile drop-low-score {
   function my_score() {
     expression: ..... #custom first phase score
   first-phase {
     if(my_score() < query(threshold), -1, my_score())

Does Vespa support early termination of matching and ranking?

Yes, this can be accomplished by configuring match-phase in the ranking profile, or by adding a range query item using hitLimit to the query tree, see capped numeric range search.
Both methods require an attribute field with fast-search. The capped range query is faster but beware that if there are other restrictive filters in the query one might end up with 0 hits. The additional filters are applied as a post filtering step over the hits from the capped range query. match-phase on the other hand is safe to use with filters or other query terms, and also supports diversification which the capped range query term does not support.


What limits apply to json document size?

There is no hard limit. Vespa requires a document to be able to load into memory in serialized form. Vespa is not optimized for huge documents.

Can a document have lists (key value pairs)?

E.g. a product is offered in a list of stores with a quantity per store. Use multivalue fields (array of struct) or parent child. Which one to chose depends on use case, see discussion in the latter link.

Does a whole document need to be updated and re-indexed?

E.g, price and quantity available per store may change often vs the actual product attributes. Vespa supports partial updates of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently.

What ACID guarantees if any does Vespa provide for single writes / updates / deletes vs batch operations etc?

See the Vespa Consistency Model. Vespa is not transactional in the traditional sense, it doesn’t have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off.

Does vespa support wildcard fields?

Wildcard fields are not supported in vespa. Workaround would be to use maps to store the wildcard fields. Map needs to be defined with indexing attribute and hence will be stored in memory. Refer to map.

Is there any size limitation in multivalued fields?

No limit, except memory.

Can we set a limit for the number of elements that can be stored in an array?

Implement a document processor for this.

How to auto-expire documents / set up garbage collection?

Set a selection criterium on the document element in services.xml. The criterium selects documents to keep. I.e. to purge documents “older than two weeks”, the expression should be “newer than two weeks”. Read more about document expiry.


Is hierarchical facets supported?

Facets is called grouping in Vespa. Groups can be multi-level.

Is filters supported?

Add filters to the query using YQL using boolean, numeric and text matching.

How to query for similar items?

One way is to describe items using tensors and query for the nearest neighbor - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using dot products or weak and are alternatives.

Stop-word support?

Vespa does not have a stop-word concept inherently. See the sample app for how to use filter terms.

How to extract more than 400 hits / query and get ALL documents?

Trying to request more than 400 hits in a query, getting this error: {‘code’: 3, ‘summary’: ‘Illegal query’, ‘message’: ‘401 hits requested, configured limit: 400.’}.

  • To increase max result set size, configure maxHits in a query profile, e.g. <field name="maxHits">500</field> in search/query-profiles/default.xml (create as needed). Query timeout can be increased, but it will still be costly and likely impact other queries - large limit more so than a large offset. It can be made cheaper by using a smaller document summary, and avoiding fields on disk if possible.
  • Using visit in the document/v1/ API is usually a better option for dumping all the data.

How to make a sub-query to get data to enrich the query, like get a user profile?

See the UserProfileSearcher for how to create a new query to fetch data - this creates a new Query, sets a new root and parameters - then fills the Hits.

How to create a cache that refreshes itself regularly

See the sub-query question above, in addition add something like:

public class ConfigCacheRefresher extends AbstractComponent {
    private final ScheduledExecutorService configFetchService = Executors.newSingleThreadScheduledExecutor();
    private Chain<Searcher> searcherChain;
    void initialize() {
        Runnable task = () -> refreshCache();
        configFetchService.scheduleWithFixedDelay(task, 1, 1, TimeUnit.MINUTES);
        searcherChain = executionFactory.searchChainRegistry().getChain(new ComponentId("configDefaultProvider"));
    public void refreshCache() {
        Execution execution = executionFactory.newExecution(searcherChain);
        Query query = createQuery(execution);
    public void deconstruct() {
        try {
            configFetchService.awaitTermination(1, TimeUnit.MINUTES);


How to debug document processing chain configuration?

This configuration is a combination of content and container cluster configuration, see indexing and feed troubleshooting.

I feed documents with no error, but they are not in the index

This is often a problem if using document expiry, as documents already expired will not be persisted, they are silently dropped. Feeding stale test data with old timestamps can cause this.

Does Vespa support addition of flexible NLP processing for documents and search queries?

E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well:

Does Vespa support customization of the inverted index?

E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses (e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index format means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a custom tokenizer. That gets called from the query parser and bundled linguistics processing Searchers as well as the Document Processor creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself.

Does vespa provide any support for named entity extraction?

It provides the building blocks but not an out of the box solution. We can write a Searcher to detect query-side entities and rewrite the query, and a DocProc if we want to handle them in some special way on the indexing side.

Does vespa provide support for text extraction?

You can write a document processor for text extraction, Vespa does not provide it out of the box.

Programming Vespa

is Python plugins supported / is there a scripting language?

Plugins have to run in the JVM - jython might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like painless - it is more flexible to write application logic in a JVM-supported language, using Searchers and Document Processors.


Vespa has a near real-time indexing core with typically sub-second latencies from ingest to indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the scaling guide.

Is there a batch ingestion mode, what limits apply?

Vespa does not have a concept of “batch ingestion” as it contradicts many of the core features that are the strengths of Vespa, including serving elasticity and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the feed sizing guide.

Can the index support up to 512GB index size in memory?

Yes. The content node is implemented in C++ and not memory constrained other than what the operating system does.

Get request for a document when document is not in sync in all the replica nodes?

If the replicas are in sync the request is only sent to the primary content node. Otherwise it’s sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document).


How fast can nodes be added and removed from a running cluster?

Elasticity is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to ideal data distribution.

Should Vespa API search calls be load balanced or does Vespa do this automatically?

You will need to load balance incoming requests between the nodes running the stateless Java container cluster(s). This can typically be done using a simple network load balancer available in most cloud services. This is included when using Vespa Cloud, with an already load balanced HTTPS endpoint - both locally within the region and globally across regions.

Supporting index partitions

Search sizing is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost.

Can a running cluster be upgraded with zero downtime?

With Vespa Cloud, we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in live-upgrade.

Can Vespa be deployed multi-region?

Vespa Cloud has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones.

Can Vespa serve an Offline index?

Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported.

Does vespa give us any tool to browse the index and attribute data?

No. Use visiting to dump all or a subset of documents. See dumping-data for a sample script.

What is the response when data is written only on some of the nodes and not on all of the replica nodes (Based on the redundancy count of the content cluster)?

Failure response will be given in case the document is not written on some of the replica nodes.

When the doc is not written to some of the nodes, will the document become available due to replica reconciliation?

Yes, it will be available, eventually. Also try Multinode testing and observability.

Does vespa provide soft delete functionality?

Yes just add a “deleted” attribute, add fast-search on it and create a searcher which adds an “andnot deleted” item to queries.

Can we configure a grace period for bucket distribution so that buckets are not redistributed as soon as a node goes down?

You can set a transition-time in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down.

Grouping is used to reduce search latency. When using grouped distribution content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all of the group.

How to set up for disaster recovery / backup?

Refer to #17898 for a discussion of options.