Vespa Serving Tuning

This document describes tuning certain features of an application for high performance, the main focus is on content cluster search features, see Container tuning for tuning of container clusters. The search sizing guide is about scaling an application deployment.

Attribute v.s index

The attribute documentation summaries when to use attribute in the indexing statement.

field timestamp type long {
    indexing: summary | attribute
    rank:     filter
If both index and attribute is configured for string type fields, Vespa will do search and matching against the index with default match text. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa.

By default Vespa does not build any posting list index structures over attribute fields. Adding fast-search to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph):

field timestamp type long {
    indexing:  summary | attribute
    attribute: fast-search
    rank:      filter

When Vespa executes a query with multiple query items it builds a query execution plan which it tries to optimize so that query tree items that are restrictive, meaning they match few documents, are evaluated early so that the temporary result set becomes as low as possible. To do so the query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries which track the number of documents a given term occurs in. However, for attribute fields without fast-search there is no hit count estimate so the estimate becomes equal to the total number of documents (matches all) and is thus moved to the end of the query evaluation. A query with only one query term searching a attribute field without fast-search would be a linear scan over all documents and thus very slow:

select * from sources * where range(timestamp, 0, 100);
But if this query term is and-ed with a more restrictive query term (see section below), the query optimizer will find that the second query term has a lower hit count estimate than the timestamp range and the query becomes fast and scales with number of documents that matches the uuid field.
select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000";
The general rules of thumb for when to use fast-search for an attribute field is:
  • Use fast-search if the attribute field is searched without any other query terms
  • Use fast-search if the attribute field could limit the total number of hits efficiently
Changing fast-search aspect of the attribute is a live change which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing fast-search requires restart.

Indexing strings

When configuring string type fields with index, the default match mode is text. This means Vespa will tokenize the content and index the tokens. Example document definition:

schema foo {
    document foo {
        field title type string {
            indexing: summary | index
        field uuid type string {
            indexing: summary | index
The string representation of an Universally unique identifier (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens).

Example: Indexing 123e4567-e89b-12d3-a456-426655440000 with the above document definition, Vespa will tokenize this into 5 tokens: [123e4567,e89b,12d3,a456,426655440000].

Phrase search is evaluated over positional indicies and has a higher cost compared to searching for a single word term. Vespa creates implicit phrases when terms are joined by hyphens. Hence /search/?query=uuid:123e4567-e89b-12d3-a456-426655440000 becomes a phrase query: uuid:"123e4567 e89b12d3 a456 426655440000".

Change the mode to match: word to disable tokenization. This stores the input 123e4567-e89b-12d3-a456-426655440000 as one token and avoids implicit phrase search:

field uuid type string {
    indexing: summary | index
    match:    word
    rank:     filter
Also configure uuid as a rank: filter field - the field will then be represented as efficient as possible during search and ranking.

Summary: Review the string fields in the application:

  • tokenized matching or not
  • used in ranking or not
The rank:filter behavior can also be triggered at query time on a per query item basis by the in a custom searcher.

Parent child and search performance

When searching imported attribute fields from parent document types there is an additional cost penalty which can be reduced significantly if the imported field is defined with rank:filter and visibility-delay is configured to > 0.

Ranking and ML Model inferences

Vespa scales with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on phased ranking. Using phased ranking enables spending more resources during a second phase ranking step than in the first-phase. The first-phase should be focused on getting decent recall (retrieve relevant documents in the top k), while second phase is used to tune precision at k, bringing the relevant documents to the top of the result set.

For text ranking applications, consider using the WAND query operator - WAND can efficiently find the top k documents using a linear ranking function (inner dot product).

Document summaries - hits

If queries request many hits from a few content nodes, a summary cache might reduce latency and cost.

Using Document summaries, Vespa can support memory-only operations if fields in the summary are defined as attribute.

Boolean, numeric, text attribute

When selecting attribute field type, considering performance, this is a rule of thumb:

  1. Use boolean if a field is a boolean (max two values)
  2. Use a string attribute if there is a set of values - only unique strings are stored
  3. Use a numeric attribute for range searches
Refer to attribute memory usage for details.