Vespa Serving Tuning

This document describes tuning certain features of an application for high performance, the main focus is on content cluster search features, see Container tuning for tuning of container clusters. The search sizing guide is about scaling an application deployment.

Attribute v.s index

The attribute documentation summaries when to use attribute in the indexing statement.

field timestamp type long {
    indexing: summary | attribute
    rank:     filter
If both index and attribute is configured for string type fields, Vespa will do search and matching against the index with default match text. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa.

By default Vespa does not build any posting list index structures over attribute fields. Adding fast-search to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph):

field timestamp type long {
    indexing:  summary | attribute
    attribute: fast-search
    rank:      filter

When Vespa executes a query with multiple query items it builds a query execution plan which it tries to optimize so that query tree items that are restrictive, meaning they match few documents, are evaluated early so that the temporary result set becomes as low as possible. To do so the query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries which track the number of documents a given term occurs in. However, for attribute fields without fast-search there is no hit count estimate so the estimate becomes equal to the total number of documents (matches all) and is thus moved to the end of the query evaluation. A query with only one query term searching a attribute field without fast-search would be a linear scan over all documents and thus very slow:

select * from sources * where range(timestamp, 0, 100);
But if this query term is and-ed with another term which matches fewer documents, that term will determine the cost instead, and fast-search won't be necessary, for example:
select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000";
The general rules of thumb for when to use fast-search for an attribute field is:
  • Use fast-search if the attribute field is searched without any other query terms
  • Use fast-search if the attribute field could limit the total number of hits efficiently
Changing fast-search aspect of the attribute is a live change which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing fast-search requires restart.

Hybrid TAAT (term-at-a-time) and DAAT (document-at-a-time) query evaluation

Generally Vespa does DAAT (document-at-a-time) query evaluation and not TAAT (term-at-a time) so ranking and matching is not fully two separate phases where one first find matches and later ranks them. Matching and first-phase score calculation is interleaved when using DAAT.

The first-phase ranking score is assigned to the hit when it satisfies the query constraints and at that point the term iterators are positioned at the document id and one can unpack additional data from the term posting lists, for example for term proximity scoring used by nativeRank ranking feature which also requires unpacking of positions of the term within the document. Vespa can do a hybrid query evaluation combining TAAT and DAAT and using TAAT for sub-tree parts of the overall query tree which is not used for ranking, but used only for document filtering where unpacking of posting lists is not needed. With rank:filter on a per field basis one can avoid unpacking on a per term/field basis while the TAAT evaluation works overall for the query sub tree. Using TAAT can speed up query matching in some cases where the query tree is large and complex and where only small parts of the query tree is used for ranking hits or ranking by a ranking expression only using attribute or tensor features. For example for a query request where we take some user input from an end user by the userQuery() syntax and combine it with business logic filters. In this example a language constraint and a market constraint:

  'hits': 10,
  'ranking.profile': text-and-popularity",
  'yql': 'select * from sources * where userQuery() and \
   (language contains "en" or language contains "br") and \
   (market contains "us" or market contains "eu" or market contains "apac" or market contains ".." ... ..... ..);'
  'query': 'cat video'
With ranking profile text-and-popularity defined as
rank-profile text-and-popularity {
  first-phase {
    expression: attribute(popularity) + log10(bm25(title))
In this case the ranking profile only uses two signals, the popularity value and the bm25 score of the title as calculated from the input userQuery() terms. The language and the market constraints in the query tree are not used in the ranking score and that part of the query tree could be evaluated using TAAT. The sub-tree result is then passed as a bit vector into the DAAT query evaluation which could speed up the overall evaluation. Enabling hybrid TAAT is done by passing ranking.matching.termwiselimit=0.01 as a request parameter so one can evaluate if using the hybrid evaluation improves performance.

Indexing uuids

When configuring string type fields with index, the default match mode is text. This means Vespa will tokenize the content and index the tokens.

The string representation of an Universally unique identifier (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens).

Example: Indexing 123e4567-e89b-12d3-a456-426655440000 with the above document definition, Vespa will tokenize this into 5 tokens: [123e4567,e89b,12d3,a456,426655440000], each of which could be matched independently, leading to possible incorrect matches.

To avoid this, change the mode to match: word to treat the entire uuid as one token/word:

field uuid type string {
    indexing: summary | index
    match:    word
    rank:     filter
In addition, configure the uuid as a rank: filter field - the field will then be represented as efficient as possible during search and ranking.

The rank:filter behavior can also be triggered at query time on a per query item basis by the in a custom searcher.

Parent child and search performance

When searching imported attribute fields from parent document types there is an additional cost penalty which can be reduced significantly if the imported field is defined with rank:filter and visibility-delay is configured to > 0.

Ranking and ML Model inferences

Vespa scales with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on phased ranking. Using phased ranking enables spending more resources during a second phase ranking step than in the first-phase. The first-phase should be focused on getting decent recall (retrieve relevant documents in the top k), while second phase is used to tune precision at k, bringing the relevant documents to the top of the result set.

For text ranking applications, consider using the WAND query operator - WAND can efficiently find the top k documents using a linear ranking function (inner dot product).

Document summaries - hits

If queries request many hits from a few content nodes, a summary cache might reduce latency and cost.

Using Document summaries, Vespa can support memory-only operations if fields in the summary are defined as attribute.

Boolean, numeric, text attribute

When selecting attribute field type, considering performance, this is a rule of thumb:

  1. Use boolean if a field is a boolean (max two values)
  2. Use a string attribute if there is a set of values - only unique strings are stored
  3. Use a numeric attribute for range searches
Refer to attributes for details.