• [+] expand all

Vespa Serving Tuning

This document describes tuning certain features of an application for high performance, the main focus is on content cluster search features, see Container tuning for tuning of container clusters. The search sizing guide is about scaling an application deployment.

Attribute v.s index

The attribute documentation summaries when to use attribute in the indexing statement. Also see the procedure for changing from attribute to index and vice-versa.

field timestamp type long {
    indexing: summary | attribute
    rank:     filter

If both index and attribute is configured for string type fields, Vespa will do search and matching against the index with default match text. All numeric type fields and tensor fields are attribute (in-memory) fields in Vespa.

When to use fast-search for attribute fields

By default, Vespa does not build any posting list index structures over attribute fields. Adding fast-search to the attribute definition as shown below will add an in-memory B-tree posting list structure which enables faster search for some cases (but not all, see next paragraph):

field timestamp type long {
    indexing:  summary | attribute
    attribute: fast-search
    rank:      filter

When Vespa runs a query with multiple query items, it builds a query execution plan. It tries to optimize the plan so the temporary result set is as small as possible. To do this, query tree items that are restrictive (matching few documents) are evaluated early . The query execution plan looks at hit count estimates for each part of the query tree using the index and B-tree dictionaries which track the number of documents a given term occurs in.

However, for attribute fields without fast-search there is no hit count estimate, so the estimate becomes equal to the total number of documents (matches all) and is thus moved to the end of the query evaluation. A query with only one query term searching an attribute field without fast-search would be a linear scan over all documents and thus expensive:

select * from sources * where range(timestamp, 0, 100);

But if this query term is and-ed with another term which matches fewer documents, that term will determine the cost instead, and fast-search won't be necessary, e.g.:

select * from sources * where range(timestamp, 0, 100) and uuid contains "123e4567-e89b-12d3-a456-426655440000";

The general rules of thumb for when to use fast-search for an attribute field is:

  • Use fast-search if the attribute field is searched without any other query terms
  • Use fast-search if the attribute field could limit the total number of hits efficiently

Changing fast-search aspect of the attribute is a live change which does not require any re-feeding, so testing the performance with and without is low effort. Adding or removing fast-search requires restart.

Note that attribute fields with fast-search that are not used in term based ranking should use rank: filter for optimal performance. See reference rank: filter.

Hybrid TAAT and DAAT query evaluation

Vespa supports hybrid query evaluation over inverted indexes, combining TAAT and DAAT evaluation to combine the best of both query evaluation techniques. Hybrid is not enabled per default and is triggered by a run time query parameter.

  • TAAT: Term At A Time scores documents one query term at a time. The entire posting iterator can be read per query term and the score of a document is accumulated. It is CPU cache friendly as posting data is read sequentially without random seeking the posting list iterator. The downside is that TAAT limits the term based ranking function to be a linear sum of term scores. This downside is one reason why most search engines uses DAAT.
  • DAAT: Document At A Time scores documents completely one at a time. This requires multiple seeks in the term posting lists, which is CPU cache unfriendly, but allows non-linear ranking functions.

Generally Vespa does DAAT (document-at-a-time) query evaluation and not TAAT (term-at-a time) for the reason listed above.

Ranking (score calculation) and matching (does the document match the query logic) is not fully two separate disjunct phases, where one first find matches and in a later phase calculates the ranking score. Matching and first-phase score calculation is interleaved when using DAAT.

The first-phase ranking score is assigned to the hit when it satisfies the query constraints. At that point, the term iterators are positioned at the document id and one can unpack additional data from the term posting lists - e.g. for term proximity scoring used by the nativeRank ranking feature, which also requires unpacking of positions of the term within the document.

The way hybrid query evaluation is done is that TAAT is used for sub-branches of the overall query tree which is not used for term based ranking.

Using TAAT can speed up query matching significantly (up to 30-50%) in cases where the query tree is large and complex, and where only parts of the query tree is used for term based ranking. Examples of query tree branches that would require DAAT is using text ranking features like bm25 or nativeRank. The list of ranking features which can handle TAAT is long, but using attribute or tensor features only can have the entire tree evaluated using TAAT.

For example, for a query where there is a user text query from an end user, one can use userQuery() YQL syntax and combine it with application level constraints. The application level filter constraints in the query could benefit from using TAAT. Given the following document schema:

search news {
  document news {
    field title type string {}
    field body type string{}
    field popularity type float {}
    field market type string {
      indexing: attribute
      attribute: fast-search
    field language type string {
      indexing: attribute
      attribute: fast-search
  fieldset default {
    fields: title,body
  rank-profile text-and-popularity {
    first-phase {
      expression: attribute(popularity) + log10(bm25(title)) + log10(bm25(body))

In this case the rank profile only uses two ranking features, the popularity attribute and the bm25 score of the userQuery(). These are used in the default fieldset containing the title and body. Notice how neither market or language is used in the ranking expression.

In this query example, there is a language constraint and a market constraint, where both language and market is queried with a long list of valid values using OR, meaning that the document should match any of the market constraints and any of the language constraints.

  'hits': 10,
  'ranking.profile': text-and-popularity",
  'yql': 'select * from sources * where userQuery() and \
   (language contains "en" or language contains "br") and \
   (market contains "us" or market contains "eu" or market contains "apac" or market contains ".." ... ..... ..)',
  'query': 'cat video',
  'ranking.matching.termwiselimit': 0.01

The language and the market constraints in the query tree are not used in the ranking score and that part of the query tree could be evaluated using TAAT. See also multi lookup set filter for how to most efficiently search with large set filters. The sub-tree result is then passed as a bit vector into the DAAT query evaluation, which could speed up the overall evaluation significantly.

Enabling hybrid TAAT is done by passing ranking.matching.termwiselimit=0.01 as a request parameter.

One can evaluate if using the hybrid evaluation improves search performance by adding the above parameter. The limit is compared to the hit fraction estimate of the sub-branch of the query tree, if the hit fraction estimate is higher than the limit, the termwise evaluation is used to evaluate that sub-branch of the query.

Indexing uuids

When configuring string type fields with index, the default match mode is text. This means Vespa will tokenize the content and index the tokens.

The string representation of an Universally unique identifier (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens).

Example: Indexing 123e4567-e89b-12d3-a456-426655440000 with the above document definition, Vespa will tokenize this into 5 tokens: [123e4567,e89b,12d3,a456,426655440000], each of which could be matched independently, leading to possible incorrect matches.

To avoid this, change the mode to match: word to treat the entire uuid as one token/word:

field uuid type string {
    indexing: summary | index
    match:    word
    rank:     filter

In addition, configure the uuid as a rank: filter field - the field will then be represented as efficient as possible during search and ranking. The rank:filter behavior can also be triggered at query time on a per-query item basis by the com.yahoo.prelude.query.Item.setRanked() in a custom searcher.

Parent child and search performance

When searching imported attribute fields (with fast-search) from parent document types there is an additional inderection that can be reduced significantly if the imported field is defined with rank:filter and visibility-delay is configured to > 0. The rank:filter setting impacts posting list graunularity and visibility-delay enables a cache for the indirection between the child and parent document.

Ranking and ML Model inferences

Vespa scales with the number of hits the query retrieves per node/search thread, and which needs to be evaluated by the first-phase ranking function. Read more on phased ranking. Using phased ranking enables spending more resources during a second phase ranking step than in the first-phase. The first-phase should be focused on getting decent recall (retrieve relevant documents in the top k), while second phase is used to tune precision.

For text ranking applications, consider using the WAND query operator - WAND can efficiently (sub-linear) find the top k documents using an inner scoring function.

Multi Lookup - Set filtering

Several real-world search use cases are built around limiting or filtering based on a set filter. If the contents of a field in the document matches any of the values in the query set, it should be retrieved. E.g. searching data for a set of users:

select * from sources * where user_id = 1 or user_id = 2 or user_id = 3 or user_id = 3 or user_id = 4 or user_id 5 ...

For OR filters over the same field it is strongly recommended using the weighted set query operator. It has considerably less overhead than plain OR for set filtering:

select * from sources * where weightedSet(user_id, {"1":1, "2":1, "3":1})

Attribute fields used like the above without other stronger query terms, should have fast-search and rank:filter. If there is a large number of unique values in the field, it is faster to use hash dictionary instead of btree, which is the default data structure for dictionaries for attribute fields with fast-search:

field user_id type long {
  indexing: summary | attribute

E.g if having 10M unique user_ids in the dictionary, a search for 1000 users per query, btree dictionary would be 1000 lookup times log(10M), while hash based would be 1000 lookups times 1. The weightedSet query set filtering approach works great in combination with TAAT, see hybrid TAAT/DAAT section.

Also see the dictionary schema reference.

Document summaries - hits

If queries request many (thousands) of hits from a content cluster with few content nodes, increasing the summary cache might reduce latency and cost.

Using explicit document summaries, Vespa can support memory-only summary fetching, if all fields referenced in the document summary are all defined with attribute. Dedicated in-memory summaries avoid (potential) disk read and summary chunk decompression. Vespa document summaries are stored using compressed chunks. See also the practical search performance guide on hits fetching.

Boolean, numeric, text attribute

When selecting attribute field type, considering performance, this is a rule of thumb:

  1. Use boolean if a field is a boolean (max two values)
  2. Use a string attribute if there is a set of values - only unique strings are stored
  3. Use a numeric attribute for range searches
  4. Use a numeric attribute if the data is really numeric, don't replace numeric with string numeric

Refer to attributes for details.