Vespa Search Performance Tuning

This document describes how to tune an application for high performance, while search sizing guide discuss how to scale an application.

Attribute v.s index

The attribute documentation summaries when to use attribute in the indexing statement. Adding attribute:fast-search will speed up searches over attribute fields, by building an in-memory index over the values in the attribute field.

field timestamp type long {
    indexing:  summary | attribute
    attribute: fast-search
    rank:      filter
}
If both index and attribute is configured for string type fields, Vespa will do search and matching against the index with default match text.

Indexing strings

When configuring string type fields with index, the default match mode is text. This means Vespa will tokenize the content and index the tokens. Example document definition:

search foo {
    document foo {
        field title type string {
            indexing: summary | index
        }
        field uuid type string {
            indexing: summary | index
        }
    }
}
The string representation of an Universally unique identifier (UUID) is 32 hexadecimal (base 16) digits, in five groups, separated by hyphens, in the form 8-4-4-4-12, for a total of 36 characters (32 alphanumeric characters and four hyphens).

Example: Indexing 123e4567-e89b-12d3-a456-426655440000 with the above document definition, Vespa will tokenize this into 5 tokens: [123e4567,e89b,12d3,a456,426655440000].

Phrase search is evaluated over positional indicies and has a higher cost compared to searching for a single word term. Vespa creates implicit phrases when terms are joined by hyphens. Hence /search/?query=uuid:123e4567-e89b-12d3-a456-426655440000 becomes a phrase query: uuid:"123e4567 e89b12d3 a456 426655440000".

Change the mode to match: word to disable tokenization. This stores the input 123e4567-e89b-12d3-a456-426655440000 as one token and avoids implicit phrase search:

field uuid type string {
    indexing: summary | index
    match:    word
    rank:     filter
}
Also configure uuid as a rank: filter field - the field will then be represented as efficient as possible during search and ranking.

Summary: Review the string fields in the application:

  • tokenized matching or not
  • used in ranking or not
The rank:filter behavior can also be triggered at query time on a per query item basis by the com.yahoo.prelude.query.Item.setRanked() in a custom searcher.

Parent child and search performance

When searching imported attribute fields from parent document types there is an additional cost penalty which can be reduced significantly if the imported field is defined with rank:filter and visibility-delay is configured to > 0.

Ranking

Vespa scales with the number of hits the query recalls which needs to be ranked per node. The ranking cost per document recalled is determined by the complexity of the ranking expression in use and the rank feature complexity.

Document summaries

If queries request many hits from a few content nodes, a summary cache might reduce cost.

Document summaries can be memory-only operations is all fields are attributes. Use a summary class to request attribute fields only.