• [+] expand all

Ranking

Ranking is where Vespa does computing, or inference over documents. The computations to be done are expressed in functions called ranking expressions, bundled into rank profiles defined in schemas. These can range from simple math expressions combining some rank features, to tensor expressions or large machine-learned Onnx models.

Two-phase ranking

Rank profiles can define two phases that are evaluated locally on content nodes, which means that no data needs to be transferred to container nodes to make inferences over data:

schema myapp {

    rank-profile my-rank-profile {

        num-threads-per-search:4

        first-phase {
            expression {
                attribute(quality) * freshness(timestamp)
            }
        }

        second-phase {
            expression: sum(onnx(my_onnx_model))
            rerank-count: 50
        }

    }
}

The first phase is executed for all matching documents while the second is executed for the best rerank-count documents per content node according to the first-phase function. This is useful to direct more computation towards the most promising candidate documents, see phased ranking.

Global-phase ranking

It's also possible to define an additional phase that runs on the stateless container nodes after merging hits from the content nodes. Please read the global-phase as part of phased ranking documentation for more details. This can be more efficient use of CPU (especially with many content nodes) and can be used instead of second-phase, or in addition to a moderately expensive second-phase as in the example below.

schema myapp {
    rank-profile my-rank-profile {
        first-phase {
            expression: attribute(quality) * freshness(timestamp)
        }
        second-phase {
            expression {
                my_combination_of(fieldMatch(title), bm25(abstract), attribute(quality), freshness(timestamp))
            }
        }
        global-phase {
            expression: sum(onnx(my_onnx_model))
            rerank-count: 100
        }
    }
}

Machine-Learned model inference

Vespa supports ML models in these formats:

As these are exposed as rank features, it is possible to rank using a model ensemble. Deploy multiple model instances and write a ranking expression that combines the results:

schema myapp {

    onnx-model my_model_1 {
        ...
    }
    onnx-model my_model_2 {
        ...
    }

    rank-profile my-rank-profile {
    ...
        second-phase {
            expression: max( sum(onnx(my_model_1), sum(onnx(my_model_2) )
        }
    }
}

Model Training and Deployment

Models are deployed in application packages. Read more on how to automate training, deployment and re-training in a closed loop using Vespa Cloud.

Rank profiles

Ranking expressions are defined in rank profiles - either inside the schema or equivalently in their own files in the application package, named schemas/[schema-name]/[profile-name].profile.

One schema can have any number of rank profiles for implementing e.g. different use cases or bucket testing variations. If no profile is specified, the default text ranking profile is used.

Rank profiles can inherit other profiles. This makes it possible to define complex profiles and variants without duplication.

Queries select a rank profile using the ranking.profile argument in requests or a query profiles, or equivalently in Searcher code, by

query.getRanking().setProfile("my-rank-profile");

If no profile is specified in the query, the one called default is used. This profile is available also if not defined explicitly.

Another special rank profile called unranked is also always available. Specifying this boosts performance in queries which do not need ranking because random order is fine or explicit sorting is used.

Evaluation

Rank profiles are not evaluated lazily. Example:

function inline foo(tensor, defaultVal) {
    expression: if (count(tensor) == 0, defaultValue, sum(tensor))
}

function bar() {
    expression: foo(tensor, sum(tensor1 * tensor2))
}

Will the sum in the bar function be computed lazily, meaning only if tensor is empty?

No, this would require lambda arguments. Only doubles and tensors are passed between functions.

Text ranking

The default ranking is the first-phase function nativeRank, that is a function returning the value of the nativeRank rank feature, and no second-phase.

A good simple alternative to nativeRank for text ranking is using the BM25 rank feature.

If the expression is written manually, it might be most convenient to stick with using the fieldMatch(name) feature for each field. This feature combines the more basic fieldMatch features in a reasonable way. A good way to combine the fieldMatch score of each field is to use a weighted average as explained above. Another way is to combine the field match scores using the fieldMatch(name).weight/significance/importance features which takes term weight or rareness or both into account and allows a normalized score to be produced by simply summing the product of this feature and any other normalized per-field score for each field. In addition, some attribute value(s) must usually be included to determine the a priori quality of each document.

For example, assuming the title field is more important than the body field, create a ranking expression which gives more weight to that field, as in the example above. Vespa contains some built-in convenience support for this - weights can be set in the individual fields by weight: <number> and the feature match can be used to get a weighted average of the fieldMatch scores of each field. The overall ranking expression might contain other ranking dimensions than just text match, like freshness, the quality of the document, or any other property of the document or query.

Weight, significance and connectedness

Modify the values of the match features from the query by sending weight, significance and connectedness with the query:

Feature inputDescription
Weight

Set query term weight. Example: ... where (title contains ({weight:200}"heads") AND title contains "tails") specifies that heads is twice as important for the final rank score than tails (the default weight is 100).

Weight is used in fieldMatch(name).weight, which can be multiplied with fieldMatch(name) to yield a weighted score for the field, and in fieldMatch(name).weightedOccurrence to get an occurrence score which is higher if higher weighted terms occurs most.

Configure static field weights in the schema.

Significance

Significance is an indication of how rare a term is in the corpus of the language, used by a number of text matching rank features. This can be set explicitly for each term in the query, or by calling item.setSignificance() in a Searcher.

With indexed search, default significance values are calculated automatically during indexing. However, unless the indexed corpus is representative of the word frequencies in the user's language, relevance can be improved by passing significances derived from a representative corpus. Relative significance is accessible in ranking through the fieldMatch(name).significance feature. Weight and significance are also averaged into fieldMatch(name).importance for convenience.

Streaming search does not compute term significance, queries should pass this with the query terms. Read more.

Connectedness

Signify the degree of connection between adjacent terms in the query - set query term connectivity to another term.

For example, the query new york newspaper should have a higher connectedness between the terms "new" and "york" than between "york" and "newspaper" to rank documents higher if they contain "new york" as a phrase.

Term connectedness is taken into account by fieldMatch(name).proximity, which is also an important contribution to fieldMatch(name). Connectedness is a normalized value which is 0.1 by default. It must be set by a custom Searcher, looking up connectivity information from somewhere - there is no query syntax for it.