Ranking is where Vespa does computing, or inference over documents retrieved by a query. The goal is to order (rank) the documents retrieved.
The computations are expressed in functions called ranking expressions, bundled into rank profiles defined in schemas. These can range from simple math expressions combining some rank features, to tensor expressions or large machine-learned Onnx models.
Rank profiles can define two phases that are evaluated locally on content nodes, which means that no data needs to be transferred to container nodes to make inferences over data:
schema myapp { rank-profile my-rank-profile { num-threads-per-search:4 first-phase { expression { attribute(quality) * freshness(timestamp) } } second-phase { expression: sum(onnx(my_onnx_model)) rerank-count: 50 } } }
The first phase is executed for all matching documents while the second is executed for the top-scoring rerank-count documents per content node as scored by the first-phase function. This is useful to direct more computation towards the most promising candidate documents, see phased ranking.
It's also possible to define an additional phase that runs on the stateless container nodes after merging hits from the content nodes. Please read the global-phase as part of phased ranking documentation for more details. This can be more efficient use of CPU (especially with many content nodes) and can be used instead of second-phase, or in addition to a moderately expensive second-phase as in the example below. This phase also supports GPU acceleration.
schema myapp { rank-profile my-rank-profile { first-phase { expression: attribute(quality) * freshness(timestamp) } second-phase { expression { my_combination_of(fieldMatch(title), bm25(abstract), attribute(quality), freshness(timestamp)) } } global-phase { expression: sum(onnx(my_onnx_model)) rerank-count: 100 } } }
Vespa supports ML models in these formats:
As these are exposed as rank features, it is possible to rank using a model ensemble. Deploy multiple model instances and write a ranking expression that combines the results:
schema myapp { onnx-model my_model_1 { ... } onnx-model my_model_2 { ... } rank-profile my-rank-profile { ... second-phase { expression: max( sum(onnx(my_model_1), sum(onnx(my_model_2) ) } } }
Models are deployed in application packages. Read more on how to automate training, deployment and re-training in a closed loop using Vespa Cloud.
Ranking expressions are defined in rank profiles -
either inside the schema or equivalently in their own files in the
application package, named
schemas/[schema-name]/[profile-name].profile
.
One schema can have any number of rank profiles for implementing e.g. different use cases or bucket testing variations. If no profile is specified, the default text ranking profile is used.
Rank profiles can inherit other profiles. This makes it possible to define complex profiles and variants without duplication.
Queries select a rank profile using the ranking.profile argument in requests or a query profiles, or equivalently in Searcher code, by
query.getRanking().setProfile("my-rank-profile");
If no profile is specified in the query, the one called default
is used.
This profile is available also if not defined explicitly.
Another special rank profile called unranked
is also always available.
Specifying this boosts performance in queries which do not need ranking because random order is fine or
explicit sorting is used.
The default ranking is the first-phase function nativeRank
,
that is a function returning the value of the nativeRank rank feature,
and no second-phase. This default text scoring feature only considers how well a query matches
the searched field/fieldset.
The overall ranking expression might contain other ranking dimensions than just text match, like freshness, the quality of the document, or any other property of the document or query.
A simple alternative to nativeRank
for text scoring is using the
BM25 feature.
Another text matching feature is fieldMatch(field)
string segment match.
This feature combines the more basic fieldMatch sub-features in a reasonable way but has a
high computional cost compared to nativeRank
and BM25
and is only
suitable for second-phase
evaluation.
Modify the values of the match features from the query by sending weight, significance and connectedness with the query:
Feature input | Description |
---|---|
Weight |
Set query term weight.
Example: The term weight is used in several text scoring features, including fieldMatch(name).weight and nativeRank. Note that the term weight is not applicable for all text scoring features, for example bm25 does not use the term weigth. Configure static field weights in the schema. |
Significance |
Significance is an indication of how rare a term is in the corpus of the language, used by a number of text matching rank features. This can be set explicitly for each term in the query, or by calling item.setSignificance() in a Searcher. With indexed search, default significance values are calculated automatically during indexing. However, unless the indexed corpus is representative of the word frequencies in the user's language, relevance can be improved by passing significances derived from a representative corpus. Relative significance is accessible in ranking through the fieldMatch(name).significance feature. Weight and significance are also averaged into fieldMatch(name).importance for convenience. Streaming search does not compute term significance, queries should pass this with the query terms. Read more. |
Connectedness |
Signify the degree of connection between adjacent terms in the query - set query term connectivity to another term.
For example, the query Term connectedness is taken into account by fieldMatch(name).proximity, which is also an important contribution to fieldMatch(name). Connectedness is a normalized value which is 0.1 by default. It must be set by a custom Searcher, looking up connectivity information from somewhere - there is no query syntax for it. |