Ranking is where Vespa does computing, or inference over documents. The computations to be done are expressed in functions called ranking expressions, bundled into rank profiles defined in schemas. These can range from simple math expressions combining some rank features, to tensor expressions or large machine-learned Onnx models.
Rank profiles can define two phases that are evaluated locally on content nodes, which means that no data needs to be transferred to container nodes to make inferences over data:
schema myapp { rank-profile my-rank-profile { num-threads-per-search:4 first-phase { expression { attribute(quality) * freshness(timestamp) } } second-phase { expression: sum(onnx(my_onnx_model)) rerank-count: 50 } } }
The first phase is executed for all matching documents while the second is executed for the best rerank-count documents per content node according to the first-phase function. This is useful to direct more computation towards the most promising candidate documents, see phased ranking.
It's also possible to define an additional phase that runs on the stateless container nodes after merging hits from the content nodes. This can be more efficient use of CPU (especially with many content nodes) and can be used instead of second-phase, or in addition to a moderately expensive second-phase.
schema myapp { rank-profile my-rank-profile { num-threads-per-search:4 first-phase { expression { attribute(quality) * freshness(timestamp) } } second-phase { expression { my_combination_of(fieldMatch(title), bm25(abstract), attribute(quality), freshness(timestamp)) } } global-phase { expression: sum(onnx(my_onnx_model)) rerank-count: 100 } } }In contrast to the first-phase and second-phase ranking, the global-phase expression needs its input data transferred from content nodes to the container; for the example above this means that any inputs used by my_onnx_model would be automatically added as match-features so the ONNX model can get the per-hit data it needs.
Vespa supports ML models in these formats:
As these are exposed as rank features, it is possible to rank using a model ensemble. Deploy multiple model instances and write a ranking expression that combines the results:
schema myapp { onnx-model my_model_1 { ... } onnx-model my_model_2 { ... } rank-profile my-rank-profile { ... second-phase { expression: max( sum(onnx(my_model_1), sum(onnx(my_model_2) ) } } }
Models are deployed in application packages. Read more on how to automate training, deployment and re-training in a closed loop using Vespa Cloud.
Ranking expressions are defined in rank profiles -
either inside the schema or equivalently in their own files in the
application package, named
schemas/[schema-name]/[profile-name].profile
.
One schema can have any number of rank profiles for implementing e.g. different use cases or bucket testing variations. If no profile is specified, the default text ranking profile is used.
Rank profiles can inherit other profiles. This makes it possible to define complex profiles and variants without duplication.
Queries select a rank profile using the ranking.profile argument in requests or a query profiles, or equivalently in Searcher code, by
query.getRanking().setProfile("my-rank-profile");
If no profile is specified in the query, the one called default
is used.
This profile is available also if not defined explicitly.
Another special rank profile called unranked
is also always available.
Specifying this boosts performance in queries which do not need ranking because random order is fine or
explicit sorting is used.
Rank profiles are not evaluated lazily. Example:
function inline foo(tensor, defaultVal) { expression: if (count(tensor) == 0, defaultValue, sum(tensor)) } function bar() { expression: foo(tensor, sum(tensor1 * tensor2)) }
Will the sum
in the bar
function be computed lazily,
meaning only if tensor
is empty?
No, this would require lambda arguments. Only doubles and tensors are passed between functions.