Vespa defines Big Data Serving as:

Selection, organization and machine-learned model inference,

  • over many, constantly changing data items (thousands to billions),
  • with low latency (~100 ms) and high load (thousands of queries/second)

Ranking enables organization and ML inference, and multi-phased ranking addresses latency and load:

schema myapp {

    rank-profile my-rank-profile {
        first-phase {
            expression: attribute(quality) * freshness(timestamp)
        second-phase {
            expression: sum(onnx("my-model.onnx", "add"))
Applications use the query API to select the documents to evaluate using a query language, and choose a rank profile for the ranking.

Ranking is running ranking expressions using rank features (values / computed values from queries, document and constants).

Note: Vespa also supports stateless model evaluation - making inferences without documents (i.e. query to model).


Rank profiles can have one or two phases:

  • Phase one should use a computationally inexpensive function to rank candidates. This phase is about recall, select the best candidates.
  • Phase two is run on a small candidate set. This phase is about precision - use more resources to fine tune ranking for best candidates on top.
In short, the query selection and first phase ranking reduces the size of the computation - then machine-learned models can be used on the second-phase ranking on rerank-count documents per node. This makes the ranking scalable (see sizing):
  • Control the second phase candidate set size
  • Add content nodes to rank less documents per node
Ranking is run in the content cluster. This means, online inference (a.k.a. real time inference or dynamic inference) is executed for the candidate document set. Read more about phased ranking.

Machine-Learned model inference

Vespa supports the following ML models:

As these are exposed as rank features, it is possible to rank using a model ensemble. Deploy multiple model instances and write a rank expression that combines the results (max, avg, custom, ...) - example:
schema myapp {

    rank-profile my-rank-profile {
        second-phase {
            expression: max( sum(onnx("my-model-1.onnx", "add"), sum(onnx("my-model-2.onnx", "add") )

Model Training and Deployment

To use data in Vespa to train a model, refer to the Learning to Rank guide.

Models are deployed in application packages. Read more on how to automate training, deployment and re-training in a closed loop using Vespa Cloud.

Rank profile

Ranking expressions are stored in rank profiles. An application can have multiple rank profiles - this can be used for implementing different use cases, or bucket testing ranking variations. If not specified, the default text ranking profile is used.

A rank profile can inherit another rank profile.

Queries select rank profile using ranking.profile, or in Searcher code:

Note that some use cases (where hits can be in any order, or explicitly sorted) performs better using the unranked rank profile.