Ranking

Vespa defines Big Data Serving as:

Selection, organization and machine-learned model inference,
  • over many, constantly changing data items (thousands to billions),
  • with low latency (~100 ms) and high load (thousands of queries/second)
Ranking enables organization and ML inference, and multi-phased ranking addresses latency and load:
search myapp {

    rank-profile my-rank-profile {
        first-phase {
            expression: attribute(quality) * freshness(timestamp)
        }
        second-phase {
            expression: sum(onnx("my-model.onnx", "add"))
        }
    }
}
Applications use the query API to select the documents to evaluate using a query language, and choose a rank profile for the ranking. Rank profiles can have one or two phases:
  • Phase one should use a computationally inexpensive function to rank candidates
  • Phase two is run on a small candidate set
Read more about phased ranking.

In short, the query selection and first phase ranking reduces the size of the computation - then machine-learned models can be used on the second-phase ranking on rerank-count documents per node. This makes the ranking scalable (see sizing):

  • Control the second phase candidate set size
  • Add content nodes to rank less documents per node

Ranking is run in the content cluster. This means, online inference (a.k.a. real time inference or dynamic inference) is executed for the candidate document set. Ranking is running ranking expressions using rank features (values / computed values from queries, document and constants).

Note: Vespa also supports stateless model evaluation - making inferences without documents (i.e. query to model).

Machine-Learned model inference

Vespa supports the following ML models:

As these are exposed as rank features, it is possible to rank using a model ensemble. Deploy multiple model instances and write a rank expression that combines the results (max, avg, custom, ...) - example:
search myapp {

    rank-profile my-rank-profile {
    ...
        second-phase {
            expression: max( sum(onnx("my-model-1.onnx", "add"), sum(onnx("my-model-2.onnx", "add") )
        }
    }
}

Model Training

To use data in Vespa to train a model, refer to the Learning to Rank guide.

Rank profile

Ranking expressions are stored in rank profiles. An application can have multiple rank profiles - this can be used for implementing different use cases, or bucket testing ranking variations. If not specified, the default text ranking profile is used.

A rank profile can inherit another rank profile.

Queries select rank profile using ranking.profile, or in Searcher code:

query.getRanking().setProfile("my-rank-profile");
Note that some use cases (where hits can be in any order, or explicitly sorted) performs better using the unranked rank profile.

Tracking relevance variations over time

Vespa comes with a few simple metrics for relevance that enables applications to see how relevance changes over time, either as a result of changes to how relevance is computed, changes to query construction, changes to the content ingested, or as a result of changing user behavior.

The relevance metrics are relevance.at_1, relevance.at_3 and relevance.at_10. See metrics for more information.