Ranking Introduction

Vespa computes one or more ranking expressions for documents matching a query. The results of the computations can be returned with the documents and used to order and further select the documents that should be returned.

Ranking expressions are mathematical functions over tensors or scalars. The function may contain anything from a single reference to a built-in feature to a machine-learned models from Tensorflow, ONNX or XGBoost - refer to advanced ranking for details on using dot products, tensors and wand.

Ranking expressions and rank features

Vespa computes the rank score by a configured expression called a ranking expression. Ranking expressions look like standard mathematical expressions and support the usual operators and functions as well as an if function - enabling decision trees and conditional business logic. It supports a comprehensive set of tensor functions, which allows expressing machine-learned functions such as deep neural nets.

The primitive values which are combined by ranking expressions are called rank features. The rank features can be both scalars and tensors, and one of:

  • Constants set in the application package
  • Values sent with the query or set in the document
  • Computed by Vespa from the query and the document to provide information about how well the query matched the document

Two-phase ranking

Rank scores in general become more accurate by using complex expressions which use many features or large tensors. Even though Vespa is optimized for such calculations, complex expressions become expensive when calculated for each selected document. For this reason Vespa can be configured to run two ranking expressions - a smaller and less accurate one on all matches as they are found (first-phase ranking) and a more expensive and accurate one only on the best documents (second-phase ranking). This provides a better CPU budget distribution by dedicating more resources to the best candidates - see ranking.softtimeout.

By default, second-phase ranking (if specified) is run on the 100 best hits per content node, after matching and before information is returned to the container. The number of hits to rerank can be configured in the ranking expression.

Rank expressions are configured in the rank-profile section of search definitions. Example:

search myapp {
    …
    rank-profile default inherits default {
        first-phase {
            expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
        }
        second-phase {
            expression {
                xgboost("my_model.json") 
            }
            rerank-count: 200
        }
    }
}
In this example, the first phase uses the nativeRank feature as the ranking expression plus a freshness component. The contribution from the freshness feature is set in a query parameter. The second phase uses a trained xgboost model.

Ranking expressions can be large. They can be wrapped in curly braces, or put in files in the application package, using file:filename in place of the expression.

See the rank profiles reference for details.

It is possible to configure multiple rank profiles and choose between them in the query, for example to use different ranking expressions for different use cases or to bucket test new revisions. A rank profile may also inherit another to allow specifying only the differences between two profiles.

Choosing functions for the first and second phase

A good ranking expression will for most applications consume too much CPU to be runnable on the entire result set, so in most cases the desired rank function should be used as the second phase function. The task then becomes to find a first phase function, which correlates sufficiently well with the second phase function, to ensure that relevance is not hurt too much by not evaluating the second phase function on all the hits. In many cases, nativeRank will work well as the first phase function. The impact of using a cheaper function in the first phase can be assessed by deploying the second order function as the first order function in a test rank profile, and comparing the results to the production rank profile.

nativeRank

The default ranking is nativeRank in the first phase and no second phase re-ranking. The nativeRank is a feature which gives a reasonably good rank score, while being fast enough to be suitable for first phase ranking. See the native rank reference and native rank introduction for more information.

Choosing ranking expressions

There are two methods for deciding ranking expressions for the application.

The first method is to hand write a ranking expression. For example, assuming the title field is more important than the body field, create a ranking expression which gives more weight to that field, as in the example above. Vespa contains some built-in convenience support for this - weights can be set in the individual fields by weight: <number> and the feature match can be used to get a weighted average of the fieldMatch scores of each field. The built in text matching feature nativeRank has a set of tunable parameters (including field weight) to control the text matching ranking component in the overall ranking expression. The overall ranking expression might contain other ranking dimensions than just text match, like freshness, the quality of the document, or any other property of the document or query. Hand writing is fine if the intended ranking is well defined enough to be easily mappable into a ranking expression, or if the alternative below is too expensive.

The second method is to produce the ranking expression automatically from a training set - a set of document and query pairs where a human has assessed the quality of the document as an answer to the query. Given many (tens of thousands) of such judgements, and the rank feature values for each, a machine learning algorithm can be used to produce the ranking expression and/or the weights of the expression (often represented in a constant tensor).

Choosing features

Vespa's rank feature set contains a large set of low level features, as well as some higher level features. If automated training is used, all features can often just be handed to the training algorithm to let it choose which ones to use. Depending on the algorithm, it can be a good idea to leave out the unnormalized features to avoid spending learning power on having to learn to normalize these features and determine that they really represent the same information as some of the normalized features.

If the expression is written manually, it might be most convenient to stick with using the fieldMatch(name) feature for each field. This feature combines the more basic fieldMatch features in a reasonable way. A good way to combine the fieldMatch score of each field is to use a weighted average as explained above. Another way is to combine the field match scores using the fieldMatch(name).weight/significance/importance features which takes term weight or rareness or both into account and allows a normalized score to be produced by simply summing the product of this feature and any other normalized per-field score for each field. In addition, some attribute value(s) must usually be included to determine the a priori quality of each document.

Feature contribution functions

The ranking features in Vespa are linear. For example, the earliness feature is 0 if the match is at the very end of the field, 1 if the match is at the very start of the field, and 0.5 if the match is exactly in the middle of the field. In many cases, we do not want the contribution of a feature to be linear with its "goodness". For example, we may want earliness to decay quickly in the beginning, as the match moves further out, but decay very slowly as it nears the end of the field, from the intuition that it matters a lot if the match is of the first word or the twentieth in the field, but it doesn't matter as much if the match is at the thousands or thousand-and-twentieths.

To achieve this, we need to pass the feature value through a function which turns the line into a curve matching our intent. This is easiest if you stick to normalized fields. Then we are looking for any function which begins and ends in the same point, f(0)=0 and f(1)=1, but which curves in between. To get the effect described above, we need a curve which starts almost flat and ends very steep. One example of a function like that is:

pow(0-x,2)
The second number decides how pronounced the curving is. A larger number will make changes to higher x values even more important relative to the same change to lower x values.

Normalization

The rank features provided includes both features normalized to the range 0-1, and un-normalized features like counts and positions. Whenever possible, prefer the normalized features. They capture the same information (and more), but are easier to use because they can be combined more easily with other features. In addition, try to write ranking expressions such that the combined rank score is also normalized, for example by taking averages not sums. The resulting normalized rank scores makes it possible to implement relevance based blending, search assistance triggering when there are no good hits, and so on.

The if function and string equality tests

if can be used for other purposes than encoding MLR trained decision trees. One use is to choose different ranking functions for different types of documents in the same search. Ranking expressions are able to do string equality tests, so to choose between different ranking sub-functions based on the value of a string attribute (say, "category"), use an expression like:

if (attribute(category)=="restaurant",…restaurant function, if (attribute(category)=="hotel",…hotel function, …))
This method is also used automatically when multiple search definitions are deployed to the same cluster, and all is searched in the same query to choose the ranking expression from the correct search definition for each document.

By using if functions, one can also implement strict tiering, ensuring that documents having some criterions always gets a higher score than the other documents. Example:

if (fieldMatch(business).fieldCompleteness==1, 0.8+document.distance*0.2,
                                               if (attribute(category)=="shop", 0.6+fieldMatch(title)*0.2,
                                                                                 match*attribute(popularity)*0.6 )
This function puts all exact matches on business names first, sorted by geographical distance, followed by all shops sorted by title match, followed by everything else sorted by the overall match quality and popularity.

Weight, significance and connectedness

It is possible to influence the values of the match features calculated by Vespa from the query by sending weight, significance and connectedness with the query:

Weight

Signify that some query terms are more or less important than others in matches. For example query=large shoes!200 specifies that he term "shoes" should be twice as important for the final rank score than "large" (since the default weight is 100).

Weight is used in fieldMatch(name).weight, which can be multiplied with fieldMatch(name) to yield a weighted score for the field, and in fieldMatch(name).weightedOccurrence to get a occurrence score which is higher if higher weighted terms occurs most. Configure static field weights in the search definition.

Significance

How rare a particular term is in the corpus or the language. This is sometimes valuable information because if a document matches a rare word, it might mean the document is more important than one which matches a common word. Significance is calculated automatically by Vespa during indexing, but can also be overridden by setting the significance values on the query terms in a Searcher component. Significance is accessible in fieldMatch(name).significance, which can be used the same way as weight. Weight and significance are also averaged into fieldMatch(name).importance for convenience.

Connectedness

Signify the degree of connection between adjacent terms in the query. For example, the query new york newspaper should have a higher connectedness between the terms "new" and "york" than between "york" and "newspaper" to rank documents higher if they contain "new york" as a phrase. Term connectedness is taken into account by fieldMatch(name).proximity, which is also an important contribution to fieldMatch(name). Connectedness is a normalized value which is 0.1 by default. It must be set by a custom Searcher, looking up connectivity information from somewhere - there is no query syntax for it.

Rank features: dumping ranking information for tuning

"All" rank features can be included in the results for each document by adding rankfeatures to the query. This is useful for tasks like recording the rank feature values for automated training. Since the set of actual feature computable are in general infinite, "all" features really means a large default set. If more rank features than is available in the default set is wanted, they can be added to the set in the rank profile:

rank-features: feature1 feature2 …
This list can also be enclosed in curly brackets and span multiple lines. It is also possible to take full control over which features will be dumped by adding
ignore-default-rank-features
to the rank profile. This will make the explicitly listed rank features the only ones dumped when requesting rankfeatures in the query. The features are dumped in JSON format.

Dumping rank features for specific documents

When you have a training set containing judgements for certain documents, it is useful to select those documents in the query by adding a term matching the document id, but without impacting the values of any rank features. To do this, add that term with rankedcode> set to false. In YQL:

select * from mydocumenttype where myidfield contains ([ {"ranked": false} ] "mydocumentid" and ...);

Summary features: getting match information in the results

As rankfeatures dumps too many features to to be usable for production, Vespa also allows a smaller number of feature values to be returned with each result by specifying a list of summary-features. This can be used, for example, to write custom Searcher or presentation logic which depends on which fields was matched and how well they matched - see inspecting structured data.

A list of summary features is set by adding to the rank profile(s):

summary-features: feature1 feature2 …
The list can also be enclosed by curly brackets and split into multiple lines.

Summary features is useful to dump tensors results from ranking expressions:

rank-profile test {
    summary-features: output_indexed_tensor output_mapped_tensor output_mixed_tensor

    function output_indexed_tensor() {
        expression: attribute(indexed_tensor)
    }
    function output_mapped_tensor() {
        expression: attribute(mapped_tensor)
    }
    function output_mixed_tensor() {
        expression: attribute(mixed_tensor)
    }
}

Feature configuration

Some features, most notably the fieldMatch features, contains configuration parameters which allows the feature calculation to be tweaked per field for performance or relevance. Feature configuration values are set by adding:

rank-properties {
    featureName.configurationProperty: "value"
}
to the rank profile. These values are set per field, so for example to set some values for the title field and some others for the description field, add:
rank-properties {
    fieldMatch(title).maxAlternativeSegmentations: 10
    fieldmatch(title).maxOccurrences: 5
    fieldMatch(description).maxOccurrences: 20
}
The full list of configuration features is found in the rank feature configuration reference.

Using constants

Ranking expressions can refer to constants defined in a constants clause:

first-phase {
    expression: myConst1 + myConst2
}
constants {
    myConst1: 1.5
    myConst2: 2.5
    ...
}
Constants lists are inherited and can be overridden in sub-profiles. This is useful to create a set of rank profiles that use the same broad ranking but differs by constants values.

For performance, always prefer constants to query variables (see below) whenever the constant values to use can be enumerated in a set of ranking profiles. Constants are applied to ranking expressions at configuration time, and the resulting constant parts of expressions calculated, which may lead to reduced execution cost, especially with tensor constants.

Using query variables

As ranking expressions can refer to any feature by name, one can use query features as ranking variables. These variables can be assigned default values in the rank profile by adding:

rank-properties {
    query(name): 0.5 
}
to the ranking profile. Also, these variables can be overridden in the query by adding:
rankfeature.query(name)=0.1
to the query - see the Search API. These variables can be used for example to allow the query to specify the degree of importance to various parts of a ranking expression, or to quickly search large parameter spaces to find a good ranking, by trying different values in each query.

Function snippets

When using machine learned ranking, we are searching a function space which is much more limited than the space of functions supported by ranking expressions. We can increase the space of functions available to MLR because the primitive features used in MLR training do not need to be primitive features in Vespa - they can just as well be ranking expression snippets. If there are certain mathematical combinations of features believed to be useful in an application, these can be pre-calculated from the actual primitive features of Vespa and given to MLR as primitives. Such primitives can then be replaced textually by the corresponding rank expression snippet, before the learned expression is deployed on Vespa.

Vespa supports expression functions. Functions having zero arguments can be used as summary- and rank-features. For example, the function "myfeature":

rank-profile myrankprofile inherits default {
    function myfeature() {
      expression: fieldMatch(title).completeness * pow(0 - fieldMatch(title).earliness, 2)
    }
}
becomes available as a feature as follows:
summary-features {
    myfeature
}

Tracking relevance variations over time

Vespa comes with a few simple metrics for relevance that enables applications to see how relevance changes over time, either as a result of changes to how relevance is computed, changes to query construction, changes to the content ingested, or as a result of changing user behavior.

The relevance metrics are relevance.at_1, relevance.at_3 and relevance.at_10. See metrics for more information.