Ranking

Vespa will order the documents selected by a query using a ranking expression, which is a mathematical function returning a score for each document.

Ranking expressions and rank features

Vespa computes the rank score by a configured mathematical expression called a ranking expression. Ranking expressions looks like standard mathematical expressions and supports the usual operators and functions as well as an if function which allows decision trees and conditional business logic. In addition to scalars, ranking expressions can compute over tensors, which makes it possible to use machine-learned functions such as for example deep neural nets.

The primitive values which are combined by ranking expressions are called rank features. Rank features are constants set in the application package, values sent with the query or set in the document, or they are computed by Vespa from the query and the document to provide information about how well the query matched the document. The rank features may be both scalars and tensors. See the full list of rank features

Two-phase ranking

Rank scores in general becomes more accurate by using complex expressions which use many features or large tensors. But even though Vespa is heavily optimized for such calculations, complex expressions become expensive when they must be calculated for each selected document. For this reason Vespa can be configured to run two ranking expressions - a smaller and less accurate one on all matches as they are found (first-phase ranking) and a more expensive and accurate one only on the best documents (second-phase ranking). This often provides a more optimal usage of the cpu budget by dedicating more of the total cpu towards the best candidates.

Second-phase ranking, if specified, will by default be run on the 100 best hits on each content node, after matching and before information is returned upwards to the container. The number of hits to rerank can be configured in the ranking expression.

Configuring rank expressions

Rank expressions are configured in the rank profile section of search definitions. Example:

search myapp {

    …

    rank-profile default inherits default {

        first-phase {
            expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
        }

        second-phase {
            expression {
                0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) +
                0.3 * attributeMatch(keywords)
            }
            rerank-count: 200
        }
    }
}
In this example, the first phase uses the single nativeRank feature as the ranking expression plus a freshness component, where the contribution from the freshness feature is determined by a query parameter. The second phase uses a ranking expression which combines the match features of a few fields, and also specifies that the second phase should happen for the 200 best hits per node instead of the default 100 (in a real application this expression would usually be very large, or use tensors).

If the ranking expressions becomes too large to write on a single line, they can be wrapped in curly braces, or put in separate files in the application package an referred by file:filename in place of the expression. It is also possible to set configuration values to change how the rank features are calculated, and specify which rank feature values should be included in the hit information sent to the search container. See the rank profiles reference for details.

When ranking expressions are changed or added, the changes can be deployed by redeploying the application package. There is no need for any restarting or reindexing.

Note that it is possible to specify multiple rank profiles and choose between them in the query, for example to use different ranking expressions for different use cases or to bucket test new revisions. A rank profile may also inherit another to allow specifying only the differences between two profiles.

Advanced ranking functions using tensors

Many modern machine-learning methods such as neural nets and large logistic regressions produces functions with thousands to millions of parameters. Vespa features a native tensor model and operator set which makes it possible to express such functions succinctly as ranking expression, which Vespa will execute as optimized code to compute a score for each document. See the tensor user guide for more.

Choosing ranking expressions

There are two methods for deciding a ranking expressions for your application.

The first method is to hand write an expression which captures the ranking you intend. For example, if you know that the title field is more important than the body field, you can create a ranking expression which gives more weight to that field, as in the example above. Vespa contains some built-in convenience support for this - weights can be set in the individual fields by weight: <number> and the feature match can be used to get a weighted average of the fieldMatch scores of each field. The built in text matching feature nativeRank has a set of tunable parameters (including field weight) to control the text matching ranking component in the overall ranking expression. The overall ranking expression might contain other ranking dimensions then just text match, like freshness, the quality of the document, or any other property of the document or query. Hand writing is fine if the intended ranking is well defined enough to be easily mappable into a ranking expression, or if the alternative below is too expensive.

The second method is to produce the ranking expression automatically from a training set - a set of document and query pairs where a human has assessed the quality of the document as an answer to the query. Given many (tens of thousands) of such judgements, and the rank feature values for each, a machine learning algorithm can be used to produce the ranking expression and/or the weights of the expression (often represented in a constant tensor).

Choosing features

The rank feature set of Vespa contains a large set of low level features as well as some higher level features. If automated training is used, all features can often just be handed to the training algorithm to let it choose which ones to use. Depending on the algorithm, it may be a good idea to leave out the unnormalized features to avoid spending learning power on having to learn to normalize these features and determine that they really represent the same information as some of the normalized features.

If the expression is written manually, it might be most convenient to stick with using the fieldMatch(name) feature for each field. This feature combines the more basic fieldMatch features in a reasonable way. A good way to combine the fieldMatch score of each field is to use a weighted average as explained above. Another way is to combine the field match scores using the fieldMatch(name).weight/significance/importance features which takes term weight or rareness or both into account and allows a normalized score to be produced by simply summing the product of this feature and any other normalized per-field score for each field. In addition, some attribute value(s) must usually be included to determine the a priori quality of each document.

Feature contribution functions

The ranking features in Vespa are linear. For example the earliness feature is 0 if the match is at the very end of the field, 1 if the match is at the very start of the field, and 0.5 if the match is exactly in the middle of the field. In many cases, we do not want the contribution of a feature to be linear with its "goodness". For example, we may want earliness to fall of quickly in the beginning as the match moves further out, but fall of just very slowly as it nears the end of the field, from the intuition that it matters a lot if the match is of the first word or the twentieth in the field, but it doesn't matter as much if the match is at the thousands or thousand-and-twentieths.

To achieve such things, we need to pass the feature value through a function which turns the line into a curve matching our intent. This is most easy if you stick to normalized fields. Then we are looking for any function which begins and ends in the same point, f(0)=0 and f(1)=1, but which curves in between. To get the effect described above, we need a curve which starts almost flat and ends very steep. One example of a function like that is:

pow(0-x,2)
The second number here decides how pronounced the curving is. A larger number will make changes to higher x values even more important relative to the same change to lower x values.

Normalization

The rank features provided includes both features normalized to the range 0-1 and unnormalized features like counts and positions. Whenever possible, prefer the normalized features. They capture the same information (and more), but are easier to use because they can be combined more easily with other features. In addition, try to write ranking expressions such that the combined rank score is also normalized, for example by taking averages not sums. The resulting normalized rank scores makes it possible to implement relevance based blending, search assistance triggering when there are no good hits and so on.

Literal boosting

By default, Vespa does stemming and normalization of the words in the indexes and queries to increase recall. Some times it helps relevance to write a ranking expression which gives a higher score to matches where the literal form used in the query matches the literal form used in the document. This can be done by adding rank:literal to the fields in question to turn this feature on, and writing a ranking function which accesses features for a field names as the original field with _literal appended. For example, if you have a field:

field title type string {
    indexing: index
    rank:     literal
}
You can write this ranking expression: 0.9*fieldMatch(title) + 0.1*fieldMatch(title_literal)

The if function and string equality tests

The if function can be used for other purposes than encoding MLR trained decision trees. Another use is to choose different ranking functions for different types of documents in the same search. Ranking expressions are able to perform string equality tests, so to choose between different ranking sub-functions based on the value of a string attribute (say, "category"), we can write an expression like this

if (attribute(category)=="restaurant",…restaurant function, if (attribute(category)=="hotel",…hotel function, …))
This method is also used automatically when multiple search definitions are deployed to the same cluster and all is searched in the same query to choose the ranking expression from the correct search definition for each document.

By using if functions, we can also implement strict tiering, where we ensure that documents having some criterions always gets a higher score than the other documents. An example:

if (fieldMatch(business).fieldCompleteness==1, 0.8+document.distance*0.2,
                                               if (attribute(category)=="shop", 0.6+fieldMatch(title)*0.2,
                                                                                 match*attribute(popularity)*0.6 )
This function puts all exact matches on business names first, sorted by geographical distance, followed by all shops sorted by title match, followed by everything else sorted by the overall match quality and popularity.

Weight, significance and connectedness

It is possible to influence the values of the match features calculated by Vespa from the query by sending weight, significance and connectedness with the query:

Weight Signify that some query terms are more or less important than others in matches. For example query=large shoes!200 specifies that he term "shoes" should be twice as important for the final rank score than "large" (since the default weight is 100). Weight is used in fieldMatch(name).weight, which can be multiplied with fieldMatch(name) to yield a weighted score for the field, and in fieldMatch(name).weightedOccurrence to get a occurrence score which is higher if higher weighted terms occurs most. Configure static field weights in the search definition.
Significance How rare a particular term is in the corpus or the language. This is sometimes valuable information because if a document matches a rare word, it might mean the document is more important than one which matches a common word. Significance is calculated automatically by Vespa during indexing, but can also be overridden by setting the significance values on the query terms in a Searcher component. Significance is accessible in fieldMatch(name).significance, which can be used the same way as weight. Weight and significance are also averaged into fieldMatch(name).importance for convenience.
Connectedness Signify the degree of connection between adjacent terms in the query. For example, the query new york newspaper should have a higher connectedness between the terms "new" and "york" than between "york" and "newspaper" to rank documents higher if they contain "new york" as a phrase. Term connectedness is taken into account by fieldMatch(name).proximity, which is also an important contribution to fieldMatch(name). Connectedness is a normalized value which is 0.1 by default. It must be set by a custom Searcher, looking up connectivity information from somewhere - there is no query syntax for it.

Rank features: dumping ranking information for tuning

"All" ranking features can be included in the results for each document by adding the rankfeatures flag to the query. This is useful for tasks like recording the rank feature values for automated training. Since the set of actual feature computable are in general infinite, "all" features really means a large default set. If more rank features than is available in the default set is wanted, they can be added to the set in the rank profile:

rank-features: feature1 feature2 …
This list can also be enclosed in curly brackets and span multiple lines. It is also possible to take full control over which features will be dumped by adding
ignore-default-rank-features
to the rank profile. This will make the explicitly listed rank features the only ones dumped when requesting rankfeatures in the query. The features are dumped on JSON format.

Summary features: getting match information in the results

As rankfeatures dumps way too many features to to be usable for production, Vespa also allows a smaller number of feature values to be returned with each result in production by specifying a list of summary features in the search definition. This can be used, for example, to write custom Searcher or presentation logic which depends on which fields was matched and how well they matched. See the inspecting structured data documentation for details about how to access summary features in a custom Searcher. A list of summary features is set per rank profile by adding:

summary-features: feature1 feature2 …
to the rank profile. The list can also be enclosed by curly brackets and split into multiple lines.

Feature configuration

Some features, most notably the fieldMatch features, contains configuration parameters which allows the feature calculation to be tweaked per field for performance or relevance. Feature configuration values are set by adding:

rank-properties {
    featureName.configurationProperty: "value"
}
to the rank profile. These values are set per field, so for example to set some values for the title field and some others for the description field, add:
rank-properties {
    fieldMatch(title).maxAlternativeSegmentations: 10
    fieldmatch(title).maxOccurrences: 5
    fieldMatch(description).maxOccurrences: 20
}
The full list of configuration features is found in the rank feature configuration reference.

Choosing functions for the first and second phase

A good ranking expression will for most applications consume too much cpu to be runnable on the entire result set, so in most cases the desired rank function should be used as the second phase function. The task then becomes to find a first phase function which correlates sufficiently well with the second phase function to ensure that relevance is not hurt too much by not evaluating the second phase function on all the hits. In many cases, nativeRank will work well as the first phase function. The impact of using a cheaper function in the first phase can be assessed by deploying the second order function as the first order function in a test rank profile and comparing the results to the production rank profile.

Performance notes

Evaluating ranking expressions is in itself not expensive, but expressions returning the value of a single rank feature only are more optimal still, so they should be preferred as first phase functions when possible. The total latency is typically linear with the number of hits the query matches.

The ranking framework will ensure that only values actually used by rank expressions in the chosen query will be calculated and only in the phase where they are used. Because of this, there is additional cost in actually using features even though they are always available for use.

The fieldMatch (and hence match) features contain information about the best matches of the query to the field text and are quite expensive to calculate - probably not suitable for a first phase function in most applications. It is possible to make the fieldMatch features less expensive (and less accurate) by setting the maxAlternativeSegmentations configuration value - see string segmentation match configuration parameters.

If the reranking count is turned up, more cpu will be needed to do the reranking, but in addition more memory will be needed per query being executed to hold temporary information needed between the first and second phase.

Raw scores and query item labeling

Vespa ranking is very flexible and relatively decoupled from document matching. The output from the matching pipeline typically indicates how the different words in the query matches a specific document and lets the ranking framework figure out how this translates to match quality.

However, some of the more complex match operators will produce scores directly rather than expose underlying match information. A good example is the wand operator. During ranking, a wand will look like a single word that has no detailed match information, but rather a numeric score attached to it. This is called a raw score and can be included in ranking expressions using the rawScore feature.

The rawScore feature takes a field name as parameter and will give the sum of all raw scores produced by the query for that field. If more fine-grained control is needed (the query contains multiple operators producing raw scores for the same field, but we want to handle those scores separately in the ranking expression) the itemRawScore feature may be used. This feature takes a query item label as parameter and gives the raw score produced by that item only.

Query item labeling is a generic mechanism that can be used to attach symbolic names to query items. A query item is labeled by using the setLabel method on a query item in the search container query API.

Using constants

Ranking expressions may refer to constants defined in a constants clause:

first-phase {
    expression: myConst1 + myConst2
}
constants {
    myConst1: 1.5
    myConst2: 2.5
    ...
}
Constants lists are inherited and can be overridden in sub-profiles. This is useful to create a set of rank profiles that use the same broad ranking but differs by constants values.

For performance, always prefer constants to query variables (see below) whenever the constant values to use can be enumerated in a set of ranking profiles. Constants are applied to ranking expressions at configuration time and the resulting constant parts of expressions calculated, which may lead to reduced execution cost, especially with tensor constants.

Using query variables

Because ranking expressions may refer to any feature by name, one can use query features as ranking variables. These variables can be assigned default values in the rank profile by adding:

rank-properties {
    query(name): "value"
}
to the ranking profile. In addition, these variables can be overridden in the query by adding:
rankfeature.query(name)=value
to the HTTP query - see the Search API. These variables can be used for example to allow the query to specify the degree of importance to various parts of a ranking expression, or to quickly search large parameter spaces to find a good ranking by trying different values in each query.

Function Snippets as MLR Level Features

When using machine learned ranking, we are searching a function space which is much more limited than the space of functions supported by ranking expressions. We can increase the space of functions available to MLR because the primitive features used in MLR training does not need to be primitive features in Vespa - they can just as well be ranking expression snippets. If there are certain mathematical combinations of features believed to be useful in an application, these can be pre-calculated from the actual primitive features of Vespa and given to MLR as primitives. Such primitives can then be replaced textually by the corresponding rank expression snippet before the learned expression is deployed on Vespa.

Vespa supports the concept of expression macros, see reference. As a by-product of supporting this, you can now write macros that accept zero arguments, and use their output as both a summary- and rank-feature. The idea of the macro was to allow easier creation of complex ranking expressions, not to produce rank-features for MLR, so the generated feature names are not very accessible. E.g. a macro "myfeature" written as:

rank-profile myrankprofile inherits default {
    macro myfeature() {
      expression: fieldMatch(title).completeness * pow(0 - fieldMatch(title).earliness, 2)
    }
    first-phase {
      expression: nativeRank + freshness(date) + myfeature
    }
}
The macro can be retrieved in summary features by telling the system to include the feature it would generate:
summary-features {
    rankingExpression(myfeature@)
}
where the "rankingExpression" prefix, the parenthesis and the trailing at-sign are all required.

nativeRank

The default ranking is nativeRank in the first phase and no second phase re-ranking. The nativeRank is a feature which gives a reasonably good rank score while being fast enough to be suitable for first phase ranking. See the native rank reference and native rank introduction for more information.