Text Ranking

Refer to the ranking introduction for Vespa ranking. See the text search and test search through ML tutorials. Also relevant is the guide for Semantic Retrieval for Question Answering Applications.

The default ranking is nativeRank in the first phase and no second phase re-ranking. The nativeRank is a feature which gives a reasonably good rank score, while being fast enough to be suitable for first phase ranking. See the native rank reference and native rank introduction for more information.

An alternative to nativeRank is using the BM25 rank feature.

If the expression is written manually, it might be most convenient to stick with using the fieldMatch(name) feature for each field. This feature combines the more basic fieldMatch features in a reasonable way. A good way to combine the fieldMatch score of each field is to use a weighted average as explained above. Another way is to combine the field match scores using the fieldMatch(name).weight/significance/importance features which takes term weight or rareness or both into account and allows a normalized score to be produced by simply summing the product of this feature and any other normalized per-field score for each field. In addition, some attribute value(s) must usually be included to determine the a priori quality of each document.

For example, assuming the title field is more important than the body field, create a ranking expression which gives more weight to that field, as in the example above. Vespa contains some built-in convenience support for this - weights can be set in the individual fields by weight: <number> and the feature match can be used to get a weighted average of the fieldMatch scores of each field. The overall ranking expression might contain other ranking dimensions than just text match, like freshness, the quality of the document, or any other property of the document or query.

Weight, significance and connectedness

Modify the values of the match features from the query by sending weight, significance and connectedness with the query:

Weight

Set query term weight. Example: ... where (title contains ([{"weight":200}]"heads") AND title contains "tails") specifies that heads is twice as important for the final rank score than tails (the default weight is 100).

Weight is used in fieldMatch(name).weight, which can be multiplied with fieldMatch(name) to yield a weighted score for the field, and in fieldMatch(name).weightedOccurrence to get a occurrence score which is higher if higher weighted terms occurs most. Configure static field weights in the search definition.

Significance

How rare a particular term is in the corpus or the language. This is sometimes valuable information because if a document matches a rare word, it might mean the document is more important than one which matches a common word. Significance is calculated automatically by Vespa during indexing, but can also be overridden by setting the significance values on the query terms in a Searcher component. Significance is accessible in fieldMatch(name).significance, which can be used the same way as weight. Weight and significance are also averaged into fieldMatch(name).importance for convenience.

Connectedness

Signify the degree of connection between adjacent terms in the query. For example, the query new york newspaper should have a higher connectedness between the terms "new" and "york" than between "york" and "newspaper" to rank documents higher if they contain "new york" as a phrase. Term connectedness is taken into account by fieldMatch(name).proximity, which is also an important contribution to fieldMatch(name). Connectedness is a normalized value which is 0.1 by default. It must be set by a custom Searcher, looking up connectivity information from somewhere - there is no query syntax for it.