The bm25 rank feature implements the Okapi BM25 ranking function used to estimate the relevance of a text document given a search query. It is a pure text ranking feature which operates over an indexed string field. The feature is cheap to compute, about 3-4 times faster than nativeRank, while still providing a good rank score quality wise. It is a good candidate to use in a first phase ranking function when ranking text documents.
The bm25 feature calculates a score for how good a query with terms matches an indexed string field t in a document D. The score is calculated as follows:
: The
inverse document frequency
(IDF) of query term i in field t. This is calculated as:
As the IDF is calculated per content node and index, slight variations might occur. To use the same IDF across all content nodes, set it as the significance on each query term using annotations.
In the following example we have an indexed string field content, and a rank profile using the bm25 rank feature. Note that the field must be enabled for usage with the bm25 feature by setting the enable-bm25 flag in the index section of the field definition.
schema example { document example { field content type string { indexing: index | summary index: enable-bm25 } } rank-profile default { first-phase { expression { bm25(content) } } } }
If the enable-bm25 flag is turned on after documents are already fed, some extra steps must be executed to prepare the posting lists in the memory and disk indexes for this field. For each content node do the following:
vespa-proton-cmd --local triggerFlush
vespa-proton-cmd --local triggerFlush