Phased Ranking

Rank scores in general become more accurate by using complex expressions which use many features, complex features or large tensors. Even though Vespa is optimized for such calculations, complex expressions become expensive when calculated for each selected document. For this reason Vespa can be configured to run multiple phases of ranking expressions - a smaller and less accurate one on all matches as they are found (first-phase ranking) and a more expensive and accurate one only on the best documents (second-phase ranking). This provides a better CPU budget distribution by dedicating more resources to the best candidates as calculated per ranking phase.

Read the query API guide to get an overview of how queries are executed in Vespa, before continuing this guide.

Vespa supports multiple rank phases:

  • Re-ranking of the top hits from the content nodes using a custom searcher component
  • Re-ranking at each content node using two phase ranking configured by rank profiles with ranking expressions working on scalars, tensors or matching features
  • Retrieval and Ranking with query operators like WAND/Weak And which speeds up retrieval of top matches of a disjunction query (OR) or Nearest Neighbor Search.

Container top matches re-ranking

It's possible to write custom searcher plugin which can re-rank hits using any custom scoring function. The searcher can also blend and re-order hits from multiple sources when using federation of content sources.

public class ReRankerSearcher extends Searcher {
    
    @Override
    public Result search(Query query, Execution execution) {
        // Pass on to the next searcher to execute the query
        Result result = execution.search(query);
        // Fill the result including summary-features 
        execution.fill(result); 
        for(Hit h: result.hits()) {
            h.getRelevance().setScore(myReranker(h));
        }
        return result;
    }

    double myReranker(Hit h) {
        double newScore = 0.1d;
        //Do the magic
        return newScore;
    }
}
The searcher might request rank features calculated by the content nodes to be returned along with the hit fields using summary-features. The features returned can be configured in the rank-profile as summary-features. As the example above is executed in a single thread iterating over the top hits, the number of hits and complexity of the re-ranking need to be adjusted per the given latency budget. The number of hits is limited by the query api hits parameter and maxHits setting. The hits available for container level re-ranking is the global top ranking hits after content nodes have retrieved and ranked the hits and global top ranking hits have been found by merging the responses from the content nodes.

Two-phase ranking content nodes

Rank expressions are configured in the rank-profile section of schema. The rank-profile supports having two phases of ranking; first-phase and second-phase:

By default, second-phase ranking (if specified) is run on the 100 best hits from the first-phase ranking per content node, after matching and before information is returned to the container. The number of hits to rerank can be configured as well in the rank-profile. Example:

schema myapp {
    …
    rank-profile title-freshness inherits default {
        first-phase {
            expression {
                nativeRank(title) + freshness(timestamp)
            }
        }
        second-phase {
            expression {
                xgboost("my-model.json")
            }
            rerank-count: 50
        }
    }
}
In this example, the first phase uses the text matching feature nativeRank scoped to the title field plus one of the built-in document ranking features named freshness over a timestamp field which stores the epoch time in second resolution. For each content node, the top 50 hits from the first phase function is re-ranked using a trained xgboost model.

Top-K Query Operators

If the first-phase ranking function can be approximated as a simple linear function, and the query mode is any, the Weak And/WAND implementations in Vespa allows avoiding fully evaluating all the documents matching the query with the first-phase function. Instead, only the top K hits using the internal operator ranking will be exposed to the first-phase ranking function.

Both wand implementations accepts a targeted number of hits (top N). Each content node will expose all hits that were evaluated to the first-phase ranking function, while skipped documents will not. The skipped set are the ones that matches the query, but which cannot outperform any of the already collected hits on the internal scoring heap. Due to not fully evaluating all documents matching the query, the total hit count becomes inaccurate when using weakAnd/Wand.

The nearest neighbor search operator is also a top k retrieval operator which does ranking (by distance) and retrieval in one process and exposes only a small subset of the matching documents to the first-phase ranking function.

Choosing phased ranking functions

A good ranking expression will for most applications consume too much CPU to be runnable on the entire result set within the latency budget/SLA. The application rank function should hence in most cases be a second phase function. The task then becomes to find a first phase function, which correlates sufficiently well with the second phase function - this to ensure that relevance is not hurt too much by not evaluating the second phase function on all the hits.

The impact of using a cheaper function in the first phase can be assessed by deploying the second phase function as the first phase function in a test rank profile, and comparing the results to the production rank profile. During such evaluation, the timeout should be adjusted for the increased resource usage with using the second order function over all matching candidates.

With a dataset with query and document judgements one can evaluate the first-phase ranking function by it's effectiveness to retrieve relevant documents at a given position (e.g Recall@100.