Dot Product Search Operator

DotProductItem is a query item available in the search container that can be used to search for documents matching a subset of weighted tokens while at the same time calculating the sparse dot product between token weights in the query item and the corpus. This is the query operator for which Parallel Wand is an optimization.

When to Use

You have a collection of weighted tokens produced by an algorithm and want to perform matching against a corpus containing weighted tokens produced by another algorithm in order to implement personalized content exploration.

Generally, the use-cases for dot product are the same as for wand. The raw scores produced by dot product operators are equivalent to those produced by Parallel Wand.

The difference is that Parallel Wand will perform local optimizations in order to retrieve the top-k results that would be returned by the dot product operator. This optimization will only yield correct results if the overall ranking is equal to the score produced by the dot product operator itself.

It might make sense to start out using dot product and later switch to wand if the performance gain outweighs the reduction in flexibility and correctness. Also note that benchmarking should be involved in such a switch, to quantify the possible gain in performance (which might be negative).

Here follows a list of cases where dot product might be preferable to wand (Better means more correct):

  • Might be more efficient with few tokens or few total results.
  • Works better with arbitrary rank expressions and compound queries.
  • Works better with grouping.
  • Scales better when partitioning the problem space.

How to Use

DotProductItem is an advanced feature. You may need a custom searcher to prepare the query. Before that, you will need a corpus with weighted tokens to be matched. Finally, it is all tied together by using the score produced by the dot product operator in a ranking expression.

Prepare the Corpus

Use a weighted set field in the document to store the tokens. Use an attribute vector for best performance:

field features type weightedset<string> {
    indexing: summary | attribute
    attribute: fast-search
}

Prepare the Query

The query needs to be prepared by a custom searcher or sent using YQL. The code below shows the relevant part. If you use multiple dot products in the same query it is a good idea to label them. This enables us to use individual dot product scores when ranking results later.

Item makeDotProduct(String label, String field, Map<String, Integer> token_map) {
    DotProductItem item = new DotProductItem(field);
    item.setLabel(label);
    for (Map.Entry<String, Integer> entry : token_map.entrySet()) {
        item.addToken(entry.getKey(), entry.getValue());
    }
    return item;
}

Ranking

The dot product operator produces raw scores that may be used in a ranking expression. The simplest approach is to use the sum of all raw scores for the field containing the tokens:

rank-profile default {
    first-phase {
        expression: rawScore(features)
    }
}
For better control, label each dot product in the query and use their scores individually:
rank-profile default {
    first-phase {
        expression: itemRawScore(dp1) + itemRawScore(dp2)
    }
}