Advanced Search Operators

This document describes the characteristics and differences of Vespa's advanced search operators:

A basic description is given for each operator in addition to the following characteristics:

  • Field type: What kind of field types is supported for this operator.
  • Query model: What kind of data is passed down with the query when using this operator.
  • Matching: What is the criteria for a document being a match.
  • Ranking: What kind of ranking is this operator using internally (if any) and what is exposed to the ranking framework.
  • YQL operator: How is this operator used in YQL.
  • Java Query Item: How is this operator used in the Java Query API.

Parallel Wand

Parallel Wand is a search operator that can be used for efficient top-k retrieval. It implements the "Weak AND"/"Weighted AND" algorithm as described by Broder et al in "Efficient query evaluation using a two-level retrieval process", and is an operator that scales adaptively from OR to AND.

Parallel Wand can be used to search for documents where weighted tokens in a field matches a subset of weighted tokens in the query. At the same time it calculates internally the dot product between token weights in the query and the field. Parallel Wand is guaranteed to return the top-k hits according to its internal dot product rank score. It is also optimized for performance when using multiple threads per search in the backend. Take a look at Wand Search Operators reference for more information on how to use this operator.

Field type Weighted set attribute with fast-search. Note: Also supported for regular attribute or index fields, but then with much weaker performance).
Query model Weighted set with {token, weight} pairs.
Matching Documents where the weighted set field contains at least one of the tokens in the query and where the internal dot product score for this document is larger than the worst among the current top-k best hits. This means that typically more than top-k documents are matched and returned for ranking. Is also means that a lot of documents are skipped even though they match several of the tokens in the query because the dot product score is too low. This skipping is what makes Parallel Wand faster than Dot Product Operator in some cases.
Ranking Dot product score between the weights of the matched query tokens and field tokens. This score is available using rawScore or itemRawScore rank features. Note that the top-k best hits are only guaranteed to be returned when using this internal score as the final ranking expression.
YQL operator wand()
Java Query Item WandItem

Tuning target number of hits

When using Parallel Wand via YQL or a Java Searcher plugin you can specify the target for minimum number of hits the operator should produce. As default, set targetNumHits equal to the number of hits you are going to display to the user. If additional second phase ranking with rerank-count is used, do not set targetNumHits less then the configured rank-profile's rerank-count. Note that if you combine Parallel Wand with a ranking expression that does not use its raw score, the tuning of targetNumHits should follow the guide lines given for Vespa Wand.

Tuning Score Threshold

You can also specify a score threshold when using Parallel Wand. The internal dot product score for a document must be larger than scoreThreshold in order to be considered a match. Default value is 0.0.

Dot Product

Dot Product Operator is the brute force equivalent to Parallel Wand. They are both used to search for documents where weighted tokens in a field matches a subset of weighted tokens in the query. They also produce the exact same dot product score. In some simple cases Dot Product Operator is preferable to Parallel Wand. Take a look at Dot Product Search Operator reference for more information on these use cases.

Field type Weighted set attribute with fast-search. Note: Also supported for regular attribute or index fields, but then with much weaker performance).
Query model Weighted set with {token, weight} pairs
Matching Documents where the weighted set field contains at least one of the tokens in the query.
Ranking Dot product score between the weights of the matched query tokens and field tokens. This score is available using rawScore or itemRawScore rank features.
YQL operator dotProduct()
Java Query Item DotProductItem

Weighted Set

Weighted Set Operator is used to search for documents where all tokens in the searched field will be reverse matched against the tokens of the weighted set in the query. This means that using a Weighted Set Operator to search a single-value attribute field will have similar semantics to using a normal term to search a weighted set field. Take a look at Weighted Set Search Operator reference for more information on how to use this operator.

Field type Single value or multi-value attribute or index field. (Note: Most use cases operates on a single value field).
Query model Weighted set with {token, weight} pairs.
Matching Documents where the field contains at least one of the tokens in the query.
Ranking The operator will act as a single term in the back-end. The query term weight is the weight assigned to the operator itself and the match weight is the largest weight among matching tokens from the weighted set. This operator does not produce a raw score. Due to better ranking and performance we recommend using the Dot Product Operator instead.
YQL operator weightedSet()
Java Query Item WeightedSetItem

Vespa Wand

Vespa Wand is a search operator that also implements the "Weak AND"/"Weighted AND" algorithm. Unlike Parallel Wand, Vespa Wand can be used to search across several fields of various types, but it does NOT guarantee to return the top-k best number of hits. It can however be combined with any ranking expression, but keep in mind that this expression should correlate with its simple internal ranking score that uses query term weight and inverse document frequency for matching terms.

Field type Multiple fields of all types (both attribute and index).
Query model Arbitrary number of query items searching across different fields.
Matching Documents that matches at least one of the tokens in the query and where the internal operator score for this document is larger than the worst among the current top-k best hits. As with Parallel Wand, this means that typically more than top-k documents are matched and a lot of documents are skipped.
Ranking Internal ranking score based on query term weight and inverse document frequency for matching terms to find the top-k hits. This score is currently not available to the ranking framework. Matching terms are exposed to the ranking framework (same as when using AND or OR), so an arbitrary ranking expression can be used in combination with this operator. Note that the ranking expression used should correlate with this internal ranking score. nativeFieldMatch and nativeDotProduct are good starting points.
YQL operator weakAnd()
Java Query Item WeakAndItem

Tuning target number of hits

When using Vespa Wand via YQL or a Java Searcher plugin you can specify the target for minimum number of hits the operator should produce. The effect of tuning targetNumHits may not be very intuitive. To ensure that you get the best hits possible with a Vespa Wand set the target number somewhat higher than the number of hits displayed to the user; setting it 10 times higher should be more than enough. The reason for increasing the target number is that Vespa Wand uses a very simple ranking function internally to filter away bad hits. If your normal rank expression correlates poorly with this internal filtering formula, you need to increase the target number. The easiest is probably to use the default number (100) and see if that gives good results, only tuning if you see a problem. Also note that if your rank correlates poorly with the filtering criteria, Vespa Wand may not be the appropriate operator.

Selecting a rank function

Anything similar to classic vector model ranking will work well with Vespa Wand. We suggest using the nativeFieldMatch or nativeDotProduct feature as a starting point. Note that because Vespa Wand relies on feedback identifying which hits are used for first phase ranking to increase its threshold for what's considered a good hit, the special unranked rank profile (which turns off ranking completely) may cause Vespa Wand queries to become slower than a more normal ranking.