This guide is a practical introduction to using Vespa nearest neighbor search query operator and how to combine nearest
neighbor search with other Vespa query operators. The guide also covers
diverse, efficient candidate retrievers which can be used as candidate retrievers in a multi-phase ranking funnel.
The guide uses the Last.fm tracks dataset for illustration.
Latency numbers mentioned in the guide are obtained from running this guide on a MacBook Pro x86.
See also the generic Vespa performance - a practical guide.
curl to download the dataset and zstd to decompress published embedding data.
Installing vespa-cli
This tutorial uses Vespa-CLI,
the official command-line client for Vespa.ai.
It is a single binary without any runtime dependencies and is available for Linux, macOS and Windows:
$ brew install vespa-cli
Dataset
This guide uses the Last.fm tracks dataset.
Note that the dataset is released under the following terms:
Research only, strictly non-commercial. For details, or if you are unsure, please contact Last.fm.
Also, Last.fm has the right to advertise and refer to any work derived from the dataset.
To download the dataset directly (About 120 MB zip file):
This python script can be used to traverse
the dataset files and create a JSONL formatted feed file with Vespa put operations.
The schema used with this feed format is introduced in the next section.
The number of unique tags is used as a proxy for the popularity of the track.
import os
import sys
import json
import unicodedata
directory = sys.argv[1]
seen_tracks = set()
def remove_control_characters(s):
return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")
def process_file(filename):
global seen_tracks
with open(filename) as fp:
doc = json.load(fp)
title = doc['title']
artist = doc['artist']
hash = title + artist
if hash in seen_tracks:
return
else:
seen_tracks.add(hash)
track_id = doc['track_id']
tags = doc['tags']
tags_dict = dict()
for t in tags:
k,v = t[0],int(t[1])
tags_dict[k] = v
n = len(tags_dict)
vespa_doc = {
"put": "id:music:track::%s" % track_id,
"fields": {
"title": remove_control_characters(title),
"track_id": track_id,
"artist": remove_control_characters(artist),
"tags": tags_dict,
"popularity": n
}
}
print(json.dumps(vespa_doc))
sorted_files = []
for root, dirs, files in os.walk(directory):
for filename in files:
filename = os.path.join(root, filename)
sorted_files.append(filename)
sorted_files.sort()
for filename in sorted_files:
process_file(filename)
A Vespa application package is the set
of configuration files and Java plugins that together define the behavior of a Vespa system:
what functionality to use, the available document types, how ranking will be done,
and how data will be processed during feeding and indexing.
The minimum required files to create the basic search application are track.sd and services.xml.
Create directories for the configuration files:
A schema is a configuration of a document type and what we should compute over it.
For this application we define a document type called track.
Write the following to app/schemas/track.sd:
This document schema is explained in the practical search performance guide,
the addition is the embedding field and the various closeness rank profiles. Note
that the closeness schema defines the query tensor inputs which needs to be declared to be
used with Vespa’s nearestNeighbor query operator.
field embedding type tensor<float>(x[384]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 50
}
}
}
The services.xml defines the services that make up
the Vespa application — which services to run and how many nodes per service.
Write the following to app/services.xml:
The following sections uses the Vespa query api and
formulate queries using Vespa query language. The examples uses the
vespa-cli command which supports running queries.
The CLI uses the Vespa query api.
Use vespa query -v to see the curl equivalent:
$ vespa query -v 'yql=select ..'
The first example is searching and ranking using the bm25 rank profile defined in the schema.
It uses the bm25 rank feature as the first-phase relevance score:
$ vespa query \
'yql=select artist, title, track_id from track where userQuery()' \
'query=total eclipse of the heart' \
'hits=1' \
'type=all' \
'ranking=bm25'
The above query example searches for total AND eclipse AND of AND the AND heart
in the default fieldset, which in the schema includes the title and artist fields.
The result
for the above query will look something like this:
{"timing":{"querytime":0.007,"summaryfetchtime":0.002,"searchtime":0.01},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":22.590392521579684,"source":"tracks","fields":{"track_id":"TRKLIXH128F42766B6","title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}}]}}
This query only matched one document because the query terms were ANDed.
We can change matching to use type=any instead of the default type=all. See
supported query types.
$ vespa query \
'yql=select artist, title, track_id from track where userQuery()' \
'query=total eclipse of the heart' \
'hits=1' \
'ranking=bm25' \
'type=any'
Now, the query matches 24,053 documents and is considerably slower than the previous query.
Comparing querytime of these two query examples, the one which matches the most documents have highest querytime.
In worst case, the search query matches all documents.
Query matching performance is greatly impacted by the number of documents that matches the query specification.
Type any queries requires more compute resources than type all.
There is an optimization available for type=any queries, using
the weakAnd query operator which implements the WAND algorithm.
See the using wand with Vespa guide for more details.
Run the same query, but instead of type=any use type=weakAnd,
see supported query types:
$ vespa query \
'yql=select artist, title, track_id from track where userQuery()' \
'query=total eclipse of the heart' \
'hits=1' \
'ranking=bm25' \
'type=weakAnd'
Compared to the type any query which fully ranked 24,053 documents,
the query only expose about 3,600 documents to the first-phase ranking expression.
Also notice that the faster search returns the same document at the first position.
$ vespa query \
'yql=select artist, title, track_id from track where userQuery()' \
'query="total eclipse of the heart"' \
'hits=1' \
'ranking=bm25' \
'type=weakAnd'
In this case, the query input “total eclipse of the heart” is parsed as a phrase query, and the
search only finds 1 document matching the exact phrase.
Maximum Inner Product Search using Vespa WAND
The previous section introduced the weakAnd query operator which integrates
with linguistic processing and string matching using match: text.
The following examples uses the
wand() query operator.
The wand query operator calculates the maximum inner product
between the sparse query and document feature integer
weights. The inner product ranking score calculated by the wand query operator
can be used in a ranking expression by the rawScore(name)
rank feature.
This query searches the track document type using a learned sparse userProfile representation,
performing a maximum inner product search over the tags weightedset field.
$ vespa query \
'yql=select track_id, title, artist from track where {targetHits:10}wand(tags, @userProfile)' \
'userProfile={"pop":1, "love songs":1,"romantic":10, "80s":20 }' \
'hits=2' \
'ranking=tags'
The query asks for 2 hits to be returned, and uses the tags rank profile.
The result
for the above query will look something like this:
{"timing":{"querytime":0.051000000000000004,"summaryfetchtime":0.004,"searchtime":0.057},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":66},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/57037bdeb9caadebd8c235e1","relevance":2500.0,"source":"tracks","fields":{"track_id":"TRMIBBE128E078B487","title":"The Rose ***","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/8eb2e19ee627b054113ba4c9","relevance":2344.0,"source":"tracks","fields":{"track_id":"TRKDRVK128F421815B","title":"Nothing's Gonna Change My Love For You","artist":"Glenn Medeiros"}}]}}
The wand query operator exposed a total of about 60 documents to the first-phase ranking which
uses the rawScore(tag) rank-feature directly, so the relevancy is the
result of the sparse dot product between the sparse user profile and the document tags.
The wand query operator is safe, meaning, it returns the same top-k results as
the brute-force dotProduct query operator. wand is a type of query operator which
performs matching and ranking interleaved and skipping documents
which cannot compete into the final top-k results.
See the using wand with Vespa guide for more details on
using wand and weakAnd query operators.
Exact nearest neighbor search
Vespa’s nearest neighbor search operator supports doing exact brute force nearest neighbor search
using dense representations. This guide uses
the sentence-transformers/all-MiniLM-L6-V2
embedding model. Download the pre-generated document embeddings and feed them to Vespa.
The feed file uses partial updates to add the vector embedding.
The following query examples use a static query vector embedding for the
query string Total Eclipse Of The Heart. The query embedding was obtained by the
following snippet using sentence-transformers:
fromsentence_transformersimportSentenceTransformermodel=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')print(model.encode("Total Eclipse Of The Heart").tolist())
$ vespa query \
'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
'hits=1' \
'ranking=closeness' \
"input.query(q)=$Q"
Query breakdown:
Search for ten (targetHits:10) nearest neighbors of the query(q) query tensor over the embedding
document tensor field.
The annotation approximate:false tells Vespa to perform exact search.
The hits parameter controls how many results are returned in the response. Number of hits
requested does not impact targetHits. Notice that targetHits is per content node involved in the query.
ranking=closeness tells Vespa which rank-profile to score documents. One must
specify how to rank the targetHits documents retrieved and exposed to first-phase ranking expression
in the rank-profile.
input.query(q) points to the input query vector.
Not specifying ranking will cause
Vespa to use nativeRank which does not use the vector similarity, causing
results to be randomly sorted.
The above exact nearest neighbor search will return the following
result:
{"timing":{"querytime":0.051,"summaryfetchtime":0.001,"searchtime":0.051},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":118},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897741908119,"source":"tracks","fields":{"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}}]}}
The exact search takes approximately 51ms, performing 95,666 distance calculations.
A total of about 120 documents were exposed to the first-phase ranking during the search as can be seen from
totalCount.
The exact search is using a scoring heap during chunked distance calculations, and documents which at some time
were put on the top-k heap are exposed to first phase ranking. Splitting the vectors into chunks
reduces the computational complexity as one can rule out distance neighbors just by comparing a few
chunks with the lowest scoring vector on the heap.
It is possible to reduce search latency of the exact search by throwing more CPU resources at it.
Changing the rank-profile to closeness-t4 makes Vespa use four threads per query:
$ vespa query \
'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
'hits=1' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
This section covers using the faster, but approximate, nearest neighbor search. The
track schema’s embedding field has the index property, which means Vespa builds a
HNSW index to support fast, approximate vector search. See
Approximate Nearest Neighbor Search using HNSW Index
for an introduction to HNSW and the tuning parameters.
The default query behavior is using approximate:true when the embedding
field has index:
$ vespa query \
'yql=select title, artist from track where {targetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
'hits=1' \
'ranking=closeness' \
"input.query(q)=$Q"
Which returns the following response:
{"timing":{"querytime":0.004,"summaryfetchtime":0.001,"searchtime":0.004},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":10},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897837210658,"source":"tracks","fields":{"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}}]}}
Now, the query is significantly faster, and also uses less resources during the search. To get latency down
to 20 ms with the exact search one had to use 4 matching threads. In this case the
result latency is down to 4ms with a single matching thread.
For this query example, the approximate search returned the exact same top-1 hit and there was
no accuracy loss for the top-1 position.
A few key differences between exact and approximate neighbor search:
totalCount is different, when using the approximate version, Vespa exposes exactly targethits to the
configurable first-phase rank expression in the chosen rank-profile.
The exact search is using a scoring heap during chunked distance calculations, and documents which at some time
were put on the top-k heap are exposed to first phase ranking.
The search is approximate and might not return the exact top 10 closest vectors as with exact search. This
is a complex tradeoff between accuracy, query performance , and memory usage.
See Billion-scale vector search with Vespa - part two
for a deep-dive into these trade-offs.
With the support for setting approximate:false|true a developer can quantify accuracy loss by comparing the
results of exact nearest neighbor search with the results of the approximate search.
By doing so, developers can quantify the recall@k or overlap@k,
and find the right balance between search performance and accuracy. Increasing hnsw.exploreAdditionalHits
improves accuracy (recall@k) at the cost of a slower query.
Combining approximate nearest neighbor search with query filters
Vespa allows combining the search for nearest neighbors to be constrained by regular query filters.
In this query example the title field must contain the term heart:
$ vespa query \
'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
Which returns the following response:
{"timing":{"querytime":0.005,"summaryfetchtime":0.001,"searchtime":0.007},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":55},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897741908119,"source":"tracks","fields":{"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/cb79ca7f404071e95561ca38","relevance":0.5259774715154759,"source":"tracks","fields":{"title":"Heart Of My Heart","artist":"Quest"}}]}}
When using filtering, it is important for performance reasons that the fields that are included in the filters have
been defined with index or attribute:fast-search.
See searching attribute fields.
The optimal performance for pure filtering, where the query term(s) does not influence ranking, is achieved
using rank: filter in the schema.
field popularity type int {
indexing: summary | attribute
rank: filter
attribute: fast-search
}
Matching against the popularity field does not influence ranking and Vespa can use the most efficient posting
list representation. Note that one can still access the value of
the attribute in ranking expressions.
In the following example, since the title field does not have rank: filter one can instead
flag that the term should not be used by any ranking expression by
using the ranked query annotation.
The following disables term based ranking and
the matching against the title field can use the most efficient posting list representation.
$ vespa query \
'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
In the previous examples, since the rank-profile did only use the closeness rank feature,
the matching would not impact the score anyway.
Vespa also allows combining the nearestNeighbor query operator with any other Vespa query operator.
$ vespa query \
'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
In this case restricting the nearest neighbor search to tracks by Bonnie Tyler with popularity > 20.
Strict filters and distant neighbors
When combining nearest neighbor search with strict filters which matches less than 5 percentage of the total number of documents,
Vespa will instead of searching the HNSW graph, constrained by the filter, fall back to using exact nearest neighbor search.
See Controlling filter behavior for how to adjust the threshold for which strategy that is used.
When falling back to exact search users will observe that totalCount increases and is higher than targetHits.
As seen from previous examples, more hits are exposed to the first-phase ranking expression when using
exact search. When using exact search with filters, the search can also use multiple threads to evaluate the query, which
helps reduce the latency impact.
With strict filters that removes many hits, the hits (nearest neighbors) might not be near in the embedding space, but far,
or distant neighbors. Technically, all document vectors are a neighbor of the query,
but with a varying distance, some are close, others are distant.
With strict filters, the neighbors that are returned might be of low quality (far distance).
One way to combat this is to use the distanceThreshold
query annotation parameter of the nearestNeighbor query operator.
The value of the distance depends on the distance-metric used.
By adding the distance(field,embedding) rank-feature to
the match-features of the closeness rank-profiles, it is possible to analyze what distance
could be considered too far.
See match-features reference.
Note that distance of 0 is perfect, while distance of 1 is distant. The distanceThreshold
remove hits that have a higherdistance(field, embedding) than distanceThreshold. The
distanceThreshold is applied regardless of performing exact or approximate search.
The following query with a restrictive filter on popularity is used for illustration:
$ vespa query \
'yql=select matchfeatures, title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
The above query returns
{"timing":{"querytime":0.008,"summaryfetchtime":0.002,"searchtime":0.011},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":63},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897875290117,"source":"tracks","fields":{"matchfeatures":{"distance(field,embedding)":0.6686418170467985},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler","popularity":100}},{"id":"index:tracks/0/3517728cc88356c8ca6de0d9","relevance":0.5005276509131764,"source":"tracks","fields":{"matchfeatures":{"distance(field,embedding)":0.9978916213231626},"title":"Closer To The Heart","artist":"Rush","popularity":100}}]}}
By using a distanceTreshold of 0.7, the Closer To The Heart track will be removed from the result
because it’s distance(field, embedding) is close to 1.
$ vespa query \
'yql=select matchfeatures, title, popularity, artist from track where {distanceThreshold:0.7,targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q"
{"timing":{"querytime":0.008,"summaryfetchtime":0.001,"searchtime":0.011},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897875290117,"source":"tracks","fields":{"matchfeatures":{"distance(field,embedding)":0.6686418170467985},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler","popularity":100}}]}}
Setting appropriate distanceThreshold is best handled by supervised learning as
the distance threshold should be calibrated based on the query complexity
and possibly also the feature distributions of the returned top-k hits.
Having the distance rank feature returned as match-features,
enables post-processing of the result using a custom
re-ranking/filtering searcher.
The post-processing searcher can analyze the score distributions of the returned top-k hits
(using the features returned with match-features),
remove low scoring hits before presenting the result to the end user,
or not return any results at all.
Hybrid sparse and dense retrieval methods with Vespa
In the previous filtering examples the ranking was not impacted by the filters.
They were only used to impact recall, not the order of the results. The following examples
demonstrate how to perform hybrid retrieval combining the efficient query operators in
a single query. Hybrid retrieval can be used as the first phase in a multi-phase ranking funnel, see
Vespa’s phased ranking.
The first query example combines the nearestNeighbor operator with the weakAnd operator,
combining them using logical disjunction (OR). This type of query enables retrieving
both based on semantic (vector distance) and traditional sparse (exact) matching.
$ vespa query \
'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q"
The query combines the sparse weakAnd and the dense nearestNeighbor query operators
using logical disjunction. Both query operator retrieves the target number of hits (or more), ranked by its inner
raw score/distance function.
The hits exposed to the configurable first-phase ranking expression is a combination
of the best hits from the two different retrieval strategies.
The ranking is performed using the following hybrid rank profile which serves as an example
how to combine the different efficient retrievers.
{"timing":{"querytime":0.007,"summaryfetchtime":0.001,"searchtime":0.009000000000000001},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1176},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":123.18970542319387,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":22.590415639472816,"closeness(field,embedding)":0.5992897837210658,"rawScore(tags)":0.0},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/57c74bd2d466b7cafe30c14d","relevance":112.03224663886917,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":12.032246638869161,"closeness(field,embedding)":0.0,"rawScore(tags)":0.0},"title":"Eclipse","artist":"Kyoto Jazz Massive"}}]}
The result hits also include match-features which
can be used for feature logging for learning to rank, or to simply
debug the components in the final score.
In the below query, the weight of the embedding similarity (closeness) is increased by overriding
the query(wVector) weight:
$ vespa query \
'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=40'
Which changes the order and a different hit is surfaced at position two:
{"timing":{"querytime":0.011,"summaryfetchtime":0.001,"searchtime":0.014},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1176},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":146.56200698831543,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":22.590415639472816,"closeness(field,embedding)":0.5992897837210658,"rawScore(tags)":0.0},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/3517728cc88356c8ca6de0d9","relevance":126.74309103465859,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":6.7219852584615865,"closeness(field,embedding)":0.5005276444049249,"rawScore(tags)":0.0},"title":"Closer To The Heart","artist":"Rush"}}]}}
One can also throw the personalization component using the sparse
user profile into the retriever mix. For example having a user profile:
userProfile={"love songs":1, "love":1,"80s":1}
Which can be used with the wand query operator to retrieve personalized hits.
$ vespa query \
'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({targetHits:10}wand(tags, @userProfile))' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=340' \
'userProfile={"love songs":1, "love":1,"80s":1}'
In this case, another document is surfaced at position 2, which have a non-zero personalized score.
Notice that totalCount increases as the wand query operator brought more hits into first-phase ranking.
{"timing":{"querytime":0.014,"summaryfetchtime":0.001,"searchtime":0.017},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":1244},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":326.34894210463517,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":22.590415639472816,"closeness(field,embedding)":0.5992897837210658,"rawScore(tags)":0.0},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/8eb2e19ee627b054113ba4c9","relevance":281.0,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":0.0,"closeness(field,embedding)":0.0,"rawScore(tags)":181.0},"title":"Nothing's Gonna Change My Love For You","artist":"Glenn Medeiros"}}]}}
In the examples above, some of the hits had
"closeness(field,embedding)": 0.0
This means that the hit was not retrieved by the nearestNeighbor operator, similar rawScore(tags) might
also be 0 if the hit was not retrieved by the wand query operator.
It is nevertheless possible to calculate the semantic distance/similarity using
tensor computations for the hits that were not retrieved by the nearestNeighbor
query operator. See also tensor functions.
For example to compute the euclidean distance one can add a
function to the rank-profile:
Changing from logical OR to AND instead will intersect the result of the two efficient retrievers.
The search for nearest neighbors is then constrained to documents which at least matches one of
the query terms in the weakAnd.
$ vespa query \
'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=340'
In this case, the documents exposed to ranking must match at least one of the query terms (for WAND to retrieve it).
It is also possible to combine hybrid search with filters:
$ vespa query \
'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=340'
Another interesting approach for hybrid retrieval is to use Vespa’s
rank() query operator. The first operand
of the rank() operator is used for retrieval, and the remaining operands are only used to compute
rank features for those hits retrieved by the first operand.
$ vespa query \
'yql=select title, matchfeatures, artist from track where rank({targetHits:100}nearestNeighbor(embedding,q), userQuery())' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=340'
This query returns 100 documents, since only the first operand of the rank query operator was used for
retrieval, the sparse userQuery() representation was only used to calculate sparse
rank features for
the results retrieved by the nearestNeighbor. Sparse rank features such as bm25(title) for example.
{"timing":{"querytime":0.01,"summaryfetchtime":0.002,"searchtime":0.015},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":100},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":326.34896241725517,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":22.590435952092836,"closeness(field,embedding)":0.5992897837210658,"rawScore(tags)":0.0},"title":"Total Eclipse Of The Heart","artist":"Bonnie Tyler"}},{"id":"index:tracks/0/3517728cc88356c8ca6de0d9","relevance":276.90138973270746,"source":"tracks","fields":{"matchfeatures":{"attribute(popularity)":100.0,"bm25(title)":6.721990635032981,"closeness(field,embedding)":0.5005276444049249,"rawScore(tags)":0.0},"title":"Closer To The Heart","artist":"Rush"}}]}}
One can also do this the other way around, retrieve using the sparse representation, and have
Vespa calculate the closeness(field, embedding) or related rank features for the hits
retrieved by the sparse query representation.
$ vespa query \
'yql=select title, matchfeatures, artist from track where rank(userQuery(),{targetHits:100}nearestNeighbor(embedding,q))' \
'query=total eclipse of the heart' \
'type=weakAnd' \
'hits=2' \
'ranking=hybrid' \
"input.query(q)=$Q" \
'ranking.features.query(wVector)=340'
The weakAnd query operator exposes more hits to ranking than approximate nearest neighbor search, similar
to the wand query operator. Generally, using the rank query operator is more efficient than combining
query retriever operators using or. See also the
Vespa passage ranking
for complete examples of different retrieval strategies for multi-phase ranking funnels.
Multiple nearest neighbor search operators in the same query
This section looks at how to use multiple nearestNeighbor query operator instances in the same Vespa query request.
First, the query embedding for Total Eclipse Of The Heart:
fromsentence_transformersimportSentenceTransformermodel=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')print(model.encode("Total Eclipse Of The Heart").tolist())
fromsentence_transformersimportSentenceTransformermodel=SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')print(model.encode("Summer of '69").tolist())
The following Vespa query combines two nearestNeighbor query operators
using logical disjunction (OR) and referencing two different
query tensor inputs:
input.query(q) holding the Total Eclipse Of The Heart query vector.
input.query(qa) holding the Summer of ‘69 query vector.
$ vespa query \
'yql=select title from track where ({targetHits:10}nearestNeighbor(embedding,q)) or ({targetHits:10}nearestNeighbor(embedding,qa))' \
'hits=2' \
'ranking=closeness-t4' \
"input.query(q)=$Q" \
"input.query(qa)=$QA"
The above query returns 20 documents to first phase ranking, as seen from totalCount. Ten from each nearest neighbor query operator:
{"timing":{"querytime":0.007,"summaryfetchtime":0.001,"searchtime":0.01},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":20},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897917249415,"source":"tracks","fields":{"title":"Total Eclipse Of The Heart"}},{"id":"index:tracks/0/5b1c2ae1024d88451c2f1c5a","relevance":0.5794361034642413,"source":"tracks","fields":{"title":"Summer of 69"}}]}}
One can also use the label annotation when there are multiple nearestNeighbor operators in the same query
to differentiate which of them produced the match.
$ vespa query \
'yql=select title, matchfeatures from track where ({ label:"q", targetHits:10}nearestNeighbor(embedding,q)) or ({label:"qa",targetHits:10}nearestNeighbor(embedding,qa))' \
'hits=2' \
'ranking=closeness-label' \
"input.query(q)=$Q" \
"input.query(qa)=$QA"
The above query annotates the two nearestNeighbor query operators using
label query annotation. The result include
match-features so one can see which query operator retrieved the document from the
closeness(label, ..) feature output:
{"timing":{"querytime":0.011,"summaryfetchtime":0.001,"searchtime":0.014},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":20},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/f13697952a0d5eaeb2c43ffc","relevance":0.5992897917249415,"source":"tracks","fields":{"matchfeatures":{"closeness(label,q)":0.5992897917249415,"closeness(label,qa)":0.0},"title":"Total Eclipse Of The Heart"}},{"id":"index:tracks/0/5b1c2ae1024d88451c2f1c5a","relevance":0.5794361034642413,"source":"tracks","fields":{"matchfeatures":{"closeness(label,q)":0.0,"closeness(label,qa)":0.5794361034642413},"title":"Summer of 69"}}]}}
Note that the previous examples used or to combine the two operators. Using and instead, requires
that there are documents that is in both the top-k results. Increasing targetHits to 500,
finds 9 tracks that overlap. In this case both closeness labels have a non-zero score.
$ vespa query \
'yql=select title, matchfeatures from track where ({label:"q", targetHits:500}nearestNeighbor(embedding,q)) and ({label:"qa",targetHits:500}nearestNeighbor(embedding,qa))' \
'hits=2' \
'ranking=closeness-label' \
"input.query(q)=$Q" \
"input.query(qa)=$QA"
Which returns the following top two hits. Note that the closeness-label rank profile
uses closeness(field, embedding) which in the case of multiple nearest neighbor search operators
uses the maximum score to represent the unlabeled closeness(field,embedding). This
can be seen from the relevance value, compared with the labeled closeness() rank features.
{"timing":{"querytime":0.015,"summaryfetchtime":0.001,"searchtime":0.017},"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":9},"coverage":{"coverage":100,"documents":95666,"full":true,"nodes":1,"results":1,"resultsFull":1},"children":[{"id":"index:tracks/0/99a2a380cac4830bfee63ae0","relevance":0.5174298300948759,"source":"tracks","fields":{"matchfeatures":{"closeness(label,q)":0.4755796429687308,"closeness(label,qa)":0.5174298300948759},"title":"Summer Of Love"}},{"id":"index:tracks/0/a373d26938a20dbdda8fc7c1","relevance":0.5099393361432658,"source":"tracks","fields":{"matchfeatures":{"closeness(label,q)":0.5099393361432658,"closeness(label,qa)":0.47990179066646654},"title":"Midnight Heartache"}}]}}
Vespa also supports having multiple document side embedding fields, which also
can be searched using multiple nearestNeighbor operators in the query.
field embedding type tensor<float>(x[384]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 50
}
}
}
field embedding_two tensor<float>(x[768]) {
indexing: attribute | index
attribute {
distance-metric: euclidean
}
index {
hnsw {
max-links-per-node: 16
neighbors-to-explore-at-insert: 50
}
}
}
Controlling filter behavior
Vespa allows developers to control how filters are combined with nearestNeighbor query operator, see
Query Time Constrained Approximate Nearest Neighbor Search
for a detailed description of pre-filtering and post-filtering strategies.
The following query examples explore the two query-time parameters
which can be used to control the filtering behavior. The parameters are
These parameters can be used per query or configured in the rank-profile in the
document schema.
The following query runs with the default setting for ranking.matching.postFilterThreshold which is 1, which means,
do not perform post-filtering, use pre-filtering strategy:
$ vespa query \
'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
'hits=2' \
'ranking=closeness' \
'ranking.matching.postFilterThreshold=1.0' \
'ranking.matching.approximateThreshold=0.05' \
"input.query(q)=$Q"
The query exposes targetHits to ranking as seen from the totalCount. Now, repeating the query, but
forcing post-filtering instead by setting ranking.matching.postFilterThreshold=0.0:
$ vespa query \
'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
'hits=2' \
'ranking=closeness' \
'ranking.matching.postFilterThreshold=0.0' \
'ranking.matching.approximateThreshold=0.05' \
"input.query(q)=$Q"
In this case, Vespa will estimate how many documents the filter matches and auto-adjust targethits internally to a
higher number, attempting to expose the targetHits to first phase ranking:
The query exposes 14 documents to ranking as can be seen from totalCount. There are 8420 documents in the collection
that are tagged with the rock tag, so roughly 8%.
Changing to a tag which is less frequent, for example, 90s, which
matches 1,695 documents or roughly 1.7% will cause Vespa to fall back to exact search as the estimated filter hit count
is less than the approximateThreshold.
$ vespa query \
'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
'hits=2' \
'ranking=closeness' \
'ranking.matching.postFilterThreshold=0.0' \
'ranking.matching.approximateThreshold=0.05' \
"input.query(q)=$Q"