This guide is a practical introduction to using Vespa nearest neighbor search query operator and how to combine nearest neighbor search with other Vespa query operators. The guide uses Vespa's embedding support to map text to vectors. The guide also covers diverse, efficient candidate retrievers which can be used as candidate retrievers in a multi-phase ranking funnel.
The guide uses the Last.fm tracks dataset for illustration. Latency numbers mentioned in the guide are obtained from running this guide on a M1. See also the generic Vespa performance - a practical guide.
This guide covers the following:
The guide includes step-by-step instructions on how to reproduce the experiments.
Prerequisites:
NO_SPACE
- the vespaengine/vespa container image + headroom for data requires disk space.
Read more.
curl
to download the dataset.This tutorial uses Vespa-CLI, the official command-line client for Vespa.ai. It is a single binary without any runtime dependencies and is available for Linux, macOS and Windows:
$ brew install vespa-cli
This guide uses the Last.fm tracks dataset. Note that the dataset is released under the following terms:
Research only, strictly non-commercial. For details, or if you are unsure, please contact Last.fm. Also, Last.fm has the right to advertise and refer to any work derived from the dataset.
To download the dataset execute the following (120 MB zip file):
$ curl -L -o lastfm_test.zip \ http://millionsongdataset.com/sites/default/files/lastfm/lastfm_test.zip $ unzip lastfm_test.zip
The downloaded data must to be converted to the Vespa JSON feed format.
This python script can be used to traverse
the dataset files and create a JSONL formatted feed file with Vespa put operations.
The [schema)(schemas.html) is covered in the next section.
The number of unique tags
is used as a proxy for the popularity of the track.
Process the dataset and convert it to Vespa JSON document operation format.
$ python3 create-vespa-feed.py lastfm_test > feed.jsonl
A Vespa application package is the set of configuration files and Java plugins that together define the behavior of a Vespa system: what functionality to use, the available document types, how ranking will be done, and how data will be processed during feeding and indexing.
The minimum required files to create the basic search application are track.sd
and services.xml
.
Create directories for the configuration files and embedding model:
$ mkdir -p app/schemas; mkdir -p app/search/query-profiles/; mkdir -p app/model
A schema is a configuration of a document type and additional synthetic fields and ranking configuration.
For this application, we define a track
document type.
Write the following to app/schemas/track.sd
:
schema track { document track { field track_id type string { indexing: summary | attribute match: word } field title type string { indexing: summary | index index: enable-bm25 } field artist type string { indexing: summary | index } field tags type weightedset<string> { indexing: summary | attribute attribute: fast-search } field popularity type int { indexing: summary | attribute attribute: fast-search rank: filter } } field embedding type tensor<float>(x[384]) { indexing: input title | embed e5 |attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 32 neighbors-to-explore-at-insert: 200 } } } fieldset default { fields: title, artist } document-summary track_id { summary track_id { } } rank-profile tags { first-phase { expression: rawScore(tags) } } rank-profile bm25 { first-phase { expression: bm25(title) } } rank-profile closeness { num-threads-per-search: 1 match-features: distance(field, embedding) inputs { query(q) tensor<float>(x[384]) query(q1) tensor<float>(x[384]) } first-phase { expression: closeness(field, embedding) } } rank-profile closeness-t4 inherits closeness { num-threads-per-search: 4 } rank-profile closeness-label inherits closeness { match-features: closeness(label, q) closeness(label, q1) } rank-profile hybrid inherits closeness { inputs { query(wTags) : 1.0 query(wPopularity) : 1.0 query(wTitle) : 1.0 query(wVector) : 1.0 } first-phase { expression { query(wTags) * rawScore(tags) + query(wPopularity) * log(attribute(popularity)) + query(wTitle) * log(bm25(title)) + query(wVector) * closeness(field, embedding) } } match-features { rawScore(tags) attribute(popularity) bm25(title) closeness(field, embedding) distance(field, embedding) } } }
This document schema is explained in the practical search performance guide,
the addition is the embedding
field which is defined as a synthetic field outside of the document. This
field is populated by Vespa's embedding functionality. Using the E5
text embedding model (described in this blog post).
Note that the closeness
rank-profile defines two
query input tensors using inputs.
field embedding type tensor<float>(x[384]) { indexing: input title | embed e5 | attribute | index attribute { distance-metric: angular } index { hnsw { max-links-per-node: 32 neighbors-to-explore-at-insert: 200 } } }
See Approximate Nearest Neighbor Search using HNSW Index
for an introduction to HNSW
and the HNSW
tuning parameters.
The services.xml defines the services that make up
the Vespa application — which services to run and how many nodes per service.
Write the following to app/services.xml
:
<?xml version="1.0" encoding="UTF-8"?> <services version="1.0"> <container id="default" version="1.0"> <search/> <document-api/> <component id="e5" type="hugging-face-embedder"> <transformer-model path="model/e5-small-v2-int8.onnx"/> <tokenizer-model path="model/tokenizer.json"/> </component> </container> <content id="tracks" version="1.0"> <engine> <proton> <tuning> <searchnode> <requestthreads> <persearch>4</persearch> </requestthreads> </searchnode> </tuning> </proton> </engine> <redundancy>1</redundancy> <documents> <document type="track" mode="index"></document> </documents> <nodes> <node distribution-key="0" hostalias="node1"></node> </nodes> </content> </services>
The default query profile can be used to override default query api settings for all queries.
The following enables presentation.timing and
renders weightedset
fields as a JSON maps.
<query-profile id="default"> <field name="presentation.timing">true</field> <field name="renderer.json.jsonWsets">true</field> </query-profile>
The final step is to download embedding model files
$ curl -L -o app/model/e5-small-v2-int8.onnx \ https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx $ curl -L -o app/model/tokenizer.json \ https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/tokenizer.json
The application package can now be deployed to a running Vespa instance. See also the Vespa quick start guide.
Start the Vespa container image using Docker:
$ docker run --detach --name vespa --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa
Starting the container can take a short while. Before continuing, make sure
that the configuration service is running by using vespa status deploy
.
$ vespa config set target local $ vespa status deploy --wait 300
Once ready, deploy the application using vespa deploy
:
$ vespa deploy --wait 300 app
Feed the dataset. During indexing, Vespa will invoke the embedding model (which is relatively computationally expensive), so feeding and indexing this dataset takes about 180 seconds on a M1 laptop (535 inserts/s).
$ vespa feed -t http://localhost:8080 feed.jsonl
The following sections uses the Vespa query api and formulate queries using Vespa query language. The examples uses the vespa-cli command which supports running queries.
The CLI uses the Vespa query api.
Use vespa query -v
to see the curl equivalent:
$ vespa query -v 'yql=select ..'
The first example is searching and ranking using the bm25
rank profile defined in the schema.
It uses the bm25 rank feature as the first-phase
relevance score:
$ vespa query \ 'yql=select artist, title, track_id from track where userQuery()' \ 'query=total eclipse of the heart' \ 'hits=1' \ 'type=all' \ 'ranking=bm25'
This query combines YQL userQuery()
with Vespa's simple query language.
The query type is
using all
, requiring that all the terms match.
The above query example searches for total AND eclipse AND of AND the AND heart
in the default
fieldset, which in the schema includes the title
and artist
fields.
The result for the above query will look something like this:
This query only matched one document because the query terms were AND
ed.
We can change matching to use type=any
instead of the default type=all
. See
supported query types.
$ vespa query \ 'yql=select artist, title, track_id from track where userQuery()' \ 'query=total eclipse of the heart' \ 'hits=1' \ 'ranking=bm25' \ 'type=any'
Now, the query matches 24,053 documents and is considerably slower than the previous query.
Comparing querytime
of these two query examples, the one which matches the most documents have highest querytime
.
In worst case, the search query matches all documents.
Query matching performance is greatly impacted by the number of documents that matches the query specification.
Type any
queries requires more compute resources than type all
.
There is an optimization available for type=any
queries, using
the weakAnd
query operator which implements the WAND algorithm.
See the using wand with Vespa guide for more details.
Run the same query, but instead of type=any
use type=weakAnd
,
see supported query types:
$ vespa query \ 'yql=select artist, title, track_id from track where userQuery()' \ 'query=total eclipse of the heart' \ 'hits=1' \ 'ranking=bm25' \ 'type=weakAnd'
Compared to the type any
query which fully ranked 24,053 documents,
the query only expose about 3,600 documents to the first-phase
ranking expression.
Also notice that the faster search returns the same document at the first position.
$ vespa query \ 'yql=select artist, title, track_id from track where userQuery()' \ 'query="total eclipse of the heart"' \ 'hits=1' \ 'ranking=bm25' \ 'type=weakAnd'
In this case, the query input "total eclipse of the heart" is parsed as a phrase query, and the search only finds 1 document matching the exact phrase.
The previous section introduced the weakAnd
query operator which integrates
with linguistic processing and string matching using match: text
.
The following examples uses the
wand() query operator.
The wand
query operator calculates the maximum inner product
between the sparse query and document feature integer
weights. The inner product ranking score calculated by the wand
query operator
can be used in a ranking expression by the rawScore(name)
rank feature.
rank-profile tags { first-phase { expression: rawScore(tags) } }
This query searches the track document type using a learned sparse userProfile representation,
performing a maximum inner product search over the tags
weightedset field.
$ vespa query \ 'yql=select track_id, title, artist from track where {targetHits:10}wand(tags, @userProfile)' \ 'userProfile={"pop":1, "love songs":1,"romantic":10, "80s":20 }' \ 'hits=2' \ 'ranking=tags'
The query asks for 2 hits to be returned, and uses the tags
rank profile.
The result
for the above query will look something like this:
The wand
query operator exposed a total of about 60 documents to the first-phase
ranking which
uses the rawScore(tag)
rank-feature directly, so the relevancy
is the
result of the sparse dot product between the sparse user profile and the document tags.
The wand
query operator is safe, meaning, it returns the same top-k results as
the brute-force dotProduct
query operator. wand
is a type of query operator which
performs matching and ranking interleaved and skipping documents
which cannot compete into the final top-k results.
See the using wand with Vespa guide for more details on
using wand
and weakAnd
query operators.
Vespa's nearest neighbor search operator supports doing exact brute force nearest neighbor search using dense representations. The first query example uses exact nearest neighbor search and Vespa embed functionality:
$ vespa query \ 'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \ 'hits=1' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Query breakdown:
targetHits:10
) nearest neighbors of the query(q)
query tensor over the embedding
document tensor field.approximate:false
tells Vespa to perform exact search.hits
parameter controls how many results are returned in the response. Number of hits
requested does not impact targetHits
. Notice that targetHits
is per content node involved in the query.ranking=closeness
tells Vespa which rank-profile to score documents. One must
specify how to rank the targetHits
documents retrieved and exposed to first-phase
ranking expression
in the rank-profile
.input.query(q)
is the query vector produced by the embedder.Not specifying ranking will cause Vespa to use nativeRank which does not use the vector similarity, causing results to be randomly sorted.
The above exact nearest neighbor search will return the following result:
The exact search takes approximately 14ms, performing 95,666 distance calculations.
A total of about 101 documents were exposed to the first-phase ranking during the search as can be seen from
totalCount
. The relevance
is the result of the rank-profile
scoring.
It is possible to reduce search latency of the exact search by throwing more CPU resources at it.
Changing the rank-profile to closeness-t4
makes Vespa use four threads per query:
$ vespa query \ 'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \ 'hits=1' \ 'ranking=closeness-t4' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Now, the exact search latency is reduced by using more threads, see multi-threaded searching and ranking for more on this topic.
This section covers using the faster, but approximate, nearest neighbor search. The
track
schema's embedding
field has the index
property, which means Vespa builds a
HNSW
index to support fast, approximate vector search. See
Approximate Nearest Neighbor Search using HNSW Index
for an introduction to HNSW
and the tuning parameters.
The default query behavior is using approximate:true
when the embedding
field has index
:
$ vespa query \ 'yql=select title, artist from track where {targetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \ 'hits=1' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Which returns the following response:
Now, the query is faster, and also uses less resources during the search.
To get latency down to 20 ms with the exact search one had to use 4 matching threads. In this case the
result latency is down to 4ms with a single matching thread.
For this query example, the approximate search returned the exact same top-1 hit and there was
no accuracy loss for the top-1 position. Note that the overall query time is dominated
by the embed
inference.
A few key differences between exact
and approximate
neighbor search:
totalCount
is different, when using the approximate version, Vespa exposes exactly targethits
to the
configurable first-phase
rank expression in the chosen rank-profile
.
The exact search is using a scoring heap during evaluation (chunked distance calculations), and documents which at some time
were put on the top-k heap are exposed to first phase ranking.
The search is approximate and might not return the exact top 10 closest vectors as with exact search. This is a complex tradeoff between accuracy, query performance , and memory usage. See Billion-scale vector search with Vespa - part two for a deep-dive into these trade-offs.
With the support for setting approximate:false|true
a developer can quantify accuracy loss by comparing the
results of exact nearest neighbor search with the results of the approximate search.
By doing so, developers can quantify the recall@k or overlap@k,
and find the right balance between search performance and accuracy. Increasing hnsw.exploreAdditionalHits
improves accuracy (recall@k) at the cost of a slower query.
Vespa allows combining the search for nearest neighbors to be constrained by regular query filters.
In this query example the title
field must contain the term heart
:
$ vespa query \ 'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Which returns the following response:
The query term heart
does in this case not impact the ordering (ranking) of the results, as the
rank-profile used only uses the vector similarity closeness.
When using filtering, it is important for performance reasons that the fields that are included in the filters have
been defined with index
or attribute:fast-search
.
See searching attribute fields.
The optimal performance for combining nearestNeighbor search with filtering, where the query term(s) does not influence ranking, is achieved
using rank: filter
in the schema (See blog post for details):
field popularity type int { indexing: summary | attribute rank: filter attribute: fast-search }
Matching against the popularity field does not influence ranking, and Vespa can use the most efficient posting
list representation. Note that one can still access the value of
the popularity
attribute in ranking expressions.
rank-profile popularity { first-phase { expression: attribute(popularity) } }
In the following example, since the title
field does not have rank: filter
one can instead
flag that the term should not be used by any ranking expression by
using the ranked
query annotation.
The following disables term based ranking and
the matching against the title
field can use the most efficient posting list representation.
$ vespa query \ 'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
In the previous examples, since the rank-profile did only use the closeness rank feature, the matching would not impact the score anyway.
Vespa also allows combining the nearestNeighbor query operator with any other Vespa query operator.
$ vespa query \ 'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
This query example restricts the search to tracks by Bonnie Tyler
with popularity > 20
.
When combining nearest neighbor search with strict filters which matches less than 5 percentage of the total number of documents,
Vespa will instead of searching the HNSW graph, constrained by the filter, fall back to using exact nearest neighbor search.
See Controlling filter behavior for how to adjust the threshold for which strategy that is used.
When falling back to exact search users will observe that totalCount
increases and is higher than targetHits
.
As seen from previous examples, more hits are exposed to the first-phase
ranking expression when using
exact search. When using exact search with filters, the search can also use multiple threads to evaluate the query, which
helps reduce the latency impact.
With strict filters that removes many hits, the hits (nearest neighbors) might not be near in the embedding space, but far, or distant neighbors. Technically, all document vectors are a neighbor of the query vector, but with a varying distance.
With restrictive filters, the neighbors that are returned might be of low quality (far distance).
One way to combat this effect is to use the distanceThreshold
query annotation parameter of the nearestNeighbor
query operator.
The value of the distance
depends on the distance-metric used.
By adding the distance(field,embedding) rank-feature to
the match-features
of the closeness
rank-profiles, it is possible to analyze what distance
could be considered too far.
See match-features reference.
Note that distance of 0 is perfect, while distance of 1 is distant. The distanceThreshold
remove hits that have a higher distance(field, embedding)
than distanceThreshold
. The
distanceThreshold
is applied regardless of performing exact or approximate search.
The following query with a restrictive filter on popularity is used for illustration:
$ vespa query \ 'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \ 'hits=2' \ 'ranking=closeness-t4' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
The above query returns
By using a distanceTreshold
of 0.2, the Eclipse
track will be removed from the result
because it's distance(field, embedding)
is close to 0.5.
$ vespa query \ 'yql=select title, popularity, artist from track where {distanceThreshold:0.2,targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Setting appropriate distanceThreshold
is best handled by supervised learning as
the distance threshold should be calibrated based on the query complexity
and possibly also the feature distributions of the returned top-k hits.
Having the distance
rank feature returned as match-features
,
enables post-processing of the result using a custom
re-ranking/filtering searcher.
The post-processing searcher can analyze the score distributions of the returned top-k hits
(using the features returned with match-features
),
remove low scoring hits before presenting the result to the end user,
or not return any results at all.
In the previous filtering examples the ranking was not impacted by the filters. They were only used to impact recall, not the order of the results. The following examples demonstrate how to perform hybrid retrieval combining the efficient query operators in a single query. Hybrid retrieval can be used as the first phase in a multi-phase ranking funnel, see Vespa's phased ranking.
The first query example combines the nearestNeighbor
operator with the weakAnd
operator,
combining them using logical disjunction (OR
). This type of query enables retrieving
both based on semantic (vector distance) and traditional sparse (exact) matching.
$ vespa query \ 'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
The query combines the sparse weakAnd
and dense nearestNeighbor
query operators
using logical disjunction.
Both query operator retrieves the target number of hits (or more), ranked by its inner
raw score function.
The hits exposed to the configurable first-phase
ranking expression is a combination
of the best hits from the two different retrieval strategies.
The ranking is performed using the hybrid
rank profile which serves as an example
how to combine the different efficient retrievers.
rank-profile hybrid inherits closeness { inputs { query(wTags) : 1 query(wPopularity) : 1 query(wTitle) : 1 query(wVector) : 1 } first-phase { expression { query(wTags) * rawScore(tags) + query(wPopularity) * log(attribute(popularity)) + query(wTitle) * log(bm25(title)) + query(wVector) * closeness(field, embedding) } } match-features { rawScore(tags) attribute(popularity) bm25(title) closeness(field, embedding) } }
The query returns the following result:
The result hits also include match-features which
can be used for feature logging for learning to rank, or to simply
debug the various feature components used to calculate the relevance
score.
In the below query, we lower the weight of the popularity factor by adjusting query(wPopularity)
to 0.1:
$ vespa query \ 'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' \ 'input.query(wPopularity)=0.1'
Which changes the order and a different hit is surfaced at position two:
The following query adds the personalization component using the sparse user profile into the retriever mix.
userProfile={"love songs":1, "love":1,"80s":1}
Which can be used with the wand
query operator to retrieve personalized hits for ranking.
$ vespa query \ 'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({targetHits:10}wand(tags, @userProfile))' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' \ 'input.query(wPopularity)=0.1' \ 'userProfile={"love songs":1, "love":1,"80s":1}'
Now we have new top ranking documents. Notice that totalCount
increases as the
wand
query operator retrieved more hits into first-phase
ranking. Also notice that
the relevance
score changes.
Changing from logical OR
to AND
instead will intersect the result of the two efficient retrievers.
The search for nearest neighbors is constrained to documents that at least match one of
the query terms in the weakAnd
.
$ vespa query \ 'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery()' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
In this case, the documents exposed to ranking must match at least one of the query terms (for WAND to retrieve it). It is also possible to combine hybrid search with filters, this filters both the sparse and dense retrieval on popularity
$ vespa query \ 'yql=select title, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Another interesting approach for hybrid retrieval is to use Vespa's
rank() query operator. The first operand
of the rank()
operator is used for retrieval, and the remaining operands are only used to compute
rank features for those hits retrieved by the first operand.
$ vespa query \ 'yql=select title, artist from track where rank({targetHits:100}nearestNeighbor(embedding,q), userQuery())' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
This query returns 100 documents, since only the first operand of the rank
query operator was used for
retrieval, the sparse userQuery()
representation was only used to calculate sparse
rank features for
the results retrieved by the nearestNeighbor
. Sparse rank features such as bm25(title)
for example.
One can also do this the other way around, retrieve using the sparse representation, and have
Vespa calculate the closeness(field, embedding)
or related rank features for the hits
retrieved by the sparse query representation.
$ vespa query \ 'yql=select title, artist from track where rank(userQuery(),{targetHits:100}nearestNeighbor(embedding,q))' \ 'query=total eclipse of the heart' \ 'type=weakAnd' \ 'hits=2' \ 'ranking=hybrid' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
The weakAnd
query operator exposes more hits to ranking than approximate nearest neighbor search, similar
to the wand
query operator. Generally, using the rank
query operator is more efficient than combining
query retriever operators using or
. See also the
Vespa passage ranking
for complete examples of different retrieval strategies for multi-phase ranking funnels.
One can also use the rank
operator to first retrieve by some filter logic, and compute distance or similarity for the retrieved documents.
$ vespa query \ 'yql=select title, popularity, artist from track where rank(popularity>99,{targetHits:10}nearestNeighbor(embedding,q))' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Queries that only use the nearestNeighbor
operator as the second operand of rank
does not need to add HNSW
indexing, which saves
a lot of indexing and memory resource footprint.
This section looks at how to use multiple nearestNeighbor
query operator instances in the same Vespa query request.
The following Vespa query combines two nearestNeighbor
query operators
using logical disjunction (OR
) and referencing two different
query tensor inputs:
input.query(q)
holding the Total Eclipse Of The Heart query vector.input.query(q1)
holding the Summer of '69 query vector.$ vespa query \ 'yql=select title from track where ({targetHits:10}nearestNeighbor(embedding,q)) or ({targetHits:10}nearestNeighbor(embedding,q1))' \ 'hits=2' \ 'ranking=closeness' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' \ 'input.query(q1)=embed(e5, "Summer of 69")'
The query exposes 20 hits to first phase ranking, as seen from totalCount
. Ten from each nearest neighbor query operator:
Utilizing a combination of various query embeddings within a single query request holds numerous applications, particularly in cases involving shorter queries with inherent ambiguity. In such scenarios, employing query expansion and query rewrites can facilitate retrieval by accommodating multiple interpretations.
One can also use the label query term
annotation when there are multiple nearestNeighbor
operators in the same query
to get the distance or closeness per query vector. Notice we use the closeness-label
rank-profile defined
in the schema:
rank-profile closeness-label inherits closeness { match-features: closeness(label, q) closeness(label, q1) }
$ vespa query \ 'yql=select title from track where ({ label:"q", targetHits:10}nearestNeighbor(embedding,q)) or ({label:"q1",targetHits:10}nearestNeighbor(embedding,q1))' \ 'hits=2' \ 'ranking=closeness-label' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' \ 'input.query(q1)=embed(e5, "Summer of 69")'
The above query annotates the two nearestNeighbor
query operators using
label query annotation.
Note that the previous examples used or
to combine the two operators. Using and
instead, requires
that there are documents that is in both the top-k results. Increasing targetHits
to 500,
finds a few tracks that overlap.
$ vespa query \ 'yql=select title from track where ({label:"q", targetHits:500}nearestNeighbor(embedding,q)) and ({label:"q1",targetHits:500}nearestNeighbor(embedding,q1))' \ 'hits=2' \ 'ranking=closeness-label' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")' \ 'input.query(q1)=embed(e5, "Summer of 69")'
Note that the closeness-label
rank profile
uses closeness(field, embedding)
which in the case of multiple nearest neighbor search operators
uses the maximum score to represent the unlabeled closeness(field,embedding)
. This
can be seen from the relevance
value,
compared with the labeled closeness() rank features.
Vespa also supports having multiple document side embedding fields, which also
can be searched using multiple nearestNeighbor
operators in the query.
field embedding type tensor<float>(x[384]) { indexing: attribute | index attribute { distance-metric: euclidean } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 50 } } } field embedding_two tensor<float>(x[768]) { indexing: attribute | index attribute { distance-metric: euclidean } index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 50 } } }
Vespa allows developers to control how filters are combined with nearestNeighbor query operator, see Query Time Constrained Approximate Nearest Neighbor Search for a detailed description of pre-filtering and post-filtering strategies. The following query examples explore the two query-time parameters which can be used to control the filtering behavior. The parameters are
These parameters can be used per query or configured in the rank-profile in the document schema.
The following query runs with the default setting for ranking.matching.postFilterThreshold which is 1, which means, do not perform post-filtering, use pre-filtering strategy:
$ vespa query \ 'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \ 'hits=2' \ 'ranking=closeness' \ 'ranking.matching.postFilterThreshold=1.0' \ 'ranking.matching.approximateThreshold=0.05' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
The query exposes targetHits
to ranking as seen from the totalCount
. Now, repeating the query, but
forcing post-filtering instead by setting ranking.matching.postFilterThreshold=0.0:
$ vespa query \ 'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \ 'hits=2' \ 'ranking=closeness' \ 'ranking.matching.postFilterThreshold=0.0' \ 'ranking.matching.approximateThreshold=0.05' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
In this case, Vespa will estimate how many documents the filter matches and auto-adjust targethits
internally to a
higher number, attempting to expose the targetHits
to first phase ranking:
The query exposes 16 documents to ranking as can be seen from totalCount
. There are 8420
documents in the collection
that are tagged with the rock
tag, so roughly 8%.
Auto adjusting targetHits
upwards for post-filtering is not always what you want, because it is slower than just retrieving
from the HNSW index without constraints. We can change the
targetHits
adjustment factor with the ranking.matching.targetHitsMaxAdjustmentFactor parameter.
In this case, we set it to 1, which disables adjusting the targetHits
upwards.
$ vespa query \ 'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \ 'hits=2' \ 'ranking=closeness' \ 'ranking.matching.postFilterThreshold=0.0' \ 'ranking.matching.approximateThreshold=0.05' \ 'ranking.matching.targetHitsMaxAdjustmentFactor=1' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
Since we are post-filtering without upward adjusting the targetHits, we end up with fewer hits.
Changing the query to limit to a tag which is less frequent, for example, 90s
, which
matches 1,695 documents or roughly 1.7%, will cause Vespa to fall back to exact search as the estimated filter hit count
is less than the approximateThreshold
.
$ vespa query \ 'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \ 'hits=2' \ 'ranking=closeness' \ 'ranking.matching.postFilterThreshold=0.0' \ 'ranking.matching.approximateThreshold=0.05' \ 'input.query(q)=embed(e5, "Total Eclipse Of The Heart")'
The fallback to exact search will expose more than targetHits
documents to ranking.
Read more about combining filters with nearest neighbor search in
the Query Time Constrained Approximate Nearest Neighbor Search
blog post.
This concludes this tutorial.
The following removes the container and the data:
$ docker rm -f vespa