Searching for the nearest neighbors of a data point in high dimensional vector space is a
fundamental problem for many real-world applications.
Many real-world applications require query-time constrained vector search.
For example, real-time recommender systems using vector search for candidate retrieval
need to filter recommendations by hard constraints like regional availability or age group suitability.
Likewise, search systems using vector search need to support filtering.
For example, typical e-commerce search interfaces allow users to navigate and filter search results using result facets.
Using Vespa’s nearest neighbor search
The following sections demonstrates how to use the Vespa nearestNeighbor
query operator. To learn how Vespa can create the vectors for you, see embedding.
See also the practical nearest neighbor search guide
for many query examples.
These examples use exact nearest neighbor search with perfect accuracy,
but which is computationally expensive for large document volumes since
the distance metric must be calculated for every document that matches
the boolean query filters.
Exact nearest neighbor search scales close to linearly
with number of threads used per query.
Using multiple threads per search, can be used to make
exact nearest neighbor search run with acceptable serving latency.
Using more threads per search to reduce latency is still costly for larger vector volumes
or high query throughput applications, as the number of distance calculations
involved in the query does not change by changing number of threads performing the search.
A cost-efficient approach is to instead of exact search, using approximate search.
See how to use approximate nearest neighbor search with HNSW in
the Approximate Nearest Neighbor Search document.
The following Vespa document schema is used to illustrate Vespa’s support for
vector search, or nearest neighbor search:
The product document schema has 4 fields. Fields like popularity are self-explanatory but the
fields of tensor type needs more explanation.
The schema defines two first-order dense tensor types (vectors):
image_embedding and text_embedding. <float> specifies the
tensor value type and d0 denotes the name of the dimension.
The image_embedding field is used to store a dense vectorized embedding representation
of the product image and which use angular
as distance-metric. See for example text to image search using CLIP with Vespa.
The rank-profile specifies the query input tensor names and types. The query input tensors
must be of the same dimensionality as the document vector and have the same dimension name.
Skipping the query tensor definition will cause a query time error:
Expected a tensor value of 'query(query_embedding)' but has [...]
The closeness(field, image_embedding) is a rank-feature calculated
by the nearestNeighbor query operator.
The closeness rank feature calculates a score in the range [0, 1], where 0 is infinite distance,
and 1 is zero distance. This is convenient because Vespa sorts hits by decreasing relevancy score,
and one usually want the closest hits to be ranked highest.
The first-phase is part of Vespa’s phased ranking support. In this example
the closeness feature is re-used and documents are not re-ordered.
Phased ranking enables re-ranking of the top-k best scoring hits as ranked
or retrieved from the previous ranking phase. The computed ranking score is rendered as relevance in
the default Vespa JSON result format. If the relevance field
of the hit becomes 0.0 one usually have forgotten to specify the correct ranking profile.
An example of a rank-profile also specifying an additional re-ranking phase:
In this case, hits retrieved by the nearest neighbor search operator are re-scored also using
the product’s popularity as a signal. The value of the popularity field can be read by the
attribute(popularity) rank-feature. The second-phaseranking expression
combines the popularity with the closeness(field, image_embedding) rank-feature using multiplication.
Indexing product data
After deploying the application package with the document schema, one
can index the product data using the
Vespa JSON feed format.
In the example below there are two documents,
the vector fields are using indexed tensor short form.
The above JSON formatted data can be fed to Vespa using any of the
Vespa feeding apis.
Querying using nearestNeighbor query operator
To query the product dataset one uses the
nearestNeighbor query operator.
The operator expects two arguments; the document tensor field which is searched and the input query tensor name.
The targetHits
query annotation specifies the number of results that one wants to expose to first-phase
ranking per content node involved in the query.
targetHits is a required parameter and the query will fail if not specified.
The targetHits is a lower bound per content node,
and with exact search more hits than targetHits are exposed to first-phase ranking.
The query tensor is sent as a query tensor input
and the query tensor name is referenced in the second argument of the nearestNeighbor operator.
In the following example, the nearestNeighbor operator
is used to recommend similar products based on image similarity.
For a given image (e.g. a product image shown in the product search result page) one can find
products which have a similar product image.
The search for nearest neighbors in the CLIP image embedding space is limited to products
where in_stock is true.
{
"yql": "select * from product where {targetHits: 100}nearestNeighbor(image_embedding, image_query_embedding) and in_stock = true",
"input.query(image_query_embedding)": [
0.22507139604882176,
0.11696498718517367,
0.9418422036734729,
...
],
"ranking.profile": "image_similarity",
"hits": 10
}
The YQL query uses logical conjunction and to filter the nearest neighbors
by a constraint on the in_stock field.
The targetHits specifies
the number of hits one want to expose to ranking. The query request
also specifies hits, which determines how
many hits are returned to the client using the JSON result format.
The total number of hits which is ranked by the ranking profile
depends on the query filters and how fast the nearest neighbor search algorithm converges (for exact search).
The ranking.profile parameter
controls which profile is used. In this case, it simply ranks documents based on how close they are in the CLIP embedding space.
Using Nearest Neighbor from a Searcher Component
As with all query operators in Vespa, one can also build the query tree programmatically
in a custom searcher component.
The following example executes the incoming query and fetch the tensor of the top ranking hit
ranked by whatever rank-profile was specified in the search request.
The image_embedding tensor from the top ranking hit is read and used as the input
for the nearestNeighbor query operator
to find related products (using the rank profile described in earlier sections).
packageai.vespa.example.searcher;importcom.yahoo.prelude.query.NearestNeighborItem;importcom.yahoo.search.Query;importcom.yahoo.search.Result;importcom.yahoo.search.Searcher;importcom.yahoo.search.result.Hit;importcom.yahoo.search.searchchain.Execution;importcom.yahoo.tensor.Tensor;publicclassFeedbackSearcherextendsSearcher{@OverridepublicResultsearch(Queryquery,Executionexecution){Resultresult=execution.search(query);if(result.getTotalHitCount()==0)returnresult;execution.fill(result);// fetch the summary data of the hits if neededHitbestHit=result.hits().get(0);TensorclipEmbedding=(Tensor)bestHit.getField("image_embedding");QueryrelatedQuery=newQuery();relatedQuery.getRanking().setProfile("image_similarity");relatedQuery.getRanking().getFeatures().put("query(image_query_embedding)",clipEmbedding);NearestNeighborItemnn=newNearestNeighborItem("image_embedding","image_query_embedding");nn.setTargetNumHits(10);relatedQuery.getModel().getQueryTree().setRoot(nn);returnexecution.search(relatedQuery);}}
It is also possible to embed the vectorization components in Vespa using
stateless ML model evaluation.
Using binary embeddings with hamming distance
The following packs a 128 bit embedding representation into a 16 dimensional dense tensor
using int8 tensor value precision (16 x 8 = 128 bit):
Hamming distance search over binary vectors is implemented with xor and pop count cpu instructions.
The rank-profile specifies
num-threads-per-search to
reduce serving latency (but not cost).