Nearest neighbor search, or vector search, is a technique used to find the closest data points to a given query point in a high-dimensional vector space. This is supported in Vespa using the nearestNeighbor query operator. This operator can be combined with other filters or query terms using the Vespa query language, making it easy to create hybrid solutions that combine modern vector based techniques with traditional information retrieval.
Also try pyvespa examples.
A nearest neighbor search has at least these components: a document vector, a query vector,
a rank profile using closeness()
and a query with the nearestNeighbor
operator:
# Schema definition of the vector in documents document doc { field d_vector type tensor<float>(d[3]) { indexing: summary | attribute } } # Rank profile definition in schema: # - using the closeness() rank feature # - defining the q_vector type that must match the d_vector type rank-profile rank_docs inherits default { inputs { query(q_vector) tensor<float>(d[3]) } first-phase { expression: closeness(field, d_vector) } } # A query with # - a nearestNeighbor operator with document and query vectors # - selecting the rank_docs rank profile $ vespa query 'select * from docs where {targetHits: 3}nearestNeighbor(d_vector, q_vector)' \ ranking=rank_docs \ 'input.query(q_vector)'='[1,2,3]' # Documents with vectors { "put": "id:mynamespace:music::a-head-full-of-dreams", "fields": { "d_vector": [0,1,2] } }
The nearestNeighbor
query operator will calculate values
used by the closeness() rank feature.
field
, d_vector)
means that the closeness rank feature shall use the d_vector field.
nearestNeighbor
operators,
so the closeness() rank feature refers to the different operators (using different fields)
See other examples of using closeness(label
, q)
in the nearest neighbor search guide.
Read more in this guide on tensor types, distance metrics, rank profiles and approximate nearest neighbor search.
A vector is represented by a tensor with one indexed dimension. Example tensor type representing a float vector with 384 dimensions:
tensor<float>(x[384])
Document vectors are stored in a tensor field defined in the document schema. A tensor type (dense) with one indexed dimension stores a single vector per document:
field doc_embedding type tensor<float>(x[384]) { indexing: attribute }
A tensor type (mixed) with one or more mapped dimensions and one indexed dimension stores multiple vectors per document:
field doc_embeddings type tensor<float>(m{},x[384]) { indexing: attribute }
Similarly, the type of a query vector is defined in a rank-profile:
rank-profile my_profile { inputs { query(query_embedding) tensor<float>(x[384]) } ... }
This all ties together with the nearestNeighbor query operator
that expects two arguments; the document tensor field name which is searched and the input query tensor name.
The operator finds the documents that are closest to the query vector
using the distance-metric defined in the tensor field.
Note that the document schema can have multiple tensor fields storing vectors,
and the query can have multiple nearestNeighbor
operators searching different tensor fields.
Support for using the nearestNeighbor
operator with a mixed tensor with multiple vectors per document
is available in Vespa 8.144.19
.
To learn how Vespa can create the vectors for you, see embedding.
The following sections demonstrates how to use the nearestNeighbor query operator.
These examples use exact nearest neighbor search with perfect accuracy, but which is computationally expensive for large document volumes since the distance metric must be calculated for every document that matches the boolean query filters. See also the practical nearest neighbor search guide for more examples.
Exact nearest neighbor search scales close to linearly with number of threads used per query. This can be used to make exact nearest neighbor search run with acceptable serving latency. Using more threads per search to reduce latency is still costly for larger vector volumes or high query throughput applications, as the number of distance calculations involved in the query does not change by changing number of threads performing the search.
A cost-efficient approach is to use approximate search instead.
See how to use approximate nearest neighbor search with HNSW
in
the Approximate Nearest Neighbor Search document.
The following document schema is used to illustrate Vespa's support for vector search, or nearest neighbor search:
schema product { document product { field in_stock type bool { indexing: summary | attribute rank: filter attribute: fast-search } field popularity type float { indexing: summary | attribute } field text_embedding type tensor<float>(x[384]) { indexing: summary | attribute attribute { distance-metric: prenormalized-angular } } field image_embeddings type tensor<float>(i{},x[512]) { indexing: summary | attribute attribute { distance-metric: angular } } } }
The product
document schema has 4 fields.
The fields of type tensor represent vector embeddings:
text_embedding
- float vector with 384 dimensions.image_embeddings
- multiple float vectors with 512 dimensions.The text_embedding
field stores a dense vectorized embedding representation
of the product description and which use prenormalized-angular
as distance-metric
. See for example
Dense Retrieval using bi-encoders over Transformer models.
The image_embeddings
field stores multiple dense vectorized embedding representations
of the product images and which use angular
as distance-metric
.
See for example text to image search using CLIP with Vespa.
Vespa supports six different distance-metrics:
euclidean
angular
dotproduct
prenormalized-angular
hamming
geodegrees
The distance-metric is a property of the field.
This is obvious when index
is set on the vector field,
as the distance metric is used when building the index structure.
Without index
, no extra data structures are built for the field,
and the distance-metric setting is used when calculating the distance at query-time.
It still makes sense to have the metric as a field-property,
as the field values are often produced using a specific distance metric.
Lastly, one need to configure how to rank products which are retrieved by the nearest neighbor search:
rank-profile semantic_similarity { inputs { query(query_embedding) tensor<float>(x[384]) } first-phase { expression: closeness(field, text_embedding) } } rank-profile image_similarity { inputs { query(image_query_embeddings) tensor<float>(x[512]]) } first-phase { expression: closeness(field, image_embeddings) } }
The rank-profile
specifies the query input tensor names and types. The query input tensors
must be of the same dimensionality as the document vector and have the same dimension name.
Skipping the query tensor definition will cause a query time error:
Expected a tensor value of 'query(query_embedding)' but has [...]
The closeness(field, text_embedding)
is a rank-feature
calculated by the nearestNeighbor query operator.
This calculates a score in the range [0, 1], where 0 is infinite distance,
and 1 is zero distance. This is convenient because Vespa sorts hits by decreasing relevancy score,
and one usually want the closest hits to be ranked highest.
The closeness(field, image_embeddings)
rank-feature
operators over a tensor field that stores multiple vectors per document.
For each document, the vector that is closest to the query vector is used in the calculation.
The first-phase
is part of Vespa's phased ranking support.
Phased ranking enables re-ranking of the top-k best scoring hits as ranked
or retrieved from the previous ranking phase. The computed ranking score is rendered as relevance
in
the default Vespa JSON result format. If the relevance
field
of the hit becomes 0.0 one usually have forgotten to specify the correct ranking profile.
An example of a rank-profile
also specifying an additional re-ranking phase:
rank-profile image_similarity_with_reranking { inputs { query(image_query_embedding) tensor<float>(x[512]]) } first-phase { expression: closeness(field, image_embeddings) } second-phase { rerank-count: 1000 expression: closeness(field, image_embeddings) * attribute(popularity) } }
In this case, hits retrieved by the
nearestNeighbor
query operator are re-scored also using
the product's popularity as a signal. The value of the popularity
field can be read by the
attribute(popularity)
rank-feature. The second-phase
ranking expression
combines the popularity with the closeness(field, image_embeddings)
rank-feature using multiplication.
After deploying the application package with the document schema, one can index the product data using the Vespa JSON feed format.
In the example below there are two documents. The vector embedding fields are using indexed tensor short form and mixed tensor short form:
[
{
"put": "id:shopping:product::998211",
"fields": {
"in_stock": true,
"popularity": 0.342,
"text_embedding": [
0.16766378547490635,
0.3737005826272204,
0.040492891373747675,
..
],
"image_embeddings": {
"0": [
0.9147281579191466,
0.5453696694173961,
0.7529545687063771,
..
],
"1": [0.3737005826272204, ..]
}
}
},
{
"put": "id:shopping:product::97711",
"fields": {
"in_stock": false,
"popularity": 0.538,
"text_embedding": [
0.03515862084651311,
0.24585168798559187,
0.6123057708571111,
..
],
"image_embeddings": {
"0": [
0.9785931815169806,
0.5697209315543527,
0.5352198004501647,
..
],
"1": [0.24585168798559187, ..]
}
}
}
]
The above JSON formatted data can be fed to Vespa using any of the Vespa feeding APIs.
The nearestNeighbor query operator is used to search the product dataset. The operator expects two arguments; the document tensor field which is searched and the input query tensor name.
The targetHits
query annotation specifies the number of results to expose to first-phase
ranking per content node involved in the query.
targetHits
is a required parameter and the query will fail if not specified.
The targetHits
is a lower bound per content node,
and with exact search more hits than targetHits
are exposed to first-phase
ranking.
The query tensor is sent as a query input
and the query tensor name is referenced in the second argument of the nearestNeighbor
operator.
In the following example, the nearestNeighbor
operator
is used to recommend similar products based on image similarity.
For a given image (e.g. a product image shown in the product search result page) one can find
products which have similar product images.
Note that the nearest neighbors search is limited to products where in_stock
is true
.
The overall query is specified using the Vespa query language using the Query API:
{
"yql": "select * from product where {targetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true",
"input.query(image_query_embedding)": [
0.22507139604882176,
0.11696498718517367,
0.9418422036734729,
..
],
"ranking.profile": "image_similarity",
"hits": 10
}
The YQL query uses logical conjunction and
to filter the nearestNeighbor
by a constraint on the in_stock
field.
The query request also specifies hits, which determines how many hits are returned to the client using the JSON result format.
The total number of hits which is ranked by the ranking profile depends on the query filters and how fast the nearest neighbor search algorithm converges (for exact search).
The ranking.profile parameter controls which ranking profile is used. In this case, it simply ranks documents based on how close they are in the CLIP embedding space.
As with all query operators in Vespa, one can build the query tree programmatically
in a custom searcher component.
See
Centroids
in Billion-Scale Image Search
for an example of how the NearestNeighborItem
is used.
The following packs a 128 bit embedding representation into a 16 dimensional dense tensor
using int8
tensor value precision (16 x 8 = 128 bit):
document vector { field vector type tensor<int8>(x[16]) { indexing: summary | attribute attribute { distance-metric: hamming } } } rank-profile hamming-nn { num-threads-per-search: 12 first-phase { expression: closeness(field,vector) } }
Hamming distance search over binary vectors is implemented with xor and pop count cpu instructions. The rank-profile specifies num-threads-per-search to reduce serving latency (but not cost).
See the Billion Scale Vector Search with Vespa blog post for a detailed introduction to using binary vectors with hamming distance.