Nearest Neighbor Search

Nearest neighbor search, or vector search, is a technique used to find the closest data points to a given query point in a high-dimensional vector space. This is supported in Vespa using the nearestNeighbor query operator. This operator can be combined with other filters or query terms using the Vespa query language, making it easy to create hybrid solutions that combine modern vector based techniques with traditional information retrieval.

Also try pyvespa examples.

Minimal example

A nearest neighbor search has at least these components: a document vector, a query vector, a rank profile using closeness() and a query with the nearestNeighbor operator:

# Schema definition of the vector in documents
    document doc {

        field d_vector type tensor<float>(d[3]) {
            indexing: summary | attribute
        }

    }

# Rank profile definition in schema:
#  - using the closeness() rank feature
#  - defining the q_vector type that must match the d_vector type
    rank-profile rank_docs inherits default {
        inputs {
            query(q_vector) tensor<float>(d[3])
        }
        first-phase {
            expression: closeness(field, d_vector)
        }
    }

# A query with
#  - a nearestNeighbor operator with document and query vectors
#  - selecting the rank_docs rank profile
$ vespa query 'select * from docs where {targetHits: 3}nearestNeighbor(d_vector, q_vector)' \
  ranking=rank_docs \
  'input.query(q_vector)'='[1,2,3]'

# Documents with vectors
{
    "put": "id:mynamespace:music::a-head-full-of-dreams",
    "fields": {
        "d_vector": [0,1,2]
    }
}

The nearestNeighbor query operator will calculate values used by the closeness() rank feature.

Read more in this guide on tensor types, distance metrics, rank profiles and approximate nearest neighbor search.

Vectors

A vector is represented by a tensor with one indexed dimension. Example tensor type representing a float vector with 384 dimensions:

tensor<float>(x[384])

Document vectors are stored in a tensor field defined in the document schema. A tensor type (dense) with one indexed dimension stores a single vector per document:

field doc_embedding type tensor<float>(x[384]) {
    indexing: attribute
}

A tensor type (mixed) with one or more mapped dimensions and one indexed dimension stores multiple vectors per document:

field doc_embeddings type tensor<float>(m{},x[384]) {
    indexing: attribute
}

Similarly, the type of a query vector is defined in a rank-profile:

rank-profile my_profile {
    inputs {
        query(query_embedding) tensor<float>(x[384])
    }
    ...
}

This all ties together with the nearestNeighbor query operator that expects two arguments; the document tensor field name which is searched and the input query tensor name. The operator finds the documents that are closest to the query vector using the distance-metric defined in the tensor field. Note that the document schema can have multiple tensor fields storing vectors, and the query can have multiple nearestNeighbor operators searching different tensor fields.

Support for using the nearestNeighbor operator with a mixed tensor with multiple vectors per document is available in Vespa 8.144.19 .

To learn how Vespa can create the vectors for you, see embedding.

The following sections demonstrates how to use the nearestNeighbor query operator.

These examples use exact nearest neighbor search with perfect accuracy, but which is computationally expensive for large document volumes since the distance metric must be calculated for every document that matches the boolean query filters. See also the practical nearest neighbor search guide for more examples.

Exact nearest neighbor search scales close to linearly with number of threads used per query. This can be used to make exact nearest neighbor search run with acceptable serving latency. Using more threads per search to reduce latency is still costly for larger vector volumes or high query throughput applications, as the number of distance calculations involved in the query does not change by changing number of threads performing the search.

A cost-efficient approach is to use approximate search instead. See how to use approximate nearest neighbor search with HNSW in the Approximate Nearest Neighbor Search document.

The following document schema is used to illustrate Vespa's support for vector search, or nearest neighbor search:

schema product {

    document product {

        field in_stock type bool {
            indexing: summary | attribute
            rank: filter
            attribute: fast-search
        }

        field popularity type float {
            indexing: summary | attribute
        }

        field text_embedding type tensor<float>(x[384]) {
            indexing: summary | attribute
            attribute {
                distance-metric: prenormalized-angular
            }
        }

        field image_embeddings type tensor<float>(i{},x[512]) {
            indexing: summary | attribute
            attribute {
                distance-metric: angular
            }
        }

    }

}

The product document schema has 4 fields. The fields of type tensor represent vector embeddings:

  • text_embedding - float vector with 384 dimensions.
  • image_embeddings - multiple float vectors with 512 dimensions.

The text_embedding field stores a dense vectorized embedding representation of the product description and which use prenormalized-angular as distance-metric. See for example Dense Retrieval using bi-encoders over Transformer models.

The image_embeddings field stores multiple dense vectorized embedding representations of the product images and which use angular as distance-metric. See for example text to image search using CLIP with Vespa.

Vespa supports six different distance-metrics:

  • euclidean
  • angular
  • dotproduct
  • prenormalized-angular
  • hamming
  • geodegrees

The distance-metric is a property of the field. This is obvious when index is set on the vector field, as the distance metric is used when building the index structure. Without index, no extra data structures are built for the field, and the distance-metric setting is used when calculating the distance at query-time. It still makes sense to have the metric as a field-property, as the field values are often produced using a specific distance metric.

Lastly, one need to configure how to rank products which are retrieved by the nearest neighbor search:

rank-profile semantic_similarity {
    inputs {
        query(query_embedding) tensor<float>(x[384])
    }
    first-phase {
        expression: closeness(field, text_embedding)
    }
}

rank-profile image_similarity {
    inputs {
        query(image_query_embeddings) tensor<float>(x[512]])
    }
    first-phase {
        expression: closeness(field, image_embeddings)
    }
}

The rank-profile specifies the query input tensor names and types. The query input tensors must be of the same dimensionality as the document vector and have the same dimension name.

Skipping the query tensor definition will cause a query time error:

Expected a tensor value of 'query(query_embedding)' but has [...]

The closeness(field, text_embedding) is a rank-feature calculated by the nearestNeighbor query operator. This calculates a score in the range [0, 1], where 0 is infinite distance, and 1 is zero distance. This is convenient because Vespa sorts hits by decreasing relevancy score, and one usually want the closest hits to be ranked highest.

The closeness(field, image_embeddings) rank-feature operators over a tensor field that stores multiple vectors per document. For each document, the vector that is closest to the query vector is used in the calculation.

The first-phase is part of Vespa's phased ranking support. Phased ranking enables re-ranking of the top-k best scoring hits as ranked or retrieved from the previous ranking phase. The computed ranking score is rendered as relevance in the default Vespa JSON result format. If the relevance field of the hit becomes 0.0 one usually have forgotten to specify the correct ranking profile.

An example of a rank-profile also specifying an additional re-ranking phase:

rank-profile image_similarity_with_reranking {
    inputs {
        query(image_query_embedding) tensor<float>(x[512]])
    }
    first-phase {
        expression: closeness(field, image_embeddings)
    } 
    second-phase {
        rerank-count: 1000
        expression: closeness(field, image_embeddings) * attribute(popularity)
    }
}

In this case, hits retrieved by the nearestNeighbor query operator are re-scored also using the product's popularity as a signal. The value of the popularity field can be read by the attribute(popularity) rank-feature. The second-phase ranking expression combines the popularity with the closeness(field, image_embeddings) rank-feature using multiplication.

Indexing product data

After deploying the application package with the document schema, one can index the product data using the Vespa JSON feed format.

In the example below there are two documents. The vector embedding fields are using indexed tensor short form and mixed tensor short form:

[
    {
        "put": "id:shopping:product::998211",
        "fields": {
            "in_stock": true,
            "popularity": 0.342,
            "text_embedding": [
                0.16766378547490635,
                0.3737005826272204,
                0.040492891373747675,
                ..
            ],
            "image_embeddings": {
                "0": [
                    0.9147281579191466,
                    0.5453696694173961,
                    0.7529545687063771,
                    ..
                ],
                "1": [0.3737005826272204, ..]
            }
        }
    },
    {
        "put": "id:shopping:product::97711",
        "fields": {
            "in_stock": false,
            "popularity": 0.538,
            "text_embedding": [
                0.03515862084651311,
                0.24585168798559187,
                0.6123057708571111,
                ..
            ],
            "image_embeddings": {
                "0": [
                    0.9785931815169806,
                    0.5697209315543527,
                    0.5352198004501647,
                    ..
                ],
                "1": [0.24585168798559187, ..]
            }
        }
    }
]

The above JSON formatted data can be fed to Vespa using any of the Vespa feeding APIs.

Querying using nearestNeighbor query operator

The nearestNeighbor query operator is used to search the product dataset. The operator expects two arguments; the document tensor field which is searched and the input query tensor name.

The targetHits query annotation specifies the number of results to expose to first-phase ranking per content node involved in the query. targetHits is a required parameter and the query will fail if not specified. The targetHits is a lower bound per content node, and with exact search more hits than targetHits are exposed to first-phase ranking.

The query tensor is sent as a query input and the query tensor name is referenced in the second argument of the nearestNeighbor operator. In the following example, the nearestNeighbor operator is used to recommend similar products based on image similarity. For a given image (e.g. a product image shown in the product search result page) one can find products which have similar product images.

Note that the nearest neighbors search is limited to products where in_stock is true.

The overall query is specified using the Vespa query language using the Query API:

{
    "yql": "select * from product where {targetHits: 100}nearestNeighbor(image_embeddings, image_query_embedding) and in_stock = true",
    "input.query(image_query_embedding)": [
        0.22507139604882176,
        0.11696498718517367,
        0.9418422036734729,
        ..
    ],
    "ranking.profile": "image_similarity",
    "hits": 10
}

The YQL query uses logical conjunction and to filter the nearestNeighbor by a constraint on the in_stock field.

The query request also specifies hits, which determines how many hits are returned to the client using the JSON result format.

The total number of hits which is ranked by the ranking profile depends on the query filters and how fast the nearest neighbor search algorithm converges (for exact search).

The ranking.profile parameter controls which ranking profile is used. In this case, it simply ranks documents based on how close they are in the CLIP embedding space.

Using Nearest Neighbor from a Searcher Component

As with all query operators in Vespa, one can build the query tree programmatically in a custom searcher component. See Centroids in Billion-Scale Image Search for an example of how the NearestNeighborItem is used.

Using binary embeddings with hamming distance

The following packs a 128 bit embedding representation into a 16 dimensional dense tensor using int8 tensor value precision (16 x 8 = 128 bit):

document vector {
    field vector type tensor<int8>(x[16]) {
        indexing: summary | attribute
        attribute {
            distance-metric: hamming
        }
    }
}
rank-profile hamming-nn {
    num-threads-per-search: 12
    first-phase {
        expression: closeness(field,vector)
    }
}

Hamming distance search over binary vectors is implemented with xor and pop count cpu instructions. The rank-profile specifies num-threads-per-search to reduce serving latency (but not cost).

See the Billion Scale Vector Search with Vespa blog post for a detailed introduction to using binary vectors with hamming distance.