• [+] expand all

Approximate Nearest Neighbor Search using HNSW Index

For an introduction to nearest neighbor search see Nearest Neighbor Search documentation. See also the practical nearest neighbor search guide for a step by step guide. This document describes how to speed up searches for nearest neighbors by adding a HNSW index to the tensor field. Vespa implements a modified version of the Hierarchical Navigable Small World (HNSW) graphs algorithm (paper). The implementation in Vespa supports:

  • Meta Data Filtering - The search for nearest neighbors can be constrained by query filters as the nearest neighbor search in Vespa is expressed as a query operator. The query operator can be combined with other filters or query terms using the Vespa query language.
  • Real Time Indexing - Add, remove and update vectors in the index with low latency, using the document JSON format for tensors.
  • Mutable HNSW Index - This ensures no query overhead from having to search multiple HNSW indexes.
  • Multi-threaded Indexing - The costly part when doing changes to the index is searching the graph layers to find which parts to change. This is handled by multiple threads, controlled by the feeding concurrency setting.

The query examples in nearest neighbor search uses the brute force variant which has perfect accuracy but which is computationally expensive for large document volumes as the distance needs to be calculated for every document which matches the query filters. To enable fast approximate matching, the tensor needs an index directive. The same document type can have multiple HNSW indexes:

field image_sift_encoding type tensor<float>(x[128]) {
  indexing: summary | attribute | index
  attribute {
    distance-metric: euclidean 
  }
  index {
    hnsw {
      max-links-per-node: 16
      neighbors-to-explore-at-insert: 500
    }
  }
}

field bert_embedding type tensor<float>(x[384]) {
  indexing: summary | attribute | index
  attribute {
    distance-metric: angular
  }
  index {
    hnsw {
      max-links-per-node: 24
      neighbors-to-explore-at-insert: 500
    }
  }
}

In the example above, fast approximate search is enabled by building an HNSW index over the image_sift_encoding and the bert_embedding tensor fields. The two fields uses different distance-metric and max-links-per-node settings. See HNSW index reference for details on the index parameters. Choosing the value of these parameters affects both accuracy, query speed, and also indexing performance.

The tensors used in queries must have their type declared in a query profile in the application package as discussed in the exact nearest neighbor search document.

Using fast approximate nearest neighbor search in query

With an HNSW index enabled on the tensor field one can choose between approximate or exact (brute-force) search by using the approximate annotation:

{
  "yql": "select title,price from sources * where ({targetHits: 10, approximate:false}nearestNeighbor(image_sift_encoding,query_vector_sift)) and in_stock = true",
  "hits": 100
  "ranking.features.query(query_vector_sift)": [0.21,0.12,....],
  "ranking.profile": "image_similarity" 
}

By default, approximate is true when searching a tensor with a HNSW index.

The approximate parameter allows computing the accuracy loss by computing the exact neighbors and compare with the approximation to compute the accuracy metric (e.g., recall@k).

In addition to targetHits, there is a hnsw.exploreAdditionalHits parameter which controls how many extra nodes in the graph (in addition to targetHits) that are explored during the graph search.

Using a greater number gives better quality (accuracy as compared with the brute force) but worse performance.

{
  "yql": "select title,price from sources * where ({targetHits: 10, hnsw.exploreAdditionalHits:100}nearestNeighbor(image_sift_encoding,query_vector_sift)) and in_stock = true",
  "hits": 10,
  "ranking.features.query(query_vector_sift)": [0.21,0.12,....],
  "ranking.profile": "image_similarity" 
}

Nearest Neighbor Search Considerations

  • When using the nearestNeighbor operator the total hit count is not accurate. This is similar to the behavior with weakAnd and wand. This is true for both approximate and brute force.
  • When using the nearestNeighbor operator the grouping counts are not accurate.
  • Changing the distance-metric for a tensor field with an HNSW index requires re-starting. Similar, changing the max-links-per-node and neighbors-to-explore-at-insert requires re-starting.
  • Vespa tensor fields are in-memory data structures and so is the HNSW index. For large datasets the primary memory footprint driver is the raw tensor field.

The HNSW search algorithm is sub-linear (close to log(N)), this has interesting properties when attempting to add more nodes horizontally as even if the document volume per node is reduced by a factor of 10, the latency is only reduced by 50%. Pure HNSW search applications should attempt to scale up by using a larger flavor and try to maximize the number of vectors per node. To scale with throughput (query volume) add groups (replicas). See Sizing search.

Using lower precision, for example using bfloat16 instead of float saves 50% memory usage, which enables putting more documents per node.