• [+] expand all

Nearest Neighbor Search

Searching for the nearest neighbors of a data point in high dimensional vector space is a fundamental problem for many real-time serving applications. In many of these real-world applications of nearest neighbor search, search is constrained by query filters applied over the data points metadata. Expressing the search for nearest neighbors as a native Vespa query operator allows combining the search with traditional query filters using boolean logic.

The following sections demonstrates how to use the nearestNeighbor query operator in Vespa.

These examples use exact nearest neighbor search with perfect accuracy, but which is computationally expensive for large document volumes since the distance metric must be calculated for every document that matches the boolean query filters.

Exact nearest neighbor search scales linearly with number of threads used per query. Using multiple threads per search, can be used to make exact search run with acceptable serving latency. Using more threads per search to reduce latency is still costly, but the results are exact without any accuracy loss. See Sizing search documentation.

See how to use approximate nearest neighbor search with HNSW in the Approximate Nearest Neighbor Search guide.

The examples below with the nearestNeighbor query operator uses data and inspiration from an E-Commerce use case.

  • Related products by product image similarity using SIFT image encodings
  • Related products using text (title) embeddings similarity from a Transformer model (BERT)
  • Related products using a hybrid combination of the above
  • Hybrid Search using traditional keyword matching in combination with dense vector retrieval

The following is the document schema:

schema product {

    document product {

        field title type string {
            indexing: summary | index
        }

        field in_stock type bool {
            indexing: summary | attribute
        }

        field price type float {
            indexing: summary | attribute
        }

        field popularity type float {
            indexing: summary | attribute
        }

        field image_sift_encoding type tensor<float>(x[128]) {
            indexing: summary | attribute
            attribute {
                distance-metric: euclidean
            }
        }

        field bert_embedding type tensor<float>(x[384]) {
            indexing: summary | attribute
            attribute {
                distance-metric: angular
            }
        }
    }
}

The product document type is made up of six fields. Fields like title, in_stock, price and popularity are self-explanatory but the tensor type fields needs more explanation.

The image_sift_encoding field is used to store the scale-invariant feature transform (SIFT) encoding of a product catalog image. The tensor definition specifies that it is an indexed (dense) tensor and has exactly 128 dimensions. The <float> defines the tensor cell address precision type. When searching for nearest neighbors in the SIFT embedding vector space, one use euclidean distance metric.

The bert_embedding field is used to store a dense vectorized embedding representation of the product and which use angular as distance metric. See Dense Retrieval using bi-encoders over Transformer models.

Vespa supports 4 distance metrics to use when searching for nearest neighbors:

  • euclidean
  • angular
  • innerproduct
  • geodegrees
  • hamming

The first two are general distance metrics, the last three may only be used for specific types of data. hamming is for example useful for binary embeddings using int8 tensor precision where the hamming distance is implemented using bitwise efficient operations. See distance-metric reference documentation for details.

A user also need to define the query tensors in the application package, see query feature types for details on query tensor feature types.

The search/query-profiles/types/root.xml file defines the query tensor inputs used by the nearest neighbor search operator.

<query-profile-type id="root">
    <field name="ranking.features.query(query_vector_sift)" type="tensor<float>(x[128])" />
    <field name="ranking.features.query(query_vector_bert)" type="tensor<float>(x[384])" />
</query-profile-type>

Naturally, these uses the same number of dimensions as the document side field types. To make use of these tensor definitions, a type reference is needed in search/query-profiles/default.xml.

<query-profile id="default" type="root">
</query-profile>

Skipping these steps will cause a query time error:

Expected a tensor value of 'query(query_vector_sift)' but has [...]

Lastly, one need to configure how to rank our products which is configured in a ranking profile in the schema.

rank-profile image-similarity inherits default {
    first-phase {
        expression: closeness(field, image_sift_encoding)
    }
}

The closeness(field, image_sift_encoding) Vespa rank feature which returns a value in the range [0, 1], where 0 is infinite distance, and 1 is zero distance. This is convenient because Vespa sorts hits by decreasing relevancy score, and one usually want the closest hits to be ranked highest. The first-phase is part of Vespa's phased ranking support. Phased ranking or multi-stage document ranking, enables re-ranking of the top-k subset of the hits which are retrieved from the previous stage. The computed ranking score is found in the relevance field in the rendered search result. If this becomes 0.0 one usually have forgotten to specify the correct ranking profile.

Indexing product data

After deploying the application package with the document schema, one can index the data using the json feed format. In the example below there are two documents, the tensor fields are represented using indexed tensor short form.

[
  {
    "put": "id:shopping:product::998211",
    "fields": {
      "title": "Linen Blend Wide Leg Trousers",
      "in_stock": true,
      "price": 29.95,
      "popularity": 0.342,
      "image_sift_encoding": {
        "values": [
          0.9147281579191466,
          0.5453696694173961,
          0.7529545687063771,
          ..
         ]
      },
       "bert_embedding": {
        "values": [
          0.16766378547490635,
          0.3737005826272204,
          0.040492891373747675,
          ..
         ]
        }
      }
  },
  {
    "put": "id:shopping:product::97711",
    "fields": {
      "title": "Loose-fit White Pens",
      "in_stock": false,
      "price": 99.99,
      "popularity": 0.538,
      "image_sift_encoding": {
        "values": [
          0.9785931815169806,
          0.5697209315543527,
          0.5352198004501647,
          ..
        ]
      },
      "bert_embedding": {
        "values": [
          0.03515862084651311,
          0.24585168798559187,
          0.6123057708571111,
          ..
        ]
      }
    }
  }
]

The above json data can be fed to Vespa using any of the Vespa feeding api's. Vespa only supports real time indexing so there is no explicit batch support, documents become searchable as they are fed to the system. Tensor fields can be partially updated.

Querying product data

To query the data one uses the nearestNeighbor query operator. This operator expects two arguments; the document tensor field which is searched and the input query tensor. The targetHits annotation specifies the targeted number of hits per content node involved in the query. This annotation is a required parameter and the query fill fail if not specified. The targetHits is a lower bound per node and with exact search more hits are exposed to first-phase ranking.

The query tensor is sent as a query ranking feature and the query tensor name is referenced in second argument of the query operator.

Related Products using Image Similarity

In this example, the nearestNeighbor query operator is used to recommend similar products based on image similarity. For a given image (e.g. a product image shown in the product search result page) one can find products which have a similar product image.

The search for nearest neighbors in the SIFT embedding space is limited to products where in_stock is true. The overall query is specified using the YQL query language and the query api request becomes:

{
    "yql": "select title, price from sources products where ([{\"targetHits\": 10}]nearestNeighbor(image_sift_encoding,query_vector_sift)) and in_stock = true;",
    "ranking.features.query(query_vector_sift)": [
        0.22507139604882176,
        0.11696498718517367,
        0.9418422036734729,
        ...
    ],
    "ranking.profile": "image-similarity",
    "hits": 10
}

The YQL query uses logical conjunction and so that the nearest neighbor search is restricted to documents where in_stock is true. The targetHits is the desired number of neighbors which is exposed to the first-phase ranking function declared in the image-similarity ranking profile.

The total number of hits which is ranked by the ranking profile depends on the query filters and how fast the nearest neighbor search algorithm converges. The operator might expose more than 10 hits to the first-phase ranking expression. Since there is filtering on in_stock, the query might return zero results in the case where the entire product catalog is out of stock.

The ranking.profile parameter controls the rank-profile we described earlier, which simply ranks documents based on how similar they are in the SIFT vector encoding space.

Finally, the hits parameter controls how many hits are returned in the response. If the documents are partitioned over multiple content nodes the targetHits is per node, so with e.g. 10 nodes we get at least 100 nearest neighbors fully evaluated using the first-phase ranking function.

In this example, the first-phase expression is the closeness score as calculated by the nearestNeighbor operator. One could re-order hits by changing the first-phase ranking function, and the following would re-rank the nearest neighbors which are in stock using the price:

rank-profile image-similarity-price {
    first-phase {
        expression: if(isNan(attribute(price)) == 1,0.0, attribute(price))
    }
}

Here the closeness score calculated by the nearestNeighbor query operator is ignored, documents are instead re-ranked by price.

Related Products using BERT Embedding Similarity

As an alternative to using product image similarity to find related items one could also use the textual BERT embedding representation from the product:

rank-profile similarity-using-bert {
    first-phase {
        expression: closeness(field, bert_embedding)
    }
}

Related Products using a hybrid combination

It is also possible to combine the two methods for retrieving related products. In this example one use a single query request using disjunction (OR) logical operator to retrieve products which are close in either vector space. The ranking function uses weighted linear combination of the closeness scores produced by two nearestNeighbor query operators:

rank-profile hybrid-similarity {
    first-phase {
        expression: {
            query(bert_weight)*closeness(field, bert_embedding) +
            (1 - query(bert_weight))*closeness(field, image_sift_encoding)
        }
    }
}

The search request need to pass both query embedding tensors. The query use logical disjunction (OR) to combine the two nearestNeighbor query operators. The search request becomes:

{
    "yql": "select title, price from sources products where (([{\"targetNumHits\": 10}]nearestNeighbor(title_bert_embedding,query_vector_bert))
    or ([{\"targetNumHits\": 10}]nearestNeighbor(image_sift_encoding,query_vector_sift))) and in_stock = true;",
    "hits": 10,
    "ranking.features.query(query_vector_sift)": [
        0.5158057404746615,
        0.6441591144976925,
        0.24075880767944935,
        ...
    ],
    "ranking.features.query(query_vector_bert)": [
        0.708719978302217,
        0.4800850502044147,
        0.03310770782193717,
        ...
    ],
    "ranking.features.query(bert_weight)": 0.1,
    "ranking.profile": "hybrid-similarity"
}

Using Nearest Neighbor from a Searcher Component

As with all query operators in Vespa, one can also build the query tree programmatically in a custom Searcher component.

In the example below, one execute the incoming query and fetch the tensor of the top ranking hit ranked by whatever rank-profile was specified in the search request. The image_sift_encoding tensor from the top ranking hit is read and used as the input for the nearestNeighbor query operator to find related products (using the rank profile described in earlier sections).

package ai.vespa.example.searcher;

import com.yahoo.prelude.query.NearestNeighborItem;
import com.yahoo.search.Query;
import com.yahoo.search.Result;
import com.yahoo.search.Searcher;
import com.yahoo.search.result.Hit;
import com.yahoo.search.searchchain.Execution;
import com.yahoo.tensor.Tensor;

public class FeedbackSearcher extends Searcher {

    @Override
    public Result search(Query query, Execution execution) {
        Result result = execution.search(query);
        if (result.getTotalHitCount() == 0)
            return result;
        execution.fill(result); // fetch the summary data of the hits if needed
        Hit bestHit = result.hits().get(0);
        Tensor sift_encoding = (Tensor) bestHit.getField("image_sift_encoding");
        Query relatedQuery = new Query();
        relatedQuery.getRanking().setProfile("image-similarity");
        relatedQuery.getRanking().getFeatures().put("query(query_vector_sift)", sift_encoding);
        NearestNeighborItem nn = new NearestNeighborItem("image_sift_encoding",  "query_vector_sift");
        nn.setTargetNumHits(10);
        relatedQuery.getModel().getQueryTree().setRoot(nn);
        return execution.search(relatedQuery);
    }
}

Using binary embeddings with hamming distance

If one for example has a 128 bit binary embedding (hash), one can pack 128 bits into a 16 dimensional dense tensor using int8 cell precision:

document vector {
    field vector type tensor<int8>(x[16]) {
        indexing: summary|attribute
        attribute {
            distance-metric:hamming
        }
    }
}
rank-profile hamming-nn {
    num-threads-per-search: 12
    first-phase {
        expression: closeness(field,vector)
    }
}

Hamming on binary fields is implemented with xor and pop count cpu instructions, so it is highly efficient. In the example there is also multi-threaded search, which can also be used for previous examples for exact brute force nearest neighbor search to reduce serving latency.