• [+] expand all

Vespa nearest neighbor search - a practical guide

This guide is a practical introduction to using Vespa nearest neighbor search query operator and how to combine nearest neighbor search with other Vespa query operators. The guide also covers diverse, efficient candidate retrievers which can be used as candidate retrievers in a multi-phase ranking funnel.

The guide uses the Last.fm tracks dataset for illustration. Latency numbers mentioned in the guide are obtained from running this guide on a MacBook Pro x86. See also the generic Vespa performance - a practical guide.

This guide covers the following:

The guide includes step-by-step instructions on how to reproduce the experiments.

Installing vespa-cli

This tutorial uses Vespa-CLI, the official command-line client for Vespa.ai. It is a single binary without any runtime dependencies and is available for Linux, macOS and Windows:

$ brew install vespa-cli 

Dataset

This guide uses the Last.fm tracks dataset. Note that the dataset is released under the following terms:

Research only, strictly non-commercial. For details, or if you are unsure, please contact Last.fm. Also, Last.fm has the right to advertise and refer to any work derived from the dataset.

To download the dataset directly (About 120 MB zip file):

$ curl -L -o lastfm_test.zip \
    http://millionsongdataset.com/sites/default/files/lastfm/lastfm_test.zip 
$ unzip lastfm_test.zip

The downloaded data needs to be converted to the Vespa JSON format.

This python script can be used to traverse the dataset files and create a JSONL formatted feed file with Vespa put operations. The schema used with this feed format is introduced in the next section. The number of unique tags is used as a proxy for the popularity of the track.

import os
import sys
import json
import unicodedata

directory = sys.argv[1]
seen_tracks = set() 

def remove_control_characters(s):
    return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")

def process_file(filename):
    global seen_tracks
    with open(filename) as fp:
        doc = json.load(fp)
        title = doc['title']
        artist = doc['artist']
        hash = title + artist
        if hash in seen_tracks:
            return
        else:
            seen_tracks.add(hash) 

        track_id = doc['track_id']
        tags = doc['tags']
        tags_dict = dict()
        for t in tags:
            k,v = t[0],int(t[1])
            tags_dict[k] = v
        n = len(tags_dict)

        vespa_doc = {
            "put": "id:music:track::%s" % track_id,
                "fields": {
                    "title": remove_control_characters(title),
                    "track_id": track_id,
                    "artist": remove_control_characters(artist),
                    "tags": tags_dict,
                    "popularity": n
            }
        }
        print(json.dumps(vespa_doc))

sorted_files = []
for root, dirs, files in os.walk(directory):
    for filename in files:
        filename = os.path.join(root, filename)
        sorted_files.append(filename)
sorted_files.sort()
for filename in sorted_files:
    process_file(filename)
import os
import sys
import json
import unicodedata

directory = sys.argv[1]
seen_tracks = set() 

def remove_control_characters(s):
    return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")

def process_file(filename):
    global seen_tracks
    with open(filename) as fp:
        doc = json.load(fp)
        title = doc['title']
        artist = doc['artist']
        hash = title + artist
        if hash in seen_tracks:
            return
        else:
            seen_tracks.add(hash) 

        track_id = doc['track_id']
        tags = doc['tags']
        tags_dict = dict()
        for t in tags:
            k,v = t[0],int(t[1])
            tags_dict[k] = v
        n = len(tags_dict)

        vespa_doc = {
            "put": "id:music:track::%s" % track_id,
                "fields": {
                    "title": remove_control_characters(title),
                    "track_id": track_id,
                    "artist": remove_control_characters(artist),
                    "tags": tags_dict,
                    "popularity": n
            }
        }
        print(json.dumps(vespa_doc))

sorted_files = []
for root, dirs, files in os.walk(directory):
    for filename in files:
        filename = os.path.join(root, filename)
        sorted_files.append(filename)
sorted_files.sort()
for filename in sorted_files:
    process_file(filename)

Process the dataset and convert it to Vespa JSON document operation format.

$ python3 create-vespa-feed.py lastfm_test > feed.jsonl

Create a Vespa Application Package

A Vespa application package is the set of configuration files and Java plugins that together define the behavior of a Vespa system: what functionality to use, the available document types, how ranking will be done, and how data will be processed during feeding and indexing.

The minimum required files to create the basic search application are track.sd and services.xml. Create directories for the configuration files:

$ mkdir -p app/schemas; mkdir -p app/search/query-profiles/

Schema

A schema is a configuration of a document type and what we should compute over it. For this application we define a document type called track. Write the following to app/schemas/track.sd:

schema track {

    document track {

        field track_id type string {
            indexing: summary | attribute
            match: word
        }

        field title type string {
            indexing: summary | index
            index: enable-bm25
        }

        field artist type string {
            indexing: summary | index
        }

        field tags type weightedset<string> {
            indexing: summary | attribute
            attribute: fast-search
        }

        field embedding type tensor<float>(x[384]) {
            indexing: attribute | index
            attribute {
                distance-metric: euclidean
            }
            index {
                hnsw {
                    max-links-per-node: 16
                    neighbors-to-explore-at-insert: 50
                }
            }
        }

        field popularity type int {
            indexing: summary | attribute
            attribute: fast-search
            rank: filter
        }
    }

    fieldset default {
        fields: title, artist
    }

    document-summary track_id {
        summary track_id type string { 
            source: track_id
        }
    }

    rank-profile tags {
        first-phase {
            expression: rawScore(tags)
        }
    }

    rank-profile bm25 {
        first-phase {
            expression: bm25(title)
        }
    }

    rank-profile closeness {
        num-threads-per-search: 1
        match-features: distance(field, embedding)

        inputs {
            query(q)  tensor<float>(x[384])
            query(qa) tensor<float>(x[384])
        } 

        first-phase {
            expression: closeness(field, embedding)
        }
    }

    rank-profile closeness-t4 inherits closeness {
        num-threads-per-search: 4
    }

    rank-profile closeness-label inherits closeness {
        match-features: closeness(label, q) closeness(label, qa)
    }

    rank-profile hybrid inherits closeness {
        inputs {
            query(wTags) : 1.0
            query(wPopularity) :  1.0
            query(wTitle) : 1.0
            query(wVector) : 1.0
        }
        first-phase {
            expression {
                query(wTags) * rawScore(tags) + 
                query(wPopularity) * attribute(popularity) + 
                query(wTitle) * bm25(title) + 
                query(wVector) * closeness(field, embedding)
            }
        }
        match-features {
            rawScore(tags)
            attribute(popularity)
            bm25(title)
            closeness(field, embedding)
            distance(field, embedding)
        }
    }
}

This document schema is explained in the practical search performance guide, the addition is the embedding field and the various closeness rank profiles. Note that the closeness schema defines the query tensor inputs which needs to be declared to be used with Vespa’s nearestNeighbor query operator.

field embedding type tensor<float>(x[384]) {
    indexing: attribute | index
    attribute {
        distance-metric: euclidean
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 50
        }
    }
 }

See Approximate Nearest Neighbor Search using HNSW Index for an introduction to HNSW and the HNSW tuning parameters.

Services Specification

The services.xml defines the services that make up the Vespa application — which services to run and how many nodes per service. Write the following to app/services.xml:

<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">

    <container id="default" version="1.0">
        <search/>
        <document-api/>
    </container>

    <content id="tracks" version="1.0">
        <engine>
            <proton>
                <tuning>
                    <searchnode>
                        <requestthreads>
                            <persearch>4</persearch>
                        </requestthreads>
                    </searchnode>
                </tuning>
            </proton>
        </engine>
        <redundancy>1</redundancy>
        <documents>
            <document type="track" mode="index"></document>
        </documents>
        <nodes>
            <node distribution-key="0" hostalias="node1"></node>
        </nodes>
    </content>
</services>

The default query profile can be used to override default query api settings for all queries.

The following enables presentation.timing and renders weightedset fields as a JSON maps.

<query-profile id="default">
    <field name="presentation.timing">true</field>
    <field name="renderer.json.jsonWsets">true</field>
</query-profile>

Deploy the application package

The application package can now be deployed to a running Vespa instance. See also the Vespa quick start guide.

Start the Vespa container image using Docker:

$ docker run --detach --name vespa --hostname vespa-container \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

Starting the container can take a short while. Before continuing, make sure that the configuration service is running by using vespa status deploy.

$ vespa config set target local
$ vespa status deploy --wait 300 

Once ready, deploy the application using vespa deploy:

$ vespa deploy --wait 300 app

Index the dataset

Feed the dataset using Vespa feed client:

$ curl -L -o vespa-feed-client-cli.zip \
    https://search.maven.org/remotecontent?filepath=com/yahoo/vespa/vespa-feed-client-cli/8.86.28/vespa-feed-client-cli-8.86.28-zip.zip
$ unzip vespa-feed-client-cli.zip
$ ./vespa-feed-client-cli/vespa-feed-client \
  --verbose --file feed.jsonl --endpoint http://localhost:8080

Free-text search using Vespa weakAnd

The following sections uses the Vespa query api and formulate queries using Vespa query language. The examples uses the vespa-cli command which supports running queries.

The CLI uses the Vespa query api. Use vespa query -v to see the curl equivalent:

$ vespa query -v 'yql=select ..'

The first example is searching and ranking using the bm25 rank profile defined in the schema. It uses the bm25 rank feature as the first-phase relevance score:

$ vespa query \
    'yql=select artist, title, track_id from track where userQuery()' \
    'query=total eclipse of the heart' \
    'hits=1' \
    'type=all' \
    'ranking=bm25'

This query combines YQL userQuery() with Vespa’s simple query language. The query type is using all, requiring that all the terms match.

The above query example searches for total AND eclipse AND of AND the AND heart in the default fieldset, which in the schema includes the title and artist fields.

The result for the above query will look something like this:

{
    "timing": {
        "querytime": 0.007,
        "summaryfetchtime": 0.002,
        "searchtime": 0.01
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 1
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 22.590392521579684,
                "source": "tracks",
                "fields": {
                    "track_id": "TRKLIXH128F42766B6",
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            }
        ]
    }
}

This query only matched one document because the query terms were ANDed. We can change matching to use type=any instead of the default type=all. See supported query types.

$ vespa query \
    'yql=select artist, title, track_id from track where userQuery()' \
    'query=total eclipse of the heart' \
    'hits=1' \
    'ranking=bm25' \
    'type=any'

Now, the query matches 24,053 documents and is considerably slower than the previous query. Comparing querytime of these two query examples, the one which matches the most documents have highest querytime. In worst case, the search query matches all documents.

Query matching performance is greatly impacted by the number of documents that matches the query specification. Type any queries requires more compute resources than type all.

There is an optimization available for type=any queries, using the weakAnd query operator which implements the WAND algorithm. See the using wand with Vespa guide for more details.

Run the same query, but instead of type=any use type=weakAnd, see supported query types:

$ vespa query \
    'yql=select artist, title, track_id from track where userQuery()' \
    'query=total eclipse of the heart' \
    'hits=1' \
    'ranking=bm25' \
    'type=weakAnd'

Compared to the type any query which fully ranked 24,053 documents, the query only expose about 3,600 documents to the first-phase ranking expression. Also notice that the faster search returns the same document at the first position.

$ vespa query \
    'yql=select artist, title, track_id from track where userQuery()' \
    'query="total eclipse of the heart"' \
    'hits=1' \
    'ranking=bm25' \
    'type=weakAnd'

In this case, the query input “total eclipse of the heart” is parsed as a phrase query, and the search only finds 1 document matching the exact phrase.

Maximum Inner Product Search using Vespa WAND

The previous section introduced the weakAnd query operator which integrates with linguistic processing and string matching using match: text.

The following examples uses the wand() query operator. The wand query operator calculates the maximum inner product between the sparse query and document feature integer weights. The inner product ranking score calculated by the wand query operator can be used in a ranking expression by the rawScore(name) rank feature.

rank-profile tags {
    first-phase {
        expression: rawScore(tags)
    }
}

This query searches the track document type using a learned sparse userProfile representation, performing a maximum inner product search over the tags weightedset field.

$ vespa query \
    'yql=select track_id, title, artist from track where {targetHits:10}wand(tags, @userProfile)' \
    'userProfile={"pop":1, "love songs":1,"romantic":10, "80s":20 }' \
    'hits=2' \
    'ranking=tags'

The query asks for 2 hits to be returned, and uses the tags rank profile. The result for the above query will look something like this:

{
    "timing": {
        "querytime": 0.051000000000000004,
        "summaryfetchtime": 0.004,
        "searchtime": 0.057
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 66
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/57037bdeb9caadebd8c235e1",
                "relevance": 2500.0,
                "source": "tracks",
                "fields": {
                    "track_id": "TRMIBBE128E078B487",
                    "title": "The Rose   ***",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/8eb2e19ee627b054113ba4c9",
                "relevance": 2344.0,
                "source": "tracks",
                "fields": {
                    "track_id": "TRKDRVK128F421815B",
                    "title": "Nothing's Gonna Change My Love For You",
                    "artist": "Glenn Medeiros"
                }
            }
        ]
    }
}

The wand query operator exposed a total of about 60 documents to the first-phase ranking which uses the rawScore(tag) rank-feature directly, so the relevancy is the result of the sparse dot product between the sparse user profile and the document tags.

The wand query operator is safe, meaning, it returns the same top-k results as the brute-force dotProduct query operator. wand is a type of query operator which performs matching and ranking interleaved and skipping documents which cannot compete into the final top-k results. See the using wand with Vespa guide for more details on using wand and weakAnd query operators.

Vespa’s nearest neighbor search operator supports doing exact brute force nearest neighbor search using dense representations. This guide uses the sentence-transformers/all-MiniLM-L6-V2 embedding model. Download the pre-generated document embeddings and feed them to Vespa. The feed file uses partial updates to add the vector embedding.

$ curl -L -o lastfm_embeddings.jsonl.zst \
    https://data.vespa.oath.cloud/sample-apps-data/lastfm_embeddings.jsonl.zst
$ zstdcat lastfm_embeddings.jsonl.zst | ./vespa-feed-client-cli/vespa-feed-client \
  --verbose --stdin --endpoint http://localhost:8080

The following query examples use a static query vector embedding for the query string Total Eclipse Of The Heart. The query embedding was obtained by the following snippet using sentence-transformers:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print(model.encode("Total Eclipse Of The Heart").tolist())
$ export Q='[-0.008,0.085,0.05,-0.009,-0.038,-0.003,0.019,-0.085,0.123,-0.11,0.029,-0.032,-0.059,-0.005,-0.022,0.031,0.007,0.003,0.006,0.041,-0.094,-0.044,-0.004,0.045,-0.016,0.101,-0.029,-0.028,-0.044,-0.012,0.025,-0.011,0.016,0.031,-0.037,-0.027,0.007,0.026,-0.028,0.049,-0.041,-0.041,-0.018,0.033,0.034,-0.01,-0.038,-0.052,0.02,0.029,-0.029,-0.043,-0.143,-0.055,0.052,-0.021,-0.012,-0.058,0.017,-0.017,0.023,0.017,-0.074,0.067,-0.043,-0.065,-0.028,0.066,-0.048,0.034,0.026,-0.034,0.085,-0.082,-0.043,0.054,-0.0,-0.075,-0.012,-0.056,0.027,-0.027,-0.088,0.01,0.01,0.071,0.007,0.022,-0.032,0.068,-0.003,-0.109,-0.005,0.07,-0.017,0.006,-0.007,-0.034,-0.062,0.096,0.038,0.038,-0.031,-0.023,0.064,-0.046,0.055,-0.011,0.016,-0.016,-0.007,-0.083,0.061,-0.037,0.04,0.099,0.063,0.032,0.019,0.099,0.105,-0.046,0.084,0.041,-0.088,-0.015,-0.002,-0.0,0.045,0.02,0.109,0.031,0.02,0.012,-0.043,0.034,-0.053,-0.023,-0.073,-0.052,-0.006,0.004,-0.018,-0.033,-0.067,0.126,0.018,-0.006,-0.03,-0.044,-0.085,-0.043,-0.051,0.057,0.048,0.042,-0.013,0.041,-0.017,-0.039,0.06,0.015,-0.031,0.043,-0.049,0.008,-0.008,0.028,-0.014,0.035,-0.08,-0.052,0.017,0.02,0.059,0.049,0.048,0.033,0.024,0.009,0.021,-0.042,-0.021,0.048,0.015,0.042,-0.004,-0.012,0.041,0.053,0.015,-0.034,-0.005,0.068,-0.053,-0.107,-0.051,0.03,-0.063,-0.036,0.032,-0.054,0.085,0.022,0.08,0.054,-0.045,-0.058,-0.161,0.066,0.065,-0.043,0.084,0.043,-0.01,-0.01,-0.084,-0.021,0.041,0.026,-0.011,-0.065,-0.046,0.0,-0.046,-0.014,-0.009,-0.08,0.063,0.02,-0.082,0.088,0.046,0.058,0.005,-0.024,0.047,0.019,0.051,-0.021,0.02,-0.003,-0.019,0.08,0.031,0.021,0.041,-0.01,-0.018,0.07,0.076,-0.021,0.027,-0.086,0.059,-0.068,-0.126,0.025,-0.037,0.036,-0.028,0.035,-0.068,0.005,-0.032,0.023,0.012,0.074,0.028,-0.02,0.054,0.124,0.022,-0.021,-0.099,-0.044,-0.044,0.093,0.004,-0.006,-0.037,0.034,-0.021,-0.046,-0.031,-0.034,0.015,-0.041,0.001,0.022,0.015,0.02,-0.16,0.065,-0.016,0.059,-0.249,0.023,0.031,0.047,0.063,-0.06,-0.002,-0.049,-0.06,-0.014,0.013,0.004,0.019,-0.039,0.007,0.024,-0.004,0.045,-0.026,0.078,-0.014,-0.038,0.003,-0.0,0.019,0.04,-0.017,-0.088,-0.04,-0.029,0.05,0.012,-0.042,0.052,0.035,0.061,0.011,0.03,-0.068,0.015,0.032,-0.028,-0.046,-0.032,0.094,0.006,0.082,-0.103,0.013,-0.054,0.038,0.01,0.029,-0.025,0.119,0.034,0.024,-0.034,-0.055,-0.014,0.026,0.068,-0.009,0.085,0.028,-0.086,0.038,0.01,-0.024,0.01,0.071,-0.078,-0.033,-0.024,0.023,-0.005,-0.002,-0.047,0.031,0.023,0.004,0.069,-0.018,0.034,0.109,0.036,0.009,0.029]'

The first query example uses exact nearest neighbor search:

$ vespa query \
    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
    'hits=1' \
    'ranking=closeness' \
    "input.query(q)=$Q"

Query breakdown:

  • Search for ten (targetHits:10) nearest neighbors of the query(q) query tensor over the embedding document tensor field.
  • The annotation approximate:false tells Vespa to perform exact search.
  • The hits parameter controls how many results are returned in the response. Number of hits requested does not impact targetHits. Notice that targetHits is per content node involved in the query.
  • ranking=closeness tells Vespa which rank-profile to score documents. One must specify how to rank the targetHits documents retrieved and exposed to first-phase ranking expression in the rank-profile.
  • input.query(q) points to the input query vector.

Not specifying ranking will cause Vespa to use nativeRank which does not use the vector similarity, causing results to be randomly sorted.

The above exact nearest neighbor search will return the following result:

{
    "timing": {
        "querytime": 0.051,
        "summaryfetchtime": 0.001,
        "searchtime": 0.051
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 118
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897741908119,
                "source": "tracks",
                "fields": {
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            }
        ]
    }
}

The exact search takes approximately 51ms, performing 95,666 distance calculations. A total of about 120 documents were exposed to the first-phase ranking during the search as can be seen from totalCount.

The exact search is using a scoring heap during chunked distance calculations, and documents which at some time were put on the top-k heap are exposed to first phase ranking. Splitting the vectors into chunks reduces the computational complexity as one can rule out distance neighbors just by comparing a few chunks with the lowest scoring vector on the heap.

It is possible to reduce search latency of the exact search by throwing more CPU resources at it. Changing the rank-profile to closeness-t4 makes Vespa use four threads per query:

$ vespa query \
    'yql=select title, artist from track where {approximate:false,targetHits:10}nearestNeighbor(embedding,q)' \
    'hits=1' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"

Now, the exact search latency is reduced by using more threads, see multi-threaded searching and ranking for more on this topic.

{
    "timing": {
        "querytime": 0.019,
        "summaryfetchtime": 0.001,
        "searchtime": 0.021
    }
}

This section covers using the faster, but approximate, nearest neighbor search. The track schema’s embedding field has the index property, which means Vespa builds a HNSW index to support fast, approximate vector search. See Approximate Nearest Neighbor Search using HNSW Index for an introduction to HNSW and the tuning parameters.

The default query behavior is using approximate:true when the embedding field has index:

$ vespa query \
    'yql=select title, artist from track where {targetHits:10,hnsw.exploreAdditionalHits:20}nearestNeighbor(embedding,q)' \
    'hits=1' \
    'ranking=closeness' \
    "input.query(q)=$Q"

Which returns the following response:

{
    "timing": {
        "querytime": 0.004,
        "summaryfetchtime": 0.001,
        "searchtime": 0.004
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 10
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897837210658,
                "source": "tracks",
                "fields": {
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            }
        ]
    }
}

Now, the query is significantly faster, and also uses less resources during the search. To get latency down to 20 ms with the exact search one had to use 4 matching threads. In this case the result latency is down to 4ms with a single matching thread. For this query example, the approximate search returned the exact same top-1 hit and there was no accuracy loss for the top-1 position.

A few key differences between exact and approximate neighbor search:

  • totalCount is different, when using the approximate version, Vespa exposes exactly targethits to the configurable first-phase rank expression in the chosen rank-profile. The exact search is using a scoring heap during chunked distance calculations, and documents which at some time were put on the top-k heap are exposed to first phase ranking.

  • The search is approximate and might not return the exact top 10 closest vectors as with exact search. This is a complex tradeoff between accuracy, query performance , and memory usage. See Billion-scale vector search with Vespa - part two for a deep-dive into these trade-offs.

With the support for setting approximate:false|true a developer can quantify accuracy loss by comparing the results of exact nearest neighbor search with the results of the approximate search. By doing so, developers can quantify the recall@k or overlap@k, and find the right balance between search performance and accuracy. Increasing hnsw.exploreAdditionalHits improves accuracy (recall@k) at the cost of a slower query.

Combining approximate nearest neighbor search with query filters

Vespa allows combining the search for nearest neighbors to be constrained by regular query filters. In this query example the title field must contain the term heart:

$ vespa query \
    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains "heart"' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"

Which returns the following response:

{
    "timing": {
        "querytime": 0.005,
        "summaryfetchtime": 0.001,
        "searchtime": 0.007
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 55
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897741908119,
                "source": "tracks",
                "fields": {
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/cb79ca7f404071e95561ca38",
                "relevance": 0.5259774715154759,
                "source": "tracks",
                "fields": {
                    "title": "Heart Of My Heart",
                    "artist": "Quest"
                }
            }
        ]
    }
}

When using filtering, it is important for performance reasons that the fields that are included in the filters have been defined with index or attribute:fast-search. See searching attribute fields.

The optimal performance for pure filtering, where the query term(s) does not influence ranking, is achieved using rank: filter in the schema.

field popularity type int {
    indexing: summary | attribute
    rank: filter
    attribute: fast-search
}

Matching against the popularity field does not influence ranking and Vespa can use the most efficient posting list representation. Note that one can still access the value of the attribute in ranking expressions.

rank-profile popularity {
    first-phase {
        expression: attribute(popularity)
    }
}

In the following example, since the title field does not have rank: filter one can instead flag that the term should not be used by any ranking expression by using the ranked query annotation.

The following disables term based ranking and the matching against the title field can use the most efficient posting list representation.

$ vespa query \
    'yql=select title, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and title contains ({ranked:false}"heart")' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"

In the previous examples, since the rank-profile did only use the closeness rank feature,
the matching would not impact the score anyway.

Vespa also allows combining the nearestNeighbor query operator with any other Vespa query operator.

$ vespa query \
    'yql=select title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 20 and artist contains "Bonnie Tyler"' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"

In this case restricting the nearest neighbor search to tracks by Bonnie Tyler with popularity > 20.

Strict filters and distant neighbors

When combining nearest neighbor search with strict filters which matches less than 5 percentage of the total number of documents, Vespa will instead of searching the HNSW graph, constrained by the filter, fall back to using exact nearest neighbor search. See Controlling filter behavior for how to adjust the threshold for which strategy that is used. When falling back to exact search users will observe that totalCount increases and is higher than targetHits. As seen from previous examples, more hits are exposed to the first-phase ranking expression when using exact search. When using exact search with filters, the search can also use multiple threads to evaluate the query, which helps reduce the latency impact.

With strict filters that removes many hits, the hits (nearest neighbors) might not be near in the embedding space, but far, or distant neighbors. Technically, all document vectors are a neighbor of the query, but with a varying distance, some are close, others are distant.

With strict filters, the neighbors that are returned might be of low quality (far distance). One way to combat this is to use the distanceThreshold query annotation parameter of the nearestNeighbor query operator. The value of the distance depends on the distance-metric used. By adding the distance(field,embedding) rank-feature to the match-features of the closeness rank-profiles, it is possible to analyze what distance could be considered too far. See match-features reference.

Note that distance of 0 is perfect, while distance of 1 is distant. The distanceThreshold remove hits that have a higher distance(field, embedding) than distanceThreshold. The distanceThreshold is applied regardless of performing exact or approximate search.

The following query with a restrictive filter on popularity is used for illustration:

$ vespa query \
    'yql=select matchfeatures, title, popularity, artist from track where {targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"

The above query returns

{
    "timing": {
        "querytime": 0.008,
        "summaryfetchtime": 0.002,
        "searchtime": 0.011
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 63
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897875290117,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "distance(field,embedding)": 0.6686418170467985
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler",
                    "popularity": 100
                }
            },
            {
                "id": "index:tracks/0/3517728cc88356c8ca6de0d9",
                "relevance": 0.5005276509131764,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "distance(field,embedding)": 0.9978916213231626
                    },
                    "title": "Closer To The Heart",
                    "artist": "Rush",
                    "popularity": 100
                }
            }
        ]
    }
}

By using a distanceTreshold of 0.7, the Closer To The Heart track will be removed from the result because it’s distance(field, embedding) is close to 1.

$ vespa query \
    'yql=select matchfeatures, title, popularity, artist from track where {distanceThreshold:0.7,targetHits:10}nearestNeighbor(embedding,q) and popularity > 80' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q"
{
    "timing": {
        "querytime": 0.008,
        "summaryfetchtime": 0.001,
        "searchtime": 0.011
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 1
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897875290117,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "distance(field,embedding)": 0.6686418170467985
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler",
                    "popularity": 100
                }
            }
        ]
    }
}

Setting appropriate distanceThreshold is best handled by supervised learning as the distance threshold should be calibrated based on the query complexity and possibly also the feature distributions of the returned top-k hits. Having the distance rank feature returned as match-features, enables post-processing of the result using a custom re-ranking/filtering searcher. The post-processing searcher can analyze the score distributions of the returned top-k hits (using the features returned with match-features), remove low scoring hits before presenting the result to the end user, or not return any results at all.

Hybrid sparse and dense retrieval methods with Vespa

In the previous filtering examples the ranking was not impacted by the filters. They were only used to impact recall, not the order of the results. The following examples demonstrate how to perform hybrid retrieval combining the efficient query operators in a single query. Hybrid retrieval can be used as the first phase in a multi-phase ranking funnel, see Vespa’s phased ranking.

The first query example combines the nearestNeighbor operator with the weakAnd operator, combining them using logical disjunction (OR). This type of query enables retrieving both based on semantic (vector distance) and traditional sparse (exact) matching.

$ vespa query \
    'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q"

The query combines the sparse weakAnd and the dense nearestNeighbor query operators using logical disjunction. Both query operator retrieves the target number of hits (or more), ranked by its inner raw score/distance function. The hits exposed to the configurable first-phase ranking expression is a combination of the best hits from the two different retrieval strategies. The ranking is performed using the following hybrid rank profile which serves as an example how to combine the different efficient retrievers.

rank-profile hybrid inherits closeness {
        inputs {
            query(wTags) : 1
            query(wPopularity) : 1
            query(wTitle) : 1
            query(wVector) : 1
        }
        first-phase {
            expression {
                query(wTags) * rawScore(tags) + 
                query(wPopularity) * attribute(popularity) + 
                query(wTitle) * bm25(title) + 
                query(wVector) * closeness(field, embedding)
            }
        }
        match-features {
            rawScore(tags)
            attribute(popularity)
            bm25(title)
            closeness(field, embedding)
        }
    }

The query returns the following result:

{
    "timing": {
        "querytime": 0.007,
        "summaryfetchtime": 0.001,
        "searchtime": 0.009000000000000001
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 1176
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 123.18970542319387,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 22.590415639472816,
                        "closeness(field,embedding)": 0.5992897837210658,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/57c74bd2d466b7cafe30c14d",
                "relevance": 112.03224663886917,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 12.032246638869161,
                        "closeness(field,embedding)": 0.0,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Eclipse",
                    "artist": "Kyoto Jazz Massive"
                }
            }
        ]
    }

The result hits also include match-features which can be used for feature logging for learning to rank, or to simply debug the components in the final score.

In the below query, the weight of the embedding similarity (closeness) is increased by overriding the query(wVector) weight:

$ vespa query \
    'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery()' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=40'

Which changes the order and a different hit is surfaced at position two:

{
    "timing": {
        "querytime": 0.011,
        "summaryfetchtime": 0.001,
        "searchtime": 0.014
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 1176
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 146.56200698831543,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 22.590415639472816,
                        "closeness(field,embedding)": 0.5992897837210658,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/3517728cc88356c8ca6de0d9",
                "relevance": 126.74309103465859,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 6.7219852584615865,
                        "closeness(field,embedding)": 0.5005276444049249,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Closer To The Heart",
                    "artist": "Rush"
                }
            }
        ]
    }
}

One can also throw the personalization component using the sparse user profile into the retriever mix. For example having a user profile:

userProfile={"love songs":1, "love":1,"80s":1}

Which can be used with the wand query operator to retrieve personalized hits.

$ vespa query \
    'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) or userQuery() or ({targetHits:10}wand(tags, @userProfile))' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=340' \
    'userProfile={"love songs":1, "love":1,"80s":1}' 

In this case, another document is surfaced at position 2, which have a non-zero personalized score. Notice that totalCount increases as the wand query operator brought more hits into first-phase ranking.

{
    "timing": {
        "querytime": 0.014,
        "summaryfetchtime": 0.001,
        "searchtime": 0.017
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 1244
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 326.34894210463517,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 22.590415639472816,
                        "closeness(field,embedding)": 0.5992897837210658,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/8eb2e19ee627b054113ba4c9",
                "relevance": 281.0,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 0.0,
                        "closeness(field,embedding)": 0.0,
                        "rawScore(tags)": 181.0
                    },
                    "title": "Nothing's Gonna Change My Love For You",
                    "artist": "Glenn Medeiros"
                }
            }
        ]
    }
}

In the examples above, some of the hits had

"closeness(field,embedding)": 0.0

This means that the hit was not retrieved by the nearestNeighbor operator, similar rawScore(tags) might also be 0 if the hit was not retrieved by the wand query operator.

It is nevertheless possible to calculate the semantic distance/similarity using tensor computations for the hits that were not retrieved by the nearestNeighbor query operator. See also tensor functions. For example to compute the euclidean distance one can add a function to the rank-profile:

rank-profile compute-also-for-sparse inherits closeness {
    function euclidean() {
        expression: sqrt(sum(map(query(q) - attribute(embedding), f(x)(x * x))))
    }
    function match_closeness() {
        expression: 1/(1 + euclidean())
    }
    first-phase {
        expression {
         bm25(title) + 
         if(closeness(field, embedding) == 0, match_closeness(), closeness(field, embedding))
        }
    }
}

Changing from logical OR to AND instead will intersect the result of the two efficient retrievers. The search for nearest neighbors is then constrained to documents which at least matches one of the query terms in the weakAnd.

$ vespa query \
    'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery()' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=340'

In this case, the documents exposed to ranking must match at least one of the query terms (for WAND to retrieve it). It is also possible to combine hybrid search with filters:

$ vespa query \
    'yql=select title, matchfeatures, artist from track where {targetHits:100}nearestNeighbor(embedding,q) and userQuery() and popularity < 75' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=340' 

Another interesting approach for hybrid retrieval is to use Vespa’s rank() query operator. The first operand of the rank() operator is used for retrieval, and the remaining operands are only used to compute rank features for those hits retrieved by the first operand.

$ vespa query \
    'yql=select title, matchfeatures, artist from track where rank({targetHits:100}nearestNeighbor(embedding,q), userQuery())' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=340' 

This query returns 100 documents, since only the first operand of the rank query operator was used for retrieval, the sparse userQuery() representation was only used to calculate sparse rank features for the results retrieved by the nearestNeighbor. Sparse rank features such as bm25(title) for example.

{
    "timing": {
        "querytime": 0.01,
        "summaryfetchtime": 0.002,
        "searchtime": 0.015
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 100
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 326.34896241725517,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 22.590435952092836,
                        "closeness(field,embedding)": 0.5992897837210658,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Total Eclipse Of The Heart",
                    "artist": "Bonnie Tyler"
                }
            },
            {
                "id": "index:tracks/0/3517728cc88356c8ca6de0d9",
                "relevance": 276.90138973270746,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "attribute(popularity)": 100.0,
                        "bm25(title)": 6.721990635032981,
                        "closeness(field,embedding)": 0.5005276444049249,
                        "rawScore(tags)": 0.0
                    },
                    "title": "Closer To The Heart",
                    "artist": "Rush"
                }
            }
        ]
    }
}

One can also do this the other way around, retrieve using the sparse representation, and have Vespa calculate the closeness(field, embedding) or related rank features for the hits retrieved by the sparse query representation.

$ vespa query \
    'yql=select title, matchfeatures, artist from track where rank(userQuery(),{targetHits:100}nearestNeighbor(embedding,q))' \
    'query=total eclipse of the heart' \
    'type=weakAnd' \
    'hits=2' \
    'ranking=hybrid' \
    "input.query(q)=$Q" \
    'ranking.features.query(wVector)=340' 

The weakAnd query operator exposes more hits to ranking than approximate nearest neighbor search, similar to the wand query operator. Generally, using the rank query operator is more efficient than combining query retriever operators using or. See also the Vespa passage ranking for complete examples of different retrieval strategies for multi-phase ranking funnels.

Multiple nearest neighbor search operators in the same query

This section looks at how to use multiple nearestNeighbor query operator instances in the same Vespa query request.

First, the query embedding for Total Eclipse Of The Heart:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print(model.encode("Total Eclipse Of The Heart").tolist())
$ export Q='[-0.008,0.085,0.05,-0.009,-0.038,-0.003,0.019,-0.085,0.123,-0.11,0.029,-0.032,-0.059,-0.005,-0.022,0.031,0.007,0.003,0.006,0.041,-0.094,-0.044,-0.004,0.045,-0.016,0.101,-0.029,-0.028,-0.044,-0.012,0.025,-0.011,0.016,0.031,-0.037,-0.027,0.007,0.026,-0.028,0.049,-0.041,-0.041,-0.018,0.033,0.034,-0.01,-0.038,-0.052,0.02,0.029,-0.029,-0.043,-0.143,-0.055,0.052,-0.021,-0.012,-0.058,0.017,-0.017,0.023,0.017,-0.074,0.067,-0.043,-0.065,-0.028,0.066,-0.048,0.034,0.026,-0.034,0.085,-0.082,-0.043,0.054,-0.0,-0.075,-0.012,-0.056,0.027,-0.027,-0.088,0.01,0.01,0.071,0.007,0.022,-0.032,0.068,-0.003,-0.109,-0.005,0.07,-0.017,0.006,-0.007,-0.034,-0.062,0.096,0.038,0.038,-0.031,-0.023,0.064,-0.046,0.055,-0.011,0.016,-0.016,-0.007,-0.083,0.061,-0.037,0.04,0.099,0.063,0.032,0.019,0.099,0.105,-0.046,0.084,0.041,-0.088,-0.015,-0.002,-0.0,0.045,0.02,0.109,0.031,0.02,0.012,-0.043,0.034,-0.053,-0.023,-0.073,-0.052,-0.006,0.004,-0.018,-0.033,-0.067,0.126,0.018,-0.006,-0.03,-0.044,-0.085,-0.043,-0.051,0.057,0.048,0.042,-0.013,0.041,-0.017,-0.039,0.06,0.015,-0.031,0.043,-0.049,0.008,-0.008,0.028,-0.014,0.035,-0.08,-0.052,0.017,0.02,0.059,0.049,0.048,0.033,0.024,0.009,0.021,-0.042,-0.021,0.048,0.015,0.042,-0.004,-0.012,0.041,0.053,0.015,-0.034,-0.005,0.068,-0.053,-0.107,-0.051,0.03,-0.063,-0.036,0.032,-0.054,0.085,0.022,0.08,0.054,-0.045,-0.058,-0.161,0.066,0.065,-0.043,0.084,0.043,-0.01,-0.01,-0.084,-0.021,0.041,0.026,-0.011,-0.065,-0.046,0.0,-0.046,-0.014,-0.009,-0.08,0.063,0.02,-0.082,0.088,0.046,0.058,0.005,-0.024,0.047,0.019,0.051,-0.021,0.02,-0.003,-0.019,0.08,0.031,0.021,0.041,-0.01,-0.018,0.07,0.076,-0.021,0.027,-0.086,0.059,-0.068,-0.126,0.025,-0.037,0.036,-0.028,0.035,-0.068,0.005,-0.032,0.023,0.012,0.074,0.028,-0.02,0.054,0.124,0.022,-0.021,-0.099,-0.044,-0.044,0.093,0.004,-0.006,-0.037,0.034,-0.021,-0.046,-0.031,-0.034,0.015,-0.041,0.001,0.022,0.015,0.02,-0.16,0.065,-0.016,0.059,-0.249,0.023,0.031,0.047,0.063,-0.06,-0.002,-0.049,-0.06,-0.014,0.013,0.004,0.019,-0.039,0.007,0.024,-0.004,0.045,-0.026,0.078,-0.014,-0.038,0.003,-0.0,0.019,0.04,-0.017,-0.088,-0.04,-0.029,0.05,0.012,-0.042,0.052,0.035,0.061,0.011,0.03,-0.068,0.015,0.032,-0.028,-0.046,-0.032,0.094,0.006,0.082,-0.103,0.013,-0.054,0.038,0.01,0.029,-0.025,0.119,0.034,0.024,-0.034,-0.055,-0.014,0.026,0.068,-0.009,0.085,0.028,-0.086,0.038,0.01,-0.024,0.01,0.071,-0.078,-0.033,-0.024,0.023,-0.005,-0.002,-0.047,0.031,0.023,0.004,0.069,-0.018,0.034,0.109,0.036,0.009,0.029]'

Secondly, the query embedding for Summer of ‘69:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
print(model.encode("Summer of '69").tolist())
$ export QA='[-0.043,0.027,-0.017,0.018,0.034,0.067,0.037,-0.046,-0.014,-0.114,0.033,-0.028,0.02,0.024,0.025,0.019,0.045,0.007,0.018,-0.035,-0.126,0.024,0.005,0.05,-0.005,0.048,0.059,0.07,-0.041,0.006,-0.008,0.113,-0.046,-0.007,0.065,-0.02,-0.007,-0.067,-0.099,0.069,-0.068,-0.013,0.054,0.029,-0.031,-0.018,0.036,-0.015,0.027,0.011,0.04,0.038,-0.046,-0.025,-0.042,0.028,-0.006,-0.091,0.033,-0.016,-0.079,-0.058,-0.044,-0.022,0.086,-0.107,0.002,-0.037,-0.058,-0.039,-0.028,0.037,-0.015,0.035,0.0,0.072,-0.021,-0.01,0.044,-0.094,0.116,-0.109,-0.04,0.01,0.012,-0.031,0.087,0.005,-0.035,0.049,-0.088,-0.02,-0.023,-0.01,-0.063,-0.018,-0.024,-0.05,-0.009,0.115,0.049,0.017,-0.05,0.017,0.084,-0.053,0.051,0.033,-0.001,-0.087,-0.031,-0.019,0.132,0.006,0.056,-0.117,0.043,0.01,-0.03,0.176,0.055,0.042,0.051,0.025,-0.041,-0.027,0.041,-0.0,0.01,-0.016,0.048,-0.031,0.103,-0.044,-0.003,-0.005,-0.029,-0.032,-0.046,-0.095,-0.074,-0.094,0.111,-0.042,0.004,0.048,0.006,0.042,-0.092,0.109,0.016,-0.04,-0.01,0.033,-0.034,0.049,0.03,0.02,0.04,0.015,0.007,0.03,0.018,0.017,-0.029,-0.082,0.015,0.002,-0.048,0.047,-0.03,-0.029,-0.008,0.088,0.04,0.023,0.052,-0.034,0.006,0.003,-0.048,-0.094,-0.014,-0.086,-0.052,-0.01,0.062,-0.03,0.062,0.058,-0.027,-0.04,-0.084,-0.061,0.09,-0.049,-0.032,0.007,-0.071,-0.052,0.055,-0.064,0.041,-0.008,0.076,-0.018,-0.025,-0.034,0.016,-0.007,0.041,0.023,-0.021,-0.046,0.01,-0.022,-0.019,-0.027,-0.039,-0.037,0.014,0.004,0.017,0.0,0.034,0.003,0.015,-0.019,0.02,0.025,-0.05,0.056,-0.047,-0.088,0.004,-0.116,0.07,-0.057,0.032,0.006,-0.021,0.09,-0.02,0.035,-0.114,-0.006,-0.01,-0.005,0.025,-0.046,0.054,-0.002,-0.003,0.028,-0.025,0.001,-0.003,0.09,-0.084,0.058,0.091,-0.025,-0.034,-0.032,0.026,-0.032,0.054,0.039,0.033,-0.029,0.015,0.076,-0.054,0.021,-0.069,-0.049,-0.051,-0.006,0.002,-0.058,-0.021,-0.011,0.025,-0.003,-0.001,-0.018,-0.064,-0.023,-0.013,0.029,-0.022,0.023,-0.019,-0.028,-0.072,-0.044,-0.082,0.074,0.086,-0.016,0.041,0.004,-0.047,-0.029,-0.137,0.005,-0.075,0.136,0.054,0.024,0.052,0.01,0.024,-0.038,0.078,0.005,0.013,-0.034,-0.051,-0.0,0.03,-0.007,0.025,-0.042,0.065,0.02,0.05,0.045,0.004,0.095,0.044,0.044,0.091,0.024,0.0,0.022,0.027,0.011,-0.011,0.009,-0.056,-0.026,0.173,-0.019,0.024,-0.014,-0.064,0.079,0.083,-0.033,0.051,-0.005,-0.056,-0.043,-0.061,-0.034,0.112,0.072,0.042,-0.047,0.055,0.058,0.015,0.017,0.015,0.083,0.024,-0.023,-0.024,0.007,0.043,0.042,0.025,0.011,0.042,-0.032,-0.044,0.021,-0.064,-0.065,0.078,0.051,-0.028,-0.136]'

The following Vespa query combines two nearestNeighbor query operators using logical disjunction (OR) and referencing two different query tensor inputs:

  • input.query(q) holding the Total Eclipse Of The Heart query vector.
  • input.query(qa) holding the Summer of ‘69 query vector.
$ vespa query \
    'yql=select title from track where ({targetHits:10}nearestNeighbor(embedding,q)) or ({targetHits:10}nearestNeighbor(embedding,qa))' \
    'hits=2' \
    'ranking=closeness-t4' \
    "input.query(q)=$Q" \
    "input.query(qa)=$QA" 

The above query returns 20 documents to first phase ranking, as seen from totalCount. Ten from each nearest neighbor query operator:

{
    "timing": {
        "querytime": 0.007,
        "summaryfetchtime": 0.001,
        "searchtime": 0.01
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 20
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897917249415,
                "source": "tracks",
                "fields": {
                    "title": "Total Eclipse Of The Heart"
                }
            },
            {
                "id": "index:tracks/0/5b1c2ae1024d88451c2f1c5a",
                "relevance": 0.5794361034642413,
                "source": "tracks",
                "fields": {
                    "title": "Summer of 69"
                }
            }
        ]
    }
}

One can also use the label annotation when there are multiple nearestNeighbor operators in the same query to differentiate which of them produced the match.

$ vespa query \
    'yql=select title, matchfeatures from track where ({ label:"q", targetHits:10}nearestNeighbor(embedding,q)) or ({label:"qa",targetHits:10}nearestNeighbor(embedding,qa))' \
    'hits=2' \
    'ranking=closeness-label' \
    "input.query(q)=$Q" \
    "input.query(qa)=$QA" 

The above query annotates the two nearestNeighbor query operators using label query annotation. The result include match-features so one can see which query operator retrieved the document from the closeness(label, ..) feature output:

{
    "timing": {
        "querytime": 0.011,
        "summaryfetchtime": 0.001,
        "searchtime": 0.014
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 20
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/f13697952a0d5eaeb2c43ffc",
                "relevance": 0.5992897917249415,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "closeness(label,q)": 0.5992897917249415,
                        "closeness(label,qa)": 0.0
                    },
                    "title": "Total Eclipse Of The Heart"
                }
            },
            {
                "id": "index:tracks/0/5b1c2ae1024d88451c2f1c5a",
                "relevance": 0.5794361034642413,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "closeness(label,q)": 0.0,
                        "closeness(label,qa)": 0.5794361034642413
                    },
                    "title": "Summer of 69"
                }
            }
        ]
    }
}

Note that the previous examples used or to combine the two operators. Using and instead, requires that there are documents that is in both the top-k results. Increasing targetHits to 500,
finds 9 tracks that overlap. In this case both closeness labels have a non-zero score.

$ vespa query \
    'yql=select title, matchfeatures from track where ({label:"q", targetHits:500}nearestNeighbor(embedding,q)) and ({label:"qa",targetHits:500}nearestNeighbor(embedding,qa))' \
    'hits=2' \
    'ranking=closeness-label' \
    "input.query(q)=$Q" \
    "input.query(qa)=$QA" 

Which returns the following top two hits. Note that the closeness-label rank profile uses closeness(field, embedding) which in the case of multiple nearest neighbor search operators uses the maximum score to represent the unlabeled closeness(field,embedding). This can be seen from the relevance value, compared with the labeled closeness() rank features.

{
    "timing": {
        "querytime": 0.015,
        "summaryfetchtime": 0.001,
        "searchtime": 0.017
    },
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 9
        },
        "coverage": {
            "coverage": 100,
            "documents": 95666,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:tracks/0/99a2a380cac4830bfee63ae0",
                "relevance": 0.5174298300948759,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "closeness(label,q)": 0.4755796429687308,
                        "closeness(label,qa)": 0.5174298300948759
                    },
                    "title": "Summer Of Love"
                }
            },
            {
                "id": "index:tracks/0/a373d26938a20dbdda8fc7c1",
                "relevance": 0.5099393361432658,
                "source": "tracks",
                "fields": {
                    "matchfeatures": {
                        "closeness(label,q)": 0.5099393361432658,
                        "closeness(label,qa)": 0.47990179066646654
                    },
                    "title": "Midnight Heartache"
                }
            }
        ]
    }
}

Vespa also supports having multiple document side embedding fields, which also can be searched using multiple nearestNeighbor operators in the query.

field embedding type tensor<float>(x[384]) {
    indexing: attribute | index
    attribute {
        distance-metric: euclidean
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 50
        }
    }
 }
 field embedding_two tensor<float>(x[768]) {
    indexing: attribute | index
    attribute {
        distance-metric: euclidean
    }
    index {
        hnsw {
            max-links-per-node: 16
            neighbors-to-explore-at-insert: 50
        }
    }
 }

Controlling filter behavior

Vespa allows developers to control how filters are combined with nearestNeighbor query operator, see Query Time Constrained Approximate Nearest Neighbor Search for a detailed description of pre-filtering and post-filtering strategies. The following query examples explore the two query-time parameters which can be used to control the filtering behavior. The parameters are

These parameters can be used per query or configured in the rank-profile in the document schema.

The following query runs with the default setting for ranking.matching.postFilterThreshold which is 1, which means, do not perform post-filtering, use pre-filtering strategy:

$ vespa query \
  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
  'hits=2' \
  'ranking=closeness' \
  'ranking.matching.postFilterThreshold=1.0' \
  'ranking.matching.approximateThreshold=0.05' \
  "input.query(q)=$Q"

The query exposes targetHits to ranking as seen from the totalCount. Now, repeating the query, but forcing post-filtering instead by setting ranking.matching.postFilterThreshold=0.0:

$ vespa query \
  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "rock"' \
  'hits=2' \
  'ranking=closeness' \
  'ranking.matching.postFilterThreshold=0.0' \
  'ranking.matching.approximateThreshold=0.05' \
  "input.query(q)=$Q"

In this case, Vespa will estimate how many documents the filter matches and auto-adjust targethits internally to a higher number, attempting to expose the targetHits to first phase ranking:

The query exposes 14 documents to ranking as can be seen from totalCount. There are 8420 documents in the collection that are tagged with the rock tag, so roughly 8%.

Changing to a tag which is less frequent, for example, 90s, which matches 1,695 documents or roughly 1.7% will cause Vespa to fall back to exact search as the estimated filter hit count is less than the approximateThreshold.

$ vespa query \
  'yql=select title, artist, tags from track where {targetHits:10}nearestNeighbor(embedding,q) and tags contains "90s"' \
  'hits=2' \
  'ranking=closeness' \
  'ranking.matching.postFilterThreshold=0.0' \
  'ranking.matching.approximateThreshold=0.05' \
  "input.query(q)=$Q"

The exact search exposes more documents to ranking. Read more about combining filters with nearest neighbor search in the Query Time Constrained Approximate Nearest Neighbor Search blog post.

Tear down the container

This concludes this tutorial. The following removes the container and the data:

$ docker rm -f vespa