SPANN Billion Scale Vector Search

The SPANN (Space Partitioned ANN) approach for approximate nearest neighbor search is described in SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search. SPANN uses a hybrid combination of graph and inverted index methods for approximate nearest neighbor search.

We recommend you read Billion-scale vector search using hybrid HNSW-IF for details on how SPANN is implemented using Vespa, before running this example application. Excerpt:

SPANN searches for the k closest centroid vectors of the query vector in the in-memory ANN search data structure. Then, it reads the k associated posting lists for the retrieved centroids and computes the distance between the query vector and the vector data read from the posting list:

This sample application demonstrates how to represent SPANN using Vespa.

Setup:

  1. Create a tenant on Vespa Cloud:

    Go to console.vespa-cloud.com and create your tenant (unless you already have one).

  2. Install the Vespa CLI using Homebrew:

    $ brew install vespa-cli
    

    Windows/No Homebrew? See the Vespa CLI page to download directly.

  3. Configure the Vespa client:

    $ export VESPA_CLI_HOME=$PWD/.vespa
    
    $ vespa config set target cloud
    $ vespa config set application vespa-team.autotest
    

    Use the tenant name from step 1 instead of "vespa-team", and replace in other steps in this example guide, too.

  4. Get Vespa Cloud control plane access:

    $ vespa auth login
    

    Follow the instructions from the command to authenticate.

  5. Clone a sample application:

    $ vespa clone billion-scale-vector-search myapp && cd myapp
    

    See sample-apps for other sample apps you can clone.

  6. Add a certificate for data plane access to the application:

    $ vespa auth cert app
    

    It is a good idea to take note of the path to the .pem files written here.

Download Vector Data

This sample app uses the Microsoft SPACEV vector dataset from big-ann-benchmarks.com. It uses the first 10M vectors of the 100M slice sample. This sample file is about 1GB (10M vectors):

$ curl -L -o spacev10m_base.i8bin \
  https://data.vespa-cloud.com/sample-apps-data/spacev10m_base.i8bin

Install dependencies and create the feed files for the first 10M vectors from the 100M sample:

$ pip3 install numpy requests tqdm
$ python3 app/src/main/python/create-vespa-feed.py spacev10m_base.i8bin

Output:

  • graph-vectors.jsonl
  • if-vectors.jsonl

Build and deploy Vespa app

Build the application:

$ mvn clean package -U -f app

Deploy the application:

$ vespa deploy --wait 900 ./app

Wait for the application endpoint to become available:

$ vespa status --wait 300

Test basic functionality:

$ vespa test app/src/test/application/tests/system-test/feed-and-search-test.json

See CD tests for details.

Feed data

The graph vectors must be feed before the if vectors:

$ vespa feed graph-vectors.jsonl
$ vespa feed if-vectors.jsonl

Now is a good time to open the Vespa Cloud Dashboard to track progress.

Refer to <resources> configuration to manage the feeding speed - more CPU is better, e.g.:

<resources vcpu="8" memory="16Gb" disk="50Gb"/>

Use the instance type reference to find good combinations. Run time for a 2 VCPU deployment vs. 8 VCPU:

duration 2vcpu duration 8vcpu

Observe the feed and query phases (below) of this guide:

feed and queries

Recall Evaluation

Download the query vectors and the ground truth for the 10M first vectors:

$ curl -L -o query.i8bin \
  https://github.com/microsoft/SPTAG/raw/main/datasets/SPACEV1B/query.bin
$ curl -L -o spacev10m_gt100.i8bin \
  https://data.vespa-cloud.com/sample-apps-data/spacev10m_gt100.i8bin

Find the path to the credentials from the vespa auth cert step above, like

/Users/username/.vespa/tenant_name.autotest.default/data-plane-public-cert.pem

Replace the two filenames in the command below. (This is not needed when running a local test)

Run first 1K queries and evaluate recall@10. A higher number of clusters gives higher recall:

$ ENDPOINT=$(vespa status --format=plain)
$ python3 app/src/main/python/recall.py \
  --endpoint ${ENDPOINT}/search/ \
  --query_file query.i8bin \
  --query_gt_file spacev10m_gt100.i8bin \
  --certificate $PWD/../.vespa/vespa-team.autotest.default/data-plane-public-cert.pem \
  --key         $PWD/../.vespa/vespa-team.autotest.default/data-plane-private-key.pem

See the blog post for details about this script.

$ vespa destroy --force

Local test with OCI image

Verify memory Limits:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment:

$ vespa config set target local

Download this sample application:

$ vespa clone billion-scale-vector-search myapp && cd myapp

Pull and start the Vespa image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Verify that the configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

At this point, you can continue the guide from download vector data.


Cleanup

When done, remove the container:

$ docker rm -f vespa