# SPANN Billion Scale Vector Search

[](/en/examples/billion-scale-vector-search.html.md "View as Markdown") 

The SPANN (Space Partitioned ANN) approach for approximate nearest neighbor search is described in [SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search](https://arxiv.org/abs/2111.08566). SPANN uses a hybrid combination of graph and inverted index methods for approximate nearest neighbor search.

We recommend you read [Billion-scale vector search using hybrid HNSW-IF](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/) for details on how SPANN is implemented using Vespa, before running this example application. Excerpt:

> SPANN searches for the k closest centroid vectors of the query vector in the in-memory ANN search data structure. Then, it reads the k associated posting lists for the retrieved centroids and computes the distance between the query vector and the vector data read from the posting list:

![](https://blog.vespa.ai/assets/2022-06-07-vespa-spann-billion-scale-vector-search/spann-posting-lists.excalidraw.png)

This sample application demonstrates how to represent SPANN using Vespa.

Setup:

1. **Create a [tenant](/en/learn/tenant-apps-instances.html) on Vespa Cloud:**

2. **Install the [Vespa CLI](/en/clients/vespa-cli.html)** using [Homebrew](https://brew.sh/):

3. **Configure the Vespa client:**

4. **Get Vespa Cloud control plane access:**

5. **Clone a sample [application](/en/basics/applications.html):**

6. **Add a certificate for [data plane access](/en/security/guide#data-plane) to the application:**

## Download Vector Data

This sample app uses the Microsoft SPACEV vector dataset from [big-ann-benchmarks.com](https://big-ann-benchmarks.com/). It uses the first 10M vectors of the 100M slice sample. This sample file is about 1GB (10M vectors):

```
$ curl -L -o spacev10m_base.i8bin \
  https://data.vespa-cloud.com/sample-apps-data/spacev10m_base.i8bin
```

Install dependencies and create the feed files for the first 10M vectors from the 100M sample:

```
$ pip3 install numpy requests tqdm
```

```
$ python3 app/src/main/python/create-vespa-feed.py spacev10m_base.i8bin
```

Output:

- `graph-vectors.jsonl`
- `if-vectors.jsonl`

## Build and deploy Vespa app

Build the application:

```
$ mvn clean package -U -f app
```

Deploy the application:

```
$ vespa deploy --wait 900 ./app
```

Wait for the application endpoint to become available:

```
$ vespa status --wait 300
```

Test [basic functionality](https://github.com/vespa-engine/sample-apps/blob/master/billion-scale-vector-search/app/src/test/application/tests/system-test/feed-and-search-test.json):

```
$ vespa test app/src/test/application/tests/system-test/feed-and-search-test.json
```

See [CD tests](https://docs.vespa.ai/en/operations/automated-deployments.html#cd-tests) for details.

## Feed data

The _graph_ vectors must be feed before the _if_ vectors:

```
$ vespa feed graph-vectors.jsonl
```

```
$ vespa feed if-vectors.jsonl
```

Now is a good time to open the [Vespa Cloud Dashboard](https://console.vespa-cloud.com/link/application/autotest/dev/instance/default?default.dev.aws-us-east-1c=metrics) to track progress.

Refer to [\<resources\>](https://github.com/vespa-engine/sample-apps/blob/master/billion-scale-vector-search/app/src/main/application/services.xml) configuration to manage the feeding speed - more CPU is better, e.g.:

```
<resources vcpu="8" memory="16Gb" disk="50Gb"/>
```

Use the [instance type reference](https://cloud.vespa.ai/en/reference/aws-flavors.html) to find good combinations. Run time for a 2 VCPU deployment vs. 8 VCPU:

![duration 2vcpu](assets/billion-vector-2vcpu.png) ![duration 8vcpu](assets/billion-vector-8vcpu.png)

Observe the feed and query phases (below) of this guide:

![feed and queries](assets/billion-vector-feed-queries.png)

## Recall Evaluation

Download the query vectors and the ground truth for the 10M first vectors:

```
$ curl -L -o query.i8bin \
  https://github.com/microsoft/SPTAG/raw/main/datasets/SPACEV1B/query.bin
$ curl -L -o spacev10m_gt100.i8bin \
  https://data.vespa-cloud.com/sample-apps-data/spacev10m_gt100.i8bin
```

Find the path to the credentials from the `vespa auth cert` step above, like

```
/Users/username/.vespa/tenant_name.autotest.default/data-plane-public-cert.pem
```

Replace the two filenames in the command below. (This is not needed when running a [local test](#local-test-with-oci-image))

Run first 1K queries and evaluate recall@10. A higher number of clusters gives higher recall:

```
$ ENDPOINT=$(vespa status --format=plain)
$ python3 app/src/main/python/recall.py \
  --endpoint ${ENDPOINT}/search/ \
  --query_file query.i8bin \
  --query_gt_file spacev10m_gt100.i8bin \
  --certificate $PWD/../.vespa/vespa-team.autotest.default/data-plane-public-cert.pem \
  --key $PWD/../.vespa/vespa-team.autotest.default/data-plane-private-key.pem
```

See the [blog post](https://blog.vespa.ai/vespa-hybrid-billion-scale-vector-search/#hnsw-if-accuracy) for details about this script.

```
$ vespa destroy --force
```

## Local test with OCI image

**Prerequisites:**

- Linux, macOS or Windows 10 Pro on x86\_64 or arm64, with [Podman Desktop](https://podman.io/) or [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed, with an engine running. 
  - Alternatively, start the Podman daemon:
```
$ podman machine init --memory 6000
$ podman machine start
```
  - See [Docker Containers](/en/operations/self-managed/docker-containers.html) for system limits and other settings.

- For CPUs older than Haswell (2013), see [CPU Support](/en/cpu-support.html).
- Memory: Minimum 4 GB RAM dedicated to Docker/Podman. [Memory recommendations](/en/operations/self-managed/node-setup.html#memory-settings). 
- Disk: Avoid `NO_SPACE` - the vespaengine/vespa container image + headroom for data requires disk space. [Read more](/en/writing/feed-block.html). 
- [Homebrew](https://brew.sh/) to install the [Vespa CLI](/en/clients/vespa-cli.html), or download the Vespa CLI from [Github releases](https://github.com/vespa-engine/vespa/releases). 
- [Java 17](https://openjdk.org/projects/jdk/17/).
- [Apache Maven](https://maven.apache.org/install.html) is used to build the application.

Verify memory Limits:

```
$ docker info | grep "Total Memory"
```
or

```
$ podman info | grep "memTotal"
```

Install [Vespa CLI](../clients/vespa-cli.html):

```
$ brew install vespa-cli
```

For local deployment:

```
$ vespa config set target local
```

Download this sample application:

```
$ vespa clone billion-scale-vector-search myapp && cd myapp
```

Pull and start the Vespa image:

```
$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa
```

Verify that the configuration service (deploy api) is ready:

```
$ vespa status deploy --wait 300
```

At this point, you can continue the guide from [download vector data](#download-vector-data).

* * *

### Cleanup

When done, remove the container:

```
$ docker rm -f vespa
```

 Copyright © 2026 - [Cookie Preferences](#)

### On this page:

- [SPANN Billion Scale Vector Search](#page-title)
- [Download Vector Data](#download-vector-data)
- [Build and deploy Vespa app](#build-and-deploy-vespa-app)
- [Feed data](#feed-data)
- [Recall Evaluation](#recall-evaluation)
- [Local test with OCI image](#local-test-with-oci-image)
- [Cleanup](#cleanup)