Vespa basics
Learn more
Applications and components
Schemas and documents
Reading and writing
Querying
Ranking and inference
RAG and embedding
Linguistics and text processing
Content and elasticity
Performance
Operations
- Environments
- Zones
- Production deployment
- Deployment variants
- Automated deployments
- Autoscaling
- Enclave: Bring your own cloud
- Reindexing
- Data management and backup
- Cloning applications and data
- Monitoring
- Metrics
- Notifications
- Deployment patterns
- Private endpoints
- Endpoint routing
- Access logging
- Artifact archive
- Deleting applications
- Self-managed
- Kubernetes
Security
Clients
Modules
- E-commerce
Reference
- APIs
- Applications and components
- Schemas and documents
- Reading and writing
  - Indexing language
  - Document selector language
- Querying
- Ranking and inference
- RAG and embedding
  - Chunking
  - Embedding
- Operations
  - Health checks
  - Log files
  - Tools
  - Metrics
    
    Metrics
    
    Default metric set
    
    Vespa metric set
    
    Metric units
    
    Container metrics
    
    Distributor metrics
    
    Search node metrics
    
    Storage metrics
    
    Configserver metrics
    
    Logd metrics
    
    Node Admin metrics
    
    Slobrok metrics
    
    Cluster controller metrics
    
    Sentinel metrics
  - Self-managed
    
    Tools
- Security
  - Mtls
- Clients
  - Vespa CLI
    
    vespa
    
    vespa activate
    
    vespa auth
    
    vespa clone
    
    vespa config
    
    vespa curl
    
    vespa deploy
    
    vespa destroy
    
    vespa document
    
    vespa feed
    
    vespa fetch
    
    vespa inspect
    
    vespa log
    
    vespa prepare
    
    vespa prod
    
    vespa query
    
    vespa status
    
    vespa test
    
    vespa version
    
    vespa visit
- Release notes

Indexing paged vectors

Most of the data of a vector (tensor) index is the vectors themselves. The vector data must be accessed to calculate true distances both when querying the index and when adding vectors to it, and due to the high dimensionality these accesses are effectively random. While it is viable to page indexed vector attributes to disk for queries if somewhat higher latency can be tolerated, it does not allow a large vector index to be built at reasonable speed: To create a high quality index, each vector insert must make many distance calculations, which results in low write throughput when the vectors in the index do not reside in RAM.

To build vector indexes larger than available memory efficiently the procedure described here can be used. This is suitable when:

You want to build an index for vector retrieval (not just store the vectors for ranking/brute force NN), with a vector data set that doesn't fit in memory across the content nodes you want to deploy for it.
The vector data in question is mostly write-once (frequent writes to other fields is fine), and rescaling of the content cluster will not be necessary.

Steps

Declare the vector field(s) to be indexed as paged.

    schema docs {
        document docs {
            field myVectors type tensor<bfloat16>(chunk{}, x[384]) {
                indexing: attribute | index
                attribute: paged
            }
        }
    }

Calculate how much data you can fit in memory:

Calculate your attribute raw data size (taking just the vector is close enough unless you have many other attribute fields),
multiply by the number of searchable-copies you want,
multiply by 1.2 to add room for the index over the vectors,
divide by 0.65 to leave room for working memory,
multiply by your total number of documents.

This gives you the total memory needed across all the nodes in your content cluster (or across one group if you have multiple).

Example with the type above with 1B documents and 10 chunks average per document:
10 * 384*2 bytes * 2 * 1.2 / 0.65 * 1B = 14.178 Gb total cluster memory.
Create one document type per data subset which fits in memory under the calculation above.

Example: Suppose you want to create a vector index over four years worth of documents of type docs and that you only want to allocate enough memory to fit 25% of the vector data across the cluster. Create four subtypes of docs, one for each year: docs2021, docs2022, docs2023 and docs 2024, in four different schema files. Each of these can inherit the parent type and otherwise be empty:
```
    schema docs2021 inherits docs {
        document docs2021 inherits docs {
        }
    }
    
```
You can of course also add time-period-specific fields and ranking here.

Add all the subtypes to the content cluster you want in services.xml:

    <content id="myClusterId" version="1.0">
        <documents>
            <document type="docs2021" mode="index" />
            <document type="docs2022" mode="index" />
            ...
        </documents>

Feed each of the types completely one by one, without applying queries at the same time.
Once all the types are written, you can apply query traffic. Vespa will search across all the types by default, but it is possible to restrict to a subset using the restrict query parameter.