Vespa basics
Learn more
Applications and components
Schemas and documents
Reading and writing
Querying
Ranking and inference
RAG and embedding
Linguistics and text processing
Content and elasticity
Performance
Operations
- Quota
- Environments
- Zones
- Production deployment
- Deployment variants
- Automated deployments
- Autoscaling
- Enclave: Bring your own cloud
- Reindexing
- Data management and backup
- Cloning applications and data
- Monitoring
- Metrics
- Notifications
- Support
- Deployment patterns
- Private endpoints
- Endpoint routing
- Access logging
- Artifact archive
- Deleting applications
- Self-managed
- Kubernetes
  - Introduction
  - Architecture
  - Deployment
    
    Installation
    
    Minikube Setup
    
    Setup ECR Pull-through Cache
    
    Setup Dev Environment
    
    Permissions
  - Operations
    
    Operations
    
    Upgrade Vespa on Kubernetes
    
    Delete a VespaSet
    
    Monitor a Vespa on Kubernetes Deployment
  - Configuration
    
    Configure Local Storage Type
    
    Configure Log Collections
    
    Configure External Access Layer
    
    Provide Custom Overrides
    
    Enable TLS Encryption
Security
Clients
Modules
- E-commerce
Reference
- APIs
- Applications and components
- Schemas and documents
- Reading and writing
  - Indexing language
  - Document selector language
- Querying
- Ranking and inference
- RAG and embedding
  - Chunking
  - Embedding
- Operations
  - Health checks
  - Log files
  - Tools
  - Metrics
    
    Metrics
    
    Default metric set
    
    Vespa metric set
    
    Metric units
    
    Container metrics
    
    Distributor metrics
    
    Search node metrics
    
    Storage metrics
    
    Configserver metrics
    
    Logd metrics
    
    Node Admin metrics
    
    Slobrok metrics
    
    Cluster controller metrics
    
    Sentinel metrics
  - Self-managed
    
    Tools
- Security
  - Mtls
- Clients
  - Vespa CLI
    
    vespa
    
    vespa activate
    
    vespa auth
    
    vespa clone
    
    vespa config
    
    vespa curl
    
    vespa deploy
    
    vespa destroy
    
    vespa document
    
    vespa feed
    
    vespa fetch
    
    vespa inspect
    
    vespa log
    
    vespa prepare
    
    vespa prod
    
    vespa query
    
    vespa status
    
    vespa test
    
    vespa version
    
    vespa visit
- Release notes

Reindexing

When the indexing pipeline of a Vespa application changes, Vespa may automatically refeed stored data such that the index is updated according to the new specification. Changes in the indexing pipeline may be due to changes in external libraries, e.g. for linguistics, or due to changes in the configuration done by the user, such as the indexing script in a document's schema, or the indexing mode of a document type in a content cluster. Reindexing can be done for an application's full corpus, for only certain content clusters, or for only certain document types in certain clusters, using the reindex endpoint, and inspected at the reindexing endpoint, details are described below.

Start reindexing

When a change in the indexing pipeline of an application is deployed, this is discovered by the config server (see the prepare endpoint for details). If the change is to be deployed, a validation override might have to be added to the application package (e.g. if changing match settings for a field). Deployment output will then list the reindex actions required to make the index reflect the new indexing pipeline. Use the reindex endpoint to mark reindexing as ready for affected document types, but only after the new indexing pipeline is successfully deployed, i.e. when the application has converged on the config generation that introduced the change. Reindexing then commences with the next deployment of the application. Summary of steps needed to enable and start reindexing:

Change indexing pipeline in application package, adding validation overrides if needed
Wait until config has converged on new config generation
Mark reindexing change as ready by POSTing to reindex endpoint
Start reindexing job by deploying application package one more time

Reindexing progress

Reindexing is done by a component in each content cluster that visits all documents of the indicated types, and re-feeds these through the indexing chain of the cluster. (Note that only the document fields are re-fed — all derived fields, produced by the indexing pipeline, are recomputed.) The reindexing process avoids write races with concurrent feed by locking small subsets of the corpus when reindexing them; this may cause elevated write latencies for a fraction of concurrent write operations, but does not impact general throughput. Moreover, since reindexing can be both lengthy and resource consuming, depending on the corpus, the process is tuned to yield resources to other tasks, such as external feed and serving, and is generally safe to run in the background.

Reindexing is done for one document type at a time, in parallel across content clusters. Detailed progress can be found at the reindexing endpoint. If state is failed, reindexing attempts to resume from the position where it failed after a grace period of some minutes. State pending indicates reindexing will start, or resume, when the cluster is ready, while running means it's currently progressing. Finally, successful means all documents of that type were successfully reindexed. Additionally, if the speed of a reindexing is 0.0—set by users—that reindexing is halted until the speed is either set to a positive value again, or it is replaced by a new reindexing of that document type.

Procedure

Refer to schema changes for a procedure / way to test the reindexing feature, and tools to validate the data.

Use cases

Below are sample changes to the schema for different use cases, or examples of operational steps for data manipulation.

Use case	Description
clear field	To clear a field, do a partial update of all documents with the value, say an empty string. It is also possible to use reindexing, but there is a twist - intuitively, this would work: field artist type string { indexing: "" \| summary \| index } However, the reset only works for synthetic fields. A solution is to deploy a document processor that empties the field, to the default indexing chain - then trigger a reprocessing.
change indexing settings	As reindexing takes time, a field's data can be in one state or another, while the queries to it have the most current state. This is OK for many changes and applications. If not, it is possible to reindex to a new field for a more atomic change. Add a synthetic field outside the document definition and pipe the content of the current field to it: search mydocs { field title_non_stemmed type string { indexing: input title \| index \| summary stemming: none } document mydocs { field title type string { indexing: index \| summary } Once reindexing is completed, switch queries to use the new field. This solution naturally increases memory and disk requirements in the transition. Going back to using the original field with the new settings can be done by changing the index settings for the original field, wait for reindexing to be finished and start using the original field again in queries, then remove the temporary synthetic field.

Use case

Description

clear field

To clear a field, do a partial update of all documents with the value, say an empty string.

It is also possible to use reindexing, but there is a twist - intuitively, this would work:

field artist type string {
    indexing: "" | summary | index
}

However, the reset only works for synthetic fields.

A solution is to deploy a document processor that empties the field, to the default indexing chain - then trigger a reprocessing.

change indexing settings

As reindexing takes time, a field's data can be in one state or another, while the queries to it have the most current state. This is OK for many changes and applications.

If not, it is possible to reindex to a new field for a more atomic change. Add a synthetic field outside the document definition and pipe the content of the current field to it:

search mydocs {

    field title_non_stemmed type string {
        indexing: input title | index | summary
        stemming: none
    }

    document mydocs {
        field title type string {
            indexing: index | summary
        }

Once reindexing is completed, switch queries to use the new field. This solution naturally increases memory and disk requirements in the transition.

Going back to using the original field with the new settings can be done by changing the index settings for the original field, wait for reindexing to be finished and start using the original field again in queries, then remove the temporary synthetic field.

Relevant pointers: