Vespa basics
Learn more
Applications and components
Schemas and documents
Reading and writing
Querying
Ranking and inference
RAG and embedding
Linguistics and text processing
Content and elasticity
Performance
Operations
- Environments
- Zones
- Production deployment
- Deployment variants
- Automated deployments
- Autoscaling
- Enclave: Bring your own cloud
- Reindexing
- Data management and backup
- Cloning applications and data
- Monitoring
- Metrics
- Notifications
- Deployment patterns
- Private endpoints
- Endpoint routing
- Access logging
- Artifact archive
- Deleting applications
- Self-managed
- Kubernetes
Security
Clients
Modules
- E-commerce
  - Multi-currency filtering
  - Saved search notifications
Reference
- APIs
- Applications and components
- Schemas and documents
- Reading and writing
  - Indexing language
  - Document selector language
- Querying
- Ranking and inference
- RAG and embedding
  - Chunking
  - Embedding
- Operations
  - Health checks
  - Log files
  - Tools
  - Metrics
    
    Metrics
    
    Default metric set
    
    Vespa metric set
    
    Metric units
    
    Container metrics
    
    Distributor metrics
    
    Search node metrics
    
    Storage metrics
    
    Configserver metrics
    
    Logd metrics
    
    Node Admin metrics
    
    Slobrok metrics
    
    Cluster controller metrics
    
    Sentinel metrics
  - Self-managed
    
    Tools
- Security
  - Mtls
- Clients
  - Vespa CLI
    
    vespa
    
    vespa activate
    
    vespa auth
    
    vespa clone
    
    vespa config
    
    vespa curl
    
    vespa deploy
    
    vespa destroy
    
    vespa document
    
    vespa feed
    
    vespa fetch
    
    vespa inspect
    
    vespa log
    
    vespa prepare
    
    vespa prod
    
    vespa query
    
    vespa status
    
    vespa test
    
    vespa version
    
    vespa visit
- Release notes

Operations

A deployed Vespa application is a self-contained highly available, distributed stateful system. Operating these at scale is difficult, so Vespa automates this to the extent possible in the deployment environment it is running.

Deployment environment	Automated operations	Suitable for
Vespa self-managed/open source	Application deployment (single application, single instance), application change (except rolling restarts), data redistribution, failover	Development
Vespa Kubernetes Operator	Application deployment (single application, single instance), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption	Production in environments outside hyperscalers
Vespa Cloud	Application deployment (multiple applications, instances, regions, clouds), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption, Vespa platform and OS upgrades, continuous deployment pipeline with verification, metrics and management console	Development, production on hyperscalers (including in customer accounts and VPCs).

Vespa is designed to enable applications to evolve in production. This includes these aspects:

Application package changes are managed by Vespa's built-in control plane to be carried out without impacting queries or writes. If a change can not be made without impacting queries or writes, it is rejected on deployment (and will require a validation override to be allowed).
The operations supported by Vespa are those that can be scaled to hundreds of nodes, billions of documents and hundreds of thousands of queries per second. If you can run it on a single machine, you can scale it.
The hardware resources available in a cluster can be changed both up and down. Redistribution will happen automatically in the background, without limited resource usage to avoid impacting queries and writes.
When possible (on Vespa Cloud), new revisions of applications are deployed in test zones where they can be verified by application-supplied functional tests before being allowed to progress to production.

Performance and scaling

Content clusters in Vespa can be scaled to any amount of content by adding more nodes (horizontal scaling). Data will redistribute automatically, and there's no need for manual tuning of the process. To scale to large amounts of queries, content clusters can also be scaled by adding multiple groups of nodes (vertical scaling). Each group contains a single copy of the corpus and container clusters will automatically load balance over groups.

A Vespa application can consist of any number of stateless and stateful clusters. On larger applications it can be beneficial to split different functions into separate clusters that can be optimized separately. For example, having one stateless container cluster for feeding and another for querying, or using different content clusters for different data schemas.

Read more in elasticity and the performance guide.