Vespa basics
Learn more
Schemas and documents
Reading and writing
Querying
Ranking and ML models
Linguistics and text processing
Applications and components
GenAI and RAG
Content and elasticity
Performance and tuning
Operations - Vespa Cloud
- Automated deployments
- Node resources
- Autoscaling
- Topology and resizing
- Enclave - bring your own cloud
- Production deployment
- Migrating to Vespa Cloud
- Migrating from ElasticSearch to Vespa
- Data management and backup
- Cloning applications and data
- Index bootstrap
- Monitoring
- Notifications
- Deleting applications
- Environments
- Zones
- Private endpoints
- Deployment patterns
- Routing
- Artifact Archive
Operations - Self-managed
Security
Reference
- Vespa CLI
- Application packages
- Schemas
- services.xml
- deployment.xml
- Deployment variants
- hosts.xml
- validation-overrides.xml
- Indexing language
- Chunking
- Embedding
- Components
- Custom configuration files
- Configuration file format
- mTLS
- Tools
- Health checks
- APIs
- Queries and results
- Ranking and ML models
- Document API
- Metrics
Getting help from LLMs
- Guide to using LLMs
- llms.txt
Contributing

Operations

A deployed Vespa application is a self-contained highly available, distributed stateful system. Operating these at scale is difficult, so Vespa automates this to the extent possible in the deployment environment it is running.

Deployment environment	Automated operations	Suitable for
Vespa self-managed/open source	Application deployment (single application, single instance), application change (except rolling restarts), data redistribution, failover	Development
Vespa Kubernetes Operator	Application deployment (single application, single instance), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption	Production in environments outside hyperscalers
Vespa Cloud	Application deployment (multiple applications, instances, regions, clouds), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption, Vespa platform and OS upgrades, continuous deployment pipeline with verification, metrics and management console	Development, production on hyperscalers (including in customer accounts and VPCs).

Vespa is designed to enable applications to evolve in production. This includes these aspects:

Application package changes are managed by Vespa's built-in control plane to be carried out without impacting queries or writes. If a change can not be made without impacting queries or writes, it is rejected on deployment (and will require a validation override to be allowed).
The operations supported by Vespa are those that can be scaled to hundreds of nodes, billions of documents and hundreds of thousands of queries per second. If you can run it on a single machine, you can scale it.
The hardware resources available in a cluster can be changed both up and down. Redistribution will happen automatically in the background, without limited resource usage to avoid impacting queries and writes.
When possible (on Vespa Cloud), new revisions of applications are deployed in test zones where they can be verified by application-supplied functional tests before being allowed to progress to production.

Performance and scaling

Content clusters in Vespa can be scaled to any amount of content by adding more nodes (horizontal scaling). Data will redistribute automatically, and there's no need for manual tuning of the process. To scale to large amounts of queries, content clusters can also be scaled by adding multiple groups of nodes (vertical scaling). Each group contains a single copy of the corpus and container clusters will automatically load balance over groups.

A Vespa application can consist of any number of stateless and stateful clusters. On larger applications it can be beneficial to split different functions into separate clusters that can be optimized separately. For example, having one stateless container cluster for feeding and another for querying, or using different content clusters for different data schemas.

Read more in elasticity and the performance guide.