A deployed Vespa application is a self-contained highly available, distributed stateful system. Operating these at scale is difficult, so Vespa automates this to the extent possible in the deployment environment it is running.
| Deployment environment | Automated operations | Suitable for |
|---|---|---|
| Vespa self-managed/open source | Application deployment (single application, single instance), application change (except rolling restarts), data redistribution, failover | Development |
| Vespa Kubernetes Operator | Application deployment (single application, single instance), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption | Production in environments outside hyperscalers |
| Vespa Cloud | Application deployment (multiple applications, instances, regions, clouds), application change, data redistribution, failover, node provisioning, failed node replacement, node type change, autoscaling, endpoint routing, encryption, Vespa platform and OS upgrades, continuous deployment pipeline with verification, metrics and management console | Development, production on hyperscalers (including in customer accounts and VPCs). |
Vespa is designed to enable applications to evolve in production. This includes these aspects:
Content clusters in Vespa can be scaled to any amount of content by adding more nodes (horizontal scaling). Data will redistribute automatically, and there's no need for manual tuning of the process. To scale to large amounts of queries, content clusters can also be scaled by adding multiple groups of nodes (vertical scaling). Each group contains a single copy of the corpus and container clusters will automatically load balance over groups.
A Vespa application can consist of any number of stateless and stateful clusters. On larger applications it can be beneficial to split different functions into separate clusters that can be optimized separately. For example, having one stateless container cluster for feeding and another for querying, or using different content clusters for different data schemas.
Read more in elasticity and the performance guide.