Vespa Overview

Vespa is a platform that allows you to develop and run scalable, low latency, stateful and stateless backend services easily. This document provides an overview of the features and main components of the platform.


Vespa allows application developers to create backend and middleware systems which scale to accommodate large amounts of data and high loads without sacrificing latency or reliability. A Vespa instance consists of a number of stateless Java container clusters and zero or more content clusters storing data.

The stateless container clusters host components which process both incoming data as well as requests/queries and their responses. These components provide not only the functionality belonging to the platform, like indexing and the global stages of query execution, but also the middleware logic of the application. Application developers can configure their Vespa system with a single stateless cluster which performs all such functions, or create different clusters within their system for each kind of task. The container clusters then pass queries and data operations on to the appropriate content clusters — or, if the application uses data it does not own, it can also federate with external services which supply this data.

Content clusters in Vespa are responsible for storing data (documents) and performing lookups and distributed select/group/aggregate queries over it. They can function both as a simple key-value serving store as well as perform complex searches over structured and unstructured data, order the data by relevance models and group and aggregate the results of a search. Great care has been taken to make such operations work with low latency, such that these features can be used directly in end-user applications on large data sets without requiring precomputation of result data.

To provide scalability, content clusters automatically rebalance data, in the background, to maintain a given redundancy level. They also fail over unreachable nodes, and thus are both elastic and auto-recovering.

After intermediate processing in a container cluster, data is written to a content cluster. Writes take effect after a few milliseconds, are guaranteed to either succeed or provide failure information within a given time limit, and they scale with the available resources. Writes can be sent directly over HTTP, or by using a Java client — refer to the API documentation.

Each document instance stored in Vespa must have a configured schema. Each content cluster in a system can handle data of many types at the same time; applications can separate different types of data into different content clusters, or put multiple data types in the same content cluster.

Container and content clusters handle all the end user traffic of a Vespa application, but there's also a third type of cluster, admin and config clusters, which manage the other cluster types and handle requests for changes to the configuration of the system.

A Vespa application is completely described by an application package, which is a directory containing a declaration of the clusters which should run as part of the system, the content schemas, Java components and other configuration or data files needed by the application. Application owners make their application go live by deploying the application package to the single admin cluster, and make changes to a live application in the same way. In addition to controlling application configuration, the admin cluster also collects logs from all the nodes of the system in real time. Once Vespa is installed and started on a node, it is managed by the admin system such that the entire system can be treated as a single unit, and application owners do not need to perform administration tasks locally on the nodes of the system.

The rest of this document provides some more detail on the functions Vespa performs.

Vespa operations

Vespa accepts the following operations:

  • Writes: Put (add and replace) and remove documents, and update fields in these.
  • Lookup of a document (or some subset of it) by id.
  • Queries: Select documents whose fields match a boolean combination of conditions; matches are either sorted, ranked or grouped. Ranking is done according to a ranking expression, which can be simple mathematical function, express complex business logic, or evaluate a machine learned search ranking model. Grouping is done by field values, in a set of hierarchical groups, where each group can contain aggregated values over the data in the group. Grouping can be combined with aggregation to calculate values for, e.g., navigation aids, tag clouds, graphs or for clustering — all in a distributed fashion, without having to ship all the data back up to a container cluster, which is prohibitively expensive with large data sets.
  • Data dumps: Content matching some criterion can be streamed out for background reprocessing, backup, etc., by using the visit operation.
  • Any other custom network request which can be handled by application components deployed on a container cluster.

These operations allow developers to build feature rich applications expressed as Java middleware logic working on stored content, where selection, keyword searches and organization and processing of the content can be expressed as declarative queries by the middleware logic.

The stateless container

Container clusters host the application components which employ the operations listed above and process their return data. Vespa provides a set of components out of the box, together with component infrastructure: dependency injection built on top of Guice, with added support for injection of config from the admin server or the application package; a component model based on OSGi; a shared mechanism to chain components into handler chains for modularity as well as metrics and logging. The container also provides the network layer for handling and issuing remote requests - HTTP is provided out of the box, and other protocols/transports can be transparently plugged in as components.

Developers can make changes to components (and of course their configuration) simply by redeploying their application package - the system takes care of copying the components to the nodes of the cluster and loading/unloading components on the fly without impacting request serving.

Content clusters

Content clusters reliably store data and maintain distributed indices of data for searches and selects. The data is replicated over multiple nodes, with a number of copies specified by the application, such that the cluster can automatically repair itself on loss of a node or a disk. Using the same mechanism, clusters can also be grown or shrunk while online, simply by changing the set of available nodes declared in the application package.

Lookup of an individual document is routed directly to a node storing that document, while queries are spread over a subset of nodes which contain the queried documents. Complex queries are performed as distributed algorithms with multiple steps back and forth between the container and the content nodes; this is to achieve the low latency which is one of the main design goals of Vespa.

Administration and developer support

The single administration and configuration cluster controls the other clusters of a system. They derive the low level configuration of each individual cluster, including process and component instances, such that the application developer can declare the desired system on a higher level without worrying about its detailed realization. Whenever the application package is redeployed, the system will compute the necessary changes in configuration and push only these to the distributed components. Changed components and data files are distributed over BitTorrent for efficiency.

Application packages may be changed, redeployed and inspected over a HTTP REST API, or through a command line interface. The administration cluster runs over ZooKeeper to make changes to configuration singular and consistent, and to avoid having a single point of failure.

An application package looks the same, and is deployed the same way, whether it specifies a large system with hundreds of nodes or a single node running all services. The only change needed is to the lists of nodes making up the cluster. The container clusters may also be started within a single Java VM by "deploying" the application package from a method call. This is useful for testing applications in an IDE and in unit tests. Application packages with components can be developed in an IDE using Maven starting from sample applications.


Vespa allows functionally rich and highly available applications to be developed to scale and perform to high standards without burdening developers with the considerable low level complexity this requires. It allows developers to evolve and grow their applications over time without taking the system offline, and lets them avoid complex data and page precomputing schemes which lead to stale data that cannot be personalized, since this often requires complex queries to complete in real user time over data which is constantly changing at the same time.

To learn more about the Vespa capabilities, run the Blog search tutorial.