Glossary of both Vespa-specific terminology, and general terms useful in this context.
The unit of deployment and management. It can contain any number of clusters and schemas etc., but all deployed together. The files defining the application is called Application Package.
An attribute is a field with properties other than an indexed field. Attribute fields have flexible match modes, including exact match, prefix match, and case-sensitive matching. Attributes enable high sustained update rates by writing directly to memory without disk access. Features like Grouping, Sorting, and Parent/Child use attributes.
A set of homogenous nodes which all perform the same task. Vespa has two types: Container clusters are stateless, and content clusters store and process the data.
Components extend a base class from the Container code module; some are Chained for execution. The main available component types are:
Vespa is available as a container image from hub.docker.com. Products to run this image include Docker, podman and runC, and it enables users to run Vespa in a well-defined environment on multiple platforms.
Vespa models data as documents. A document has a string identifier, set by the application, unique across all documents. A document is a set of key-value pairs. A document has a Schema. Read more in Documents.
Document processing is a framework to create chains of configurable Components
that read and modify document operations.
A Document Processor uses
setFieldValue() to process fields,
alternatively using generated code from Concrete Documents.
The data type part of a Schema - a collection of fields.
Vespa's clusters are elastic - a user can add or remove nodes on running applications without service disruption. For the stateful content nodes, this causes data sync between nodes for uniform distribution, with minimal data re-distribution. Read more in Elasticity.
A common technique in modern big data serving applications is to map the subject data - say, text or images - to points in an abstract vector space and then do computation in that vector space. For example, retrieve similar data by finding nearby points in the vector space, or using the vectors as input to a neural net. This mapping is usually referred to as embedding, and Vespa provides built-in support for this.
Documents have Fields. A field has a type, and a field contained in a document can be written to, read from and queried. A field can also be generated (i.e. a synthetic field) - in this case, the field definition is outside the document. A field can be single value, like a string, or multivalue, like an array of strings.
Vespa Grouping is a list processing language which describes how the query hits should be grouped, aggregated and presented in result sets. A grouping statement takes the list of all matches to a query as input and groups/aggregates it, possibly in multiple nested and parallel ways to produce the output. Read more.
Parent / Child
Using document references, documents can have parent/child relationships. Use this to join data by importing fields from parent documents. Parent documents are replicated to all nodes in the cluster.
A partial update is an update to one or more fields in a document. It also includes updating all index structures, so the effect of the partial update is immediately observable in queries. Partial updates do not require the full document, and enables a high write throughput with memory-only operations. Read more.
Ranking is where Vespa does computing, or inference over documents. The computations to be done are expressed in functions called Ranking Expressions, bundled into Rank Profiles defined in a Schema. These can range from simple math expressions combining some rank features, to tensor expressions or large machine-learned models. Ranking can be single- or multiphased.
A description of a particular type of data and how to process/rank it. See the Schema guide.
A searcher is a Component - usually deployed as part of an OSGi bundle.
All Searchers must implement a single method
Developers implement application query logic in Searchers.
A Tensor is a data structure which generalizes scalars, vectors and matrices to any number of dimensions: A scalar is a tensor of rank 0, a vector is a tensor of rank 1, a matrix is a tensor of rank 2. Tensors consist of a set of scalar valued cells, with each cell having a unique address. A cell's address is specified by its index or label in all the dimensions of that tensor. The number of dimensions in a tensor is the rank of the tensor, each dimension can be either mapped or indexed.
Visit is a feature to efficiently get or process a set of / all documents, identified by a Document Selection Expression. Visit iterates over all, or a set of, buckets and sends documents to a (set of) targets.