Glossary of both Vespa-specific terminology, and general terms useful in this context.
Application
The unit of deployment and management. It can contain any number of clusters and schemas etc., but all deployed together. The files defining the application is called Application Package.
Attribute
An attribute is a field with properties other than an indexed field. Attribute fields have flexible match modes, including exact match, prefix match, and case-sensitive matching. Attributes enable high sustained update rates by writing directly to memory without disk access. Features like Grouping, Sorting, and Parent/Child use attributes.
Cluster
A set of homogenous nodes which all perform the same task. Vespa has two types: Container clusters are stateless, and content clusters store and process the data.
Component
Components extend a base class from the Container code module; some are Chained for execution. The main available component types are:
Container
Vespa's Java container, hosting all application components as well as the stateless logic of Vespa itself. Read more in Container. Not to be confused with Docker Containers.
Docker
Vespa is available as a container image from hub.docker.com. Products to run this image include Docker, podman and runC, and it enables users to run Vespa in a well-defined environment on multiple platforms.
Document
Vespa models data as documents. A document has a string identifier, set by the application, unique across all documents. A document is a set of key-value pairs. A document has a Schema. Read more in Documents.
Document Processor
Document processing is a framework to create chains of configurable Components
that read and modify document operations.
A Document Processor uses getFieldValue()
and setFieldValue()
to process fields,
alternatively using generated code from Concrete Documents.
Document Type
The data type part of a Schema - a collection of fields.
Elasticity
Vespa's clusters are elastic - a user can add or remove nodes on running applications without service disruption. For the stateful content nodes, this causes data sync between nodes for uniform distribution, with minimal data re-distribution. Read more in Elasticity.
Embedding
A common technique in modern big data serving applications is to map the subject data - say, text or images - to points in an abstract vector space and then do computation in that vector space. For example, retrieve similar data by finding nearby points in the vector space, or using the vectors as input to a neural net. This mapping is usually referred to as embedding, and Vespa provides built-in support for this.
Field
Documents have Fields. A field has a type, and a field contained in a document can be written to, read from and queried. A field can also be generated (i.e. a synthetic field) - in this case, the field definition is outside the document. A field can be single value, like a string, or multivalue, like an array of strings.
Garbage Collection
Use a Document Selection to auto-expire documents by time or any other criterion.
Grouping
Vespa Grouping is a list processing language which describes how the query hits should be grouped, aggregated and presented in result sets. A grouping statement takes the list of all matches to a query as input and groups/aggregates it, possibly in multiple nested and parallel ways to produce the output. Read more.
Namespace
A segment of Document IDs which helps you generate unique ids also if you have multiple sources of unique values. Namespace can be used to Visit a subspace of the corpus.
Node
A Node is a host / container instance running one or more Services. The mapping from logical to actual name is configured in hosts.xml.
Parent / Child
Using document references, documents can have parent/child relationships. Use this to join data by importing fields from parent documents. Parent documents are replicated to all nodes in the cluster.
Partial Update
A partial update is an update to one or more fields in a document. It also includes updating all index structures, so the effect of the partial update is immediately observable in queries. Partial updates do not require the full document, and enables a high write throughput with memory-only operations. Read more.
Query
Use the Query API to query the corpus. Queries are written in YQL, or can be created programmatically in a Searcher. Configure query execution in a Query Profile.
Ranking
Ranking is where Vespa does computing, or inference over documents. The computations to be done are expressed in functions called Ranking Expressions, bundled into Rank Profiles defined in a Schema. These can range from simple math expressions combining some rank features, to tensor expressions or large machine-learned models. Ranking can be single- or multiphased.
Schema
A description of a particular type of data and how to process/rank it. See the Schema guide.
Searcher
A searcher is a Component - usually deployed as part of an OSGi bundle.
All Searchers must implement a single method search(query)
.
Developers implement application query logic in Searchers.
Service
A Service runs in a Cluster of container or content nodes, configured in services.xml.
Tensor
A Tensor is a data structure which generalizes scalars, vectors and matrices to any number of dimensions: A scalar is a tensor of rank 0, a vector is a tensor of rank 1, a matrix is a tensor of rank 2. Tensors consist of a set of scalar valued cells, with each cell having a unique address. A cell's address is specified by its index or label in all the dimensions of that tensor. The number of dimensions in a tensor is the rank of the tensor, each dimension can be either mapped or indexed.
Visit
Visit is a feature to efficiently get or process a set of / all documents, identified by a Document Selection Expression. Visit iterates over all, or a set of, buckets and sends documents to a (set of) targets.