# Proton

Proton is Vespa's search core. Proton maintains disk and memory structures for documents. As the data is dynamic, these structures are periodically optimized by maintenance jobs and the resource footprint of these background jobs are mainly controlled by the concurrency setting.

## Internal structure

The internal structure of a proton node:

The search node consists of a bucket management system which sends requests to a set of document databases, which each consists of three sub-databases.

### Bucket management

When the node starts up it first needs to get an overview of what documents and hence buckets it has. Once it knows this, it is in initializing mode, able to handle load, but distributors do not yet know bucket metadata for all the buckets, and thus can't know whether buckets are consistent with copies on other nodes. Once metadata for all buckets are known, the content nodes transitions from initializing to up state. As the distributors wants quick access to bucket metadata, it keeps an in-memory bucket database to efficiently serve these requests.

It implements elasticity support in terms of the SPI. Operations are ordered according to priority, and only one operation per bucket can be in-flight at a time. Below bucket management is the persistence engine, which implements the SPI in terms of Vespa search. The persistence engine reads the document type from the document id, and dispatches requests to the correct document database.

### Document database

Each document database is responsible for a single document type. It has a component called FeedHandler which takes care of incoming documents, updates, and remove requests. All requests are first written to a transaction log, then handed to the appropriate sub-database, based on the request type.

### Sub-databases

There are three types of sub-databases, each with its own document meta store and document store. The document meta store holds a map from the document id to a local id. This local id is used to address the document in the document store. The document meta store also maintains information on the state of the buckets that are present in the sub-database.

The sub-databases are maintained by the index maintainer. The document distribution changes as the system is resized. When the number of nodes in the system changes, the index maintainer will move documents between the Ready and Not Ready sub-databases to reflect the new distribution. When an entry in the Removed sub-database gets old it is purged. The sub-databases are:

Not Ready Holds the redundant documents that are not searchable, i.e. the not ready documents. Documents that are not ready are only stored, not indexed. It takes some processing to move from this state to the ready state.
Ready Maintains an index of all ready documents and keeps them searchable. One of the ready copies is active while the rest are not active:
Active There should always be exactly one active copy of each document in the system, though intermittently there may be more. These documents produce results when queries are evaluated. The ready copies that are not active are indexed but will not produce results. By being indexed, they are ready to take over immediately if the node holding the active copy becomes unavailable. Read more in searchable-copies.
Removed Keeps track of documents that have been removed. The id and timestamp for each document is kept. This information is used when buckets from two nodes are merged. If the removed document exists on another node but with a different timestamp, the most recent entry prevails.

## Transaction log

Content nodes have a transaction log to persist mutating operations. The transaction log persists operations by file append. Having a transaction log simplifies proton's in-memory index structures and enables steady-state high performance, read more below.

All operations are written and synched to the transaction log. This is sequential (not random) IO but might impact overall feed performance if running on NAS attached storage where the synch operation has a much higher cost than on local attached storage (e.g SSD). See sync-transactionlog.

Default, proton will flush components like attribute vectors and memory index on shutdown, for quicker startup after scheduled restarts.

## Index

Index fields are string fields, used for text search. Other field types are attributes and summary fields.

Proton stores position information in text indices by default, for proximity relevance - posocc (below). All the occurrences of a term is stored in the posting list, with its position. This provides superior ranking features, but is somewhat more expensive than just storing a single occurrence per document. For most applications it is the correct tradeoff, since most of the cost is usually elsewhere and relevance is valuable.

Applications that only need occurrence information for filtering can use rank: filter to optimize performance, using only boolocc-files (below).

The memory index has a dictionary per index field. This contains all unique words in that field with mapping to posting lists with position information. The position information is used during ranking, see nativeRank for details on how a text match score is calculated.

The disk index stores the content of each index field in separate folders. Each folder contains:

• Dictionary. Files: dictionary.pdat, dictionary.spdat, dictionary.ssdat.
• Compressed posting lists with position information. File: posocc.dat.compressed.
• Posting lists with only occurrence information (bitvector). These are generated for common words. Files: boolocc.bdat, boolocc.idx.

## Retrieving documents

Retrieving documents is done by specifying an id to get, or use a selection expression to visit a range of documents - refer to the Document API. Overview:

Get When the content node receives a get request, it scans through all the document databases, and for each one it checks all three sub-databases. Once the document is found, the scan is stopped and the document returned. If the document is found in a Ready sub-database, the document retriever will apply any changes that is stored in the attributes before returning the document. A visit request creates an iterator over each candidate bucket. This iterator will retrieve matching documents from all sub-databases of all document databases. As for get, attributes values are applied to document fields in the Ready sub-database.

## Queries

Queries has a separate pathway through the system. It does not use the distributor, nor has it anything to do with the SPI. It is orthogonal to the elasticity set up by the storage and retrieval described above. How queries move through the system:

A query enters the system through the QR-server (query rewrite server) in the Vespa Container. The QR-server issues one query per document type to the search nodes:

Container The Container knows all the document types and rewrites queries as a collection of queries, one for each type. Queries may have a restrict parameter, in which case the container will send the query only to the specified document types. It sends the query to content nodes and collect partial results. It pings all content nodes every second to know whether they are alive, and keeps open TCP connections to each one. If a node goes down, the elastic system will make the documents available on other nodes. The match engine receives queries and routes them to the right document database based on the document type. The query is passed to the Ready sub-database, where the searchable documents are. Based on information stored in the document meta store, the query is augmented with a blocklist that ensures only active documents are matched.