Reads and writes

This guide covers the aspects of accessing documents in Vespa. Documents are stored in content clusters. Writes (PUT, UPDATE, DELETE) and reads (GET) pass through a container cluster. Find a more detailed flow at the end of this article.

Highlights:

Vespa's indexing structures are built for high-rate field updates. Refer to the feed sizing guide for write performance, in particular partial updates for partial updates.
Vespa supports parent/child for de-normalized data. This can be used to simplify the code to update application data, as one write will update all children documents.
Applications can add custom feed document processors and multiple container clusters - see indexing for details.
Writes in Vespa are consistent in a stable cluster, but Vespa will prioritize availability over consistency when there is a conflict. See the elasticity documentation and the Vespa consistency model. It is recommended to use the same client instance for updating a given document when possible - for data consistency, but also performance (see concurrent mutations). Read more on write operation ordering. For performance, group field updates to the same document into one update operation.
Applications can auto-expire documents. This feature also blocks PUTs to documents that are already expired - see indexing and document selection. This is a common problem when feeding test data with timestamps, and the writes a silently dropped.

Also see troubleshooting.

Operations

Operation

Description

Get

Get a document by ID.

Put

Write a document by ID - a document is overwritten if a document with the same document ID exists.

Puts can have conditions for test-and-set use cases. Conditions can be combined with create if nonexistent, which causes the condition to be ignored if the document does not already exist.

Remove

Remove a document by ID. If the document to be removed is not found, it is not considered a failure. Read more about data-retention. Also see batch deletes.

Removes can have conditions for test-and-set use cases.

A removed document is written as a tombstone, and later garbage collected - see removed-db / prune / age. Vespa does not retain, nor return, the document data of removed documents.

Update

Also referred to as partial updates, as it updates one or more fields of a document by ID - the document v1 API can be used to perform updates in the JSON Document format. If the document to update is not found, it is not considered a failure.

Updates support create if nonexistent (upsert).

Updates can have conditions for test-and-set use cases.

All data structures (attribute, index and summary) are updatable. Note that only assign and remove are idempotent - message re-sending can apply updates more than once. Use conditional writes for stronger consistency.

All field types	assign (may also be used to clear fields)
Numeric field types	increment. Also see auto-generate weightedset keys decrement multiply divide
Composite types	add For array and weighted set. To put into a map, see the assign section remove match Pick element from collection, then apply given operation to matched element accessing elements within a composite field using fieldpaths
Tensor types	modify Modify individual cells in a tensor - can replace, add or multiply cell values add Add cells to mapped or mixed tensors remove Remove cells from mapped or mixed tensors

API and utilities

Also see the JSON Document format:

API / util	Description
Vespa CLI	Command-line tool to `get`, `put`, `remove`, `update`, `feed`, `visit`.
/document/v1/	API for `get`, `put`, `remove`, `update`, `visit`.
Java Document API	Provides direct read-and write access to Vespa documents using Vespa's internal communication layer. Use this when accessing documents from Java components in Vespa such as searchers and document processors. See the Document class.
pyvespa	Python client library for reading and writing documents to Vespa. Provides convenient methods for feeding, querying, and visiting documents. Expect less performance than Vespa CLI and vespa-feed-client for heavy batch feed operations.

Advanced / debugging tools:

vespa-feed-client: Java library and command line client for feeding document operations using /document/v1/.
vespa-feeder is a utility for feeding over the Message Bus.
vespa-get gets single documents over the Message Bus.
vespa-visit gets multiple documents over the Message Bus.

Feed flow

Use the Vespa CLI, vespa-feed-client, pyvespa python client or /document/v1/ API to read and write documents:

Alternatively, use vespa-feeder to feed files or the Java Document API.

Indexing and/or document processing is a chain of processors that manipulate documents before they are stored. Document processors can be user defined. When using indexed search, the final step in the chain prepares documents for indexing.

The Document API forwards requests to distributors on content nodes. For more information, read about content nodes and the search core.