Writing to Vespa

Documents are stored in content clusters. Updates (PUT, UPDATE, DELETE) and reads (GET) pass through a container cluster - Refer to elastic Vespa for a more detailed flow:

This guide covers the functional aspects of updating documents in Vespa.

Vespa's indexing structures is built for high-rate, memory-only operations for field updates. Refer to the feed sizing guide for write performance.

Vespa supports parent/child for de-normalized data. This can be used to simplify the code to update application data, as one write will update all children documents.

Read more on write operation ordering and consistency.

Reads and writes

Get

Returns the newest document instance.

Put

Write a document by ID - a document is overwritten if a document with the same document ID exists.

Remove

Remove a document by ID. Later requests to access the document will not find it - read more about remove-entries. If the document to be removed is not found, this is returned in the reply. This is not considered a failure.

Update

Also referred to as partial update, as it updates some/all fields of a document by ID. If the document to update does not exist, the update returns a reply stating that no document was found.

Update supports create if nonexistent.

All data structures (attribute, index and summary) are updatable. Note that only assign and remove are idempotent - message re-sending can apply updates more than once. Use conditional writes for stronger consistency.

All field types
  • assign (may also be used to clear fields)
Numeric field types
Composite types
Tensor types
  • modify Modify individual cells in a tensor - can replace, add or multiply cell values
  • add Add cells to mapped or mixed tensors
  • remove Remove cells from mapped or mixed tensors

Conditional writes

A test and set condition can be added to Put, Remove and Update operations. The condition is a document selection. Refer to the test-and-set reference.

Example: Increment the sales field only if it is already equal to 999:

{
    "update": "id:music:music::BestOf",
        "condition": "music.sales==999",
        "fields": {
            "sales": {
                "increment": 1
            }
    }
}
Note: Use documenttype.fieldname (e.g. music.sales) in the condition, not only fieldname.

Note: If the condition is not met, an error is returned. ToDo: There is a discussion whether to change to not return error, and instead return a condition-not-met in the response.

Create if nonexistent

Updates to nonexistent documents are supported using create. An empty document is created on the content nodes, before the update is applied. This simplifies client code in the case of multiple writers. Java example using the Document API:

public DocumentUpdate createUpdate(DocumentType musicType) {
    DocumentUpdate update = new DocumentUpdate(musicType, "id:mynamespace:music::http://music.yahoo.com/bobdylan/BestOf");
    update.setCreateIfNonExistent(true);
    return update;
}
create can be used in combination with a condition. If the document does not exist, the condition will be ignored and a new document with the update applied is automatically created. Otherwise, the condition must match for the update to take place.

Caution: if all existing replicas of a document are missing when an update with "create": true is executed, a new document will always be created. This happens even if a condition has been given. If the existing replicas become available later, their version of the document will be overwritten by the newest update since it has a higher timestamp.

API and utilities

Documents are created using JSON or in Java:

/document/v1/ API for get, put, remove, update, visit.
Vespa HTTP client Jar writing to Vespa either by method calls in Java or from the command line. It provides a simple API while achieving high performance by using multiplexing and multiple parallel async connections. It is recommended in all cases when feeding from a node outside the Vespa cluster.
Java Document API Provides direct read-and write access to Vespa documents using Vespa's internal communication layer. Use this when accessing documents from Java components in Vespa such as searchers and document processors.
vespa-feeder Utility to feed data with high performance. vespa-get gets single documents, vespa-visit gets multiple.

Feed block

Write operations fail when a cluster is at disk or memory capacity. The attribute multivalue mapping and enum store can also go full and block feeding.

To remedy, add nodes to the content cluster, or use nodes with higher capacity. The data will auto-redistribute, and feeding is unblocked. Configure resource-limits to tune this. Temporarily increasing resource-limits (e.g. from 0.8 to 0.9) is an alternative if the cluster is stuck.

These metrics indicate whether feeding is blocked (set to 1 when blocked):

content.proton.resource_usage.feeding_blocked disk or memory
content.proton.documentdb.attribute.resource_usage.feeding_blocked attribute enum store or multivalue
When feeding is blocked, events are logged - examples:
java.lang.RuntimeException: ReturnCode(NO_SPACE, Put operation rejected for document 'id:test:test::0': 'diskLimitReached: {
  action: \"add more content nodes\",
  reason: \"disk used (0.85) > disk limit (0.8)\",
  capacity: 100000000000,
  free: 85000000000,
  available: 85000000000,
  diskLimit: 0.8
}'

Dropped writes - document expiry

Applications can auto-expire documents. This feature also blocks PUTs to documents that are already expired - see indexing and document selection. This is a common problem when feeding test data with timestamps, and the writes a silently dropped.

Batch delete

Options for batch deleting documents:

  • Find documents using search, delete, repeat. Pseudocode:
    while True; do
       query and read document ids, if empty exit
       delete document ids using /document/v1
       wait a sec # optional, add wait to reduce load while deleting
    
  • Like 1. but use the Java client. Instead of deleting one-by-one, stream remove operations to the API (write a Java program for this), or append to a JSON file and use the binary:
    $ java -jar $VESPA_HOME/lib/jars/vespa-http-client-jar-with-dependencies.jar --host document-api-host < deletes.json
    
  • Use a document selection to expire documents. This deletes all documents not matching the expression. The content node will iterate over the corpus and delete documents (that are later compacted out):
    <documents garbage-collection="true">
        <document type="mytype" selection="mytype.version &gt; 4" >
    </documents>