Partial Updates

A partial update is an update to one or more fields in a document. It also includes updating all index structures so the effect of the partial update is immediately observable in queries.

In Vespa, all fields can be partially updated by default. A field is index, attribute or summary or a combination of these, and both index and attribute fields can be queried. For index and summary fields, an update means a read-modify-write to the document store and limits throughput. Overview:

Field SettingSearchableFast searchableMatchingRankingDisplay in results
indexYYText and Exact matchingYN
attributeYY with attribute:fast-searchExact matchingYY
summaryNNNNY

Attribute fields do not require the document store read/write. This increases write throughput by orders of magnitude. This guide goes through the details of this feature and use cases for it.

Examples:

field score type int {
    indexing: summary
}
Summary only field. The field is stored in the document store, a partial update to the field will trigger read + write. The field is not searchable.
field score type int {
    indexing: attribute
}
Attribute only field. The field is stored in the attribute (in-memory) and a partial update will update the document in-place and will be visible for queries, ranking, grouping and sorting immediately.

Use cases

Partial updates have many use cases. Functionally, it enables updating a document without anything else than the ID, simplifying logic in the upper levels of the serving stack. Performance-wise, partial updates enables applications with a real-time update flow in tens of thousands updates per second. Examples

Filtering
Inventory updates Update product price and inventory count in real time. Do not show items out of stock.
Update relations Add a "this person likes me" to the "likes me" set - display candidates based on sets of likes/dislikes/other relations.
Ranking Update click / views / non-clicks: Feed usage data to use in ranking - rank popular items higher.

Write pipeline

Refer to proton for an overview of the write pipeline and the Transaction Log Server (TLS). As updating means a document is already there, the Document Meta Store has an entry, so the FeedView needs to consider the field’s config for indexed, attribute and summary.

Field SettingDescription
index

For all indexed fields, a memory index is used for the recent changes, implemented using B-trees. This is periodically flushed to a disk-based posting list index. Disk-based indexes are subsequently merged.

Updating the in-memory B-trees is lock-free, implemented using copy-on-write semantics. This gives high performance, with a predictable steady-state CPU/memory use. The driver for this design is the requirement for a sustained, high change rate, with stable, predictable read latencies and small temporary increases in CPU/memory. This compared to index hierarchies, merging smaller real-time indices into larger, causing temporary hot-spots.

When updating an indexed field, the document is read from the document store, the field is updated, and the full document is written back to the store. At this point, the change is searchable, and an ACK is returned to the client. Use attributes to avoid such document disk accesses and hence increase performance for partial updates. Find more details in feed performance.

attribute

Attribute fields are in-memory fields, see attributes. Updates are memory operations only. Persisted in TLS at write, and later flushed, see attribute-flush. This makes updates inexpensive and fast.

Note there is no transaction support. To support high rate, there is no coordination between threads - example:

{
    "update" : "id:namespace:doctype::1",
    "fields" : {
        "firstName" : { "assign" : "John" }
        "lastName"  : { "assign" : "Smith" }
    }
}

Above, the attributes firstName and LastName are updated in the same operation from the client, whereas the update in the search core is non-transactional. This is a throughput vs consistency tradeoff that enables the extreme update rates without being a practical limitation for many applications. More details in attributes.

Updating multivalue attributes (arrays, maps, sets, tensors) means reading the current value, making the update and writing it back:

Query execution time can be improved by adding an in-memory B-tree posting list structure using fast-search. This increases work when updating, as both the value and the posting list is updated and hence decreases update throughput.

See sizing-feeding for how to ensure an attribute is in memory on all nodes with a replica (searchable-copies or fast-access).

summary

Refer to document summaries. Need to document how an update to the summary store is. I think it is read the current version, modify and write back.

Attribute fields that are also in summary get their values from the memory structures, not the summary store. Use summary class with attributes only for applications with high write/query rates using memory only. https://docs.vespa.ai/documentation/document-summaries.html#summary-classes-in-queries

Further reading