• [+] expand all

Partial Updates

A partial update is an update to one or more fields in a document. It also includes updating all index structures so the effect of the partial update is immediately observable in queries.

Attribute is an in-memory data structure

In Vespa, all fields can be partially updated by default. A field is index, attribute or summary or a combination of these, and both index and attribute fields can be queried.

Field SettingSearchableFast searchableMatchingRankingDisplay in results
indexYYText and Exact matchingYN
attributeYY with attribute:fast-searchExact matchingYY
summaryNNNNY

Examples:

field user type string {
    indexing: summary | index
}
Summary + index field. The field is stored in the document store, a partial update to the field will trigger read + write.
field user type string {
    indexing: attribute
}
Attribute only field. The field is stored in the attribute (in-memory) and a partial update will update the document in-place and will be visible for queries, ranking, grouping and sorting immediately.

Use cases

Partial updates have many use cases. Functionally, it enables updating a document without anything else than the ID, simplifying logic in the upper levels of the serving stack. Performance-wise, partial updates enables applications with a real-time update flow in tens of thousands updates per second. Examples:

Use caseDescription
Filtering Inventory updates Update product price and inventory count in real time. Do not show items out of stock.
Update relations Add a "this person likes me" to the "likes me" set - display candidates based on sets of likes/dislikes/other relations.
Ranking Update click / views / non-clicks: Feed usage data to use in ranking - rank popular items higher.

Write pipeline

Refer to proton for an overview of the write-pipeline and the Transaction Log Server (TLS).

Field SettingDescription
index

For all indexed fields, a memory index is used for the recent changes, implemented using B-trees. This is periodically flushed to a disk-based posting list index. Disk-based indexes are subsequently merged.

Updating the in-memory B-trees is lock-free, implemented using copy-on-write semantics. This gives high performance, with a predictable steady-state CPU/memory use. The driver for this design is the requirement for a sustained, high change rate, with stable, predictable read latencies and small temporary increases in CPU/memory. This compared to index hierarchies, merging smaller real-time indices into larger, causing temporary hot-spots.

When updating an indexed field, the document is read from the document store, the field is updated, and the full document is written back to the store. At this point, the change is searchable, and an ACK is returned to the client. Use attributes to avoid such document disk accesses and increase performance for partial updates. Find more details in feed performance.

attribute

Attribute fields are in-memory fields, see attributes. This makes updates inexpensive and fast. Attribute data is periodically flushed, see attribute-flush. Note that operations are persisted to the Transaction Log Service (TLS), in the rare case of a power failure or unclean shutdown, the operations are synched from the TLS.

Note there is no transactional support for updates across fields. To support high rate, there is no coordination between threads - example:

{
    "update" : "id:namespace:doctype::1",
    "fields" : {
        "firstName" : { "assign" : "John" },
        "lastName"  : { "assign" : "Smith" }
    }
}

Above, the attributes firstName and lastName are updated in the same operation from the client, whereas the update in the search core is non-transactional. This is a throughput vs consistency tradeoff that enables the extreme update rates without being a practical limitation for many applications. More details in attributes.

Updating multivalue attributes (arrays, maps, sets, tensors) means reading the current value, making the update and writing it back:

Query execution time can be improved by adding an in-memory B-tree posting list structure using fast-search. This increases work when updating, as both the value and the posting list is updated and hence decreases update throughput.

See sizing-feeding for how to ensure an attribute is in memory on all nodes with a replica (searchable-copies or fast-access).

summary

An update to the document store is read the current version, modify and write back a new blob. Refer to document summaries.

Attribute fields that are also in summary get their values from the memory structures, not the document store. Use summary class with attributes only for applications with high write/query rates using memory only.

Further reading