# Vespa Feed Sizing Guide

Vespa is optimized to sustain a high feed load while serving - also during planned and unplanned changes to the instance. Vespa supports feed rates at memory speed, this guide goes through how to configure, test and size the application for optimal feed performance.

Read writing to Vespa first. This has an overview of Vespa, where the key takeaway is the stateless container cluster and the stateful content cluster. The processing of documents PUT to Vespa is run in the container cluster, and includes both Vespa-internal processing like tokenization and application custom code in document processing. The stateless cluster is primarily CPU bound, read indexing for how to separate search and write to different container clusters. Other than that, make sure the container cluster has enough memory to avoid excessive GC - the heap must be big enough. Allocate enough CPU for the indexing load.

All updates are written to the transaction log. This is sequential IO and rarely impacts write performance - and hence not covered in this guide.

The remainder of this document concerns how to configure the application to optimize feed performance on the content node.

## Content node

The content node runs the proton process - overview:

As there are multiple components and data structures involved, this guide starts with the simplest examples and then adds optimizations and complexity. Flushing of index structures is covered at the end of this guide.

The simplest for of indexing is no indexing. In Vespa, this is called streaming search indexing mode. Streaming search is optimized for write performance - the tradeoff is slow search. Streaming search is hence great for applications like personal search, where data size searched is small - read more in the streaming search guide.

Documents are written to the document store in all indexing modes - this is where the copy of the PUT document is persisted. See Summary Manager + Document store in illustration above. In short, this is append operations to large files, and (simplified) each PUT is one write.

PUT-ing documents to the document store can hence be thought of as appending to files using sequential IO, expecting a high write rate, using little memory and CPU. Writing a new version of a document (PUT a document that already exists) is the same as non-existent - the index on the document is updated to point to the latest version in both cases.

A partial UPDATE to a document incurs a read from the document store to get the current fields. Then the new field values are applied and the new version or the document is written. Hence, like a PUT with an extra read.

## Index

The majority of Vespa use cases operate on the full data set, using indexed mode:

schema music {
document music {
field artist type string {
indexing: summary | index
}
Observe that documents are written to summary (i.e. document store) as in streaming mode, but there is also an index. See Index Manager + Index in illustration above.

Refer to proton index for the index write. In short, it updates the memory index, which is flushed regularly. The PUT to the index is hence a memory only operation, but uses CPU to update the index.

A partial UPDATE is as in streaming search, plus the memory index update.

## Attribute

Some applications have a limited set of documents, with a high change rate to fields in the documents (e.g. stock prices - number of stocks is almost fixed, prices changes constantly). Such applications are easily write bound.

To real-time update fields in high volume, use attribute fields:

schema ticker {
document ticker {
field price type float {
indexing: summary | attribute
}
Attribute fields are not stored in the document store, there is hence no IO (except sequential flushing). This enables application to write at memory speed to vespa - a 10k update rate per node is possible.

### Redundancy settings

To achieve memory-only updates, make sure all attributes to update are ready, meaning the content node has loaded the attribute into memory:

• One way to ensure this is to set searchable copies equal to redundancy - i.e. all nodes that has a replica of the document has loaded it as searchable
• Another way is by setting fast-access on each attribute to update

### Debugging performance

When debugging update performance, it is useful to know if an update hits the document store or not. Enable spam log level and look for SummaryAdapter::put - then do an update:

$vespa-logctl searchnode:proton.server.summaryadapter spam=on .proton.server.summaryadapter ON ON ON ON ON ON OFF ON$ vespa-logfmt -l all -f | grep 'SummaryAdapter::put'
[2019-10-10 12:16:47.339] SPAM    : searchnode       proton.proton.server.summaryadapter	summaryadapter.cpp:45 SummaryAdapter::put(serialnum = '12', lid = 1, stream size = '199')
Existence of such log messages indicates that the update was accessing the document store.

## Multivalue attribute

Multivalued attributes are weightedset, array of struct/map, map of struct/map and tensor. The attributes have different characteristics, which affects write performance. Generally, updates to multivalue fields are more expensive as the field size grows.

weightedset Memory-only operation when updating: read full set, update, write back. Make the update as inexpensive as possible using numeric types instead of strings, where possible Example: a weighted set of string with many (1000+) elements. Adding an element to the set means an enum store lookup/add and add/sort of the attribute multivalue map - details in attributes. Use a numeric type instead to speed this up - this has no string comparisons. Update to array of struct/map and map of struct/map requires a read from the document store and will hence reduce update rate - see #10892. Updating tensor cell values is a memory-only operation: copy tensor, update, write back. For large tensors, this implicates reading and writing a large chunk of memory for single cell updates.

## Parent/child

Parent documents are global, i.e. has a replica on all nodes. Writing to fields in parent documents often simplify logic, compared to the de-normalized case where all (child) documents are updated. Write performance depends on the average number of child documents vs number of nodes in the cluster - examples:

• 10-node cluster, avg number of children=100, redundancy=2: A parent write means 10 writes, compared to 200 writes, or 20x better
• 50-node cluster, avg number of children=10, redundancy=2: A parent write means 50 writes, compared to 20 writes, or 2.5x worse
Hence, the more children, the better performance effect for parent writes.

A conditional update looks like:

{
"update" : "id:namespace:myDoc::1",
"condition" : "myDoc.myField == \"abc\"",
"fields" : { "myTimestamp" : { "assign" : 1570187817 } }
}
If the document store is accessed when evaluating the condition, performance drops. Conditions should be evaluated using attribute values for high performance - in the example above, myField should be an attribute.

Note: If the condition uses struct or map, values are read from the document store:

"condition" : "myDoc.myMap{1} == 3"
This is true even though all struct fields are defined as attribute. Improvements to this is tracked in #10892.

## Client roundtrips

Consider the difference when sending two fields assignments to the same document:

{
"update" : "id:namespace:doctype::1",
"fields" : {
"myMap{1}" : { "assign" : { "timestamp" : 1570187817 } }
"myMap{2}" : { "assign" : { "timestamp" : 1570187818 } }
}
}
{
"update" : "id:namespace:doctype::1",
"fields" : {
"myMap{1}" : { "assign" : { "timestamp" : 1570187817 } }
}
}
{
"update" : "id:namespace:doctype::1",
"fields" : {
"myMap{2}" : { "assign" : { "timestamp" : 1570187818 } }
}
}
In the first case, one update operation is sent from the vespa-http-client - in the latter, the client will send the second update operation after receiving and ack for the first. When updating multiple fields, put the updates in as few operations as possible. See Ordering details.

A content node normally has a fixed set of resources (CPU, memory, disk). Configure the CPU allocation for feeding vs. searching in concurrency - value from 0 to 1.0 - a higher value means more CPU resources for feeding.

## Feed testing

When testing for feeding capacity:

1. Use the vespa-http-client in asynchronous mode.
2. Test using one content node to find its capacity
3. Test feeding performance by adding feeder instances. Make sure network and CPU (content and container node) usage increases, until saturation.
4. See troubleshooting at end to make sure there are no errors
Other scenarios: Feed testing for capacity for sustained load in a system in steady state, during state changes, during query load.