Vespa Feed Sizing Guide

When updating (put/update/remove), consider:

These data structures might or might not be updated, depending on the search definition. The transaction log is always written. General notes:
  1. Putting a document will always update the document store, a file append.
  2. A partial update on an indexed field will read the document from the document store, change the field, and write it back - and also update the memory index.
  3. To improve feed throughput, trading off freshness, increase visibility-delay to batch writes on the content nodes for higher write performance. This trades off latency - writes will take effect in search results after visibility-delay seconds. This is particularly useful when batch feeding, like initial bootstrap or grid jobs.
  4. Configure the CPU allocation for feeding vs. searching in concurrency.
  5. The feeding endpoint is run in a stateless container cluster. Documents flow from this cluster to the content node cluster. Hence both this cluster and the content node cluster must be evaluated to find the bottleneck. For the stateless cluster, check CPU usage and GC metrics - the heap must not be too low. Some applications use separate container clusters for feeding (<document-api>) and searching (<search>).

Document updates

Consider the difference when sending two fields assignments two the same document:

{
    "update" : "id:namespace:doctype::1",
    "fields" : {
        "myMap{1}" : {
            "assign" : {
                "timestamp" : 1570187817
            }
        }
        "myMap{2}" : {
            "assign" : {
                "timestamp" : 1570187818
            }
        }
    }
}
{
    "update" : "id:namespace:doctype::1",
    "fields" : {
        "myMap{1}" : {
            "assign" : {
                "timestamp" : 1570187817
            }
        }
    }
}
{
    "update" : "id:namespace:doctype::1",
    "fields" : {
        "myMap{2}" : {
            "assign" : {
                "timestamp" : 1570187818
            }
        }
    }
}
In the first case, one update operation is sent from the vespa-http-client - in the latter, the client will send the second update operation after receiving and ack for the first. Ordering details.

Feed updates with high throughput

A Vespa content node can sustain a high document update rate, given that all data to be updated is in memory. Complete step 3-5 in previous section - then:

  1. Make sure all fields that are updated are attributes.
  2. Make sure all attributes to update are ready, meaning the content node has loaded the attribute into memory. One way to ensure this is to set searchable copies equal to redundancy - i.e. all nodes that has a replica of the document has loaded it as searchable. Another way is by setting fast-access on each attribute to update.
  3. Update to array of struct/map and map of struct/map requires a read from the document store and will hence reduce update rate - see #10892.
  4. Updates to multivalue fields are more expensive as the size grows. Example: a weighted set of string with many (1000+) elements. Adding an element to the set means an enum store lookup/add and add/sort of the attribute multivalue map - details in attributes. Use a numeric type instead to speed this up - this has no string comparisons.

Feed testing

When testing for feeding capacity:

  1. Use the Java HTTP client in asynchronous mode.
  2. Test using one content node to find its capacity
  3. Test feeding performance by adding feeder instances. Make sure network and CPU (content and container node) usage increases, until saturation.
  4. See troubleshooting at end to make sure there are no errors
Other scenarios: Feed testing for capacity for sustained load in a system in steady state, during state changes, during query load.

Troubleshooting

Find a non-exhaustive list of things to check below:

Metrics

Use metrics from content nodes and look at queues - queue wait time and queue size (all metrics in milliseconds):

vds.filestor.alldisks.averagequeuewait.sum
vds.filestor.alldisks.queuesize
Check content node metrics across all nodes to see if there are any outliers. Also check
vds.filestor.alldisks.allthreads.update.sum.latency

Failure rates

vds.distributor.updates.sum.latency
vds.distributor.updates.sum.ok
vds.distributor.updates.sum.failures.total
vds.distributor.puts.sum.latency
vds.distributor.puts.sum.ok
vds.distributor.puts.sum.failures.total
vds.distributor.removes.sum.latency
vds.distributor.removes.sum.ok
vds.distributor.removes.sum.failures.total

Blocked feeding

Should be 0 everywhere - refer to feed block:

content.proton.resource_usage.feeding_blocked

Concurrent mutations

vds.distributor.updates.sum.failures.concurrent_mutations
Mutating client operations towards a given document ID are sequenced on the distributors. If an operation is already active towards a document, a subsequently arriving one will be bounced back to the client with a transient failure code. Usually this happens when users send feed from multiple clients concurrently without synchronisation. Note that feed operations sent by a single client are sequenced client-side, so this should not be observed with a single client only. Bounced operations are never sent on to the backends and should not cause elevated latencies there, although the client will observe higher latencies due to automatic retries with back-off.

Wrong distribution

vds.distributor.updates.sum.failures.wrongdistributor
Indicates that clients keep sending to the wrong distributor. Normally this happens infrequently (but is does happen on client startup or distributor state transitions), as clients update and cache all state required to route directly to the correct distributor (Vespa uses a deterministic CRUSH-based algorithmic distribution). Some potential reasons for this:
  1. Clients are being constantly re-created with no cached state.
  2. The system is in some kind of flux where the underlying state keeps changing constantly.
  3. The client distribution policy has received so many errors that it throws away its cached state to start with a clean slate to e.g. avoid the case where it only has cached information for the bad side of a network partition.
  4. The system has somehow failed to converge to a shared cluster state, causing parts of the cluster to have a different idea of the correct state than others.

Cluster out of sync

update_puts/gets indicate "two-phase" updates:

vds.distributor.update_puts.sum.latency
vds.distributor.update_puts.sum.ok
vds.distributor.update_gets.sum.latency
vds.distributor.update_gets.sum.ok
vds.distributor.update_gets.sum.failures.total
vds.distributor.update_gets.sum.failures.notfound
If replicas are out of sync, updates cannot be applied directly on the replica nodes as they risk ending up with diverging state. In this case, Vespa performs an explicit read-consolidate-write (write repair) operation on the distributors. This is usually a lot slower than the regular update path because it doesn’t happen in parallel. It also happens in the write path of other operations, so risks blocking these if the updates are expensive in terms of CPU.

Replicas being out of sync is by definition not the expected steady state of the system. For example, replica divergence can happen if one or more replica nodes are unable to process or persist operations. Track (pending) merges:

vds.idealstate.buckets
vds.idealstate.merge_bucket.pending
vds.idealstate.merge_bucket.done_ok
vds.idealstate.merge_bucket.done_failed

Document Store

When debugging update performance, it is useful to know if an update hits the document store or not. Enable spam log level and look for SummaryAdapter::put - then do an update:

$ vespa-logctl searchnode:proton.server.summaryadapter spam=on
.proton.server.summaryadapter    ON  ON  ON  ON  ON  ON OFF  ON

$ vespa-logfmt -l all -f | grep 'SummaryAdapter::put'
[2019-10-10 12:16:47.339] SPAM    : searchnode       proton.proton.server.summaryadapter	summaryadapter.cpp:45 SummaryAdapter::put(serialnum = '12', lid = 1, stream size = '199')
Existence of such log messages indicates that the update was accessing the document store.

Conditional updates

A conditional update looks like:

{
    "condition" : "myDoc.myField == \"abc\"",
    "update" : "id:namespace:myDoc::1",
    "fields" : {
        "myTimestamp" : {
            "assign" : 1570187817
        }
    }
}
If the document store is accessed when evaluating the condition, performance drops. Conditions should be evaluated using attribute values for high performance - in the example above, myField should be an attribute.

Note: If the condition uses struct or map, values are read from the document store:

    "condition": "myDoc.myMap{1} == 3"
This is true even though all struct fields are defined as attribute. Improvements to this is tracked in #10892.