A content cluster blocks external write operations when at least one content node has reached the resource limit of disk or memory. This is done to avoid saturating content nodes. The Cluster controller monitors the resource usage of the content nodes and decides whether to block feeding. Transient resource usage (see details in the metrics below) is not included in the monitored usage. This ensures that transient resource usage is covered by the resource headroom on the content nodes, instead of leading to feed blocked due to natural fluctuations.
When feed is blocked, write operations are rejected by Distributors. All Put operations and most Update operations are rejected. These operations are still allowed:
assignoperations to numeric single-value fields
To remedy, add nodes to the content cluster, or use nodes with higher capacity. The data will auto-redistribute, and feeding is unblocked when all content nodes are below the limits. Configure resource-limits to tune this.
These metrics are used to monitor resource usage and whether feeding is blocked:
|cluster-controller.resource_usage.nodes_above_limit||The number of content nodes that are above one or more resource limits. When above 0, feeding is blocked.|
|content.proton.resource_usage.disk||A number between 0 and 1, indicating how much disk (of total available) is used on the content node. Transient disk used during disk index fusion is not included.|
|content.proton.resource_usage.memory||A number between 0 and 1, indicating how much memory (of total available) is used on the content node. Transient memory used by memory indexes is not included.|
When feeding is blocked, error messages are returned in write operation replies - example:
ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: memory on node 0 [my-vespa-node-0.example.com] (0.82 > 0.80))
The address space used by data structures in attributes (Multivalue Mapping, Enum Store, and Tensor Store) can also go full and block feeding - see attribute data structures for details. This will rarely happen. The following metric is used to monitor address space usage:
|content.proton.documentdb.attribute.resource_usage.address_space.max||A number between 0 and 1, indicating how much address space is used by the worst attribute data structure on the content node.|
An error is returned when the address space limit (default value is 0.90) is exceeded:
ReturnCode(NO_SPACE, External feed is blocked due to resource exhaustion: attribute-address-space:test.ready.a1.enum-store on node 0 [my-vespa-node-0.example.com] (0.91 > 0.90))
To remedy, add nodes to the content cluster to distribute documents with attributes over more nodes.