Vespa Metric Set

This document provides reference documentation for the Vespa metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.

ClusterController Metrics

Name	Unit	Suffixes	Description
cluster-controller.down.count	node	last, max	Number of content nodes down
cluster-controller.initializing.count	node	last, max	Number of content nodes initializing
cluster-controller.maintenance.count	node	last, max	Number of content nodes in maintenance
cluster-controller.retired.count	node	last, max	Number of content nodes that are retired
cluster-controller.stopping.count	node	last	Number of content nodes currently stopping
cluster-controller.up.count	node	last, max	Number of content nodes up
cluster-controller.nodes-not-converged	node	max	Number of nodes not converging to the latest cluster state version
cluster-controller.stored-document-count	document	max	Total number of unique documents stored in the cluster
cluster-controller.stored-document-bytes	byte	max	Combined byte size of all unique documents stored in the cluster (not including replication)
cluster-controller.cluster-buckets-out-of-sync-ratio	fraction	max	Ratio of buckets in the cluster currently in need of syncing
cluster-controller.busy-tick-time-ms	millisecond	count, last, max, sum	Time busy
cluster-controller.idle-tick-time-ms	millisecond	count, last, max, sum	Time idle
cluster-controller.work-ms	millisecond	count, last, sum	Time used for actual work
cluster-controller.is-master	binary	last, max	1 if this cluster controller is currently the master, or 0 if not
cluster-controller.remote-task-queue.size	operation	last	Number of remote tasks queued
cluster-controller.resource_usage.nodes_above_limit	node	last, max	The number of content nodes above resource limit, blocking feed
cluster-controller.resource_usage.max_memory_utilization	fraction	last, max	Current memory utilisation, for content node with the highest value
cluster-controller.resource_usage.max_disk_utilization	fraction	last, max	Current disk space utilisation, for content node with the highest value
cluster-controller.resource_usage.memory_limit	fraction	last, max	Memory space limit as a fraction of available memory
cluster-controller.resource_usage.disk_limit	fraction	last, max	Disk space limit as a fraction of available disk space
reindexing.progress	fraction	last, max	Re-indexing progress

Container Metrics

Name	Unit	Suffixes	Description
http.status.1xx	response	rate	Number of responses with a 1xx status
http.status.2xx	response	rate	Number of responses with a 2xx status
http.status.3xx	response	rate	Number of responses with a 3xx status
http.status.4xx	response	rate	Number of responses with a 4xx status
http.status.5xx	response	rate	Number of responses with a 5xx status
application_generation	version	N/A	The currently live application config generation (aka session id)
jdisc.gc.count	operation	average, last, max	Number of JVM garbage collections done
jdisc.gc.ms	millisecond	average, last, max	Time spent in JVM garbage collection
jdisc.jvm	version	last	JVM runtime version
jdisc.memory_mappings	operation	max	JDISC Memory mappings
jdisc.open_file_descriptors	item	max	JDISC Open file descriptors
jdisc.thread_pool.unhandled_exceptions	thread	count, last, max, min, sum	Number of exceptions thrown by tasks
jdisc.thread_pool.work_queue.capacity	thread	count, last, max, min, sum	Capacity of the task queue
jdisc.thread_pool.work_queue.size	thread	count, last, max, min, sum	Size of the task queue
jdisc.thread_pool.rejected_tasks	thread	count, last, max, min, sum	Number of tasks rejected by the thread pool
jdisc.thread_pool.size	thread	count, last, max, min, sum	Size of the thread pool
jdisc.thread_pool.max_allowed_size	thread	count, last, max, min, sum	The maximum allowed number of threads in the pool
jdisc.thread_pool.active_threads	thread	count, last, max, min, sum	Number of threads that are active
jdisc.deactivated_containers.total	item	last, sum	JDISC Deactivated container instances
jdisc.deactivated_containers.with_retained_refs.last	item	last	JDISC Deactivated container nodes with retained refs
jdisc.application.failed_component_graphs	item	rate	JDISC Application failed component graphs
jdisc.application.component_graph.creation_time_millis	millisecond	last	JDISC Application component graph creation time
jdisc.application.component_graph.reconfigurations	item	rate	JDISC Application component graph reconfigurations
jdisc.singleton.is_active	item	last, max, min	JDISC Singleton is active
jdisc.singleton.activation.count	operation	last	JDISC Singleton activations
jdisc.singleton.activation.failure.count	operation	last	JDISC Singleton activation failures
jdisc.singleton.activation.millis	millisecond	last	JDISC Singleton activation time
jdisc.singleton.deactivation.count	operation	last	JDISC Singleton deactivations
jdisc.singleton.deactivation.failure.count	operation	last	JDISC Singleton deactivation failures
jdisc.singleton.deactivation.millis	millisecond	last	JDISC Singleton deactivation time
jdisc.http.ssl.handshake.failure.missing_client_cert	operation	rate	JDISC HTTP SSL Handshake failures due to missing client certificate
jdisc.http.ssl.handshake.failure.expired_client_cert	operation	rate	JDISC HTTP SSL Handshake failures due to expired client certificate
jdisc.http.ssl.handshake.failure.invalid_client_cert	operation	rate	JDISC HTTP SSL Handshake failures due to invalid client certificate
jdisc.http.ssl.handshake.failure.incompatible_protocols	operation	rate	JDISC HTTP SSL Handshake failures due to incompatible protocols
jdisc.http.ssl.handshake.failure.incompatible_chifers	operation	rate	JDISC HTTP SSL Handshake failures due to incompatible chifers
jdisc.http.ssl.handshake.failure.connection_closed	operation	rate	JDISC HTTP SSL Handshake failures due to connection closed
jdisc.http.ssl.handshake.failure.unknown	operation	rate	JDISC HTTP SSL Handshake failures for unknown reason
jdisc.http.request.prematurely_closed	request	rate	HTTP requests prematurely closed
jdisc.http.request.requests_per_connection	request	average, count, max, min, sum	HTTP requests per connection
jdisc.http.request.uri_length	byte	count, max, sum	HTTP URI length
jdisc.http.request.content_size	byte	count, max, sum	HTTP request content size
jdisc.http.requests	request	count, rate	HTTP requests
jdisc.http.filter.rule.blocked_requests	request	rate	Number of requests blocked by filter
jdisc.http.filter.rule.allowed_requests	request	rate	Number of requests allowed by filter
jdisc.http.filtering.request.handled	request	rate	Number of filtering requests handled
jdisc.http.filtering.request.unhandled	request	rate	Number of filtering requests unhandled
jdisc.http.filtering.response.handled	request	rate	Number of filtering responses handled
jdisc.http.filtering.response.unhandled	request	rate	Number of filtering responses unhandled
jdisc.http.handler.unhandled_exceptions	request	rate	Number of unhandled exceptions in handler
jdisc.tls.capability_checks.succeeded	operation	rate	Number of TLS capability checks succeeded
jdisc.tls.capability_checks.failed	operation	rate	Number of TLS capability checks failed
jdisc.http.jetty.threadpool.thread.max	thread	count, last, max, min, sum	Configured maximum number of threads
jdisc.http.jetty.threadpool.thread.min	thread	count, last, max, min, sum	Configured minimum number of threads
jdisc.http.jetty.threadpool.thread.reserved	thread	count, last, max, min, sum	Configured number of reserved threads or -1 for heuristic
jdisc.http.jetty.threadpool.thread.busy	thread	count, last, max, min, sum	Number of threads executing internal and transient jobs
jdisc.http.jetty.threadpool.thread.total	thread	count, last, max, min, sum	Current number of threads
jdisc.http.jetty.threadpool.queue.size	thread	count, last, max, min, sum	Current size of the job queue
jdisc.http.jetty.http_compliance.violation	failure	rate	Number of HTTP compliance violations
serverNumOpenConnections	connection	average, last, max	The number of currently open connections
serverNumConnections	connection	average, last, max	The total number of connections opened
serverBytesReceived	byte	count, sum	The number of bytes received by the server
serverBytesSent	byte	count, sum	The number of bytes sent from the server
handled.requests	operation	count	The number of requests handled per metrics snapshot
handled.latency	millisecond	count, max, sum	The time used for requests during this metrics snapshot
httpapi_latency	millisecond	count, max, sum	Duration for requests to the HTTP document APIs
httpapi_pending	operation	count, max, sum	Document operations pending execution
httpapi_num_operations	operation	rate	Total number of document operations performed
httpapi_num_updates	operation	rate	Document update operations performed
httpapi_num_removes	operation	rate	Document remove operations performed
httpapi_num_puts	operation	rate	Document put operations performed
httpapi_succeeded	operation	rate	Document operations that succeeded
httpapi_failed	operation	rate	Document operations that failed
httpapi_parse_error	operation	rate	Document operations that failed due to document parse errors
httpapi_condition_not_met	operation	rate	Document operations not applied due to condition not met
httpapi_not_found	operation	rate	Document operations not applied due to document not found
httpapi_failed_unknown	operation	rate	Document operations failed by unknown cause
httpapi_failed_timeout	operation	rate	Document operations failed by timeout
httpapi_failed_insufficient_storage	operation	rate	Document operations failed by insufficient storage
httpapi_queued_operations	operation	last	Document operations queued for execution in /document/v1 API handler
httpapi_queued_bytes	byte	last	Total operation bytes queued for execution in /document/v1 API handler
httpapi_queued_age	second	last	Age in seconds of the oldest operation in the queue for /document/v1 API handler
httpapi_mbus_window_size	operation	last	The window size of Messagebus's dynamic throttle policy for /document/v1 API handler
mem.heap.total	byte	average	Total available heap memory
mem.heap.free	byte	average	Free heap memory
mem.heap.used	byte	average, max	Currently used heap memory
mem.direct.total	byte	average	Total available direct memory
mem.direct.free	byte	average	Currently free direct memory
mem.direct.used	byte	average, max	Direct memory currently used
mem.direct.count	byte	max	Number of direct memory allocations
mem.native.total	byte	average	Total available native memory
mem.native.free	byte	average	Currently free native memory
mem.native.used	byte	average	Native memory currently used
athenz-tenant-cert.expiry.seconds	second	last, max, min	Time remaining until Athenz tenant certificate expires
container-iam-role.expiry.seconds	second	N/A	Time remaining until IAM role expires
peak_qps	query_per_second	max	The highest number of qps for a second for this metrics snapshot
search_connections	connection	count, max, sum	Number of search connections
feed.operations	operation	rate	Number of document feed operations
feed.latency	millisecond	count, max, sum	Feed latency
feed.http-requests	operation	count, rate	Feed HTTP requests
queries	operation	rate	Query volume
query_container_latency	millisecond	count, max, sum	The query execution time consumed in the container
query_latency	millisecond	count, max, sum	The overall query latency as seen by the container
query_timeout	millisecond	count, max, min, sum	The amount of time allowed for query execution, from the client
failed_queries	operation	rate	The number of failed queries
degraded_queries	operation	rate	The number of degraded queries, e.g. due to some content nodes not responding in time
hits_per_query	hit_per_query	count, max, sum	The number of hits returned
query_hit_offset	hit	count, max, sum	The offset for hits returned
documents_covered	document	count	The combined number of documents considered during query evaluation
documents_total	document	count	The number of documents to be evaluated if all requests had been fully executed
documents_target_total	document	count	The target number of total documents to be evaluated when all data is in sync
jdisc.render.latency	nanosecond	average, count, last, max, min, sum	The time used by the container to render responses
query_item_count	item	count, max, sum	The number of query items (terms, phrases, etc.)
docproc.proctime	millisecond	count, max, sum	Time spent processing document
docproc.documents	document	count, max, min, sum	Number of processed documents
totalhits_per_query	hit_per_query	count, max, sum	The total number of documents found to match queries
empty_results	operation	rate	Number of queries matching no documents
requestsOverQuota	operation	count, rate	The number of requests rejected due to exceeding quota
relevance.at_1	score	count, sum	The relevance of hit number 1
relevance.at_3	score	count, sum	The relevance of hit number 3
relevance.at_10	score	count, sum	The relevance of hit number 10
error.timeout	operation	rate	Requests that timed out
error.backends_oos	operation	rate	Requests that failed due to no available backends nodes
error.plugin_failure	operation	rate	Requests that failed due to plugin failure
error.backend_communication_error	operation	rate	Requests that failed due to backend communication error
error.empty_document_summaries	operation	rate	Requests that failed due to missing document summaries
error.invalid_query_parameter	operation	rate	Requests that failed due to invalid query parameters
error.internal_server_error	operation	rate	Requests that failed due to internal server error
error.misconfigured_server	operation	rate	Requests that failed due to misconfigured server
error.invalid_query_transformation	operation	rate	Requests that failed due to invalid query transformation
error.results_with_errors	operation	rate	The number of queries with error payload
error.unspecified	operation	rate	Requests that failed for an unspecified reason
error.unhandled_exception	operation	rate	Requests that failed due to an unhandled exception
serverRejectedRequests	operation	count, rate	Deprecated. Use jdisc.thread_pool.rejected_tasks instead.
serverThreadPoolSize	thread	last, max	Deprecated. Use jdisc.thread_pool.size instead.
serverActiveThreads	thread	count, last, max, min, sum	Deprecated. Use jdisc.thread_pool.active_threads instead.
jrt.transport.tls-certificate-verification-failures	failure	N/A	TLS certificate verification failures
jrt.transport.peer-authorization-failures	failure	N/A	TLS peer authorization failures
jrt.transport.server.tls-connections-established	connection	N/A	TLS server connections established
jrt.transport.client.tls-connections-established	connection	N/A	TLS client connections established
jrt.transport.server.unencrypted-connections-established	connection	N/A	Unencrypted server connections established
jrt.transport.client.unencrypted-connections-established	connection	N/A	Unencrypted client connections established
embedder.latency	millisecond	count, max, sum	Time spent creating an embedding
embedder.sequence_length	byte	count, max, sum	Size of sequence produced by tokenizer

Distributor Metrics

Name	Unit	Suffixes	Description
vds.idealstate.buckets_rechecking	bucket	average	The number of buckets that we are rechecking for ideal state operations
vds.idealstate.idealstate_diff	bucket	average	A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state
vds.idealstate.buckets_toofewcopies	bucket	average	The number of buckets the distributor controls that have less than the desired redundancy
vds.idealstate.buckets_toomanycopies	bucket	average	The number of buckets the distributor controls that have more than the desired redundancy
vds.idealstate.buckets	bucket	average	The number of buckets the distributor controls
vds.idealstate.buckets_notrusted	bucket	average	The number of buckets that have no trusted copies.
vds.idealstate.bucket_replicas_moving_out	bucket	average	Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority.
vds.idealstate.bucket_replicas_copying_out	bucket	average	Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge
vds.idealstate.bucket_replicas_copying_in	bucket	average	Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for
vds.idealstate.bucket_replicas_syncing	bucket	average	Bucket replicas that need syncing due to mismatching metadata
vds.idealstate.max_observed_time_since_last_gc_sec	second	average	Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor.
vds.idealstate.delete_bucket.done_ok	operation	rate	The number of operations successfully performed
vds.idealstate.delete_bucket.done_failed	operation	rate	The number of operations that failed
vds.idealstate.delete_bucket.pending	operation	average	The number of operations pending
vds.idealstate.merge_bucket.done_ok	operation	rate	The number of operations successfully performed
vds.idealstate.merge_bucket.done_failed	operation	rate	The number of operations that failed
vds.idealstate.merge_bucket.pending	operation	average	The number of operations pending
vds.idealstate.merge_bucket.blocked	operation	rate	The number of operations blocked by blocking operation starter
vds.idealstate.merge_bucket.throttled	operation	rate	The number of operations throttled by throttling operation starter
vds.idealstate.merge_bucket.source_only_copy_changed	operation	rate	The number of merge operations where source-only copy changed
vds.idealstate.merge_bucket.source_only_copy_delete_blocked	operation	rate	The number of merge operations where delete of unchanged source-only copies was blocked
vds.idealstate.merge_bucket.source_only_copy_delete_failed	operation	rate	The number of merge operations where delete of unchanged source-only copies failed
vds.idealstate.split_bucket.done_ok	operation	rate	The number of operations successfully performed
vds.idealstate.split_bucket.done_failed	operation	rate	The number of operations that failed
vds.idealstate.split_bucket.pending	operation	average	The number of operations pending
vds.idealstate.join_bucket.done_ok	operation	rate	The number of operations successfully performed
vds.idealstate.join_bucket.done_failed	operation	rate	The number of operations that failed
vds.idealstate.join_bucket.pending	operation	average	The number of operations pending
vds.idealstate.garbage_collection.done_ok	operation	rate	The number of operations successfully performed
vds.idealstate.garbage_collection.done_failed	operation	rate	The number of operations that failed
vds.idealstate.garbage_collection.pending	operation	average	The number of operations pending
vds.idealstate.garbage_collection.documents_removed	document	count, rate	Number of documents removed by GC operations
vds.distributor.puts.latency	millisecond	count, max, sum	The latency of put operations
vds.distributor.puts.ok	operation	rate	The number of successful put operations performed
vds.distributor.puts.failures.total	operation	rate	Sum of all failures
vds.distributor.puts.failures.notfound	operation	rate	The number of operations that failed because the document did not exist
vds.distributor.puts.failures.test_and_set_failed	operation	rate	The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document
vds.distributor.puts.failures.concurrent_mutations	operation	rate	The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID
vds.distributor.puts.failures.notconnected	operation	rate	The number of operations discarded because there were no available storage nodes to send to
vds.distributor.puts.failures.notready	operation	rate	The number of operations discarded because distributor was not ready
vds.distributor.puts.failures.wrongdistributor	operation	rate	The number of operations discarded because they were sent to the wrong distributor
vds.distributor.puts.failures.safe_time_not_reached	operation	rate	The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed
vds.distributor.puts.failures.storagefailure	operation	rate	The number of operations that failed in storage
vds.distributor.puts.failures.timeout	operation	rate	The number of operations that failed because the operation timed out towards storage
vds.distributor.puts.failures.busy	operation	rate	The number of messages from storage that failed because the storage node was busy
vds.distributor.puts.failures.inconsistent_bucket	operation	rate	The number of operations failed due to buckets being in an inconsistent state or not found
vds.distributor.removes.latency	millisecond	count, max, sum	The latency of remove operations
vds.distributor.removes.ok	operation	rate	The number of successful removes operations performed
vds.distributor.removes.failures.total	operation	rate	Sum of all failures
vds.distributor.removes.failures.notfound	operation	rate	The number of operations that failed because the document did not exist
vds.distributor.removes.failures.test_and_set_failed	operation	rate	The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document
vds.distributor.removes.failures.concurrent_mutations	operation	rate	The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID
vds.distributor.updates.latency	millisecond	count, max, sum	The latency of update operations
vds.distributor.updates.ok	operation	rate	The number of successful updates operations performed
vds.distributor.updates.failures.total	operation	rate	Sum of all failures
vds.distributor.updates.failures.notfound	operation	rate	The number of operations that failed because the document did not exist
vds.distributor.updates.failures.test_and_set_failed	operation	rate	The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document
vds.distributor.updates.failures.concurrent_mutations	operation	rate	The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID
vds.distributor.updates.diverging_timestamp_updates	operation	rate	Number of updates that report they were performed against divergent version timestamps on different replicas
vds.distributor.removelocations.ok	operation	rate	The number of successful removelocations operations performed
vds.distributor.removelocations.failures.total	operation	rate	Sum of all failures
vds.distributor.gets.latency	millisecond	count, max, sum	The average latency of gets operations
vds.distributor.gets.ok	operation	rate	The number of successful gets operations performed
vds.distributor.gets.failures.total	operation	rate	Sum of all failures
vds.distributor.gets.failures.notfound	operation	rate	The number of operations that failed because the document did not exist
vds.distributor.visitor.latency	millisecond	count, max, sum	The average latency of visitor operations
vds.distributor.visitor.ok	operation	rate	The number of successful visitor operations performed
vds.distributor.visitor.failures.total	operation	rate	Sum of all failures
vds.distributor.visitor.failures.notready	operation	rate	The number of operations discarded because distributor was not ready
vds.distributor.visitor.failures.notconnected	operation	rate	The number of operations discarded because there were no available storage nodes to send to
vds.distributor.visitor.failures.wrongdistributor	operation	rate	The number of operations discarded because they were sent to the wrong distributor
vds.distributor.visitor.failures.safe_time_not_reached	operation	rate	The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed
vds.distributor.visitor.failures.storagefailure	operation	rate	The number of operations that failed in storage
vds.distributor.visitor.failures.timeout	operation	rate	The number of operations that failed because the operation timed out towards storage
vds.distributor.visitor.failures.busy	operation	rate	The number of messages from storage that failed because the storage node was busy
vds.distributor.visitor.failures.inconsistent_bucket	operation	rate	The number of operations failed due to buckets being in an inconsistent state or not found
vds.distributor.visitor.failures.notfound	operation	rate	The number of operations that failed because the document did not exist
vds.distributor.docsstored	document	average	Number of documents stored in all buckets controlled by this distributor
vds.distributor.bytesstored	byte	average	Number of bytes stored in all buckets controlled by this distributor
vds.bouncer.clock_skew_aborts	operation	count	Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range

Logd Metrics

Name	Unit	Suffixes	Description
logd.processed.lines	item	count	Number of log lines processed

NodeAdmin Metrics

Name	Unit	Suffixes	Description
endpoint.certificate.expiry.seconds	second	N/A	Time until node endpoint certificate expires
node-certificate.expiry.seconds	second	N/A	Time until node certificate expires

SearchNode Metrics

Name	Unit	Suffixes	Description
content.proton.config.generation	version	last	The oldest config generation used by this search node
content.proton.documentdb.documents.total	document	last, max	The total number of documents in this documents db (ready + not-ready)
content.proton.documentdb.documents.ready	document	last, max	The number of ready documents in this document db
content.proton.documentdb.documents.active	document	last, max	The number of active / searchable documents in this document db
content.proton.documentdb.documents.removed	document	last, max	The number of removed documents in this document db
content.proton.documentdb.index.docs_in_memory	document	last, max	Number of documents in memory index
content.proton.documentdb.disk_usage	byte	last	The total disk usage (in bytes) for this document db
content.proton.documentdb.memory_usage.allocated_bytes	byte	max	The number of allocated bytes
content.proton.documentdb.heart_beat_age	second	last, min	How long ago (in seconds) heart beat maintenance job was run
content.proton.docsum.docs	document	rate	Total docsums returned
content.proton.docsum.latency	millisecond	count, max, sum	Docsum request latency
content.proton.search_protocol.query.latency	second	count, max, sum	Query request latency (seconds)
content.proton.search_protocol.query.request_size	byte	count, max, sum	Query request size (network bytes)
content.proton.search_protocol.query.reply_size	byte	count, max, sum	Query reply size (network bytes)
content.proton.search_protocol.docsum.latency	second	average, count, max, sum	Docsum request latency (seconds)
content.proton.search_protocol.docsum.request_size	byte	count, max, sum	Docsum request size (network bytes)
content.proton.search_protocol.docsum.reply_size	byte	count, max, sum	Docsum reply size (network bytes)
content.proton.search_protocol.docsum.requested_documents	document	count, max, sum	Total requested document summaries
content.proton.executor.proton.queuesize	task	count, max, sum	Size of executor proton task queue
content.proton.executor.proton.accepted	task	rate	Number of executor proton accepted tasks
content.proton.executor.proton.wakeups	wakeup	rate	Number of times an executor proton worker thread has been woken up
content.proton.executor.proton.utilization	fraction	count, max, sum	Ratio of time the executor proton worker threads has been active
content.proton.executor.flush.queuesize	task	count, max, sum	Size of executor flush task queue
content.proton.executor.flush.accepted	task	rate	Number of accepted executor flush tasks
content.proton.executor.flush.wakeups	wakeup	rate	Number of times an executor flush worker thread has been woken up
content.proton.executor.flush.utilization	fraction	count, max, sum	Ratio of time the executor flush worker threads has been active
content.proton.executor.match.queuesize	task	count, max, sum	Size of executor match task queue
content.proton.executor.match.accepted	task	rate	Number of accepted executor match tasks
content.proton.executor.match.wakeups	wakeup	rate	Number of times an executor match worker thread has been woken up
content.proton.executor.match.utilization	fraction	count, max, sum	Ratio of time the executor match worker threads has been active
content.proton.executor.docsum.queuesize	task	count, max, sum	Size of executor docsum task queue
content.proton.executor.docsum.accepted	task	rate	Number of executor accepted docsum tasks
content.proton.executor.docsum.wakeups	wakeup	rate	Number of times an executor docsum worker thread has been woken up
content.proton.executor.docsum.utilization	fraction	count, max, sum	Ratio of time the executor docsum worker threads has been active
content.proton.executor.shared.queuesize	task	count, max, sum	Size of executor shared task queue
content.proton.executor.shared.accepted	task	rate	Number of executor shared accepted tasks
content.proton.executor.shared.wakeups	wakeup	rate	Number of times an executor shared worker thread has been woken up
content.proton.executor.shared.utilization	fraction	count, max, sum	Ratio of time the executor shared worker threads has been active
content.proton.executor.warmup.queuesize	task	count, max, sum	Size of executor warmup task queue
content.proton.executor.warmup.accepted	task	rate	Number of accepted executor warmup tasks
content.proton.executor.warmup.wakeups	wakeup	rate	Number of times a warmup executor worker thread has been woken up
content.proton.executor.warmup.utilization	fraction	count, max, sum	Ratio of time the executor warmup worker threads has been active
content.proton.executor.field_writer.queuesize	task	count, max, sum	Size of executor field writer task queue
content.proton.executor.field_writer.accepted	task	rate	Number of accepted executor field writer tasks
content.proton.executor.field_writer.wakeups	wakeup	rate	Number of times an executor field writer worker thread has been woken up
content.proton.executor.field_writer.utilization	fraction	count, max, sum	Ratio of time the executor fieldwriter worker threads has been active
content.proton.executor.field_writer.saturation	fraction	count, max, sum	Ratio indicating the max saturation of underlying worker threads. A higher saturation than utilization indicates a bottleneck in one of the worker threads.
content.proton.documentdb.job.total	fraction	average	The job load average total of all job metrics
content.proton.documentdb.job.attribute_flush	fraction	average	Flushing of attribute vector(s) to disk
content.proton.documentdb.job.memory_index_flush	fraction	average	Flushing of memory index to disk
content.proton.documentdb.job.disk_index_fusion	fraction	average	Fusion of disk indexes
content.proton.documentdb.job.document_store_flush	fraction	average	Flushing of document store to disk
content.proton.documentdb.job.document_store_compact	fraction	average	Compaction of document store on disk
content.proton.documentdb.job.bucket_move	fraction	average	Moving of buckets between 'ready' and 'notready' sub databases
content.proton.documentdb.job.lid_space_compact	fraction	average	Compaction of lid space in document meta store and attribute vectors
content.proton.documentdb.job.removed_documents_prune	fraction	average	Pruning of removed documents in 'removed' sub database
content.proton.documentdb.threading_service.master.queuesize	task	count, max, sum	Size of threading service master task queue
content.proton.documentdb.threading_service.master.accepted	task	rate	Number of accepted threading service master tasks
content.proton.documentdb.threading_service.master.wakeups	wakeup	rate	Number of times a threading service master worker thread has been woken up
content.proton.documentdb.threading_service.master.utilization	fraction	count, max, sum	Ratio of time the threading service master worker threads has been active
content.proton.documentdb.threading_service.index.queuesize	task	count, max, sum	Size of threading service index task queue
content.proton.documentdb.threading_service.index.accepted	task	rate	Number of accepted threading service index tasks
content.proton.documentdb.threading_service.index.wakeups	wakeup	rate	Number of times a threading service index worker thread has been woken up
content.proton.documentdb.threading_service.index.utilization	fraction	count, max, sum	Ratio of time the threading service index worker threads has been active
content.proton.documentdb.threading_service.summary.queuesize	task	count, max, sum	Size of threading service summary task queue
content.proton.documentdb.threading_service.summary.accepted	task	rate	Number of accepted threading service summary tasks
content.proton.documentdb.threading_service.summary.wakeups	wakeup	rate	Number of times a threading service summary worker thread has been woken up
content.proton.documentdb.threading_service.summary.utilization	fraction	count, max, sum	Ratio of time the threading service summary worker threads has been active
content.proton.documentdb.ready.lid_space.lid_bloat_factor	fraction	average	The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)
content.proton.documentdb.ready.lid_space.lid_fragmentation_factor	fraction	average	The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)
content.proton.documentdb.ready.lid_space.lid_limit	documentid	last, max	The size of the allocated lid space
content.proton.documentdb.ready.lid_space.highest_used_lid	documentid	last, max	The highest used lid
content.proton.documentdb.ready.lid_space.used_lids	documentid	last, max	The number of lids used
content.proton.documentdb.notready.lid_space.lid_bloat_factor	fraction	average	The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)
content.proton.documentdb.notready.lid_space.lid_fragmentation_factor	fraction	average	The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)
content.proton.documentdb.notready.lid_space.lid_limit	documentid	last, max	The size of the allocated lid space
content.proton.documentdb.notready.lid_space.highest_used_lid	documentid	last, max	The highest used lid
content.proton.documentdb.notready.lid_space.used_lids	documentid	last, max	The number of lids used
content.proton.documentdb.removed.lid_space.lid_bloat_factor	fraction	average	The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)
content.proton.documentdb.removed.lid_space.lid_fragmentation_factor	fraction	average	The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)
content.proton.documentdb.removed.lid_space.lid_limit	documentid	last, max	The size of the allocated lid space
content.proton.documentdb.removed.lid_space.highest_used_lid	documentid	last, max	The highest used lid
content.proton.documentdb.removed.lid_space.used_lids	documentid	last, max	The number of lids used
content.proton.documentdb.bucket_move.buckets_pending	bucket	last, max, sum	The number of buckets left to move
content.proton.resource_usage.disk	fraction	average	The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller
content.proton.resource_usage.disk_usage.total	fraction	max	The total relative amount of disk used by this content node (value in the range [0, 1])
content.proton.resource_usage.disk_usage.total_utilization	fraction	max	The relative amount of disk used compared to the content node disk resource limit
content.proton.resource_usage.disk_usage.transient	fraction	max	The relative amount of transient disk used by this content node (value in the range [0, 1])
content.proton.resource_usage.memory	fraction	average	The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller
content.proton.resource_usage.memory_usage.total	fraction	max	The total relative amount of memory used by this content node (value in the range [0, 1])
content.proton.resource_usage.memory_usage.total_utilization	fraction	max	The relative amount of memory used compared to the content node memory resource limit
content.proton.resource_usage.memory_usage.transient	fraction	max	The relative amount of transient memory used by this content node (value in the range [0, 1])
content.proton.resource_usage.memory_mappings	file	max	The number of memory mapped files
content.proton.resource_usage.open_file_descriptors	file	max	The number of open files
content.proton.resource_usage.feeding_blocked	binary	last, max	Whether feeding is blocked due to resource limits being reached (value is either 0 or 1)
content.proton.resource_usage.malloc_arena	byte	max	Size of malloc arena
content.proton.documentdb.attribute.resource_usage.address_space	fraction	max	The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1])
content.proton.documentdb.attribute.resource_usage.feeding_blocked	binary	max	Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1)
content.proton.resource_usage.cpu_util.setup	fraction	count, max, sum	cpu used by system init and (re-)configuration
content.proton.resource_usage.cpu_util.read	fraction	count, max, sum	cpu used by reading data from the system
content.proton.resource_usage.cpu_util.write	fraction	count, max, sum	cpu used by writing data to the system
content.proton.resource_usage.cpu_util.compact	fraction	count, max, sum	cpu used by internal data re-structuring
content.proton.resource_usage.cpu_util.other	fraction	count, max, sum	cpu used by work not classified as a specific category
content.proton.transactionlog.entries	record	average	The current number of entries in the transaction log
content.proton.transactionlog.disk_usage	byte	average	The disk usage (in bytes) of the transaction log
content.proton.transactionlog.replay_time	second	last, max	The replay time (in seconds) of the transaction log during start-up
content.proton.documentdb.ready.document_store.disk_usage	byte	average	Disk space usage in bytes
content.proton.documentdb.ready.document_store.disk_bloat	byte	average	Disk space bloat in bytes
content.proton.documentdb.ready.document_store.max_bucket_spread	fraction	average	Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)
content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes	byte	average	The number of allocated bytes
content.proton.documentdb.ready.document_store.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes)
content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes	byte	average	The number of bytes on hold
content.proton.documentdb.notready.document_store.disk_usage	byte	average	Disk space usage in bytes
content.proton.documentdb.notready.document_store.disk_bloat	byte	average	Disk space bloat in bytes
content.proton.documentdb.notready.document_store.max_bucket_spread	fraction	average	Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)
content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes	byte	average	The number of allocated bytes
content.proton.documentdb.notready.document_store.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes)
content.proton.documentdb.notready.document_store.memory_usage.dead_bytes	byte	average	The number of dead bytes (<= used_bytes)
content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes	byte	average	The number of bytes on hold
content.proton.documentdb.removed.document_store.disk_usage	byte	average	Disk space usage in bytes
content.proton.documentdb.removed.document_store.disk_bloat	byte	average	Disk space bloat in bytes
content.proton.documentdb.removed.document_store.max_bucket_spread	fraction	average	Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)
content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes	byte	average	The number of allocated bytes
content.proton.documentdb.removed.document_store.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes)
content.proton.documentdb.removed.document_store.memory_usage.dead_bytes	byte	average	The number of dead bytes (<= used_bytes)
content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes	byte	average	The number of bytes on hold
content.proton.documentdb.ready.document_store.cache.memory_usage	byte	average	Memory usage of the cache (in bytes)
content.proton.documentdb.ready.document_store.cache.hit_rate	fraction	average	Rate of hits in the cache compared to number of lookups
content.proton.documentdb.ready.document_store.cache.lookups	operation	rate	Number of lookups in the cache (hits + misses)
content.proton.documentdb.ready.document_store.cache.invalidations	operation	rate	Number of invalidations (erased elements) in the cache.
content.proton.documentdb.notready.document_store.cache.memory_usage	byte	average	Memory usage of the cache (in bytes)
content.proton.documentdb.notready.document_store.cache.hit_rate	fraction	average	Rate of hits in the cache compared to number of lookups
content.proton.documentdb.notready.document_store.cache.lookups	operation	rate	Number of lookups in the cache (hits + misses)
content.proton.documentdb.notready.document_store.cache.invalidations	operation	rate	Number of invalidations (erased elements) in the cache.
content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes	byte	average	The number of allocated bytes
content.proton.documentdb.ready.attribute.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes)
content.proton.documentdb.ready.attribute.memory_usage.dead_bytes	byte	average	The number of dead bytes (<= used_bytes)
content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes	byte	average	The number of bytes on hold
content.proton.documentdb.ready.attribute.disk_usage	byte	average	Disk space usage (in bytes) of the flushed snapshot of this attribute for this document type
content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes	byte	average	The number of allocated bytes
content.proton.documentdb.notready.attribute.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes)
content.proton.documentdb.notready.attribute.memory_usage.dead_bytes	byte	average	The number of dead bytes (<= used_bytes)
content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes	byte	average	The number of bytes on hold
content.proton.index.cache.postinglist.memory_usage	byte	average	Memory usage of the cache (in bytes). Contains disk index posting list files across all document types
content.proton.index.cache.postinglist.hit_rate	fraction	average	Rate of hits in the cache compared to number of lookups. Contains disk index posting list files across all document types
content.proton.index.cache.postinglist.lookups	operation	rate	Number of lookups in the cache (hits + misses). Contains disk index posting list files across all document types
content.proton.index.cache.postinglist.invalidations	operation	rate	Number of invalidations (erased elements) in the cache. Contains disk index posting list files across all document types
content.proton.index.cache.bitvector.memory_usage	byte	average	Memory usage of the cache (in bytes). Contains disk index bitvector files across all document types
content.proton.index.cache.bitvector.hit_rate	fraction	average	Rate of hits in the cache compared to number of lookups. Contains disk index bitvector files across all document types
content.proton.index.cache.bitvector.lookups	operation	rate	Number of lookups in the cache (hits + misses). Contains disk index bitvector files across all document types
content.proton.index.cache.bitvector.invalidations	operation	rate	Number of invalidations (erased elements) in the cache. Contains disk index bitvector files across all document types
content.proton.documentdb.index.memory_usage.allocated_bytes	byte	average	The number of allocated bytes for the memory index for this document type
content.proton.documentdb.index.memory_usage.used_bytes	byte	average	The number of used bytes (<= allocated_bytes) for the memory index for this document type
content.proton.documentdb.index.memory_usage.dead_bytes	byte	average	The number of dead bytes (<= used_bytes) for the memory index for this document type
content.proton.documentdb.index.memory_usage.onhold_bytes	byte	average	The number of bytes on hold for the memory index for this document type
content.proton.documentdb.index.io.search.read_bytes	byte	count, sum	Bytes read from disk index posting list and bitvector files as part of search for this document type
content.proton.documentdb.index.io.search.cached_read_bytes	byte	count, sum	Bytes read from cached disk index posting list and bitvector files as part of search for this document type
content.proton.documentdb.ready.index.disk_usage	byte	average	Disk space usage (in bytes) of this index field in all disk indexes for this document type
content.proton.documentdb.matching.queries	query	rate	Number of queries executed
content.proton.documentdb.matching.soft_doomed_queries	query	rate	Number of queries hitting the soft timeout
content.proton.documentdb.matching.query_latency	second	count, max, sum	Total average latency (sec) when matching and ranking a query
content.proton.documentdb.matching.query_setup_time	second	count, max, sum	Average time (sec) spent setting up and tearing down queries
content.proton.documentdb.matching.docs_matched	document	count, rate	Number of documents matched
content.proton.documentdb.matching.rank_profile.queries	query	rate	Number of queries executed
content.proton.documentdb.matching.rank_profile.soft_doomed_queries	query	rate	Number of queries hitting the soft timeout
content.proton.documentdb.matching.rank_profile.soft_doom_factor	fraction	count, max, min, sum	Factor used to compute soft-timeout
content.proton.documentdb.matching.rank_profile.query_latency	second	count, max, sum	Total average latency (sec) when matching and ranking a query
content.proton.documentdb.matching.rank_profile.query_setup_time	second	count, max, sum	Average time (sec) spent setting up and tearing down queries
content.proton.documentdb.matching.rank_profile.grouping_time	second	count, max, sum	Average time (sec) spent on grouping
content.proton.documentdb.matching.rank_profile.rerank_time	second	count, max, sum	Average time (sec) spent on 2nd phase ranking
content.proton.documentdb.matching.rank_profile.docs_matched	document	count, rate	Number of documents matched
content.proton.documentdb.matching.rank_profile.limited_queries	query	rate	Number of queries limited in match phase
content.proton.documentdb.feeding.commit.operations	operation	count, max, rate, sum	Number of operations included in a commit
content.proton.documentdb.feeding.commit.latency	second	count, max, sum	Latency for commit in seconds

Sentinel Metrics

Name	Unit	Suffixes	Description
sentinel.restarts	restart	count	Number of service restarts done by the sentinel
sentinel.totalRestarts	restart	last, max, sum	Total number of service restarts done by the sentinel since the sentinel was started
sentinel.uptime	second	last	Time the sentinel has been running
sentinel.running	instance	count, last	Number of services the sentinel has running currently

Slobrok Metrics

Name	Unit	Suffixes	Description
slobrok.heartbeats.failed	request	count	Number of heartbeat requests failed
slobrok.missing.consensus	second	count	Number of seconds without full consensus with all other brokers

Storage Metrics

Name	Unit	Suffixes	Description
vds.datastored.alldisks.buckets	bucket	average	Number of buckets managed
vds.datastored.alldisks.docs	document	average	Number of documents stored
vds.datastored.alldisks.bytes	byte	average	Number of bytes stored
vds.visitor.allthreads.averagevisitorlifetime	millisecond	count, max, sum	Average lifetime of a visitor
vds.visitor.allthreads.averagequeuewait	millisecond	count, max, sum	Average time an operation spends in input queue.
vds.visitor.allthreads.queuesize	operation	count, max, sum	Size of input message queue.
vds.visitor.allthreads.completed	operation	rate	Number of visitors completed
vds.visitor.allthreads.created	operation	rate	Number of visitors created.
vds.visitor.allthreads.failed	operation	rate	Number of visitors failed
vds.visitor.allthreads.averagemessagesendtime	millisecond	count, max, sum	Average time it takes for messages to be sent to their target (and be replied to)
vds.visitor.allthreads.averageprocessingtime	millisecond	count, max, sum	Average time used to process visitor requests
vds.filestor.queuesize	operation	count, max, sum	Size of input message queue.
vds.filestor.averagequeuewait	millisecond	count, max, sum	Average time an operation spends in input queue.
vds.filestor.active_operations.size	operation	count, max, sum	Number of concurrent active operations
vds.filestor.active_operations.latency	millisecond	count, max, sum	Latency (in ms) for completed operations
vds.filestor.throttle_window_size	operation	count, max, sum	Current size of async operation throttler window size
vds.filestor.throttle_waiting_threads	thread	count, max, sum	Number of threads waiting to acquire a throttle token
vds.filestor.throttle_active_tokens	instance	count, max, sum	Current number of active throttle tokens
vds.filestor.allthreads.mergemetadatareadlatency	millisecond	count, max, sum	Time spent in a merge step to check metadata of current node to see what data it has.
vds.filestor.allthreads.mergedatareadlatency	millisecond	count, max, sum	Time spent in a merge step to read data other nodes need.
vds.filestor.allthreads.mergedatawritelatency	millisecond	count, max, sum	Time spent in a merge step to write data needed to current node.
vds.filestor.allthreads.merge_put_latency	millisecond	count, max, sum	Latency of individual puts that are part of merge operations
vds.filestor.allthreads.merge_remove_latency	millisecond	count, max, sum	Latency of individual removes that are part of merge operations
vds.filestor.allstripes.throttled_rpc_direct_dispatches	instance	rate	Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy
vds.filestor.allstripes.throttled_persistence_thread_polls	instance	rate	Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy
vds.filestor.allstripes.timeouts_waiting_for_throttle_token	instance	rate	Number of times a persistence thread timed out waiting for an available throttle policy token
vds.filestor.allthreads.put.count	operation	rate	Number of requests processed.
vds.filestor.allthreads.put.failed	operation	rate	Number of failed requests.
vds.filestor.allthreads.put.test_and_set_failed	operation	rate	Number of operations that were skipped due to a test-and-set condition not met
vds.filestor.allthreads.put.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.put.request_size	byte	count, max, sum	Size of requests, in bytes
vds.filestor.allthreads.remove.count	operation	rate	Number of requests processed.
vds.filestor.allthreads.remove.failed	operation	rate	Number of failed requests.
vds.filestor.allthreads.remove.test_and_set_failed	operation	rate	Number of operations that were skipped due to a test-and-set condition not met
vds.filestor.allthreads.remove.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.remove.request_size	byte	count, max, sum	Size of requests, in bytes
vds.filestor.allthreads.get.count	operation	rate	Number of requests processed.
vds.filestor.allthreads.get.failed	operation	rate	Number of failed requests.
vds.filestor.allthreads.get.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.get.request_size	byte	count, max, sum	Size of requests, in bytes
vds.filestor.allthreads.update.count	request	rate	Number of requests processed.
vds.filestor.allthreads.update.failed	request	rate	Number of failed requests.
vds.filestor.allthreads.update.test_and_set_failed	request	rate	Number of requests that were skipped due to a test-and-set condition not met
vds.filestor.allthreads.update.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.update.request_size	byte	count, max, sum	Size of requests, in bytes
vds.filestor.allthreads.createiterator.count	request	rate	Number of requests processed.
vds.filestor.allthreads.createiterator.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.visit.count	request	rate	Number of requests processed.
vds.filestor.allthreads.visit.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.remove_location.count	request	rate	Number of requests processed.
vds.filestor.allthreads.remove_location.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.splitbuckets.count	request	rate	Number of requests processed.
vds.filestor.allthreads.joinbuckets.count	request	rate	Number of requests processed.
vds.filestor.allthreads.deletebuckets.count	request	rate	Number of requests processed.
vds.filestor.allthreads.deletebuckets.failed	request	rate	Number of failed requests.
vds.filestor.allthreads.deletebuckets.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.remove_by_gid.count	request	rate	Number of requests processed.
vds.filestor.allthreads.remove_by_gid.failed	request	rate	Number of failed requests.
vds.filestor.allthreads.remove_by_gid.latency	millisecond	count, max, sum	Latency of successful requests.
vds.filestor.allthreads.setbucketstates.count	request	rate	Number of requests processed.
vds.mergethrottler.averagequeuewaitingtime	millisecond	count, max, sum	Time merges spent in the throttler queue
vds.mergethrottler.queuesize	instance	count, max, sum	Length of merge queue
vds.mergethrottler.active_window_size	instance	count, max, sum	Number of merges active within the pending window size
vds.mergethrottler.estimated_merge_memory_usage	byte	count, max, sum	An estimated upper bound of the memory usage (in bytes) of the merges currently in the active window
vds.mergethrottler.bounced_due_to_back_pressure	instance	rate	Number of merges bounced due to resource exhaustion back-pressure
vds.mergethrottler.locallyexecutedmerges.ok	instance	rate	The number of successful merges for 'locallyexecutedmerges'
vds.mergethrottler.mergechains.ok	operation	rate	The number of successful merges for 'mergechains'
vds.mergethrottler.mergechains.failures.busy	operation	rate	The number of merges that failed because the storage node was busy
vds.mergethrottler.mergechains.failures.total	operation	rate	Sum of all failures
vds.server.network.tls-handshakes-failed	operation	count	Number of client or server connection attempts that failed during TLS handshaking
vds.server.network.peer-authorization-failures	failure	count	Number of TLS connection attempts failed due to bad or missing peer certificate credentials
vds.server.network.client.tls-connections-established	connection	count	Number of secure mTLS connections established
vds.server.network.server.tls-connections-established	connection	count	Number of secure mTLS connections established
vds.server.network.client.insecure-connections-established	connection	count	Number of insecure (plaintext) connections established
vds.server.network.server.insecure-connections-established	connection	count	Number of insecure (plaintext) connections established
vds.server.network.tls-connections-broken	connection	count	Number of TLS connections broken due to failures during frame encoding or decoding
vds.server.network.failed-tls-config-reloads	failure	count	Number of times background reloading of TLS config has failed
vds.server.network.rpc-capability-checks-failed	failure	count	Number of RPC operations that failed due to one or more missing capabilities
vds.server.network.status-capability-checks-failed	failure	count	Number of status page operations that failed due to one or more missing capabilities
vds.server.fnet.num-connections	connection	count	Total number of connection objects