• [+] expand all

Vespa Metric Set

Clustercontroller Metrics

NameDescriptionUnitSuffixes
cluster-controller.down.count Number of content nodes down node last
cluster-controller.initializing.count Number of content nodes initializing node last
cluster-controller.maintenance.count Number of content nodes in maintenance node last
cluster-controller.retired.count Number of content nodes that are retired node last
cluster-controller.stopping.count Number of content nodes currently stopping node last
cluster-controller.up.count Number of content nodes up node last
cluster-controller.cluster-state-change.count Number of nodes changing state node N/A
cluster-controller.busy-tick-time-ms Time busy millisecond last, max, sum, count
cluster-controller.idle-tick-time-ms Time idle millisecond last, max, sum, count
cluster-controller.work-ms Time used for actual work millisecond last, sum, count
cluster-controller.is-master 1 if this cluster controller is currently the master, or 0 if not binary last
cluster-controller.remote-task-queue.size Number of remote tasks queued operation last
cluster-controller.node-event.count Number of node events operation N/A
cluster-controller.resource_usage.nodes_above_limit The number of content nodes above resource limit, blocking feed node last, max
cluster-controller.resource_usage.max_memory_utilization Current memory utilisation, per content node fraction last, max
cluster-controller.resource_usage.max_disk_utilization Current disk space utilisation, per content node fraction last, max
cluster-controller.resource_usage.memory_limit Disk space limit as a fraction of available disk space fraction last
cluster-controller.resource_usage.disk_limit Memory space limit as a fraction of available memory fraction last
reindexing.progress Re-indexing progress fraction last

Configserver Metrics

NameDescriptionUnitSuffixes
configserver.requests Number of requests processed request count
configserver.failedRequests Number of requests that failed request count
configserver.latency Time to complete requests millisecond max, sum, count
configserver.cacheConfigElems Time to complete requests item last
configserver.cacheChecksumElems Number of checksum elements in the cache item last
configserver.hosts The number of nodes being served configuration from the config server cluster node last
configserver.delayedResponses Number of delayed responses response count
configserver.sessionChangeErrors Number of session change errors session count
configserver.zkZNodes Number of ZooKeeper nodes present node last
configserver.zkAvgLatency Average latency for ZooKeeper requests millisecond last
configserver.zkMaxLatency Max latency for ZooKeeper requests millisecond last
configserver.zkConnections Number of ZooKeeper connections connection last
configserver.zkOutstandingRequests Number of ZooKeeper requests in flight request last

Container Metrics

NameDescriptionUnitSuffixes
jrt.transport.tls-certificate-verification-failures TLS certificate verification failures failure N/A
jrt.transport.peer-authorization-failures TLS peer authorization failures failure N/A
jrt.transport.server.tls-connections-established TLS server connections established connection N/A
jrt.transport.client.tls-connections-established TLS client connections established connection N/A
jrt.transport.server.unencrypted-connections-established Unencrypted server connections established connection N/A
jrt.transport.client.unencrypted-connections-established Unencrypted client connections established connection N/A
application_generation The currently live application config generation (aka session id) version N/A
handled.requests The number of requests handled per metrics snapshot operation count
handled.latency The time used for requests during this metrics snapshot millisecond sum, count, max
serverNumOpenConnections The number of currently open connections connection max, last, average
serverNumConnections The total number of connections opened connection max, last, average
serverBytesReceived The number of bytes received by the server byte sum, count
serverBytesSent The number of bytes sent from the server byte sum, count
jdisc.thread_pool.unhandled_exceptions Number of exceptions thrown by tasks thread sum, count, last, min, max
jdisc.thread_pool.work_queue.capacity Capacity of the task queue thread sum, count, last, min, max
jdisc.thread_pool.work_queue.size Size of the task queue thread sum, count, last, min, max
jdisc.thread_pool.rejected_tasks Number of tasks rejected by the thread pool thread sum, count, last, min, max
jdisc.thread_pool.size Size of the thread pool thread sum, count, last, min, max
jdisc.thread_pool.max_allowed_size The maximum allowed number of threads in the pool thread sum, count, last, min, max
jdisc.thread_pool.active_threads Number of threads that are active thread sum, count, last, min, max
jdisc.http.jetty.threadpool.thread.max Configured maximum number of threads thread sum, count, last, min, max
jdisc.http.jetty.threadpool.thread.min Configured minimum number of threads thread sum, count, last, min, max
jdisc.http.jetty.threadpool.thread.reserved Configured number of reserved threads or -1 for heuristic thread sum, count, last, min, max
jdisc.http.jetty.threadpool.thread.busy Number of threads executing internal and transient jobs thread sum, count, last, min, max
jdisc.http.jetty.threadpool.thread.total Current number of threads thread sum, count, last, min, max
jdisc.http.jetty.threadpool.queue.size Current size of the job queue thread sum, count, last, min, max
httpapi_latency Duration for requests to the HTTP document APIs millisecond max, sum, count
httpapi_pending Document operations pending execution operation max, sum, count
httpapi_num_operations Total number of document operations performed operation rate
httpapi_num_updates Document update operations performed operation rate
httpapi_num_removes Document remove operations performed operation rate
httpapi_num_puts Document put operations performed operation rate
httpapi_succeeded Document operations that succeeded operation rate
httpapi_failed Document operations that failed operation rate
httpapi_parse_error Document operations that failed due to document parse errors operation rate
httpapi_condition_not_met Document operations not applied due to condition not met operation rate
httpapi_not_found Document operations not applied due to document not found operation rate
httpapi_failed_unknown Document operations failed by unknown cause operation rate
httpapi_failed_insufficient_storage Document operations failed by insufficient storage operation rate
httpapi_failed_timeout Document operations failed by timeout operation rate
mem.heap.total Total available heap memory byte average
mem.heap.free Free heap memory byte average
mem.heap.used Currently used heap memory byte average, max
mem.direct.total Total available direct memory byte average
mem.direct.free Currently free direct memory byte average
mem.direct.used Direct memory currently used byte average, max
mem.direct.count Number of direct memory allocations byte max
mem.native.total Total available native memory byte average
mem.native.free Currently free native memory byte average
mem.native.used Native memory currently used byte average
jdisc.memory_mappings JDISC Memory mappings operation max
jdisc.open_file_descriptors JDISC Open file descriptors item max
jdisc.gc.count Number of JVM garbage collections done operation average, max, last
jdisc.gc.ms Time spent in JVM garbage collection millisecond average, max, last
jdisc.deactivated_containers.total JDISC Deactivated container instances item last
jdisc.deactivated_containers.with_retained_refs.last JDISC Deactivated container nodes with retained refs item last
jdisc.singleton.is_active JDISC Singleton is active item last
jdisc.singleton.activation.count JDISC Singleton activations operation last
jdisc.singleton.activation.failure.count JDISC Singleton activation failures operation last
jdisc.singleton.activation.millis JDISC Singleton activation time millisecond last
jdisc.singleton.deactivation.count JDISC Singleton deactivations operation last
jdisc.singleton.deactivation.failure.count JDISC Singleton deactivation failures operation last
jdisc.singleton.deactivation.millis JDISC Singleton deactivation time millisecond last
athenz-tenant-cert.expiry.seconds Time remaining until Athenz tenant certificate expires second last
container-iam-role.expiry.seconds Time remaining until IAM role expires second N/A
http.status.1xx Number of responses with a 1xx status response rate
http.status.2xx Number of responses with a 2xx status response rate
http.status.3xx Number of responses with a 3xx status response rate
http.status.4xx Number of responses with a 4xx status response rate
http.status.5xx Number of responses with a 5xx status response rate
jdisc.http.request.prematurely_closed HTTP requests prematurely closed request rate
jdisc.http.request.requests_per_connection HTTP requests per connection request sum, count, min, max, average
jdisc.http.request.uri_length HTTP URI length byte sum, count, max
jdisc.http.request.content_size HTTP request content size byte sum, count, max
jdisc.http.requests HTTP requests request rate, count
jdisc.http.ssl.handshake.failure.missing_client_cert JDISC HTTP SSL Handshake failures due to missing client certificate operation rate
jdisc.http.ssl.handshake.failure.expired_client_cert JDISC HTTP SSL Handshake failures due to expired client certificate operation rate
jdisc.http.ssl.handshake.failure.invalid_client_cert JDISC HTTP SSL Handshake failures due to invalid client certificate operation rate
jdisc.http.ssl.handshake.failure.incompatible_protocols JDISC HTTP SSL Handshake failures due to inincompatible protocols operation rate
jdisc.http.ssl.handshake.failure.incompatible_chifers JDISC HTTP SSL Handshake failures due to incompatible chifers operation rate
jdisc.http.ssl.handshake.failure.connection_closed JDISC HTTP SSL Handshake failures due to connection closed operation rate
jdisc.http.ssl.handshake.failure.unknown JDISC HTTP SSL Handshake failures for unknown reason operation rate
jdisc.http.filter.rule.blocked_requests Number of requests blocked by filter request rate
jdisc.http.filter.rule.allowed_requests Number of requests allowed by filter request rate
jdisc.http.filtering.request.handled Number of filtering requests handled request rate
jdisc.http.filtering.request.unhandled Number of filtering requests unhandled request rate
jdisc.http.filtering.response.handled Number of filtering responses handled request rate
jdisc.http.filtering.response.unhandled Number of filtering responses unhandled request rate
jdisc.http.handler.unhandled_exceptions Number of unhandled exceptions in handler request rate
jdisc.application.failed_component_graphs JDISC Application failed component graphs item rate
jdisc.jvm JVM runtime version version last
serverRejectedRequests Deprecated. Use jdisc.thread_pool.rejected_tasks instead. operation rate, count
serverThreadPoolSize Deprecated. Use jdisc.thread_pool.size instead. thread max, last
serverActiveThreads Deprecated. Use jdisc.thread_pool.active_threads instead. thread min, max, sum, count, last
jdisc.tls.capability_checks.succeeded Number of TLS capability checks succeeded operation rate
jdisc.tls.capability_checks.failed Number of TLS capability checks failed operation rate
peak_qps The highest number of qps for a second for this metrics shapshot query/second max
search_connections Number of search connections connection sum, count, max
feed.latency Feed latency millisecond sum, count, max
feed.http-requests Feed HTTP requests operation count, rate
queries Query volume operation rate
query_container_latency The query execution time consumed in the container millisecond sum, count, max
query_latency The overall query latency as seen by the container millisecond sum, count, max, 95percentile, 99percentile
query_timeout The amount of time allowed for query execytion, from the client millisecond sum, count, max, min, 95percentile, 99percentile
failed_queries The number of failed queries operation rate
degraded_queries The number of degraded queries, e.g. due to some conent nodes not responding in time operation rate
hits_per_query The number of hits returned hit/query sum, count, max, 95percentile, 99percentile
search_connections Number of search connections connection sum, count, max
query_hit_offset The offset for hits returned hit sum, count, max
documents_covered The combined number of documents considered during query evaluation document count
documents_total The number of documents to be evaluated if all requests had been fully executed document count
documents_target_total The target number of total documents to be evaluated when when all data is in sync document count
jdisc.render.latency The time used by the container to render responses nanosecond min, max, count, sum, last, average
query_item_count The number of query items (terms, phrases, etc) item max, sum, count
totalhits_per_query The total number of documents found to match queries hit/query sum, count, max, 95percentile, 99percentile
empty_results Number of queries matching no documents operation rate
requestsOverQuota The number of requests rejected due to exceeding quota operation rate, count
docproc.proctime Time spent processing document millisecond sum, count, max
docproc.documents Number of processed documents document sum, count, max, min
relevance.at_1 The relevance of hit number 1 score sum, count
relevance.at_3 The relevance of hit number 3 score sum, count
relevance.at_10 The relevance of hit number 10 score sum, count
error.timeout Requests that timed out operation rate
error.backends_oos Requests that failed due to no available backends nodes operation rate
error.plugin_failure Requests that failed due to plugin failure operation rate
error.backend_communication_error Requests that failed due to backend communication error operation rate
error.empty_document_summaries Requests that failed due to missing document summaries operation rate
error.invalid_query_parameter Requests that failed due to invalid query parameters operation rate
error.internal_server_error Requests that failed due to internal server error operation rate
error.misconfigured_server Requests that failed due to misconfigured server operation rate
error.invalid_query_transformation Requests that failed due to invalid query transformation operation rate
error.results_with_errors The number of queries with error payload operation rate
error.unspecified Requests that failed for an unspecified reason operation rate
error.unhandled_exception Requests that failed due to an unhandled exception operation rate

Distributor Metrics

NameDescriptionUnitSuffixes
vds.idealstate.buckets_rechecking The number of buckets that we are rechecking for ideal state operations bucket average
vds.idealstate.idealstate_diff A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state bucket average
vds.idealstate.buckets_toofewcopies The number of buckets the distributor controls that have less than the desired redundancy bucket average
vds.idealstate.buckets_toomanycopies The number of buckets the distributor controls that have more than the desired redundancy bucket average
vds.idealstate.buckets The number of buckets the distributor controls bucket average
vds.idealstate.buckets_notrusted The number of buckets that have no trusted copies. bucket average
vds.idealstate.bucket_replicas_moving_out Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. bucket average
vds.idealstate.bucket_replicas_copying_out Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge bucket average
vds.idealstate.bucket_replicas_copying_in Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for bucket average
vds.idealstate.bucket_replicas_syncing Bucket replicas that need syncing due to mismatching metadata bucket average
vds.idealstate.max_observed_time_since_last_gc_sec Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. second average
vds.idealstate.delete_bucket.done_ok The number of operations successfully performed operation rate
vds.idealstate.delete_bucket.done_failed The number of operations that failed operation rate
vds.idealstate.delete_bucket.pending The number of operations pending operation average
vds.idealstate.merge_bucket.done_ok The number of operations successfully performed operation rate
vds.idealstate.merge_bucket.done_failed The number of operations that failed operation rate
vds.idealstate.merge_bucket.pending The number of operations pending operation average
vds.idealstate.merge_bucket.blocked The number of operations blocked by blocking operation starter operation rate
vds.idealstate.merge_bucket.throttled The number of operations throttled by throttling operation starter operation rate
vds.idealstate.merge_bucket.source_only_copy_changed The number of merge operations where source-only copy changed operation rate
vds.idealstate.merge_bucket.source_only_copy_delete_blocked The number of merge operations where delete of unchanged source-only copies was blocked operation rate
vds.idealstate.merge_bucket.source_only_copy_delete_failed The number of merge operations where delete of unchanged source-only copies failed operation rate
vds.idealstate.split_bucket.done_ok The number of operations successfully performed operation rate
vds.idealstate.split_bucket.done_failed The number of operations that failed operation rate
vds.idealstate.split_bucket.pending The number of operations pending operation average
vds.idealstate.join_bucket.done_ok The number of operations successfully performed operation rate
vds.idealstate.join_bucket.done_failed The number of operations that failed operation rate
vds.idealstate.join_bucket.pending The number of operations pending operation average
vds.idealstate.garbage_collection.done_ok The number of operations successfully performed operation rate
vds.idealstate.garbage_collection.done_failed The number of operations that failed operation rate
vds.idealstate.garbage_collection.pending The number of operations pending operation average
vds.idealstate.garbage_collection.documents_removed Number of documents removed by GC operations document count, rate
vds.distributor.puts.latency The latency of put operations millisecond max, sum, count
vds.distributor.puts.ok The number of successful put operations performed operation rate
vds.distributor.puts.failures.total Sum of all failures operation rate
vds.distributor.puts.failures.notfound The number of operations that failed because the document did not exist operation rate
vds.distributor.puts.failures.test_and_set_failed The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate
vds.distributor.puts.failures.concurrent_mutations The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate
vds.distributor.puts.failures.notconnected The number of operations discarded because there were no available storage nodes to send to operation rate
vds.distributor.puts.failures.notready The number of operations discarded because distributor was not ready operation rate
vds.distributor.puts.failures.wrongdistributor The number of operations discarded because they were sent to the wrong distributor operation rate
vds.distributor.puts.failures.safe_time_not_reached The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed operation rate
vds.distributor.puts.failures.storagefailure The number of operations that failed in storage operation rate
vds.distributor.puts.failures.timeout The number of operations that failed because the operation timed out towards storage operation rate
vds.distributor.puts.failures.busy The number of messages from storage that failed because the storage node was busy operation rate
vds.distributor.puts.failures.inconsistent_bucket The number of operations failed due to buckets being in an inconsistent state or not found operation rate
vds.distributor.removes.latency The latency of remove operations millisecond max, sum, count
vds.distributor.removes.ok The number of successful removes operations performed operation rate
vds.distributor.removes.failures.total Sum of all failures operation rate
vds.distributor.removes.failures.notfound The number of operations that failed because the document did not exist operation rate
vds.distributor.removes.failures.test_and_set_failed The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate
vds.distributor.removes.failures.concurrent_mutations The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate
vds.distributor.updates.latency The latency of update operations millisecond max, sum, count
vds.distributor.updates.ok The number of successful updates operations performed operation rate
vds.distributor.updates.failures.total Sum of all failures operation rate
vds.distributor.updates.failures.notfound The number of operations that failed because the document did not exist operation rate
vds.distributor.updates.failures.test_and_set_failed The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate
vds.distributor.updates.failures.concurrent_mutations The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate
vds.distributor.updates.diverging_timestamp_updates Number of updates that report they were performed against divergent version timestamps on different replicas operation rate
vds.distributor.removelocations.ok The number of successful removelocations operations performed operation rate
vds.distributor.removelocations.failures.total Sum of all failures operation rate
vds.distributor.gets.latency The average latency of gets operations millisecond max, sum, count
vds.distributor.gets.ok The number of successful gets operations performed operation rate
vds.distributor.gets.failures.total Sum of all failures operation rate
vds.distributor.gets.failures.notfound The number of operations that failed because the document did not exist operation rate
vds.distributor.visitor.latency The average latency of visitor operations millisecond max, sum, count
vds.distributor.visitor.ok The number of successful visitor operations performed operation rate
vds.distributor.visitor.failures.total Sum of all failures operation rate
vds.distributor.visitor.failures.notready The number of operations discarded because distributor was not ready operation rate
vds.distributor.visitor.failures.notconnected The number of operations discarded because there were no available storage nodes to send to operation rate
vds.distributor.visitor.failures.wrongdistributor The number of operations discarded because they were sent to the wrong distributor operation rate
vds.distributor.visitor.failures.safe_time_not_reached The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed operation rate
vds.distributor.visitor.failures.storagefailure The number of operations that failed in storage operation rate
vds.distributor.visitor.failures.timeout The number of operations that failed because the operation timed out towards storage operation rate
vds.distributor.visitor.failures.busy The number of messages from storage that failed because the storage node was busy operation rate
vds.distributor.visitor.failures.inconsistent_bucket The number of operations failed due to buckets being in an inconsistent state or not found operation rate
vds.distributor.visitor.failures.notfound The number of operations that failed because the document did not exist operation rate
vds.distributor.docsstored Number of documents stored in all buckets controlled by this distributor document average
vds.distributor.bytesstored Number of bytes stored in all buckets controlled by this distributor byte average
vds.bouncer.clock_skew_aborts Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range operation count

Logd Metrics

NameDescriptionUnitSuffixes
logd.processed.lines Number of log lines processed item count

Nodeadmin Metrics

NameDescriptionUnitSuffixes
endpoint.certificate.expiry.seconds Time until node endpoint certificate expires second N/A
node-certificate.expiry.seconds Time until node certificate expires second N/A

Searchnode Metrics

NameDescriptionUnitSuffixes
content.proton.config.generation The oldest config generation used by this search node version last
content.proton.documentdb.documents.total The total number of documents in this documents db (ready + not-ready) document last
content.proton.documentdb.documents.ready The number of ready documents in this document db document last
content.proton.documentdb.documents.active The number of active / searchable documents in this document db document last
content.proton.documentdb.documents.removed The number of removed documents in this document db document last
content.proton.documentdb.index.docs_in_memory Number of documents in memory index document last
content.proton.documentdb.disk_usage The total disk usage (in bytes) for this document db byte last
content.proton.documentdb.memory_usage.allocated_bytes The number of allocated bytes byte max
content.proton.documentdb.heart_beat_age How long ago (in seconds) heart beat maintenace job was run second last
content.proton.docsum.docs Total docsums returned document rate
content.proton.docsum.latency Docsum request latency millisecond max, sum, count
content.proton.search_protocol.query.latency Query request latency (seconds) second max, sum, count
content.proton.search_protocol.query.request_size Query request size (network bytes) byte max, sum, count
content.proton.search_protocol.query.reply_size Query reply size (network bytes) byte max, sum, count
content.proton.search_protocol.docsum.latency Docsum request latency (seconds) second max, sum, average
content.proton.search_protocol.docsum.request_size Docsum request size (network bytes) byte max, sum, count
content.proton.search_protocol.docsum.reply_size Docsum reply size (network bytes) byte max, sum, count
content.proton.search_protocol.docsum.requested_documents Total requested document summaries document count
content.proton.executor.proton.queuesize Size of executor proton task queue task max, sum, count
content.proton.executor.proton.accepted Number of executor proton accepted tasks task rate
content.proton.executor.proton.wakeups Number of times a executor proton worker thread has been woken up wakeup rate
content.proton.executor.proton.utilization Ratio of time the executor proton worker threads has been active fraction max, sum, count
content.proton.executor.flush.queuesize Size of executor flush task queue task max, sum, count
content.proton.executor.flush.accepted Number of accepted executor flush tasks task rate
content.proton.executor.flush.wakeups Number of times a executor flush worker thread has been woken up wakeup rate
content.proton.executor.flush.utilization Ratio of time the executor flush worker threads has been active fraction max, sum, count
content.proton.executor.match.queuesize Size of executor match task queue task max, sum, count
content.proton.executor.match.accepted Number of accepted executor match tasks task rate
content.proton.executor.match.wakeups Number of times a executor match worker thread has been woken up wakeup rate
content.proton.executor.match.utilization Ratio of time the executor match worker threads has been active fraction max, sum, count
content.proton.executor.docsum.queuesize Size of executor docsum task queue task max, sum, count
content.proton.executor.docsum.accepted Number of executor accepted docsum tasks task rate
content.proton.executor.docsum.wakeups Number of times a executor docsum worker thread has been woken up wakeup rate
content.proton.executor.docsum.utilization Ratio of time the executor docsum worker threads has been active fraction max, sum, count
content.proton.executor.shared.queuesize Size of executor shared task queue task max, sum, count
content.proton.executor.shared.accepted Number of executor shared accepted tasks task rate
content.proton.executor.shared.wakeups Number of times a executor shared worker thread has been woken up wakeup rate
content.proton.executor.shared.utilization Ratio of time the executor shared worker threads has been active fraction max, sum, count
content.proton.executor.warmup.queuesize Size of executor warmup task queue task max, sum, count
content.proton.executor.warmup.accepted Number of accepted executor warmup tasks task rate
content.proton.executor.warmup.wakeups Number of times a warmup executor worker thread has been woken up wakeup rate
content.proton.executor.warmup.utilization Ratio of time the executor warmup worker threads has been active fraction max, sum, count
content.proton.executor.field_writer.queuesize Size of executor field writer task queue task max, sum, count
content.proton.executor.field_writer.accepted Number of accepted executor field writer tasks task rate
content.proton.executor.field_writer.wakeups Number of times a executor field writer worker thread has been woken up wakeup rate
content.proton.executor.field_writer.utilization Ratio of time the executor fieldwriter worker threads has been active fraction max, sum, count
content.proton.documentdb.job.total The job load average total of all job metrics fraction average
content.proton.documentdb.job.attribute_flush Flushing of attribute vector(s) to disk fraction average
content.proton.documentdb.job.memory_index_flush Flushing of memory index to disk fraction average
content.proton.documentdb.job.disk_index_fusion Fusion of disk indexes fraction average
content.proton.documentdb.job.document_store_flush Flushing of document store to disk fraction average
content.proton.documentdb.job.document_store_compact Compaction of document store on disk fraction average
content.proton.documentdb.job.bucket_move Moving of buckets between 'ready' and 'notready' sub databases fraction average
content.proton.documentdb.job.lid_space_compact Compaction of lid space in document meta store and attribute vectors fraction average
content.proton.documentdb.job.removed_documents_prune Pruning of removed documents in 'removed' sub database fraction average
content.proton.documentdb.threading_service.master.queuesize Size of threading service master task queue task max, sum, count
content.proton.documentdb.threading_service.master.accepted Number of accepted threading service master tasks task rate
content.proton.documentdb.threading_service.master.wakeups Number of times a threading service master worker thread has been woken up wakeup rate
content.proton.documentdb.threading_service.master.utilization Ratio of time the threading service master worker threads has been active fraction max, sum, count
content.proton.documentdb.threading_service.index.queuesize Size of threading service index task queue task max, sum, count
content.proton.documentdb.threading_service.index.accepted Number of accepted threading service index tasks task rate
content.proton.documentdb.threading_service.index.wakeups Number of times a threading service index worker thread has been woken up wakeup rate
content.proton.documentdb.threading_service.index.utilization Ratio of time the threading service index worker threads has been active fraction max, sum, count
content.proton.documentdb.threading_service.summary.queuesize Size of threading service summary task queue task max, sum, count
content.proton.documentdb.threading_service.summary.accepted Number of accepted threading service summary tasks task rate
content.proton.documentdb.threading_service.summary.wakeups Number of times a threading service summary worker thread has been woken up wakeup rate
content.proton.documentdb.threading_service.summary.utilization Ratio of time the threading service summary worker threads has been active fraction max, sum, count
content.proton.documentdb.ready.lid_space.lid_bloat_factor The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average
content.proton.documentdb.ready.lid_space.lid_fragmentation_factor The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average
content.proton.documentdb.ready.lid_space.lid_limit The size of the allocated lid space documentid last
content.proton.documentdb.ready.lid_space.highest_used_lid The highest used lid documentid last
content.proton.documentdb.ready.lid_space.used_lids The number of lids used documentid last
content.proton.documentdb.notready.lid_space.lid_bloat_factor The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average
content.proton.documentdb.notready.lid_space.lid_fragmentation_factor The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average
content.proton.documentdb.notready.lid_space.lid_limit The size of the allocated lid space documentid last
content.proton.documentdb.notready.lid_space.highest_used_lid The highest used lid documentid last
content.proton.documentdb.notready.lid_space.used_lids The number of lids used documentid last
content.proton.documentdb.removed.lid_space.lid_bloat_factor The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average
content.proton.documentdb.removed.lid_space.lid_fragmentation_factor The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average
content.proton.documentdb.removed.lid_space.lid_limit The size of the allocated lid space documentid last
content.proton.documentdb.removed.lid_space.highest_used_lid The highest used lid documentid last
content.proton.documentdb.removed.lid_space.used_lids The number of lids used documentid last
content.proton.documentdb.bucket_move.buckets_pending The number of buckets left to move bucket last
content.proton.resource_usage.disk The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller fraction average
content.proton.resource_usage.disk_usage.total The total relative amount of disk used by this content node (value in the range [0, 1]) fraction max
content.proton.resource_usage.disk_usage.total_utilization The relative amount of disk used compared to the content node disk resource limit fraction max
content.proton.resource_usage.disk_usage.transient The relative amount of transient disk used by this content node (value in the range [0, 1]) fraction max
content.proton.resource_usage.memory The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller fraction average
content.proton.resource_usage.memory_usage.total The total relative amount of memory used by this content node (value in the range [0, 1]) fraction max
content.proton.resource_usage.memory_usage.total_utilization The relative amount of memory used compared to the content node memory resource limit fraction max
content.proton.resource_usage.memory_usage.transient The relative amount of transient memory used by this content node (value in the range [0, 1]) fraction max
content.proton.resource_usage.memory_mappings The number of mapped memory areas area max
content.proton.resource_usage.open_file_descriptors The number of open files file max
content.proton.resource_usage.feeding_blocked Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) binary max
content.proton.resource_usage.malloc_arena Size of malloc arena byte max
content.proton.documentdb.attribute.resource_usage.address_space The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1]) fraction max
content.proton.documentdb.attribute.resource_usage.feeding_blocked Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1) binary max
content.proton.resource_usage.cpu_util.setup cpu used by system init and (re-)configuration fraction max, sum, count
content.proton.resource_usage.cpu_util.read cpu used by reading data from the system fraction max, sum, count
content.proton.resource_usage.cpu_util.write cpu used by writing data to the system fraction max, sum, count
content.proton.resource_usage.cpu_util.compact cpu used by internal data re-structuring fraction max, sum, count
content.proton.resource_usage.cpu_util.other cpu used by work not classified as a specific category fraction max, sum, count
content.proton.transactionlog.entries The current number of entries in the transaction log record average
content.proton.transactionlog.disk_usage The disk usage (in bytes) of the transaction log byte average
content.proton.transactionlog.replay_time The replay time (in seconds) of the transaction log during start-up second last
content.proton.documentdb.ready.document_store.disk_usage Disk space usage in bytes byte average
content.proton.documentdb.ready.document_store.disk_bloat Disk space bloat in bytes byte average
content.proton.documentdb.ready.document_store.max_bucket_spread Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average
content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.ready.document_store.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.notready.document_store.disk_usage Disk space usage in bytes byte average
content.proton.documentdb.notready.document_store.disk_bloat Disk space bloat in bytes byte average
content.proton.documentdb.notready.document_store.max_bucket_spread Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average
content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.notready.document_store.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.notready.document_store.memory_usage.dead_bytes The number of dead bytes (<= used_bytes) byte average
content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.removed.document_store.disk_usage Disk space usage in bytes byte average
content.proton.documentdb.removed.document_store.disk_bloat Disk space bloat in bytes byte average
content.proton.documentdb.removed.document_store.max_bucket_spread Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average
content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.removed.document_store.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.removed.document_store.memory_usage.dead_bytes The number of dead bytes (<= used_bytes) byte average
content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.ready.document_store.cache.memory_usage Memory usage of the cache (in bytes) byte average
content.proton.documentdb.ready.document_store.cache.hit_rate Rate of hits in the cache compared to number of lookups fraction average
content.proton.documentdb.ready.document_store.cache.lookups Number of lookups in the cache (hits + misses) operation rate
content.proton.documentdb.ready.document_store.cache.invalidations Number of invalidations (erased elements) in the cache. operation rate
content.proton.documentdb.notready.document_store.cache.memory_usage Memory usage of the cache (in bytes) byte average
content.proton.documentdb.notready.document_store.cache.hit_rate Rate of hits in the cache compared to number of lookups fraction average
content.proton.documentdb.notready.document_store.cache.lookups Number of lookups in the cache (hits + misses) operation rate
content.proton.documentdb.notready.document_store.cache.invalidations Number of invalidations (erased elements) in the cache. operation rate
content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.ready.attribute.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.ready.attribute.memory_usage.dead_bytes The number of dead bytes (<= used_bytes) byte average
content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.notready.attribute.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.notready.attribute.memory_usage.dead_bytes The number of dead bytes (<= used_bytes) byte average
content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.index.memory_usage.allocated_bytes The number of allocated bytes byte average
content.proton.documentdb.index.memory_usage.used_bytes The number of used bytes (<= allocated_bytes) byte average
content.proton.documentdb.index.memory_usage.dead_bytes The number of dead bytes (<= used_bytes) byte average
content.proton.documentdb.index.memory_usage.onhold_bytes The number of bytes on hold byte average
content.proton.documentdb.matching.queries Number of queries executed query rate
content.proton.documentdb.matching.soft_doomed_queries Number of queries hitting the soft timeout query rate
content.proton.documentdb.matching.query_latency Total average latency (sec) when matching and ranking a query second max, sum, count
content.proton.documentdb.matching.query_setup_time Average time (sec) spent setting up and tearing down queries second max, sum, count
content.proton.documentdb.matching.docs_matched Number of documents matched document rate, count
content.proton.documentdb.matching.rank_profile.queries Number of queries executed query rate
content.proton.documentdb.matching.rank_profile.soft_doomed_queries Number of queries hitting the soft timeout query rate
content.proton.documentdb.matching.rank_profile.soft_doom_factor Factor used to compute soft-timeout fraction min, max, sum, count
content.proton.documentdb.matching.rank_profile.query_latency Total average latency (sec) when matching and ranking a query second max, sum, count
content.proton.documentdb.matching.rank_profile.query_setup_time Average time (sec) spent setting up and tearing down queries second max, sum, count
content.proton.documentdb.matching.rank_profile.grouping_time Average time (sec) spent on grouping second max, sum, count
content.proton.documentdb.matching.rank_profile.rerank_time Average time (sec) spent on 2nd phase ranking second max, sum, count
content.proton.documentdb.matching.rank_profile.docs_matched Number of documents matched document rate, count
content.proton.documentdb.matching.rank_profile.limited_queries Number of queries limited in match phase query rate
content.proton.documentdb.feeding.commit.operations Number of operations included in a commit operation max, sum, count, rate
content.proton.documentdb.feeding.commit.latency Latency for commit in seconds second max, sum, count

Sentinel Metrics

NameDescriptionUnitSuffixes
sentinel.restarts Number of service restarts done by the sentinel restart count
sentinel.totalRestarts Total number of service restarts done by the sentinel since the sentinel was started restart last
sentinel.uptime Time the sentinel has been running second last
sentinel.running Number of services the sentinel has running currently instance count, last

Slobrok Metrics

NameDescriptionUnitSuffixes
slobrok.heartbeats.failed Number of heartbeat requests failed request count
slobrok.missing.consensus Number of seconds without full consensus with all other brokers second count

Storage Metrics

NameDescriptionUnitSuffixes
vds.server.network.tls-handshakes-failed Number of client or server connection attempts that failed during TLS handshaking operation count
vds.server.network.peer-authorization-failures Number of TLS connection attempts failed due to bad or missing peer certificate credentials failure count
vds.server.network.client.tls-connections-established Number of secure mTLS connections established connection count
vds.server.network.server.tls-connections-established Number of secure mTLS connections established connection count
vds.server.network.client.insecure-connections-established Number of insecure (plaintext) connections established connection count
vds.server.network.server.insecure-connections-established Number of insecure (plaintext) connections established connection count
vds.server.network.tls-connections-broken Number of TLS connections broken due to failures during frame encoding or decoding connection count
vds.server.network.failed-tls-config-reloads Number of times background reloading of TLS config has failed failure count
vds.server.network.rpc-capability-checks-failed Number of RPC operations that failed to due one or more missing capabilities failure count
vds.server.network.status-capability-checks-failed Number of status page operations that failed to due one or more missing capabilities failure count
vds.server.fnet.num-connections Total number of connection objects connection count
vds.datastored.alldisks.buckets Number of buckets managed bucket average
vds.datastored.alldisks.docs Number of documents stored document average
vds.datastored.alldisks.bytes Number of bytes stored byte average
vds.visitor.allthreads.averagevisitorlifetime Average lifetime of a visitor millisecond max, sum, count
vds.visitor.allthreads.averagequeuewait Average time an operation spends in input queue. millisecond max, sum, count
vds.visitor.allthreads.queuesize Size of input message queue. operation max, sum, count
vds.visitor.allthreads.completed Number of visitors completed operation rate
vds.visitor.allthreads.created Number of visitors created. operation rate
vds.visitor.allthreads.failed Number of visitors failed operation rate
vds.visitor.allthreads.averagemessagesendtime Average time it takes for messages to be sent to their target (and be replied to) millisecond max, sum, count
vds.visitor.allthreads.averageprocessingtime Average time used to process visitor requests millisecond max, sum, count
vds.filestor.queuesize Size of input message queue. operation max, sum, count
vds.filestor.averagequeuewait Average time an operation spends in input queue. millisecond max, sum, count
vds.filestor.active_operations.size Number of concurrent active operations operation max, sum, count
vds.filestor.active_operations.latency Latency (in ms) for completed operations millisecond max, sum, count
vds.filestor.throttle_window_size Current size of async operation throttler window size operation max, sum, count
vds.filestor.throttle_waiting_threads Number of threads waiting to acquire a throttle token thread max, sum, count
vds.filestor.throttle_active_tokens Current number of active throttle tokens instance max, sum, count
vds.filestor.allthreads.mergemetadatareadlatency Time spent in a merge step to check metadata of current node to see what data it has. millisecond max, sum, count
vds.filestor.allthreads.mergedatareadlatency Time spent in a merge step to read data other nodes need. millisecond max, sum, count
vds.filestor.allthreads.mergedatawritelatency Time spent in a merge step to write data needed to current node. millisecond max, sum, count
vds.filestor.allthreads.put_latency Latency of individual puts that are part of merge operations millisecond max, sum, count
vds.filestor.allthreads.remove_latency Latency of individual removes that are part of merge operations millisecond max, sum, count
vds.filestor.allstripes.throttled_rpc_direct_dispatches Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy instance rate
vds.filestor.allstripes.throttled_persistence_thread_polls Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy instance rate
vds.filestor.allstripes.timeouts_waiting_for_throttle_token Number of times a persistence thread timed out waiting for an available throttle policy token instance rate
vds.filestor.allthreads.put.count Number of requests processed. operation rate
vds.filestor.allthreads.put.failed Number of failed requests. operation rate
vds.filestor.allthreads.put.test_and_set_failed Number of operations that were skipped due to a test-and-set condition not met operation rate
vds.filestor.allthreads.put.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.put.request_size Size of requests, in bytes byte max, sum, count
vds.filestor.allthreads.remove.count Number of requests processed. operation rate
vds.filestor.allthreads.remove.failed Number of failed requests. operation rate
vds.filestor.allthreads.remove.test_and_set_failed Number of operations that were skipped due to a test-and-set condition not met operation rate
vds.filestor.allthreads.remove.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.remove.request_size Size of requests, in bytes byte max, sum, count
vds.filestor.allthreads.get.count Number of requests processed. operation rate
vds.filestor.allthreads.get.failed Number of failed requests. operation rate
vds.filestor.allthreads.get.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.get.request_size Size of requests, in bytes byte max, sum, count
vds.filestor.allthreads.update.count Number of requests processed. request rate
vds.filestor.allthreads.update.failed Number of failed requests. request rate
vds.filestor.allthreads.update.test_and_set_failed Number of requests that were skipped due to a test-and-set condition not met request rate
vds.filestor.allthreads.update.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.update.request_size Size of requests, in bytes byte max, sum, count
vds.filestor.allthreads.createiterator.count Number of requests processed. request rate
vds.filestor.allthreads.createiterator.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.visit.count Number of requests processed. request rate
vds.filestor.allthreads.visit.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.remove_location.count Number of requests processed. request rate
vds.filestor.allthreads.remove_location.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.splitbuckets.count Number of requests processed. request rate
vds.filestor.allthreads.joinbuckets.count Number of requests processed. request rate
vds.filestor.allthreads.deletebuckets.count Number of requests processed. request rate
vds.filestor.allthreads.deletebuckets.failed Number of failed requests. request rate
vds.filestor.allthreads.deletebuckets.latency Latency of successful requests. millisecond max, sum, count
vds.filestor.allthreads.setbucketstates.count Number of requests processed. request rate
vds.mergethrottler.averagequeuewaitingtime Time merges spent in the throttler queue millisecond max, sum, count
vds.mergethrottler.queuesize Length of merge queue instance max, sum, count
vds.mergethrottler.active_window_size Number of merges active within the pending window size instance max, sum, count
vds.mergethrottler.bounced_due_to_back_pressure Number of merges bounced due to resource exhaustion back-pressure instance rate
vds.mergethrottler.locallyexecutedmerges.ok The number of successful merges for 'locallyexecutedmerges' instance rate
vds.mergethrottler.mergechains.ok The number of successful merges for 'mergechains' operation rate
vds.mergethrottler.mergechains.failures.busy The number of merges that failed because the storage node was busy operation rate
vds.mergethrottler.mergechains.failures.total Sum of all failures operation rate