• [+] expand all

Vespa Metric Set

This document provides reference documentation for the Vespa metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.

ClusterController Metrics

NameDescriptionUnitSuffixes

cluster-controller.down.count

Number of content nodes down node last, max

cluster-controller.initializing.count

Number of content nodes initializing node last, max

cluster-controller.maintenance.count

Number of content nodes in maintenance node last, max

cluster-controller.retired.count

Number of content nodes that are retired node last, max

cluster-controller.stopping.count

Number of content nodes currently stopping node last

cluster-controller.up.count

Number of content nodes up node last, max

cluster-controller.nodes-not-converged

Number of nodes not converging to the latest cluster state version node max

cluster-controller.cluster-buckets-out-of-sync-ratio

Ratio of buckets in the cluster currently in need of syncing fraction max

cluster-controller.busy-tick-time-ms

Time busy millisecond count, last, max, sum

cluster-controller.idle-tick-time-ms

Time idle millisecond count, last, max, sum

cluster-controller.work-ms

Time used for actual work millisecond count, last, sum

cluster-controller.is-master

1 if this cluster controller is currently the master, or 0 if not binary last, max

cluster-controller.remote-task-queue.size

Number of remote tasks queued operation last

cluster-controller.resource_usage.nodes_above_limit

The number of content nodes above resource limit, blocking feed node last, max

cluster-controller.resource_usage.max_memory_utilization

Current memory utilisation, for content node with highest value fraction last, max

cluster-controller.resource_usage.max_disk_utilization

Current disk space utilisation, for content node with highest value fraction last, max

cluster-controller.resource_usage.memory_limit

Disk space limit as a fraction of available disk space fraction last, max

cluster-controller.resource_usage.disk_limit

Memory space limit as a fraction of available memory fraction last, max

reindexing.progress

Re-indexing progress fraction last, max

Container Metrics

NameDescriptionUnitSuffixes

http.status.1xx

Number of responses with a 1xx status response rate

http.status.2xx

Number of responses with a 2xx status response rate

http.status.3xx

Number of responses with a 3xx status response rate

http.status.4xx

Number of responses with a 4xx status response rate

http.status.5xx

Number of responses with a 5xx status response rate

application_generation

The currently live application config generation (aka session id) version N/A

jdisc.gc.count

Number of JVM garbage collections done operation average, last, max

jdisc.gc.ms

Time spent in JVM garbage collection millisecond average, last, max

jdisc.jvm

JVM runtime version version last

jdisc.memory_mappings

JDISC Memory mappings operation max

jdisc.open_file_descriptors

JDISC Open file descriptors item max

jdisc.thread_pool.unhandled_exceptions

Number of exceptions thrown by tasks thread count, last, max, min, sum

jdisc.thread_pool.work_queue.capacity

Capacity of the task queue thread count, last, max, min, sum

jdisc.thread_pool.work_queue.size

Size of the task queue thread count, last, max, min, sum

jdisc.thread_pool.rejected_tasks

Number of tasks rejected by the thread pool thread count, last, max, min, sum

jdisc.thread_pool.size

Size of the thread pool thread count, last, max, min, sum

jdisc.thread_pool.max_allowed_size

The maximum allowed number of threads in the pool thread count, last, max, min, sum

jdisc.thread_pool.active_threads

Number of threads that are active thread count, last, max, min, sum

jdisc.deactivated_containers.total

JDISC Deactivated container instances item last, sum

jdisc.deactivated_containers.with_retained_refs.last

JDISC Deactivated container nodes with retained refs item last

jdisc.application.failed_component_graphs

JDISC Application failed component graphs item rate

jdisc.application.component_graph.creation_time_millis

JDISC Application component graph creation time millisecond last

jdisc.application.component_graph.reconfigurations

JDISC Application component graph reconfigurations item rate

jdisc.singleton.is_active

JDISC Singleton is active item last, max, min

jdisc.singleton.activation.count

JDISC Singleton activations operation last

jdisc.singleton.activation.failure.count

JDISC Singleton activation failures operation last

jdisc.singleton.activation.millis

JDISC Singleton activation time millisecond last

jdisc.singleton.deactivation.count

JDISC Singleton deactivations operation last

jdisc.singleton.deactivation.failure.count

JDISC Singleton deactivation failures operation last

jdisc.singleton.deactivation.millis

JDISC Singleton deactivation time millisecond last

jdisc.http.ssl.handshake.failure.missing_client_cert

JDISC HTTP SSL Handshake failures due to missing client certificate operation rate

jdisc.http.ssl.handshake.failure.expired_client_cert

JDISC HTTP SSL Handshake failures due to expired client certificate operation rate

jdisc.http.ssl.handshake.failure.invalid_client_cert

JDISC HTTP SSL Handshake failures due to invalid client certificate operation rate

jdisc.http.ssl.handshake.failure.incompatible_protocols

JDISC HTTP SSL Handshake failures due to incompatible protocols operation rate

jdisc.http.ssl.handshake.failure.incompatible_chifers

JDISC HTTP SSL Handshake failures due to incompatible chifers operation rate

jdisc.http.ssl.handshake.failure.connection_closed

JDISC HTTP SSL Handshake failures due to connection closed operation rate

jdisc.http.ssl.handshake.failure.unknown

JDISC HTTP SSL Handshake failures for unknown reason operation rate

jdisc.http.request.prematurely_closed

HTTP requests prematurely closed request rate

jdisc.http.request.requests_per_connection

HTTP requests per connection request average, count, max, min, sum

jdisc.http.request.uri_length

HTTP URI length byte count, max, sum

jdisc.http.request.content_size

HTTP request content size byte count, max, sum

jdisc.http.requests

HTTP requests request count, rate

jdisc.http.filter.rule.blocked_requests

Number of requests blocked by filter request rate

jdisc.http.filter.rule.allowed_requests

Number of requests allowed by filter request rate

jdisc.http.filtering.request.handled

Number of filtering requests handled request rate

jdisc.http.filtering.request.unhandled

Number of filtering requests unhandled request rate

jdisc.http.filtering.response.handled

Number of filtering responses handled request rate

jdisc.http.filtering.response.unhandled

Number of filtering responses unhandled request rate

jdisc.http.handler.unhandled_exceptions

Number of unhandled exceptions in handler request rate

jdisc.tls.capability_checks.succeeded

Number of TLS capability checks succeeded operation rate

jdisc.tls.capability_checks.failed

Number of TLS capability checks failed operation rate

jdisc.http.jetty.threadpool.thread.max

Configured maximum number of threads thread count, last, max, min, sum

jdisc.http.jetty.threadpool.thread.min

Configured minimum number of threads thread count, last, max, min, sum

jdisc.http.jetty.threadpool.thread.reserved

Configured number of reserved threads or -1 for heuristic thread count, last, max, min, sum

jdisc.http.jetty.threadpool.thread.busy

Number of threads executing internal and transient jobs thread count, last, max, min, sum

jdisc.http.jetty.threadpool.thread.total

Current number of threads thread count, last, max, min, sum

jdisc.http.jetty.threadpool.queue.size

Current size of the job queue thread count, last, max, min, sum

serverNumOpenConnections

The number of currently open connections connection average, last, max

serverNumConnections

The total number of connections opened connection average, last, max

serverBytesReceived

The number of bytes received by the server byte count, sum

serverBytesSent

The number of bytes sent from the server byte count, sum

handled.requests

The number of requests handled per metrics snapshot operation count

handled.latency

The time used for requests during this metrics snapshot millisecond count, max, sum

httpapi_latency

Duration for requests to the HTTP document APIs millisecond count, max, sum

httpapi_pending

Document operations pending execution operation count, max, sum

httpapi_num_operations

Total number of document operations performed operation rate

httpapi_num_updates

Document update operations performed operation rate

httpapi_num_removes

Document remove operations performed operation rate

httpapi_num_puts

Document put operations performed operation rate

httpapi_succeeded

Document operations that succeeded operation rate

httpapi_failed

Document operations that failed operation rate

httpapi_parse_error

Document operations that failed due to document parse errors operation rate

httpapi_condition_not_met

Document operations not applied due to condition not met operation rate

httpapi_not_found

Document operations not applied due to document not found operation rate

httpapi_failed_unknown

Document operations failed by unknown cause operation rate

httpapi_failed_timeout

Document operations failed by timeout operation rate

httpapi_failed_insufficient_storage

Document operations failed by insufficient storage operation rate

mem.heap.total

Total available heap memory byte average

mem.heap.free

Free heap memory byte average

mem.heap.used

Currently used heap memory byte average, max

mem.direct.total

Total available direct memory byte average

mem.direct.free

Currently free direct memory byte average

mem.direct.used

Direct memory currently used byte average, max

mem.direct.count

Number of direct memory allocations byte max

mem.native.total

Total available native memory byte average

mem.native.free

Currently free native memory byte average

mem.native.used

Native memory currently used byte average

athenz-tenant-cert.expiry.seconds

Time remaining until Athenz tenant certificate expires second last, max, min

container-iam-role.expiry.seconds

Time remaining until IAM role expires second N/A

peak_qps

The highest number of qps for a second for this metrics snapshot query_per_second max

search_connections

Number of search connections connection count, max, sum

feed.operations

Number of document feed operations operation rate

feed.latency

Feed latency millisecond count, max, sum

feed.http-requests

Feed HTTP requests operation count, rate

queries

Query volume operation rate

query_container_latency

The query execution time consumed in the container millisecond count, max, sum

query_latency

The overall query latency as seen by the container millisecond count, max, sum

query_timeout

The amount of time allowed for query execution, from the client millisecond count, max, min, sum

failed_queries

The number of failed queries operation rate

degraded_queries

The number of degraded queries, e.g. due to some content nodes not responding in time operation rate

hits_per_query

The number of hits returned hit_per_query count, max, sum

query_hit_offset

The offset for hits returned hit count, max, sum

documents_covered

The combined number of documents considered during query evaluation document count

documents_total

The number of documents to be evaluated if all requests had been fully executed document count

documents_target_total

The target number of total documents to be evaluated when when all data is in sync document count

jdisc.render.latency

The time used by the container to render responses nanosecond average, count, last, max, min, sum

query_item_count

The number of query items (terms, phrases, etc) item count, max, sum

docproc.proctime

Time spent processing document millisecond count, max, sum

docproc.documents

Number of processed documents document count, max, min, sum

totalhits_per_query

The total number of documents found to match queries hit_per_query count, max, sum

empty_results

Number of queries matching no documents operation rate

requestsOverQuota

The number of requests rejected due to exceeding quota operation count, rate

relevance.at_1

The relevance of hit number 1 score count, sum

relevance.at_3

The relevance of hit number 3 score count, sum

relevance.at_10

The relevance of hit number 10 score count, sum

error.timeout

Requests that timed out operation rate

error.backends_oos

Requests that failed due to no available backends nodes operation rate

error.plugin_failure

Requests that failed due to plugin failure operation rate

error.backend_communication_error

Requests that failed due to backend communication error operation rate

error.empty_document_summaries

Requests that failed due to missing document summaries operation rate

error.invalid_query_parameter

Requests that failed due to invalid query parameters operation rate

error.internal_server_error

Requests that failed due to internal server error operation rate

error.misconfigured_server

Requests that failed due to misconfigured server operation rate

error.invalid_query_transformation

Requests that failed due to invalid query transformation operation rate

error.results_with_errors

The number of queries with error payload operation rate

error.unspecified

Requests that failed for an unspecified reason operation rate

error.unhandled_exception

Requests that failed due to an unhandled exception operation rate

serverRejectedRequests

Deprecated. Use jdisc.thread_pool.rejected_tasks instead. operation count, rate

serverThreadPoolSize

Deprecated. Use jdisc.thread_pool.size instead. thread last, max

serverActiveThreads

Deprecated. Use jdisc.thread_pool.active_threads instead. thread count, last, max, min, sum

jrt.transport.tls-certificate-verification-failures

TLS certificate verification failures failure N/A

jrt.transport.peer-authorization-failures

TLS peer authorization failures failure N/A

jrt.transport.server.tls-connections-established

TLS server connections established connection N/A

jrt.transport.client.tls-connections-established

TLS client connections established connection N/A

jrt.transport.server.unencrypted-connections-established

Unencrypted server connections established connection N/A

jrt.transport.client.unencrypted-connections-established

Unencrypted client connections established connection N/A

embedder.latency

Time spent creating an embedding millisecond count, max, sum

embedder.sequence_length

Size of sequence produced by tokenizer byte count, max, sum

Distributor Metrics

NameDescriptionUnitSuffixes

vds.idealstate.buckets_rechecking

The number of buckets that we are rechecking for ideal state operations bucket average

vds.idealstate.idealstate_diff

A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state bucket average

vds.idealstate.buckets_toofewcopies

The number of buckets the distributor controls that have less than the desired redundancy bucket average

vds.idealstate.buckets_toomanycopies

The number of buckets the distributor controls that have more than the desired redundancy bucket average

vds.idealstate.buckets

The number of buckets the distributor controls bucket average

vds.idealstate.buckets_notrusted

The number of buckets that have no trusted copies. bucket average

vds.idealstate.bucket_replicas_moving_out

Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. bucket average

vds.idealstate.bucket_replicas_copying_out

Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge bucket average

vds.idealstate.bucket_replicas_copying_in

Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for bucket average

vds.idealstate.bucket_replicas_syncing

Bucket replicas that need syncing due to mismatching metadata bucket average

vds.idealstate.max_observed_time_since_last_gc_sec

Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. second average

vds.idealstate.delete_bucket.done_ok

The number of operations successfully performed operation rate

vds.idealstate.delete_bucket.done_failed

The number of operations that failed operation rate

vds.idealstate.delete_bucket.pending

The number of operations pending operation average

vds.idealstate.merge_bucket.done_ok

The number of operations successfully performed operation rate

vds.idealstate.merge_bucket.done_failed

The number of operations that failed operation rate

vds.idealstate.merge_bucket.pending

The number of operations pending operation average

vds.idealstate.merge_bucket.blocked

The number of operations blocked by blocking operation starter operation rate

vds.idealstate.merge_bucket.throttled

The number of operations throttled by throttling operation starter operation rate

vds.idealstate.merge_bucket.source_only_copy_changed

The number of merge operations where source-only copy changed operation rate

vds.idealstate.merge_bucket.source_only_copy_delete_blocked

The number of merge operations where delete of unchanged source-only copies was blocked operation rate

vds.idealstate.merge_bucket.source_only_copy_delete_failed

The number of merge operations where delete of unchanged source-only copies failed operation rate

vds.idealstate.split_bucket.done_ok

The number of operations successfully performed operation rate

vds.idealstate.split_bucket.done_failed

The number of operations that failed operation rate

vds.idealstate.split_bucket.pending

The number of operations pending operation average

vds.idealstate.join_bucket.done_ok

The number of operations successfully performed operation rate

vds.idealstate.join_bucket.done_failed

The number of operations that failed operation rate

vds.idealstate.join_bucket.pending

The number of operations pending operation average

vds.idealstate.garbage_collection.done_ok

The number of operations successfully performed operation rate

vds.idealstate.garbage_collection.done_failed

The number of operations that failed operation rate

vds.idealstate.garbage_collection.pending

The number of operations pending operation average

vds.idealstate.garbage_collection.documents_removed

Number of documents removed by GC operations document count, rate

vds.distributor.puts.latency

The latency of put operations millisecond count, max, sum

vds.distributor.puts.ok

The number of successful put operations performed operation rate

vds.distributor.puts.failures.total

Sum of all failures operation rate

vds.distributor.puts.failures.notfound

The number of operations that failed because the document did not exist operation rate

vds.distributor.puts.failures.test_and_set_failed

The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate

vds.distributor.puts.failures.concurrent_mutations

The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate

vds.distributor.puts.failures.notconnected

The number of operations discarded because there were no available storage nodes to send to operation rate

vds.distributor.puts.failures.notready

The number of operations discarded because distributor was not ready operation rate

vds.distributor.puts.failures.wrongdistributor

The number of operations discarded because they were sent to the wrong distributor operation rate

vds.distributor.puts.failures.safe_time_not_reached

The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed operation rate

vds.distributor.puts.failures.storagefailure

The number of operations that failed in storage operation rate

vds.distributor.puts.failures.timeout

The number of operations that failed because the operation timed out towards storage operation rate

vds.distributor.puts.failures.busy

The number of messages from storage that failed because the storage node was busy operation rate

vds.distributor.puts.failures.inconsistent_bucket

The number of operations failed due to buckets being in an inconsistent state or not found operation rate

vds.distributor.removes.latency

The latency of remove operations millisecond count, max, sum

vds.distributor.removes.ok

The number of successful removes operations performed operation rate

vds.distributor.removes.failures.total

Sum of all failures operation rate

vds.distributor.removes.failures.notfound

The number of operations that failed because the document did not exist operation rate

vds.distributor.removes.failures.test_and_set_failed

The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate

vds.distributor.removes.failures.concurrent_mutations

The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate

vds.distributor.updates.latency

The latency of update operations millisecond count, max, sum

vds.distributor.updates.ok

The number of successful updates operations performed operation rate

vds.distributor.updates.failures.total

Sum of all failures operation rate

vds.distributor.updates.failures.notfound

The number of operations that failed because the document did not exist operation rate

vds.distributor.updates.failures.test_and_set_failed

The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document operation rate

vds.distributor.updates.failures.concurrent_mutations

The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID operation rate

vds.distributor.updates.diverging_timestamp_updates

Number of updates that report they were performed against divergent version timestamps on different replicas operation rate

vds.distributor.removelocations.ok

The number of successful removelocations operations performed operation rate

vds.distributor.removelocations.failures.total

Sum of all failures operation rate

vds.distributor.gets.latency

The average latency of gets operations millisecond count, max, sum

vds.distributor.gets.ok

The number of successful gets operations performed operation rate

vds.distributor.gets.failures.total

Sum of all failures operation rate

vds.distributor.gets.failures.notfound

The number of operations that failed because the document did not exist operation rate

vds.distributor.visitor.latency

The average latency of visitor operations millisecond count, max, sum

vds.distributor.visitor.ok

The number of successful visitor operations performed operation rate

vds.distributor.visitor.failures.total

Sum of all failures operation rate

vds.distributor.visitor.failures.notready

The number of operations discarded because distributor was not ready operation rate

vds.distributor.visitor.failures.notconnected

The number of operations discarded because there were no available storage nodes to send to operation rate

vds.distributor.visitor.failures.wrongdistributor

The number of operations discarded because they were sent to the wrong distributor operation rate

vds.distributor.visitor.failures.safe_time_not_reached

The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed operation rate

vds.distributor.visitor.failures.storagefailure

The number of operations that failed in storage operation rate

vds.distributor.visitor.failures.timeout

The number of operations that failed because the operation timed out towards storage operation rate

vds.distributor.visitor.failures.busy

The number of messages from storage that failed because the storage node was busy operation rate

vds.distributor.visitor.failures.inconsistent_bucket

The number of operations failed due to buckets being in an inconsistent state or not found operation rate

vds.distributor.visitor.failures.notfound

The number of operations that failed because the document did not exist operation rate

vds.distributor.docsstored

Number of documents stored in all buckets controlled by this distributor document average

vds.distributor.bytesstored

Number of bytes stored in all buckets controlled by this distributor byte average

vds.bouncer.clock_skew_aborts

Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range operation count

Logd Metrics

NameDescriptionUnitSuffixes

logd.processed.lines

Number of log lines processed item count

NodeAdmin Metrics

NameDescriptionUnitSuffixes

endpoint.certificate.expiry.seconds

Time until node endpoint certificate expires second N/A

node-certificate.expiry.seconds

Time until node certificate expires second N/A

SearchNode Metrics

NameDescriptionUnitSuffixes

content.proton.config.generation

The oldest config generation used by this search node version last

content.proton.documentdb.documents.total

The total number of documents in this documents db (ready + not-ready) document last, max

content.proton.documentdb.documents.ready

The number of ready documents in this document db document last, max

content.proton.documentdb.documents.active

The number of active / searchable documents in this document db document last, max

content.proton.documentdb.documents.removed

The number of removed documents in this document db document last, max

content.proton.documentdb.index.docs_in_memory

Number of documents in memory index document last, max

content.proton.documentdb.disk_usage

The total disk usage (in bytes) for this document db byte last

content.proton.documentdb.memory_usage.allocated_bytes

The number of allocated bytes byte max

content.proton.documentdb.heart_beat_age

How long ago (in seconds) heart beat maintenace job was run second last, min

content.proton.docsum.docs

Total docsums returned document rate

content.proton.docsum.latency

Docsum request latency millisecond count, max, sum

content.proton.search_protocol.query.latency

Query request latency (seconds) second count, max, sum

content.proton.search_protocol.query.request_size

Query request size (network bytes) byte count, max, sum

content.proton.search_protocol.query.reply_size

Query reply size (network bytes) byte count, max, sum

content.proton.search_protocol.docsum.latency

Docsum request latency (seconds) second average, count, max, sum

content.proton.search_protocol.docsum.request_size

Docsum request size (network bytes) byte count, max, sum

content.proton.search_protocol.docsum.reply_size

Docsum reply size (network bytes) byte count, max, sum

content.proton.search_protocol.docsum.requested_documents

Total requested document summaries document count, max, sum

content.proton.executor.proton.queuesize

Size of executor proton task queue task count, max, sum

content.proton.executor.proton.accepted

Number of executor proton accepted tasks task rate

content.proton.executor.proton.wakeups

Number of times a executor proton worker thread has been woken up wakeup rate

content.proton.executor.proton.utilization

Ratio of time the executor proton worker threads has been active fraction count, max, sum

content.proton.executor.flush.queuesize

Size of executor flush task queue task count, max, sum

content.proton.executor.flush.accepted

Number of accepted executor flush tasks task rate

content.proton.executor.flush.wakeups

Number of times a executor flush worker thread has been woken up wakeup rate

content.proton.executor.flush.utilization

Ratio of time the executor flush worker threads has been active fraction count, max, sum

content.proton.executor.match.queuesize

Size of executor match task queue task count, max, sum

content.proton.executor.match.accepted

Number of accepted executor match tasks task rate

content.proton.executor.match.wakeups

Number of times a executor match worker thread has been woken up wakeup rate

content.proton.executor.match.utilization

Ratio of time the executor match worker threads has been active fraction count, max, sum

content.proton.executor.docsum.queuesize

Size of executor docsum task queue task count, max, sum

content.proton.executor.docsum.accepted

Number of executor accepted docsum tasks task rate

content.proton.executor.docsum.wakeups

Number of times a executor docsum worker thread has been woken up wakeup rate

content.proton.executor.docsum.utilization

Ratio of time the executor docsum worker threads has been active fraction count, max, sum

content.proton.executor.shared.queuesize

Size of executor shared task queue task count, max, sum

content.proton.executor.shared.accepted

Number of executor shared accepted tasks task rate

content.proton.executor.shared.wakeups

Number of times a executor shared worker thread has been woken up wakeup rate

content.proton.executor.shared.utilization

Ratio of time the executor shared worker threads has been active fraction count, max, sum

content.proton.executor.warmup.queuesize

Size of executor warmup task queue task count, max, sum

content.proton.executor.warmup.accepted

Number of accepted executor warmup tasks task rate

content.proton.executor.warmup.wakeups

Number of times a warmup executor worker thread has been woken up wakeup rate

content.proton.executor.warmup.utilization

Ratio of time the executor warmup worker threads has been active fraction count, max, sum

content.proton.executor.field_writer.queuesize

Size of executor field writer task queue task count, max, sum

content.proton.executor.field_writer.accepted

Number of accepted executor field writer tasks task rate

content.proton.executor.field_writer.wakeups

Number of times a executor field writer worker thread has been woken up wakeup rate

content.proton.executor.field_writer.utilization

Ratio of time the executor fieldwriter worker threads has been active fraction count, max, sum

content.proton.executor.field_writer.saturation

Ratio indicating the max saturation of underlying worker threads. A higher saturation than utilization indicates a bottleneck in one of the worker threads. fraction count, max, sum

content.proton.documentdb.job.total

The job load average total of all job metrics fraction average

content.proton.documentdb.job.attribute_flush

Flushing of attribute vector(s) to disk fraction average

content.proton.documentdb.job.memory_index_flush

Flushing of memory index to disk fraction average

content.proton.documentdb.job.disk_index_fusion

Fusion of disk indexes fraction average

content.proton.documentdb.job.document_store_flush

Flushing of document store to disk fraction average

content.proton.documentdb.job.document_store_compact

Compaction of document store on disk fraction average

content.proton.documentdb.job.bucket_move

Moving of buckets between 'ready' and 'notready' sub databases fraction average

content.proton.documentdb.job.lid_space_compact

Compaction of lid space in document meta store and attribute vectors fraction average

content.proton.documentdb.job.removed_documents_prune

Pruning of removed documents in 'removed' sub database fraction average

content.proton.documentdb.threading_service.master.queuesize

Size of threading service master task queue task count, max, sum

content.proton.documentdb.threading_service.master.accepted

Number of accepted threading service master tasks task rate

content.proton.documentdb.threading_service.master.wakeups

Number of times a threading service master worker thread has been woken up wakeup rate

content.proton.documentdb.threading_service.master.utilization

Ratio of time the threading service master worker threads has been active fraction count, max, sum

content.proton.documentdb.threading_service.index.queuesize

Size of threading service index task queue task count, max, sum

content.proton.documentdb.threading_service.index.accepted

Number of accepted threading service index tasks task rate

content.proton.documentdb.threading_service.index.wakeups

Number of times a threading service index worker thread has been woken up wakeup rate

content.proton.documentdb.threading_service.index.utilization

Ratio of time the threading service index worker threads has been active fraction count, max, sum

content.proton.documentdb.threading_service.summary.queuesize

Size of threading service summary task queue task count, max, sum

content.proton.documentdb.threading_service.summary.accepted

Number of accepted threading service summary tasks task rate

content.proton.documentdb.threading_service.summary.wakeups

Number of times a threading service summary worker thread has been woken up wakeup rate

content.proton.documentdb.threading_service.summary.utilization

Ratio of time the threading service summary worker threads has been active fraction count, max, sum

content.proton.documentdb.ready.lid_space.lid_bloat_factor

The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average

content.proton.documentdb.ready.lid_space.lid_fragmentation_factor

The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average

content.proton.documentdb.ready.lid_space.lid_limit

The size of the allocated lid space documentid last, max

content.proton.documentdb.ready.lid_space.highest_used_lid

The highest used lid documentid last, max

content.proton.documentdb.ready.lid_space.used_lids

The number of lids used documentid last, max

content.proton.documentdb.notready.lid_space.lid_bloat_factor

The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average

content.proton.documentdb.notready.lid_space.lid_fragmentation_factor

The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average

content.proton.documentdb.notready.lid_space.lid_limit

The size of the allocated lid space documentid last, max

content.proton.documentdb.notready.lid_space.highest_used_lid

The highest used lid documentid last, max

content.proton.documentdb.notready.lid_space.used_lids

The number of lids used documentid last, max

content.proton.documentdb.removed.lid_space.lid_bloat_factor

The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) fraction average

content.proton.documentdb.removed.lid_space.lid_fragmentation_factor

The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) fraction average

content.proton.documentdb.removed.lid_space.lid_limit

The size of the allocated lid space documentid last, max

content.proton.documentdb.removed.lid_space.highest_used_lid

The highest used lid documentid last, max

content.proton.documentdb.removed.lid_space.used_lids

The number of lids used documentid last, max

content.proton.documentdb.bucket_move.buckets_pending

The number of buckets left to move bucket last, max, sum

content.proton.resource_usage.disk

The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller fraction average

content.proton.resource_usage.disk_usage.total

The total relative amount of disk used by this content node (value in the range [0, 1]) fraction max

content.proton.resource_usage.disk_usage.total_utilization

The relative amount of disk used compared to the content node disk resource limit fraction max

content.proton.resource_usage.disk_usage.transient

The relative amount of transient disk used by this content node (value in the range [0, 1]) fraction max

content.proton.resource_usage.memory

The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller fraction average

content.proton.resource_usage.memory_usage.total

The total relative amount of memory used by this content node (value in the range [0, 1]) fraction max

content.proton.resource_usage.memory_usage.total_utilization

The relative amount of memory used compared to the content node memory resource limit fraction max

content.proton.resource_usage.memory_usage.transient

The relative amount of transient memory used by this content node (value in the range [0, 1]) fraction max

content.proton.resource_usage.memory_mappings

The number of memory mapped files file max

content.proton.resource_usage.open_file_descriptors

The number of open files file max

content.proton.resource_usage.feeding_blocked

Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) binary last, max

content.proton.resource_usage.malloc_arena

Size of malloc arena byte max

content.proton.documentdb.attribute.resource_usage.address_space

The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1]) fraction max

content.proton.documentdb.attribute.resource_usage.feeding_blocked

Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1) binary max

content.proton.resource_usage.cpu_util.setup

cpu used by system init and (re-)configuration fraction count, max, sum

content.proton.resource_usage.cpu_util.read

cpu used by reading data from the system fraction count, max, sum

content.proton.resource_usage.cpu_util.write

cpu used by writing data to the system fraction count, max, sum

content.proton.resource_usage.cpu_util.compact

cpu used by internal data re-structuring fraction count, max, sum

content.proton.resource_usage.cpu_util.other

cpu used by work not classified as a specific category fraction count, max, sum

content.proton.transactionlog.entries

The current number of entries in the transaction log record average

content.proton.transactionlog.disk_usage

The disk usage (in bytes) of the transaction log byte average

content.proton.transactionlog.replay_time

The replay time (in seconds) of the transaction log during start-up second last, max

content.proton.documentdb.ready.document_store.disk_usage

Disk space usage in bytes byte average

content.proton.documentdb.ready.document_store.disk_bloat

Disk space bloat in bytes byte average

content.proton.documentdb.ready.document_store.max_bucket_spread

Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average

content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.ready.document_store.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.notready.document_store.disk_usage

Disk space usage in bytes byte average

content.proton.documentdb.notready.document_store.disk_bloat

Disk space bloat in bytes byte average

content.proton.documentdb.notready.document_store.max_bucket_spread

Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average

content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.notready.document_store.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.notready.document_store.memory_usage.dead_bytes

The number of dead bytes (<= used_bytes) byte average

content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.removed.document_store.disk_usage

Disk space usage in bytes byte average

content.proton.documentdb.removed.document_store.disk_bloat

Disk space bloat in bytes byte average

content.proton.documentdb.removed.document_store.max_bucket_spread

Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) fraction average

content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.removed.document_store.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.removed.document_store.memory_usage.dead_bytes

The number of dead bytes (<= used_bytes) byte average

content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.ready.document_store.cache.memory_usage

Memory usage of the cache (in bytes) byte average

content.proton.documentdb.ready.document_store.cache.hit_rate

Rate of hits in the cache compared to number of lookups fraction average

content.proton.documentdb.ready.document_store.cache.lookups

Number of lookups in the cache (hits + misses) operation rate

content.proton.documentdb.ready.document_store.cache.invalidations

Number of invalidations (erased elements) in the cache. operation rate

content.proton.documentdb.notready.document_store.cache.memory_usage

Memory usage of the cache (in bytes) byte average

content.proton.documentdb.notready.document_store.cache.hit_rate

Rate of hits in the cache compared to number of lookups fraction average

content.proton.documentdb.notready.document_store.cache.lookups

Number of lookups in the cache (hits + misses) operation rate

content.proton.documentdb.notready.document_store.cache.invalidations

Number of invalidations (erased elements) in the cache. operation rate

content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.ready.attribute.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.ready.attribute.memory_usage.dead_bytes

The number of dead bytes (<= used_bytes) byte average

content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.notready.attribute.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.notready.attribute.memory_usage.dead_bytes

The number of dead bytes (<= used_bytes) byte average

content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.index.memory_usage.allocated_bytes

The number of allocated bytes byte average

content.proton.documentdb.index.memory_usage.used_bytes

The number of used bytes (<= allocated_bytes) byte average

content.proton.documentdb.index.memory_usage.dead_bytes

The number of dead bytes (<= used_bytes) byte average

content.proton.documentdb.index.memory_usage.onhold_bytes

The number of bytes on hold byte average

content.proton.documentdb.matching.queries

Number of queries executed query rate

content.proton.documentdb.matching.soft_doomed_queries

Number of queries hitting the soft timeout query rate

content.proton.documentdb.matching.query_latency

Total average latency (sec) when matching and ranking a query second count, max, sum

content.proton.documentdb.matching.query_setup_time

Average time (sec) spent setting up and tearing down queries second count, max, sum

content.proton.documentdb.matching.docs_matched

Number of documents matched document count, rate

content.proton.documentdb.matching.rank_profile.queries

Number of queries executed query rate

content.proton.documentdb.matching.rank_profile.soft_doomed_queries

Number of queries hitting the soft timeout query rate

content.proton.documentdb.matching.rank_profile.soft_doom_factor

Factor used to compute soft-timeout fraction count, max, min, sum

content.proton.documentdb.matching.rank_profile.query_latency

Total average latency (sec) when matching and ranking a query second count, max, sum

content.proton.documentdb.matching.rank_profile.query_setup_time

Average time (sec) spent setting up and tearing down queries second count, max, sum

content.proton.documentdb.matching.rank_profile.grouping_time

Average time (sec) spent on grouping second count, max, sum

content.proton.documentdb.matching.rank_profile.rerank_time

Average time (sec) spent on 2nd phase ranking second count, max, sum

content.proton.documentdb.matching.rank_profile.docs_matched

Number of documents matched document count, rate

content.proton.documentdb.matching.rank_profile.limited_queries

Number of queries limited in match phase query rate

content.proton.documentdb.feeding.commit.operations

Number of operations included in a commit operation count, max, rate, sum

content.proton.documentdb.feeding.commit.latency

Latency for commit in seconds second count, max, sum

Sentinel Metrics

NameDescriptionUnitSuffixes

sentinel.restarts

Number of service restarts done by the sentinel restart count

sentinel.totalRestarts

Total number of service restarts done by the sentinel since the sentinel was started restart last, max, sum

sentinel.uptime

Time the sentinel has been running second last

sentinel.running

Number of services the sentinel has running currently instance count, last

Slobrok Metrics

NameDescriptionUnitSuffixes

slobrok.heartbeats.failed

Number of heartbeat requests failed request count

slobrok.missing.consensus

Number of seconds without full consensus with all other brokers second count

Storage Metrics

NameDescriptionUnitSuffixes

vds.datastored.alldisks.buckets

Number of buckets managed bucket average

vds.datastored.alldisks.docs

Number of documents stored document average

vds.datastored.alldisks.bytes

Number of bytes stored byte average

vds.visitor.allthreads.averagevisitorlifetime

Average lifetime of a visitor millisecond count, max, sum

vds.visitor.allthreads.averagequeuewait

Average time an operation spends in input queue. millisecond count, max, sum

vds.visitor.allthreads.queuesize

Size of input message queue. operation count, max, sum

vds.visitor.allthreads.completed

Number of visitors completed operation rate

vds.visitor.allthreads.created

Number of visitors created. operation rate

vds.visitor.allthreads.failed

Number of visitors failed operation rate

vds.visitor.allthreads.averagemessagesendtime

Average time it takes for messages to be sent to their target (and be replied to) millisecond count, max, sum

vds.visitor.allthreads.averageprocessingtime

Average time used to process visitor requests millisecond count, max, sum

vds.filestor.queuesize

Size of input message queue. operation count, max, sum

vds.filestor.averagequeuewait

Average time an operation spends in input queue. millisecond count, max, sum

vds.filestor.active_operations.size

Number of concurrent active operations operation count, max, sum

vds.filestor.active_operations.latency

Latency (in ms) for completed operations millisecond count, max, sum

vds.filestor.throttle_window_size

Current size of async operation throttler window size operation count, max, sum

vds.filestor.throttle_waiting_threads

Number of threads waiting to acquire a throttle token thread count, max, sum

vds.filestor.throttle_active_tokens

Current number of active throttle tokens instance count, max, sum

vds.filestor.allthreads.mergemetadatareadlatency

Time spent in a merge step to check metadata of current node to see what data it has. millisecond count, max, sum

vds.filestor.allthreads.mergedatareadlatency

Time spent in a merge step to read data other nodes need. millisecond count, max, sum

vds.filestor.allthreads.mergedatawritelatency

Time spent in a merge step to write data needed to current node. millisecond count, max, sum

vds.filestor.allthreads.merge_put_latency

Latency of individual puts that are part of merge operations millisecond count, max, sum

vds.filestor.allthreads.merge_remove_latency

Latency of individual removes that are part of merge operations millisecond count, max, sum

vds.filestor.allstripes.throttled_rpc_direct_dispatches

Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy instance rate

vds.filestor.allstripes.throttled_persistence_thread_polls

Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy instance rate

vds.filestor.allstripes.timeouts_waiting_for_throttle_token

Number of times a persistence thread timed out waiting for an available throttle policy token instance rate

vds.filestor.allthreads.put.count

Number of requests processed. operation rate

vds.filestor.allthreads.put.failed

Number of failed requests. operation rate

vds.filestor.allthreads.put.test_and_set_failed

Number of operations that were skipped due to a test-and-set condition not met operation rate

vds.filestor.allthreads.put.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.put.request_size

Size of requests, in bytes byte count, max, sum

vds.filestor.allthreads.remove.count

Number of requests processed. operation rate

vds.filestor.allthreads.remove.failed

Number of failed requests. operation rate

vds.filestor.allthreads.remove.test_and_set_failed

Number of operations that were skipped due to a test-and-set condition not met operation rate

vds.filestor.allthreads.remove.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.remove.request_size

Size of requests, in bytes byte count, max, sum

vds.filestor.allthreads.get.count

Number of requests processed. operation rate

vds.filestor.allthreads.get.failed

Number of failed requests. operation rate

vds.filestor.allthreads.get.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.get.request_size

Size of requests, in bytes byte count, max, sum

vds.filestor.allthreads.update.count

Number of requests processed. request rate

vds.filestor.allthreads.update.failed

Number of failed requests. request rate

vds.filestor.allthreads.update.test_and_set_failed

Number of requests that were skipped due to a test-and-set condition not met request rate

vds.filestor.allthreads.update.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.update.request_size

Size of requests, in bytes byte count, max, sum

vds.filestor.allthreads.createiterator.count

Number of requests processed. request rate

vds.filestor.allthreads.createiterator.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.visit.count

Number of requests processed. request rate

vds.filestor.allthreads.visit.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.remove_location.count

Number of requests processed. request rate

vds.filestor.allthreads.remove_location.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.splitbuckets.count

Number of requests processed. request rate

vds.filestor.allthreads.joinbuckets.count

Number of requests processed. request rate

vds.filestor.allthreads.deletebuckets.count

Number of requests processed. request rate

vds.filestor.allthreads.deletebuckets.failed

Number of failed requests. request rate

vds.filestor.allthreads.deletebuckets.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.remove_by_gid.count

Number of requests processed. request rate

vds.filestor.allthreads.remove_by_gid.failed

Number of failed requests. request rate

vds.filestor.allthreads.remove_by_gid.latency

Latency of successful requests. millisecond count, max, sum

vds.filestor.allthreads.setbucketstates.count

Number of requests processed. request rate

vds.mergethrottler.averagequeuewaitingtime

Time merges spent in the throttler queue millisecond count, max, sum

vds.mergethrottler.queuesize

Length of merge queue instance count, max, sum

vds.mergethrottler.active_window_size

Number of merges active within the pending window size instance count, max, sum

vds.mergethrottler.estimated_merge_memory_usage

An estimated upper bound of the memory usage (in bytes) of the merges currently in the active window byte count, max, sum

vds.mergethrottler.bounced_due_to_back_pressure

Number of merges bounced due to resource exhaustion back-pressure instance rate

vds.mergethrottler.locallyexecutedmerges.ok

The number of successful merges for 'locallyexecutedmerges' instance rate

vds.mergethrottler.mergechains.ok

The number of successful merges for 'mergechains' operation rate

vds.mergethrottler.mergechains.failures.busy

The number of merges that failed because the storage node was busy operation rate

vds.mergethrottler.mergechains.failures.total

Sum of all failures operation rate

vds.server.network.tls-handshakes-failed

Number of client or server connection attempts that failed during TLS handshaking operation count

vds.server.network.peer-authorization-failures

Number of TLS connection attempts failed due to bad or missing peer certificate credentials failure count

vds.server.network.client.tls-connections-established

Number of secure mTLS connections established connection count

vds.server.network.server.tls-connections-established

Number of secure mTLS connections established connection count

vds.server.network.client.insecure-connections-established

Number of insecure (plaintext) connections established connection count

vds.server.network.server.insecure-connections-established

Number of insecure (plaintext) connections established connection count

vds.server.network.tls-connections-broken

Number of TLS connections broken due to failures during frame encoding or decoding connection count

vds.server.network.failed-tls-config-reloads

Number of times background reloading of TLS config has failed failure count

vds.server.network.rpc-capability-checks-failed

Number of RPC operations that failed to due one or more missing capabilities failure count

vds.server.network.status-capability-checks-failed

Number of status page operations that failed to due one or more missing capabilities failure count

vds.server.fnet.num-connections

Total number of connection objects connection count