Vespa Metric Set

This document provides reference documentation for the Vespa metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.

ClusterController Metrics

NameUnitSuffixesDescription

cluster-controller.down.count

node last, max Number of content nodes down

cluster-controller.initializing.count

node last, max Number of content nodes initializing

cluster-controller.maintenance.count

node last, max Number of content nodes in maintenance

cluster-controller.retired.count

node last, max Number of content nodes that are retired

cluster-controller.stopping.count

node last Number of content nodes currently stopping

cluster-controller.up.count

node last, max Number of content nodes up

cluster-controller.nodes-not-converged

node max Number of nodes not converging to the latest cluster state version

cluster-controller.cluster-buckets-out-of-sync-ratio

fraction max Ratio of buckets in the cluster currently in need of syncing

cluster-controller.busy-tick-time-ms

millisecond count, last, max, sum Time busy

cluster-controller.idle-tick-time-ms

millisecond count, last, max, sum Time idle

cluster-controller.work-ms

millisecond count, last, sum Time used for actual work

cluster-controller.is-master

binary last, max 1 if this cluster controller is currently the master, or 0 if not

cluster-controller.remote-task-queue.size

operation last Number of remote tasks queued

cluster-controller.resource_usage.nodes_above_limit

node last, max The number of content nodes above resource limit, blocking feed

cluster-controller.resource_usage.max_memory_utilization

fraction last, max Current memory utilisation, for content node with highest value

cluster-controller.resource_usage.max_disk_utilization

fraction last, max Current disk space utilisation, for content node with highest value

cluster-controller.resource_usage.memory_limit

fraction last, max Memory space limit as a fraction of available memory

cluster-controller.resource_usage.disk_limit

fraction last, max Disk space limit as a fraction of available disk space

reindexing.progress

fraction last, max Re-indexing progress

Container Metrics

NameUnitSuffixesDescription

http.status.1xx

response rate Number of responses with a 1xx status

http.status.2xx

response rate Number of responses with a 2xx status

http.status.3xx

response rate Number of responses with a 3xx status

http.status.4xx

response rate Number of responses with a 4xx status

http.status.5xx

response rate Number of responses with a 5xx status

application_generation

version N/A The currently live application config generation (aka session id)

jdisc.gc.count

operation average, last, max Number of JVM garbage collections done

jdisc.gc.ms

millisecond average, last, max Time spent in JVM garbage collection

jdisc.jvm

version last JVM runtime version

jdisc.memory_mappings

operation max JDISC Memory mappings

jdisc.open_file_descriptors

item max JDISC Open file descriptors

jdisc.thread_pool.unhandled_exceptions

thread count, last, max, min, sum Number of exceptions thrown by tasks

jdisc.thread_pool.work_queue.capacity

thread count, last, max, min, sum Capacity of the task queue

jdisc.thread_pool.work_queue.size

thread count, last, max, min, sum Size of the task queue

jdisc.thread_pool.rejected_tasks

thread count, last, max, min, sum Number of tasks rejected by the thread pool

jdisc.thread_pool.size

thread count, last, max, min, sum Size of the thread pool

jdisc.thread_pool.max_allowed_size

thread count, last, max, min, sum The maximum allowed number of threads in the pool

jdisc.thread_pool.active_threads

thread count, last, max, min, sum Number of threads that are active

jdisc.deactivated_containers.total

item last, sum JDISC Deactivated container instances

jdisc.deactivated_containers.with_retained_refs.last

item last JDISC Deactivated container nodes with retained refs

jdisc.application.failed_component_graphs

item rate JDISC Application failed component graphs

jdisc.application.component_graph.creation_time_millis

millisecond last JDISC Application component graph creation time

jdisc.application.component_graph.reconfigurations

item rate JDISC Application component graph reconfigurations

jdisc.singleton.is_active

item last, max, min JDISC Singleton is active

jdisc.singleton.activation.count

operation last JDISC Singleton activations

jdisc.singleton.activation.failure.count

operation last JDISC Singleton activation failures

jdisc.singleton.activation.millis

millisecond last JDISC Singleton activation time

jdisc.singleton.deactivation.count

operation last JDISC Singleton deactivations

jdisc.singleton.deactivation.failure.count

operation last JDISC Singleton deactivation failures

jdisc.singleton.deactivation.millis

millisecond last JDISC Singleton deactivation time

jdisc.http.ssl.handshake.failure.missing_client_cert

operation rate JDISC HTTP SSL Handshake failures due to missing client certificate

jdisc.http.ssl.handshake.failure.expired_client_cert

operation rate JDISC HTTP SSL Handshake failures due to expired client certificate

jdisc.http.ssl.handshake.failure.invalid_client_cert

operation rate JDISC HTTP SSL Handshake failures due to invalid client certificate

jdisc.http.ssl.handshake.failure.incompatible_protocols

operation rate JDISC HTTP SSL Handshake failures due to incompatible protocols

jdisc.http.ssl.handshake.failure.incompatible_chifers

operation rate JDISC HTTP SSL Handshake failures due to incompatible chifers

jdisc.http.ssl.handshake.failure.connection_closed

operation rate JDISC HTTP SSL Handshake failures due to connection closed

jdisc.http.ssl.handshake.failure.unknown

operation rate JDISC HTTP SSL Handshake failures for unknown reason

jdisc.http.request.prematurely_closed

request rate HTTP requests prematurely closed

jdisc.http.request.requests_per_connection

request average, count, max, min, sum HTTP requests per connection

jdisc.http.request.uri_length

byte count, max, sum HTTP URI length

jdisc.http.request.content_size

byte count, max, sum HTTP request content size

jdisc.http.requests

request count, rate HTTP requests

jdisc.http.filter.rule.blocked_requests

request rate Number of requests blocked by filter

jdisc.http.filter.rule.allowed_requests

request rate Number of requests allowed by filter

jdisc.http.filtering.request.handled

request rate Number of filtering requests handled

jdisc.http.filtering.request.unhandled

request rate Number of filtering requests unhandled

jdisc.http.filtering.response.handled

request rate Number of filtering responses handled

jdisc.http.filtering.response.unhandled

request rate Number of filtering responses unhandled

jdisc.http.handler.unhandled_exceptions

request rate Number of unhandled exceptions in handler

jdisc.tls.capability_checks.succeeded

operation rate Number of TLS capability checks succeeded

jdisc.tls.capability_checks.failed

operation rate Number of TLS capability checks failed

jdisc.http.jetty.threadpool.thread.max

thread count, last, max, min, sum Configured maximum number of threads

jdisc.http.jetty.threadpool.thread.min

thread count, last, max, min, sum Configured minimum number of threads

jdisc.http.jetty.threadpool.thread.reserved

thread count, last, max, min, sum Configured number of reserved threads or -1 for heuristic

jdisc.http.jetty.threadpool.thread.busy

thread count, last, max, min, sum Number of threads executing internal and transient jobs

jdisc.http.jetty.threadpool.thread.total

thread count, last, max, min, sum Current number of threads

jdisc.http.jetty.threadpool.queue.size

thread count, last, max, min, sum Current size of the job queue

serverNumOpenConnections

connection average, last, max The number of currently open connections

serverNumConnections

connection average, last, max The total number of connections opened

serverBytesReceived

byte count, sum The number of bytes received by the server

serverBytesSent

byte count, sum The number of bytes sent from the server

handled.requests

operation count The number of requests handled per metrics snapshot

handled.latency

millisecond count, max, sum The time used for requests during this metrics snapshot

httpapi_latency

millisecond count, max, sum Duration for requests to the HTTP document APIs

httpapi_pending

operation count, max, sum Document operations pending execution

httpapi_num_operations

operation rate Total number of document operations performed

httpapi_num_updates

operation rate Document update operations performed

httpapi_num_removes

operation rate Document remove operations performed

httpapi_num_puts

operation rate Document put operations performed

httpapi_succeeded

operation rate Document operations that succeeded

httpapi_failed

operation rate Document operations that failed

httpapi_parse_error

operation rate Document operations that failed due to document parse errors

httpapi_condition_not_met

operation rate Document operations not applied due to condition not met

httpapi_not_found

operation rate Document operations not applied due to document not found

httpapi_failed_unknown

operation rate Document operations failed by unknown cause

httpapi_failed_timeout

operation rate Document operations failed by timeout

httpapi_failed_insufficient_storage

operation rate Document operations failed by insufficient storage

mem.heap.total

byte average Total available heap memory

mem.heap.free

byte average Free heap memory

mem.heap.used

byte average, max Currently used heap memory

mem.direct.total

byte average Total available direct memory

mem.direct.free

byte average Currently free direct memory

mem.direct.used

byte average, max Direct memory currently used

mem.direct.count

byte max Number of direct memory allocations

mem.native.total

byte average Total available native memory

mem.native.free

byte average Currently free native memory

mem.native.used

byte average Native memory currently used

athenz-tenant-cert.expiry.seconds

second last, max, min Time remaining until Athenz tenant certificate expires

container-iam-role.expiry.seconds

second N/A Time remaining until IAM role expires

peak_qps

query_per_second max The highest number of qps for a second for this metrics snapshot

search_connections

connection count, max, sum Number of search connections

feed.operations

operation rate Number of document feed operations

feed.latency

millisecond count, max, sum Feed latency

feed.http-requests

operation count, rate Feed HTTP requests

queries

operation rate Query volume

query_container_latency

millisecond count, max, sum The query execution time consumed in the container

query_latency

millisecond count, max, sum The overall query latency as seen by the container

query_timeout

millisecond count, max, min, sum The amount of time allowed for query execution, from the client

failed_queries

operation rate The number of failed queries

degraded_queries

operation rate The number of degraded queries, e.g. due to some content nodes not responding in time

hits_per_query

hit_per_query count, max, sum The number of hits returned

query_hit_offset

hit count, max, sum The offset for hits returned

documents_covered

document count The combined number of documents considered during query evaluation

documents_total

document count The number of documents to be evaluated if all requests had been fully executed

documents_target_total

document count The target number of total documents to be evaluated when when all data is in sync

jdisc.render.latency

nanosecond average, count, last, max, min, sum The time used by the container to render responses

query_item_count

item count, max, sum The number of query items (terms, phrases, etc)

docproc.proctime

millisecond count, max, sum Time spent processing document

docproc.documents

document count, max, min, sum Number of processed documents

totalhits_per_query

hit_per_query count, max, sum The total number of documents found to match queries

empty_results

operation rate Number of queries matching no documents

requestsOverQuota

operation count, rate The number of requests rejected due to exceeding quota

relevance.at_1

score count, sum The relevance of hit number 1

relevance.at_3

score count, sum The relevance of hit number 3

relevance.at_10

score count, sum The relevance of hit number 10

error.timeout

operation rate Requests that timed out

error.backends_oos

operation rate Requests that failed due to no available backends nodes

error.plugin_failure

operation rate Requests that failed due to plugin failure

error.backend_communication_error

operation rate Requests that failed due to backend communication error

error.empty_document_summaries

operation rate Requests that failed due to missing document summaries

error.invalid_query_parameter

operation rate Requests that failed due to invalid query parameters

error.internal_server_error

operation rate Requests that failed due to internal server error

error.misconfigured_server

operation rate Requests that failed due to misconfigured server

error.invalid_query_transformation

operation rate Requests that failed due to invalid query transformation

error.results_with_errors

operation rate The number of queries with error payload

error.unspecified

operation rate Requests that failed for an unspecified reason

error.unhandled_exception

operation rate Requests that failed due to an unhandled exception

serverRejectedRequests

operation count, rate Deprecated. Use jdisc.thread_pool.rejected_tasks instead.

serverThreadPoolSize

thread last, max Deprecated. Use jdisc.thread_pool.size instead.

serverActiveThreads

thread count, last, max, min, sum Deprecated. Use jdisc.thread_pool.active_threads instead.

jrt.transport.tls-certificate-verification-failures

failure N/A TLS certificate verification failures

jrt.transport.peer-authorization-failures

failure N/A TLS peer authorization failures

jrt.transport.server.tls-connections-established

connection N/A TLS server connections established

jrt.transport.client.tls-connections-established

connection N/A TLS client connections established

jrt.transport.server.unencrypted-connections-established

connection N/A Unencrypted server connections established

jrt.transport.client.unencrypted-connections-established

connection N/A Unencrypted client connections established

embedder.latency

millisecond count, max, sum Time spent creating an embedding

embedder.sequence_length

byte count, max, sum Size of sequence produced by tokenizer

Distributor Metrics

NameUnitSuffixesDescription

vds.idealstate.buckets_rechecking

bucket average The number of buckets that we are rechecking for ideal state operations

vds.idealstate.idealstate_diff

bucket average A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state

vds.idealstate.buckets_toofewcopies

bucket average The number of buckets the distributor controls that have less than the desired redundancy

vds.idealstate.buckets_toomanycopies

bucket average The number of buckets the distributor controls that have more than the desired redundancy

vds.idealstate.buckets

bucket average The number of buckets the distributor controls

vds.idealstate.buckets_notrusted

bucket average The number of buckets that have no trusted copies.

vds.idealstate.bucket_replicas_moving_out

bucket average Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority.

vds.idealstate.bucket_replicas_copying_out

bucket average Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge

vds.idealstate.bucket_replicas_copying_in

bucket average Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for

vds.idealstate.bucket_replicas_syncing

bucket average Bucket replicas that need syncing due to mismatching metadata

vds.idealstate.max_observed_time_since_last_gc_sec

second average Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor.

vds.idealstate.delete_bucket.done_ok

operation rate The number of operations successfully performed

vds.idealstate.delete_bucket.done_failed

operation rate The number of operations that failed

vds.idealstate.delete_bucket.pending

operation average The number of operations pending

vds.idealstate.merge_bucket.done_ok

operation rate The number of operations successfully performed

vds.idealstate.merge_bucket.done_failed

operation rate The number of operations that failed

vds.idealstate.merge_bucket.pending

operation average The number of operations pending

vds.idealstate.merge_bucket.blocked

operation rate The number of operations blocked by blocking operation starter

vds.idealstate.merge_bucket.throttled

operation rate The number of operations throttled by throttling operation starter

vds.idealstate.merge_bucket.source_only_copy_changed

operation rate The number of merge operations where source-only copy changed

vds.idealstate.merge_bucket.source_only_copy_delete_blocked

operation rate The number of merge operations where delete of unchanged source-only copies was blocked

vds.idealstate.merge_bucket.source_only_copy_delete_failed

operation rate The number of merge operations where delete of unchanged source-only copies failed

vds.idealstate.split_bucket.done_ok

operation rate The number of operations successfully performed

vds.idealstate.split_bucket.done_failed

operation rate The number of operations that failed

vds.idealstate.split_bucket.pending

operation average The number of operations pending

vds.idealstate.join_bucket.done_ok

operation rate The number of operations successfully performed

vds.idealstate.join_bucket.done_failed

operation rate The number of operations that failed

vds.idealstate.join_bucket.pending

operation average The number of operations pending

vds.idealstate.garbage_collection.done_ok

operation rate The number of operations successfully performed

vds.idealstate.garbage_collection.done_failed

operation rate The number of operations that failed

vds.idealstate.garbage_collection.pending

operation average The number of operations pending

vds.idealstate.garbage_collection.documents_removed

document count, rate Number of documents removed by GC operations

vds.distributor.puts.latency

millisecond count, max, sum The latency of put operations

vds.distributor.puts.ok

operation rate The number of successful put operations performed

vds.distributor.puts.failures.total

operation rate Sum of all failures

vds.distributor.puts.failures.notfound

operation rate The number of operations that failed because the document did not exist

vds.distributor.puts.failures.test_and_set_failed

operation rate The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document

vds.distributor.puts.failures.concurrent_mutations

operation rate The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID

vds.distributor.puts.failures.notconnected

operation rate The number of operations discarded because there were no available storage nodes to send to

vds.distributor.puts.failures.notready

operation rate The number of operations discarded because distributor was not ready

vds.distributor.puts.failures.wrongdistributor

operation rate The number of operations discarded because they were sent to the wrong distributor

vds.distributor.puts.failures.safe_time_not_reached

operation rate The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed

vds.distributor.puts.failures.storagefailure

operation rate The number of operations that failed in storage

vds.distributor.puts.failures.timeout

operation rate The number of operations that failed because the operation timed out towards storage

vds.distributor.puts.failures.busy

operation rate The number of messages from storage that failed because the storage node was busy

vds.distributor.puts.failures.inconsistent_bucket

operation rate The number of operations failed due to buckets being in an inconsistent state or not found

vds.distributor.removes.latency

millisecond count, max, sum The latency of remove operations

vds.distributor.removes.ok

operation rate The number of successful removes operations performed

vds.distributor.removes.failures.total

operation rate Sum of all failures

vds.distributor.removes.failures.notfound

operation rate The number of operations that failed because the document did not exist

vds.distributor.removes.failures.test_and_set_failed

operation rate The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document

vds.distributor.removes.failures.concurrent_mutations

operation rate The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID

vds.distributor.updates.latency

millisecond count, max, sum The latency of update operations

vds.distributor.updates.ok

operation rate The number of successful updates operations performed

vds.distributor.updates.failures.total

operation rate Sum of all failures

vds.distributor.updates.failures.notfound

operation rate The number of operations that failed because the document did not exist

vds.distributor.updates.failures.test_and_set_failed

operation rate The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document

vds.distributor.updates.failures.concurrent_mutations

operation rate The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID

vds.distributor.updates.diverging_timestamp_updates

operation rate Number of updates that report they were performed against divergent version timestamps on different replicas

vds.distributor.removelocations.ok

operation rate The number of successful removelocations operations performed

vds.distributor.removelocations.failures.total

operation rate Sum of all failures

vds.distributor.gets.latency

millisecond count, max, sum The average latency of gets operations

vds.distributor.gets.ok

operation rate The number of successful gets operations performed

vds.distributor.gets.failures.total

operation rate Sum of all failures

vds.distributor.gets.failures.notfound

operation rate The number of operations that failed because the document did not exist

vds.distributor.visitor.latency

millisecond count, max, sum The average latency of visitor operations

vds.distributor.visitor.ok

operation rate The number of successful visitor operations performed

vds.distributor.visitor.failures.total

operation rate Sum of all failures

vds.distributor.visitor.failures.notready

operation rate The number of operations discarded because distributor was not ready

vds.distributor.visitor.failures.notconnected

operation rate The number of operations discarded because there were no available storage nodes to send to

vds.distributor.visitor.failures.wrongdistributor

operation rate The number of operations discarded because they were sent to the wrong distributor

vds.distributor.visitor.failures.safe_time_not_reached

operation rate The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed

vds.distributor.visitor.failures.storagefailure

operation rate The number of operations that failed in storage

vds.distributor.visitor.failures.timeout

operation rate The number of operations that failed because the operation timed out towards storage

vds.distributor.visitor.failures.busy

operation rate The number of messages from storage that failed because the storage node was busy

vds.distributor.visitor.failures.inconsistent_bucket

operation rate The number of operations failed due to buckets being in an inconsistent state or not found

vds.distributor.visitor.failures.notfound

operation rate The number of operations that failed because the document did not exist

vds.distributor.docsstored

document average Number of documents stored in all buckets controlled by this distributor

vds.distributor.bytesstored

byte average Number of bytes stored in all buckets controlled by this distributor

vds.bouncer.clock_skew_aborts

operation count Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range

Logd Metrics

NameUnitSuffixesDescription

logd.processed.lines

item count Number of log lines processed

NodeAdmin Metrics

NameUnitSuffixesDescription

endpoint.certificate.expiry.seconds

second N/A Time until node endpoint certificate expires

node-certificate.expiry.seconds

second N/A Time until node certificate expires

SearchNode Metrics

NameUnitSuffixesDescription

content.proton.config.generation

version last The oldest config generation used by this search node

content.proton.documentdb.documents.total

document last, max The total number of documents in this documents db (ready + not-ready)

content.proton.documentdb.documents.ready

document last, max The number of ready documents in this document db

content.proton.documentdb.documents.active

document last, max The number of active / searchable documents in this document db

content.proton.documentdb.documents.removed

document last, max The number of removed documents in this document db

content.proton.documentdb.index.docs_in_memory

document last, max Number of documents in memory index

content.proton.documentdb.disk_usage

byte last The total disk usage (in bytes) for this document db

content.proton.documentdb.memory_usage.allocated_bytes

byte max The number of allocated bytes

content.proton.documentdb.heart_beat_age

second last, min How long ago (in seconds) heart beat maintenace job was run

content.proton.docsum.docs

document rate Total docsums returned

content.proton.docsum.latency

millisecond count, max, sum Docsum request latency

content.proton.search_protocol.query.latency

second count, max, sum Query request latency (seconds)

content.proton.search_protocol.query.request_size

byte count, max, sum Query request size (network bytes)

content.proton.search_protocol.query.reply_size

byte count, max, sum Query reply size (network bytes)

content.proton.search_protocol.docsum.latency

second average, count, max, sum Docsum request latency (seconds)

content.proton.search_protocol.docsum.request_size

byte count, max, sum Docsum request size (network bytes)

content.proton.search_protocol.docsum.reply_size

byte count, max, sum Docsum reply size (network bytes)

content.proton.search_protocol.docsum.requested_documents

document count, max, sum Total requested document summaries

content.proton.executor.proton.queuesize

task count, max, sum Size of executor proton task queue

content.proton.executor.proton.accepted

task rate Number of executor proton accepted tasks

content.proton.executor.proton.wakeups

wakeup rate Number of times a executor proton worker thread has been woken up

content.proton.executor.proton.utilization

fraction count, max, sum Ratio of time the executor proton worker threads has been active

content.proton.executor.flush.queuesize

task count, max, sum Size of executor flush task queue

content.proton.executor.flush.accepted

task rate Number of accepted executor flush tasks

content.proton.executor.flush.wakeups

wakeup rate Number of times a executor flush worker thread has been woken up

content.proton.executor.flush.utilization

fraction count, max, sum Ratio of time the executor flush worker threads has been active

content.proton.executor.match.queuesize

task count, max, sum Size of executor match task queue

content.proton.executor.match.accepted

task rate Number of accepted executor match tasks

content.proton.executor.match.wakeups

wakeup rate Number of times a executor match worker thread has been woken up

content.proton.executor.match.utilization

fraction count, max, sum Ratio of time the executor match worker threads has been active

content.proton.executor.docsum.queuesize

task count, max, sum Size of executor docsum task queue

content.proton.executor.docsum.accepted

task rate Number of executor accepted docsum tasks

content.proton.executor.docsum.wakeups

wakeup rate Number of times a executor docsum worker thread has been woken up

content.proton.executor.docsum.utilization

fraction count, max, sum Ratio of time the executor docsum worker threads has been active

content.proton.executor.shared.queuesize

task count, max, sum Size of executor shared task queue

content.proton.executor.shared.accepted

task rate Number of executor shared accepted tasks

content.proton.executor.shared.wakeups

wakeup rate Number of times a executor shared worker thread has been woken up

content.proton.executor.shared.utilization

fraction count, max, sum Ratio of time the executor shared worker threads has been active

content.proton.executor.warmup.queuesize

task count, max, sum Size of executor warmup task queue

content.proton.executor.warmup.accepted

task rate Number of accepted executor warmup tasks

content.proton.executor.warmup.wakeups

wakeup rate Number of times a warmup executor worker thread has been woken up

content.proton.executor.warmup.utilization

fraction count, max, sum Ratio of time the executor warmup worker threads has been active

content.proton.executor.field_writer.queuesize

task count, max, sum Size of executor field writer task queue

content.proton.executor.field_writer.accepted

task rate Number of accepted executor field writer tasks

content.proton.executor.field_writer.wakeups

wakeup rate Number of times a executor field writer worker thread has been woken up

content.proton.executor.field_writer.utilization

fraction count, max, sum Ratio of time the executor fieldwriter worker threads has been active

content.proton.executor.field_writer.saturation

fraction count, max, sum Ratio indicating the max saturation of underlying worker threads. A higher saturation than utilization indicates a bottleneck in one of the worker threads.

content.proton.documentdb.job.total

fraction average The job load average total of all job metrics

content.proton.documentdb.job.attribute_flush

fraction average Flushing of attribute vector(s) to disk

content.proton.documentdb.job.memory_index_flush

fraction average Flushing of memory index to disk

content.proton.documentdb.job.disk_index_fusion

fraction average Fusion of disk indexes

content.proton.documentdb.job.document_store_flush

fraction average Flushing of document store to disk

content.proton.documentdb.job.document_store_compact

fraction average Compaction of document store on disk

content.proton.documentdb.job.bucket_move

fraction average Moving of buckets between 'ready' and 'notready' sub databases

content.proton.documentdb.job.lid_space_compact

fraction average Compaction of lid space in document meta store and attribute vectors

content.proton.documentdb.job.removed_documents_prune

fraction average Pruning of removed documents in 'removed' sub database

content.proton.documentdb.threading_service.master.queuesize

task count, max, sum Size of threading service master task queue

content.proton.documentdb.threading_service.master.accepted

task rate Number of accepted threading service master tasks

content.proton.documentdb.threading_service.master.wakeups

wakeup rate Number of times a threading service master worker thread has been woken up

content.proton.documentdb.threading_service.master.utilization

fraction count, max, sum Ratio of time the threading service master worker threads has been active

content.proton.documentdb.threading_service.index.queuesize

task count, max, sum Size of threading service index task queue

content.proton.documentdb.threading_service.index.accepted

task rate Number of accepted threading service index tasks

content.proton.documentdb.threading_service.index.wakeups

wakeup rate Number of times a threading service index worker thread has been woken up

content.proton.documentdb.threading_service.index.utilization

fraction count, max, sum Ratio of time the threading service index worker threads has been active

content.proton.documentdb.threading_service.summary.queuesize

task count, max, sum Size of threading service summary task queue

content.proton.documentdb.threading_service.summary.accepted

task rate Number of accepted threading service summary tasks

content.proton.documentdb.threading_service.summary.wakeups

wakeup rate Number of times a threading service summary worker thread has been woken up

content.proton.documentdb.threading_service.summary.utilization

fraction count, max, sum Ratio of time the threading service summary worker threads has been active

content.proton.documentdb.ready.lid_space.lid_bloat_factor

fraction average The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)

content.proton.documentdb.ready.lid_space.lid_fragmentation_factor

fraction average The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)

content.proton.documentdb.ready.lid_space.lid_limit

documentid last, max The size of the allocated lid space

content.proton.documentdb.ready.lid_space.highest_used_lid

documentid last, max The highest used lid

content.proton.documentdb.ready.lid_space.used_lids

documentid last, max The number of lids used

content.proton.documentdb.notready.lid_space.lid_bloat_factor

fraction average The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)

content.proton.documentdb.notready.lid_space.lid_fragmentation_factor

fraction average The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)

content.proton.documentdb.notready.lid_space.lid_limit

documentid last, max The size of the allocated lid space

content.proton.documentdb.notready.lid_space.highest_used_lid

documentid last, max The highest used lid

content.proton.documentdb.notready.lid_space.used_lids

documentid last, max The number of lids used

content.proton.documentdb.removed.lid_space.lid_bloat_factor

fraction average The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit)

content.proton.documentdb.removed.lid_space.lid_fragmentation_factor

fraction average The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid)

content.proton.documentdb.removed.lid_space.lid_limit

documentid last, max The size of the allocated lid space

content.proton.documentdb.removed.lid_space.highest_used_lid

documentid last, max The highest used lid

content.proton.documentdb.removed.lid_space.used_lids

documentid last, max The number of lids used

content.proton.documentdb.bucket_move.buckets_pending

bucket last, max, sum The number of buckets left to move

content.proton.resource_usage.disk

fraction average The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller

content.proton.resource_usage.disk_usage.total

fraction max The total relative amount of disk used by this content node (value in the range [0, 1])

content.proton.resource_usage.disk_usage.total_utilization

fraction max The relative amount of disk used compared to the content node disk resource limit

content.proton.resource_usage.disk_usage.transient

fraction max The relative amount of transient disk used by this content node (value in the range [0, 1])

content.proton.resource_usage.memory

fraction average The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller

content.proton.resource_usage.memory_usage.total

fraction max The total relative amount of memory used by this content node (value in the range [0, 1])

content.proton.resource_usage.memory_usage.total_utilization

fraction max The relative amount of memory used compared to the content node memory resource limit

content.proton.resource_usage.memory_usage.transient

fraction max The relative amount of transient memory used by this content node (value in the range [0, 1])

content.proton.resource_usage.memory_mappings

file max The number of memory mapped files

content.proton.resource_usage.open_file_descriptors

file max The number of open files

content.proton.resource_usage.feeding_blocked

binary last, max Whether feeding is blocked due to resource limits being reached (value is either 0 or 1)

content.proton.resource_usage.malloc_arena

byte max Size of malloc arena

content.proton.documentdb.attribute.resource_usage.address_space

fraction max The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1])

content.proton.documentdb.attribute.resource_usage.feeding_blocked

binary max Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1)

content.proton.resource_usage.cpu_util.setup

fraction count, max, sum cpu used by system init and (re-)configuration

content.proton.resource_usage.cpu_util.read

fraction count, max, sum cpu used by reading data from the system

content.proton.resource_usage.cpu_util.write

fraction count, max, sum cpu used by writing data to the system

content.proton.resource_usage.cpu_util.compact

fraction count, max, sum cpu used by internal data re-structuring

content.proton.resource_usage.cpu_util.other

fraction count, max, sum cpu used by work not classified as a specific category

content.proton.transactionlog.entries

record average The current number of entries in the transaction log

content.proton.transactionlog.disk_usage

byte average The disk usage (in bytes) of the transaction log

content.proton.transactionlog.replay_time

second last, max The replay time (in seconds) of the transaction log during start-up

content.proton.documentdb.ready.document_store.disk_usage

byte average Disk space usage in bytes

content.proton.documentdb.ready.document_store.disk_bloat

byte average Disk space bloat in bytes

content.proton.documentdb.ready.document_store.max_bucket_spread

fraction average Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)

content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes

byte average The number of allocated bytes

content.proton.documentdb.ready.document_store.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes)

content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes

byte average The number of bytes on hold

content.proton.documentdb.notready.document_store.disk_usage

byte average Disk space usage in bytes

content.proton.documentdb.notready.document_store.disk_bloat

byte average Disk space bloat in bytes

content.proton.documentdb.notready.document_store.max_bucket_spread

fraction average Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)

content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes

byte average The number of allocated bytes

content.proton.documentdb.notready.document_store.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes)

content.proton.documentdb.notready.document_store.memory_usage.dead_bytes

byte average The number of dead bytes (<= used_bytes)

content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes

byte average The number of bytes on hold

content.proton.documentdb.removed.document_store.disk_usage

byte average Disk space usage in bytes

content.proton.documentdb.removed.document_store.disk_bloat

byte average Disk space bloat in bytes

content.proton.documentdb.removed.document_store.max_bucket_spread

fraction average Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file)

content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes

byte average The number of allocated bytes

content.proton.documentdb.removed.document_store.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes)

content.proton.documentdb.removed.document_store.memory_usage.dead_bytes

byte average The number of dead bytes (<= used_bytes)

content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes

byte average The number of bytes on hold

content.proton.documentdb.ready.document_store.cache.memory_usage

byte average Memory usage of the cache (in bytes)

content.proton.documentdb.ready.document_store.cache.hit_rate

fraction average Rate of hits in the cache compared to number of lookups

content.proton.documentdb.ready.document_store.cache.lookups

operation rate Number of lookups in the cache (hits + misses)

content.proton.documentdb.ready.document_store.cache.invalidations

operation rate Number of invalidations (erased elements) in the cache.

content.proton.documentdb.notready.document_store.cache.memory_usage

byte average Memory usage of the cache (in bytes)

content.proton.documentdb.notready.document_store.cache.hit_rate

fraction average Rate of hits in the cache compared to number of lookups

content.proton.documentdb.notready.document_store.cache.lookups

operation rate Number of lookups in the cache (hits + misses)

content.proton.documentdb.notready.document_store.cache.invalidations

operation rate Number of invalidations (erased elements) in the cache.

content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes

byte average The number of allocated bytes

content.proton.documentdb.ready.attribute.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes)

content.proton.documentdb.ready.attribute.memory_usage.dead_bytes

byte average The number of dead bytes (<= used_bytes)

content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes

byte average The number of bytes on hold

content.proton.documentdb.ready.attribute.disk_usage

byte average Disk space usage (in bytes) of the flushed snapshot of this attribute for this document type

content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes

byte average The number of allocated bytes

content.proton.documentdb.notready.attribute.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes)

content.proton.documentdb.notready.attribute.memory_usage.dead_bytes

byte average The number of dead bytes (<= used_bytes)

content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes

byte average The number of bytes on hold

content.proton.index.cache.postinglist.memory_usage

byte average Memory usage of the cache (in bytes). Contains disk index posting list files across all document types

content.proton.index.cache.postinglist.hit_rate

fraction average Rate of hits in the cache compared to number of lookups. Contains disk index posting list files across all document types

content.proton.index.cache.postinglist.lookups

operation rate Number of lookups in the cache (hits + misses). Contains disk index posting list files across all document types

content.proton.index.cache.postinglist.invalidations

operation rate Number of invalidations (erased elements) in the cache. Contains disk index posting list files across all document types

content.proton.index.cache.bitvector.memory_usage

byte average Memory usage of the cache (in bytes). Contains disk index bitvector files across all document types

content.proton.index.cache.bitvector.hit_rate

fraction average Rate of hits in the cache compared to number of lookups. Contains disk index bitvector files across all document types

content.proton.index.cache.bitvector.lookups

operation rate Number of lookups in the cache (hits + misses). Contains disk index bitvector files across all document types

content.proton.index.cache.bitvector.invalidations

operation rate Number of invalidations (erased elements) in the cache. Contains disk index bitvector files across all document types

content.proton.documentdb.index.memory_usage.allocated_bytes

byte average The number of allocated bytes for the memory index for this document type

content.proton.documentdb.index.memory_usage.used_bytes

byte average The number of used bytes (<= allocated_bytes) for the memory index for this document type

content.proton.documentdb.index.memory_usage.dead_bytes

byte average The number of dead bytes (<= used_bytes) for the memory index for this document type

content.proton.documentdb.index.memory_usage.onhold_bytes

byte average The number of bytes on hold for the memory index for this document type

content.proton.documentdb.index.io.search.read_bytes

byte count, sum Bytes read from disk index posting list and bitvector files as part of search for this document type

content.proton.documentdb.index.io.search.cached_read_bytes

byte count, sum Bytes read from cached disk index posting list and bitvector files as part of search for this document type

content.proton.documentdb.ready.index.disk_usage

byte average Disk space usage (in bytes) of this index field in all disk indexes for this document type

content.proton.documentdb.matching.queries

query rate Number of queries executed

content.proton.documentdb.matching.soft_doomed_queries

query rate Number of queries hitting the soft timeout

content.proton.documentdb.matching.query_latency

second count, max, sum Total average latency (sec) when matching and ranking a query

content.proton.documentdb.matching.query_setup_time

second count, max, sum Average time (sec) spent setting up and tearing down queries

content.proton.documentdb.matching.docs_matched

document count, rate Number of documents matched

content.proton.documentdb.matching.rank_profile.queries

query rate Number of queries executed

content.proton.documentdb.matching.rank_profile.soft_doomed_queries

query rate Number of queries hitting the soft timeout

content.proton.documentdb.matching.rank_profile.soft_doom_factor

fraction count, max, min, sum Factor used to compute soft-timeout

content.proton.documentdb.matching.rank_profile.query_latency

second count, max, sum Total average latency (sec) when matching and ranking a query

content.proton.documentdb.matching.rank_profile.query_setup_time

second count, max, sum Average time (sec) spent setting up and tearing down queries

content.proton.documentdb.matching.rank_profile.grouping_time

second count, max, sum Average time (sec) spent on grouping

content.proton.documentdb.matching.rank_profile.rerank_time

second count, max, sum Average time (sec) spent on 2nd phase ranking

content.proton.documentdb.matching.rank_profile.docs_matched

document count, rate Number of documents matched

content.proton.documentdb.matching.rank_profile.limited_queries

query rate Number of queries limited in match phase

content.proton.documentdb.feeding.commit.operations

operation count, max, rate, sum Number of operations included in a commit

content.proton.documentdb.feeding.commit.latency

second count, max, sum Latency for commit in seconds

Sentinel Metrics

NameUnitSuffixesDescription

sentinel.restarts

restart count Number of service restarts done by the sentinel

sentinel.totalRestarts

restart last, max, sum Total number of service restarts done by the sentinel since the sentinel was started

sentinel.uptime

second last Time the sentinel has been running

sentinel.running

instance count, last Number of services the sentinel has running currently

Slobrok Metrics

NameUnitSuffixesDescription

slobrok.heartbeats.failed

request count Number of heartbeat requests failed

slobrok.missing.consensus

second count Number of seconds without full consensus with all other brokers

Storage Metrics

NameUnitSuffixesDescription

vds.datastored.alldisks.buckets

bucket average Number of buckets managed

vds.datastored.alldisks.docs

document average Number of documents stored

vds.datastored.alldisks.bytes

byte average Number of bytes stored

vds.visitor.allthreads.averagevisitorlifetime

millisecond count, max, sum Average lifetime of a visitor

vds.visitor.allthreads.averagequeuewait

millisecond count, max, sum Average time an operation spends in input queue.

vds.visitor.allthreads.queuesize

operation count, max, sum Size of input message queue.

vds.visitor.allthreads.completed

operation rate Number of visitors completed

vds.visitor.allthreads.created

operation rate Number of visitors created.

vds.visitor.allthreads.failed

operation rate Number of visitors failed

vds.visitor.allthreads.averagemessagesendtime

millisecond count, max, sum Average time it takes for messages to be sent to their target (and be replied to)

vds.visitor.allthreads.averageprocessingtime

millisecond count, max, sum Average time used to process visitor requests

vds.filestor.queuesize

operation count, max, sum Size of input message queue.

vds.filestor.averagequeuewait

millisecond count, max, sum Average time an operation spends in input queue.

vds.filestor.active_operations.size

operation count, max, sum Number of concurrent active operations

vds.filestor.active_operations.latency

millisecond count, max, sum Latency (in ms) for completed operations

vds.filestor.throttle_window_size

operation count, max, sum Current size of async operation throttler window size

vds.filestor.throttle_waiting_threads

thread count, max, sum Number of threads waiting to acquire a throttle token

vds.filestor.throttle_active_tokens

instance count, max, sum Current number of active throttle tokens

vds.filestor.allthreads.mergemetadatareadlatency

millisecond count, max, sum Time spent in a merge step to check metadata of current node to see what data it has.

vds.filestor.allthreads.mergedatareadlatency

millisecond count, max, sum Time spent in a merge step to read data other nodes need.

vds.filestor.allthreads.mergedatawritelatency

millisecond count, max, sum Time spent in a merge step to write data needed to current node.

vds.filestor.allthreads.merge_put_latency

millisecond count, max, sum Latency of individual puts that are part of merge operations

vds.filestor.allthreads.merge_remove_latency

millisecond count, max, sum Latency of individual removes that are part of merge operations

vds.filestor.allstripes.throttled_rpc_direct_dispatches

instance rate Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy

vds.filestor.allstripes.throttled_persistence_thread_polls

instance rate Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy

vds.filestor.allstripes.timeouts_waiting_for_throttle_token

instance rate Number of times a persistence thread timed out waiting for an available throttle policy token

vds.filestor.allthreads.put.count

operation rate Number of requests processed.

vds.filestor.allthreads.put.failed

operation rate Number of failed requests.

vds.filestor.allthreads.put.test_and_set_failed

operation rate Number of operations that were skipped due to a test-and-set condition not met

vds.filestor.allthreads.put.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.put.request_size

byte count, max, sum Size of requests, in bytes

vds.filestor.allthreads.remove.count

operation rate Number of requests processed.

vds.filestor.allthreads.remove.failed

operation rate Number of failed requests.

vds.filestor.allthreads.remove.test_and_set_failed

operation rate Number of operations that were skipped due to a test-and-set condition not met

vds.filestor.allthreads.remove.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.remove.request_size

byte count, max, sum Size of requests, in bytes

vds.filestor.allthreads.get.count

operation rate Number of requests processed.

vds.filestor.allthreads.get.failed

operation rate Number of failed requests.

vds.filestor.allthreads.get.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.get.request_size

byte count, max, sum Size of requests, in bytes

vds.filestor.allthreads.update.count

request rate Number of requests processed.

vds.filestor.allthreads.update.failed

request rate Number of failed requests.

vds.filestor.allthreads.update.test_and_set_failed

request rate Number of requests that were skipped due to a test-and-set condition not met

vds.filestor.allthreads.update.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.update.request_size

byte count, max, sum Size of requests, in bytes

vds.filestor.allthreads.createiterator.count

request rate Number of requests processed.

vds.filestor.allthreads.createiterator.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.visit.count

request rate Number of requests processed.

vds.filestor.allthreads.visit.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.remove_location.count

request rate Number of requests processed.

vds.filestor.allthreads.remove_location.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.splitbuckets.count

request rate Number of requests processed.

vds.filestor.allthreads.joinbuckets.count

request rate Number of requests processed.

vds.filestor.allthreads.deletebuckets.count

request rate Number of requests processed.

vds.filestor.allthreads.deletebuckets.failed

request rate Number of failed requests.

vds.filestor.allthreads.deletebuckets.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.remove_by_gid.count

request rate Number of requests processed.

vds.filestor.allthreads.remove_by_gid.failed

request rate Number of failed requests.

vds.filestor.allthreads.remove_by_gid.latency

millisecond count, max, sum Latency of successful requests.

vds.filestor.allthreads.setbucketstates.count

request rate Number of requests processed.

vds.mergethrottler.averagequeuewaitingtime

millisecond count, max, sum Time merges spent in the throttler queue

vds.mergethrottler.queuesize

instance count, max, sum Length of merge queue

vds.mergethrottler.active_window_size

instance count, max, sum Number of merges active within the pending window size

vds.mergethrottler.estimated_merge_memory_usage

byte count, max, sum An estimated upper bound of the memory usage (in bytes) of the merges currently in the active window

vds.mergethrottler.bounced_due_to_back_pressure

instance rate Number of merges bounced due to resource exhaustion back-pressure

vds.mergethrottler.locallyexecutedmerges.ok

instance rate The number of successful merges for 'locallyexecutedmerges'

vds.mergethrottler.mergechains.ok

operation rate The number of successful merges for 'mergechains'

vds.mergethrottler.mergechains.failures.busy

operation rate The number of merges that failed because the storage node was busy

vds.mergethrottler.mergechains.failures.total

operation rate Sum of all failures

vds.server.network.tls-handshakes-failed

operation count Number of client or server connection attempts that failed during TLS handshaking

vds.server.network.peer-authorization-failures

failure count Number of TLS connection attempts failed due to bad or missing peer certificate credentials

vds.server.network.client.tls-connections-established

connection count Number of secure mTLS connections established

vds.server.network.server.tls-connections-established

connection count Number of secure mTLS connections established

vds.server.network.client.insecure-connections-established

connection count Number of insecure (plaintext) connections established

vds.server.network.server.insecure-connections-established

connection count Number of insecure (plaintext) connections established

vds.server.network.tls-connections-broken

connection count Number of TLS connections broken due to failures during frame encoding or decoding

vds.server.network.failed-tls-config-reloads

failure count Number of times background reloading of TLS config has failed

vds.server.network.rpc-capability-checks-failed

failure count Number of RPC operations that failed to due one or more missing capabilities

vds.server.network.status-capability-checks-failed

failure count Number of status page operations that failed to due one or more missing capabilities

vds.server.fnet.num-connections

connection count Total number of connection objects