Default Metric Set

This document provides reference documentation for the Default metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.

ClusterController Metrics

NameUnitSuffixesDescription

cluster-controller.down.count

node last, max Number of content nodes down

cluster-controller.maintenance.count

node last, max Number of content nodes in maintenance

cluster-controller.up.count

node last, max Number of content nodes up

cluster-controller.is-master

binary last, max 1 if this cluster controller is currently the master, or 0 if not

cluster-controller.resource_usage.nodes_above_limit

node last, max The number of content nodes above resource limit, blocking feed

cluster-controller.resource_usage.max_memory_utilization

fraction last, max Current memory utilisation, for content node with highest value

cluster-controller.resource_usage.max_disk_utilization

fraction last, max Current disk space utilisation, for content node with highest value

Container Metrics

NameUnitSuffixesDescription

http.status.1xx

response rate Number of responses with a 1xx status

http.status.2xx

response rate Number of responses with a 2xx status

http.status.3xx

response rate Number of responses with a 3xx status

http.status.4xx

response rate Number of responses with a 4xx status

http.status.5xx

response rate Number of responses with a 5xx status

jdisc.gc.ms

millisecond average, max Time spent in JVM garbage collection

jdisc.thread_pool.work_queue.capacity

thread max Capacity of the task queue

jdisc.thread_pool.work_queue.size

thread count, max, min, sum Size of the task queue

jdisc.thread_pool.size

thread max Size of the thread pool

jdisc.thread_pool.active_threads

thread count, max, min, sum Number of threads that are active

jdisc.application.failed_component_graphs

item rate JDISC Application failed component graphs

jdisc.singleton.is_active

item last, max JDISC Singleton is active

jdisc.http.ssl.handshake.failure.missing_client_cert

operation rate JDISC HTTP SSL Handshake failures due to missing client certificate

jdisc.http.ssl.handshake.failure.incompatible_protocols

operation rate JDISC HTTP SSL Handshake failures due to incompatible protocols

jdisc.http.ssl.handshake.failure.incompatible_chifers

operation rate JDISC HTTP SSL Handshake failures due to incompatible chifers

jdisc.http.ssl.handshake.failure.unknown

operation rate JDISC HTTP SSL Handshake failures for unknown reason

mem.heap.free

byte average Free heap memory

athenz-tenant-cert.expiry.seconds

second last, max, min Time remaining until Athenz tenant certificate expires

feed.operations

operation rate Number of document feed operations

feed.latency

millisecond count, sum Feed latency

queries

operation rate Query volume

query_latency

millisecond average, count, max, sum The overall query latency as seen by the container

failed_queries

operation rate The number of failed queries

degraded_queries

operation rate The number of degraded queries, e.g. due to some content nodes not responding in time

hits_per_query

hit_per_query average, count, max, sum The number of hits returned

docproc.documents

document sum Number of processed documents

totalhits_per_query

hit_per_query average, count, max, sum The total number of documents found to match queries

serverActiveThreads

thread average Deprecated. Use jdisc.thread_pool.active_threads instead.

Distributor Metrics

NameUnitSuffixesDescription

vds.distributor.docsstored

document average Number of documents stored in all buckets controlled by this distributor

vds.bouncer.clock_skew_aborts

operation count Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range

NodeAdmin Metrics

NameUnitSuffixesDescription

endpoint.certificate.expiry.seconds

second N/A Time until node endpoint certificate expires

node-certificate.expiry.seconds

second N/A Time until node certificate expires

SearchNode Metrics

NameUnitSuffixesDescription

content.proton.documentdb.documents.total

document last, max The total number of documents in this documents db (ready + not-ready)

content.proton.documentdb.documents.ready

document last, max The number of ready documents in this document db

content.proton.documentdb.documents.active

document last, max The number of active / searchable documents in this document db

content.proton.documentdb.disk_usage

byte last The total disk usage (in bytes) for this document db

content.proton.documentdb.memory_usage.allocated_bytes

byte last The number of allocated bytes

content.proton.search_protocol.query.latency

second average, count, max, sum Query request latency (seconds)

content.proton.search_protocol.docsum.latency

second average, count, max, sum Docsum request latency (seconds)

content.proton.search_protocol.docsum.requested_documents

document rate Total requested document summaries

content.proton.resource_usage.disk

fraction average The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller

content.proton.resource_usage.memory

fraction average The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller

content.proton.resource_usage.feeding_blocked

binary last, max Whether feeding is blocked due to resource limits being reached (value is either 0 or 1)

content.proton.transactionlog.disk_usage

byte last The disk usage (in bytes) of the transaction log

content.proton.documentdb.matching.docs_matched

document rate Number of documents matched

content.proton.documentdb.matching.docs_reranked

document rate Number of documents re-ranked (second phase)

content.proton.documentdb.matching.rank_profile.query_latency

second average, count, max, sum Total average latency (sec) when matching and ranking a query

content.proton.documentdb.matching.rank_profile.query_setup_time

second average, count, max, sum Average time (sec) spent setting up and tearing down queries

content.proton.documentdb.matching.rank_profile.rerank_time

second average, count, max, sum Average time (sec) spent on 2nd phase ranking

Sentinel Metrics

NameUnitSuffixesDescription

sentinel.totalRestarts

restart last, max, sum Total number of service restarts done by the sentinel since the sentinel was started

Storage Metrics

NameUnitSuffixesDescription

vds.filestor.allthreads.put.count

operation rate Number of requests processed.

vds.filestor.allthreads.remove.count

operation rate Number of requests processed.

vds.filestor.allthreads.update.count

request rate Number of requests processed.