Name | Description | Unit | Suffixes |
---|---|---|---|
cluster-controller.down.count | Number of content nodes down | node | last |
cluster-controller.initializing.count | Number of content nodes initializing | node | last |
cluster-controller.maintenance.count | Number of content nodes in maintenance | node | last |
cluster-controller.retired.count | Number of content nodes that are retired | node | last |
cluster-controller.stopping.count | Number of content nodes currently stopping | node | last |
cluster-controller.up.count | Number of content nodes up | node | last |
cluster-controller.cluster-state-change.count | Number of nodes changing state | node | N/A |
cluster-controller.busy-tick-time-ms | Time busy | millisecond | last, max, sum, count |
cluster-controller.idle-tick-time-ms | Time idle | millisecond | last, max, sum, count |
cluster-controller.work-ms | Time used for actual work | millisecond | last, sum, count |
cluster-controller.is-master | 1 if this cluster controller is currently the master, or 0 if not | binary | last |
cluster-controller.remote-task-queue.size | Number of remote tasks queued | operation | last |
cluster-controller.node-event.count | Number of node events | operation | N/A |
cluster-controller.resource_usage.nodes_above_limit | The number of content nodes above resource limit, blocking feed | node | last, max |
cluster-controller.resource_usage.max_memory_utilization | Current memory utilisation, per content node | fraction | last, max |
cluster-controller.resource_usage.max_disk_utilization | Current disk space utilisation, per content node | fraction | last, max |
cluster-controller.resource_usage.memory_limit | Disk space limit as a fraction of available disk space | fraction | last |
cluster-controller.resource_usage.disk_limit | Memory space limit as a fraction of available memory | fraction | last |
reindexing.progress | Re-indexing progress | fraction | last |
Name | Description | Unit | Suffixes |
---|---|---|---|
configserver.requests | Number of requests processed | request | count |
configserver.failedRequests | Number of requests that failed | request | count |
configserver.latency | Time to complete requests | millisecond | max, sum, count |
configserver.cacheConfigElems | Time to complete requests | item | last |
configserver.cacheChecksumElems | Number of checksum elements in the cache | item | last |
configserver.hosts | The number of nodes being served configuration from the config server cluster | node | last |
configserver.delayedResponses | Number of delayed responses | response | count |
configserver.sessionChangeErrors | Number of session change errors | session | count |
configserver.zkZNodes | Number of ZooKeeper nodes present | node | last |
configserver.zkAvgLatency | Average latency for ZooKeeper requests | millisecond | last |
configserver.zkMaxLatency | Max latency for ZooKeeper requests | millisecond | last |
configserver.zkConnections | Number of ZooKeeper connections | connection | last |
configserver.zkOutstandingRequests | Number of ZooKeeper requests in flight | request | last |
Name | Description | Unit | Suffixes |
---|---|---|---|
jrt.transport.tls-certificate-verification-failures | TLS certificate verification failures | failure | N/A |
jrt.transport.peer-authorization-failures | TLS peer authorization failures | failure | N/A |
jrt.transport.server.tls-connections-established | TLS server connections established | connection | N/A |
jrt.transport.client.tls-connections-established | TLS client connections established | connection | N/A |
jrt.transport.server.unencrypted-connections-established | Unencrypted server connections established | connection | N/A |
jrt.transport.client.unencrypted-connections-established | Unencrypted client connections established | connection | N/A |
application_generation | The currently live application config generation (aka session id) | version | N/A |
handled.requests | The number of requests handled per metrics snapshot | operation | count |
handled.latency | The time used for requests during this metrics snapshot | millisecond | sum, count, max |
serverNumOpenConnections | The number of currently open connections | connection | max, last, average |
serverNumConnections | The total number of connections opened | connection | max, last, average |
serverBytesReceived | The number of bytes received by the server | byte | sum, count |
serverBytesSent | The number of bytes sent from the server | byte | sum, count |
jdisc.thread_pool.unhandled_exceptions | Number of exceptions thrown by tasks | thread | sum, count, last, min, max |
jdisc.thread_pool.work_queue.capacity | Capacity of the task queue | thread | sum, count, last, min, max |
jdisc.thread_pool.work_queue.size | Size of the task queue | thread | sum, count, last, min, max |
jdisc.thread_pool.rejected_tasks | Number of tasks rejected by the thread pool | thread | sum, count, last, min, max |
jdisc.thread_pool.size | Size of the thread pool | thread | sum, count, last, min, max |
jdisc.thread_pool.max_allowed_size | The maximum allowed number of threads in the pool | thread | sum, count, last, min, max |
jdisc.thread_pool.active_threads | Number of threads that are active | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.thread.max | Configured maximum number of threads | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.thread.min | Configured minimum number of threads | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.thread.reserved | Configured number of reserved threads or -1 for heuristic | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.thread.busy | Number of threads executing internal and transient jobs | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.thread.total | Current number of threads | thread | sum, count, last, min, max |
jdisc.http.jetty.threadpool.queue.size | Current size of the job queue | thread | sum, count, last, min, max |
httpapi_latency | Duration for requests to the HTTP document APIs | millisecond | max, sum, count |
httpapi_pending | Document operations pending execution | operation | max, sum, count |
httpapi_num_operations | Total number of document operations performed | operation | rate |
httpapi_num_updates | Document update operations performed | operation | rate |
httpapi_num_removes | Document remove operations performed | operation | rate |
httpapi_num_puts | Document put operations performed | operation | rate |
httpapi_succeeded | Document operations that succeeded | operation | rate |
httpapi_failed | Document operations that failed | operation | rate |
httpapi_parse_error | Document operations that failed due to document parse errors | operation | rate |
httpapi_condition_not_met | Document operations not applied due to condition not met | operation | rate |
httpapi_not_found | Document operations not applied due to document not found | operation | rate |
httpapi_failed_unknown | Document operations failed by unknown cause | operation | rate |
httpapi_failed_insufficient_storage | Document operations failed by insufficient storage | operation | rate |
httpapi_failed_timeout | Document operations failed by timeout | operation | rate |
mem.heap.total | Total available heap memory | byte | average |
mem.heap.free | Free heap memory | byte | average |
mem.heap.used | Currently used heap memory | byte | average, max |
mem.direct.total | Total available direct memory | byte | average |
mem.direct.free | Currently free direct memory | byte | average |
mem.direct.used | Direct memory currently used | byte | average, max |
mem.direct.count | Number of direct memory allocations | byte | max |
mem.native.total | Total available native memory | byte | average |
mem.native.free | Currently free native memory | byte | average |
mem.native.used | Native memory currently used | byte | average |
jdisc.memory_mappings | JDISC Memory mappings | operation | max |
jdisc.open_file_descriptors | JDISC Open file descriptors | item | max |
jdisc.gc.count | Number of JVM garbage collections done | operation | average, max, last |
jdisc.gc.ms | Time spent in JVM garbage collection | millisecond | average, max, last |
jdisc.deactivated_containers.total | JDISC Deactivated container instances | item | last |
jdisc.deactivated_containers.with_retained_refs.last | JDISC Deactivated container nodes with retained refs | item | last |
jdisc.singleton.is_active | JDISC Singleton is active | item | last |
jdisc.singleton.activation.count | JDISC Singleton activations | operation | last |
jdisc.singleton.activation.failure.count | JDISC Singleton activation failures | operation | last |
jdisc.singleton.activation.millis | JDISC Singleton activation time | millisecond | last |
jdisc.singleton.deactivation.count | JDISC Singleton deactivations | operation | last |
jdisc.singleton.deactivation.failure.count | JDISC Singleton deactivation failures | operation | last |
jdisc.singleton.deactivation.millis | JDISC Singleton deactivation time | millisecond | last |
athenz-tenant-cert.expiry.seconds | Time remaining until Athenz tenant certificate expires | second | last |
container-iam-role.expiry.seconds | Time remaining until IAM role expires | second | N/A |
http.status.1xx | Number of responses with a 1xx status | response | rate |
http.status.2xx | Number of responses with a 2xx status | response | rate |
http.status.3xx | Number of responses with a 3xx status | response | rate |
http.status.4xx | Number of responses with a 4xx status | response | rate |
http.status.5xx | Number of responses with a 5xx status | response | rate |
jdisc.http.request.prematurely_closed | HTTP requests prematurely closed | request | rate |
jdisc.http.request.requests_per_connection | HTTP requests per connection | request | sum, count, min, max, average |
jdisc.http.request.uri_length | HTTP URI length | byte | sum, count, max |
jdisc.http.request.content_size | HTTP request content size | byte | sum, count, max |
jdisc.http.requests | HTTP requests | request | rate, count |
jdisc.http.ssl.handshake.failure.missing_client_cert | JDISC HTTP SSL Handshake failures due to missing client certificate | operation | rate |
jdisc.http.ssl.handshake.failure.expired_client_cert | JDISC HTTP SSL Handshake failures due to expired client certificate | operation | rate |
jdisc.http.ssl.handshake.failure.invalid_client_cert | JDISC HTTP SSL Handshake failures due to invalid client certificate | operation | rate |
jdisc.http.ssl.handshake.failure.incompatible_protocols | JDISC HTTP SSL Handshake failures due to inincompatible protocols | operation | rate |
jdisc.http.ssl.handshake.failure.incompatible_chifers | JDISC HTTP SSL Handshake failures due to incompatible chifers | operation | rate |
jdisc.http.ssl.handshake.failure.connection_closed | JDISC HTTP SSL Handshake failures due to connection closed | operation | rate |
jdisc.http.ssl.handshake.failure.unknown | JDISC HTTP SSL Handshake failures for unknown reason | operation | rate |
jdisc.http.filter.rule.blocked_requests | Number of requests blocked by filter | request | rate |
jdisc.http.filter.rule.allowed_requests | Number of requests allowed by filter | request | rate |
jdisc.http.filtering.request.handled | Number of filtering requests handled | request | rate |
jdisc.http.filtering.request.unhandled | Number of filtering requests unhandled | request | rate |
jdisc.http.filtering.response.handled | Number of filtering responses handled | request | rate |
jdisc.http.filtering.response.unhandled | Number of filtering responses unhandled | request | rate |
jdisc.http.handler.unhandled_exceptions | Number of unhandled exceptions in handler | request | rate |
jdisc.application.failed_component_graphs | JDISC Application failed component graphs | item | rate |
jdisc.jvm | JVM runtime version | version | last |
serverRejectedRequests | Deprecated. Use jdisc.thread_pool.rejected_tasks instead. | operation | rate, count |
serverThreadPoolSize | Deprecated. Use jdisc.thread_pool.size instead. | thread | max, last |
serverActiveThreads | Deprecated. Use jdisc.thread_pool.active_threads instead. | thread | min, max, sum, count, last |
jdisc.tls.capability_checks.succeeded | Number of TLS capability checks succeeded | operation | rate |
jdisc.tls.capability_checks.failed | Number of TLS capability checks failed | operation | rate |
peak_qps | The highest number of qps for a second for this metrics shapshot | query/second | max |
search_connections | Number of search connections | connection | sum, count, max |
feed.latency | Feed latency | millisecond | sum, count, max |
feed.http-requests | Feed HTTP requests | operation | count, rate |
queries | Query volume | operation | rate |
query_container_latency | The query execution time consumed in the container | millisecond | sum, count, max |
query_latency | The overall query latency as seen by the container | millisecond | sum, count, max, 95percentile, 99percentile |
query_timeout | The amount of time allowed for query execytion, from the client | millisecond | sum, count, max, min, 95percentile, 99percentile |
failed_queries | The number of failed queries | operation | rate |
degraded_queries | The number of degraded queries, e.g. due to some conent nodes not responding in time | operation | rate |
hits_per_query | The number of hits returned | hit/query | sum, count, max, 95percentile, 99percentile |
search_connections | Number of search connections | connection | sum, count, max |
query_hit_offset | The offset for hits returned | hit | sum, count, max |
documents_covered | The combined number of documents considered during query evaluation | document | count |
documents_total | The number of documents to be evaluated if all requests had been fully executed | document | count |
documents_target_total | The target number of total documents to be evaluated when when all data is in sync | document | count |
jdisc.render.latency | The time used by the container to render responses | nanosecond | min, max, count, sum, last, average |
query_item_count | The number of query items (terms, phrases, etc) | item | max, sum, count |
totalhits_per_query | The total number of documents found to match queries | hit/query | sum, count, max, 95percentile, 99percentile |
empty_results | Number of queries matching no documents | operation | rate |
requestsOverQuota | The number of requests rejected due to exceeding quota | operation | rate, count |
docproc.proctime | Time spent processing document | millisecond | sum, count, max |
docproc.documents | Number of processed documents | document | sum, count, max, min |
relevance.at_1 | The relevance of hit number 1 | score | sum, count |
relevance.at_3 | The relevance of hit number 3 | score | sum, count |
relevance.at_10 | The relevance of hit number 10 | score | sum, count |
error.timeout | Requests that timed out | operation | rate |
error.backends_oos | Requests that failed due to no available backends nodes | operation | rate |
error.plugin_failure | Requests that failed due to plugin failure | operation | rate |
error.backend_communication_error | Requests that failed due to backend communication error | operation | rate |
error.empty_document_summaries | Requests that failed due to missing document summaries | operation | rate |
error.invalid_query_parameter | Requests that failed due to invalid query parameters | operation | rate |
error.internal_server_error | Requests that failed due to internal server error | operation | rate |
error.misconfigured_server | Requests that failed due to misconfigured server | operation | rate |
error.invalid_query_transformation | Requests that failed due to invalid query transformation | operation | rate |
error.results_with_errors | The number of queries with error payload | operation | rate |
error.unspecified | Requests that failed for an unspecified reason | operation | rate |
error.unhandled_exception | Requests that failed due to an unhandled exception | operation | rate |
Name | Description | Unit | Suffixes |
---|---|---|---|
vds.idealstate.buckets_rechecking | The number of buckets that we are rechecking for ideal state operations | bucket | average |
vds.idealstate.idealstate_diff | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state | bucket | average |
vds.idealstate.buckets_toofewcopies | The number of buckets the distributor controls that have less than the desired redundancy | bucket | average |
vds.idealstate.buckets_toomanycopies | The number of buckets the distributor controls that have more than the desired redundancy | bucket | average |
vds.idealstate.buckets | The number of buckets the distributor controls | bucket | average |
vds.idealstate.buckets_notrusted | The number of buckets that have no trusted copies. | bucket | average |
vds.idealstate.bucket_replicas_moving_out | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. | bucket | average |
vds.idealstate.bucket_replicas_copying_out | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge | bucket | average |
vds.idealstate.bucket_replicas_copying_in | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for | bucket | average |
vds.idealstate.bucket_replicas_syncing | Bucket replicas that need syncing due to mismatching metadata | bucket | average |
vds.idealstate.max_observed_time_since_last_gc_sec | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. | second | average |
vds.idealstate.delete_bucket.done_ok | The number of operations successfully performed | operation | rate |
vds.idealstate.delete_bucket.done_failed | The number of operations that failed | operation | rate |
vds.idealstate.delete_bucket.pending | The number of operations pending | operation | average |
vds.idealstate.merge_bucket.done_ok | The number of operations successfully performed | operation | rate |
vds.idealstate.merge_bucket.done_failed | The number of operations that failed | operation | rate |
vds.idealstate.merge_bucket.pending | The number of operations pending | operation | average |
vds.idealstate.merge_bucket.blocked | The number of operations blocked by blocking operation starter | operation | rate |
vds.idealstate.merge_bucket.throttled | The number of operations throttled by throttling operation starter | operation | rate |
vds.idealstate.merge_bucket.source_only_copy_changed | The number of merge operations where source-only copy changed | operation | rate |
vds.idealstate.merge_bucket.source_only_copy_delete_blocked | The number of merge operations where delete of unchanged source-only copies was blocked | operation | rate |
vds.idealstate.merge_bucket.source_only_copy_delete_failed | The number of merge operations where delete of unchanged source-only copies failed | operation | rate |
vds.idealstate.split_bucket.done_ok | The number of operations successfully performed | operation | rate |
vds.idealstate.split_bucket.done_failed | The number of operations that failed | operation | rate |
vds.idealstate.split_bucket.pending | The number of operations pending | operation | average |
vds.idealstate.join_bucket.done_ok | The number of operations successfully performed | operation | rate |
vds.idealstate.join_bucket.done_failed | The number of operations that failed | operation | rate |
vds.idealstate.join_bucket.pending | The number of operations pending | operation | average |
vds.idealstate.garbage_collection.done_ok | The number of operations successfully performed | operation | rate |
vds.idealstate.garbage_collection.done_failed | The number of operations that failed | operation | rate |
vds.idealstate.garbage_collection.pending | The number of operations pending | operation | average |
vds.idealstate.garbage_collection.documents_removed | Number of documents removed by GC operations | document | count, rate |
vds.distributor.puts.latency | The latency of put operations | millisecond | max, sum, count |
vds.distributor.puts.ok | The number of successful put operations performed | operation | rate |
vds.distributor.puts.failures.total | Sum of all failures | operation | rate |
vds.distributor.puts.failures.notfound | The number of operations that failed because the document did not exist | operation | rate |
vds.distributor.puts.failures.test_and_set_failed | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation | rate |
vds.distributor.puts.failures.concurrent_mutations | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation | rate |
vds.distributor.puts.failures.notconnected | The number of operations discarded because there were no available storage nodes to send to | operation | rate |
vds.distributor.puts.failures.notready | The number of operations discarded because distributor was not ready | operation | rate |
vds.distributor.puts.failures.wrongdistributor | The number of operations discarded because they were sent to the wrong distributor | operation | rate |
vds.distributor.puts.failures.safe_time_not_reached | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation | rate |
vds.distributor.puts.failures.storagefailure | The number of operations that failed in storage | operation | rate |
vds.distributor.puts.failures.timeout | The number of operations that failed because the operation timed out towards storage | operation | rate |
vds.distributor.puts.failures.busy | The number of messages from storage that failed because the storage node was busy | operation | rate |
vds.distributor.puts.failures.inconsistent_bucket | The number of operations failed due to buckets being in an inconsistent state or not found | operation | rate |
vds.distributor.removes.latency | The latency of remove operations | millisecond | max, sum, count |
vds.distributor.removes.ok | The number of successful removes operations performed | operation | rate |
vds.distributor.removes.failures.total | Sum of all failures | operation | rate |
vds.distributor.removes.failures.notfound | The number of operations that failed because the document did not exist | operation | rate |
vds.distributor.removes.failures.test_and_set_failed | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation | rate |
vds.distributor.removes.failures.concurrent_mutations | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation | rate |
vds.distributor.updates.latency | The latency of update operations | millisecond | max, sum, count |
vds.distributor.updates.ok | The number of successful updates operations performed | operation | rate |
vds.distributor.updates.failures.total | Sum of all failures | operation | rate |
vds.distributor.updates.failures.notfound | The number of operations that failed because the document did not exist | operation | rate |
vds.distributor.updates.failures.test_and_set_failed | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation | rate |
vds.distributor.updates.failures.concurrent_mutations | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation | rate |
vds.distributor.updates.diverging_timestamp_updates | Number of updates that report they were performed against divergent version timestamps on different replicas | operation | rate |
vds.distributor.removelocations.ok | The number of successful removelocations operations performed | operation | rate |
vds.distributor.removelocations.failures.total | Sum of all failures | operation | rate |
vds.distributor.gets.latency | The average latency of gets operations | millisecond | max, sum, count |
vds.distributor.gets.ok | The number of successful gets operations performed | operation | rate |
vds.distributor.gets.failures.total | Sum of all failures | operation | rate |
vds.distributor.gets.failures.notfound | The number of operations that failed because the document did not exist | operation | rate |
vds.distributor.visitor.latency | The average latency of visitor operations | millisecond | max, sum, count |
vds.distributor.visitor.ok | The number of successful visitor operations performed | operation | rate |
vds.distributor.visitor.failures.total | Sum of all failures | operation | rate |
vds.distributor.visitor.failures.notready | The number of operations discarded because distributor was not ready | operation | rate |
vds.distributor.visitor.failures.notconnected | The number of operations discarded because there were no available storage nodes to send to | operation | rate |
vds.distributor.visitor.failures.wrongdistributor | The number of operations discarded because they were sent to the wrong distributor | operation | rate |
vds.distributor.visitor.failures.safe_time_not_reached | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation | rate |
vds.distributor.visitor.failures.storagefailure | The number of operations that failed in storage | operation | rate |
vds.distributor.visitor.failures.timeout | The number of operations that failed because the operation timed out towards storage | operation | rate |
vds.distributor.visitor.failures.busy | The number of messages from storage that failed because the storage node was busy | operation | rate |
vds.distributor.visitor.failures.inconsistent_bucket | The number of operations failed due to buckets being in an inconsistent state or not found | operation | rate |
vds.distributor.visitor.failures.notfound | The number of operations that failed because the document did not exist | operation | rate |
vds.distributor.docsstored | Number of documents stored in all buckets controlled by this distributor | document | average |
vds.distributor.bytesstored | Number of bytes stored in all buckets controlled by this distributor | byte | average |
vds.bouncer.clock_skew_aborts | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | operation | count |
Name | Description | Unit | Suffixes |
---|---|---|---|
logd.processed.lines | Number of log lines processed | item | count |
Name | Description | Unit | Suffixes |
---|---|---|---|
endpoint.certificate.expiry.seconds | Time until node endpoint certificate expires | second | N/A |
node-certificate.expiry.seconds | Time until node certificate expires | second | N/A |
Name | Description | Unit | Suffixes |
---|---|---|---|
content.proton.config.generation | The oldest config generation used by this search node | version | last |
content.proton.documentdb.documents.total | The total number of documents in this documents db (ready + not-ready) | document | last |
content.proton.documentdb.documents.ready | The number of ready documents in this document db | document | last |
content.proton.documentdb.documents.active | The number of active / searchable documents in this document db | document | last |
content.proton.documentdb.documents.removed | The number of removed documents in this document db | document | last |
content.proton.documentdb.index.docs_in_memory | Number of documents in memory index | document | last |
content.proton.documentdb.disk_usage | The total disk usage (in bytes) for this document db | byte | last |
content.proton.documentdb.memory_usage.allocated_bytes | The number of allocated bytes | byte | max |
content.proton.documentdb.heart_beat_age | How long ago (in seconds) heart beat maintenace job was run | second | last |
content.proton.docsum.docs | Total docsums returned | document | rate |
content.proton.docsum.latency | Docsum request latency | millisecond | max, sum, count |
content.proton.search_protocol.query.latency | Query request latency (seconds) | second | max, sum, count |
content.proton.search_protocol.query.request_size | Query request size (network bytes) | byte | max, sum, count |
content.proton.search_protocol.query.reply_size | Query reply size (network bytes) | byte | max, sum, count |
content.proton.search_protocol.docsum.latency | Docsum request latency (seconds) | second | max, sum, average |
content.proton.search_protocol.docsum.request_size | Docsum request size (network bytes) | byte | max, sum, count |
content.proton.search_protocol.docsum.reply_size | Docsum reply size (network bytes) | byte | max, sum, count |
content.proton.search_protocol.docsum.requested_documents | Total requested document summaries | document | count |
content.proton.executor.proton.queuesize | Size of executor proton task queue | task | max, sum, count |
content.proton.executor.proton.accepted | Number of executor proton accepted tasks | task | rate |
content.proton.executor.proton.wakeups | Number of times a executor proton worker thread has been woken up | wakeup | rate |
content.proton.executor.proton.utilization | Ratio of time the executor proton worker threads has been active | fraction | max, sum, count |
content.proton.executor.flush.queuesize | Size of executor flush task queue | task | max, sum, count |
content.proton.executor.flush.accepted | Number of accepted executor flush tasks | task | rate |
content.proton.executor.flush.wakeups | Number of times a executor flush worker thread has been woken up | wakeup | rate |
content.proton.executor.flush.utilization | Ratio of time the executor flush worker threads has been active | fraction | max, sum, count |
content.proton.executor.match.queuesize | Size of executor match task queue | task | max, sum, count |
content.proton.executor.match.accepted | Number of accepted executor match tasks | task | rate |
content.proton.executor.match.wakeups | Number of times a executor match worker thread has been woken up | wakeup | rate |
content.proton.executor.match.utilization | Ratio of time the executor match worker threads has been active | fraction | max, sum, count |
content.proton.executor.docsum.queuesize | Size of executor docsum task queue | task | max, sum, count |
content.proton.executor.docsum.accepted | Number of executor accepted docsum tasks | task | rate |
content.proton.executor.docsum.wakeups | Number of times a executor docsum worker thread has been woken up | wakeup | rate |
content.proton.executor.docsum.utilization | Ratio of time the executor docsum worker threads has been active | fraction | max, sum, count |
content.proton.executor.shared.queuesize | Size of executor shared task queue | task | max, sum, count |
content.proton.executor.shared.accepted | Number of executor shared accepted tasks | task | rate |
content.proton.executor.shared.wakeups | Number of times a executor shared worker thread has been woken up | wakeup | rate |
content.proton.executor.shared.utilization | Ratio of time the executor shared worker threads has been active | fraction | max, sum, count |
content.proton.executor.warmup.queuesize | Size of executor warmup task queue | task | max, sum, count |
content.proton.executor.warmup.accepted | Number of accepted executor warmup tasks | task | rate |
content.proton.executor.warmup.wakeups | Number of times a warmup executor worker thread has been woken up | wakeup | rate |
content.proton.executor.warmup.utilization | Ratio of time the executor warmup worker threads has been active | fraction | max, sum, count |
content.proton.executor.field_writer.queuesize | Size of executor field writer task queue | task | max, sum, count |
content.proton.executor.field_writer.accepted | Number of accepted executor field writer tasks | task | rate |
content.proton.executor.field_writer.wakeups | Number of times a executor field writer worker thread has been woken up | wakeup | rate |
content.proton.executor.field_writer.utilization | Ratio of time the executor fieldwriter worker threads has been active | fraction | max, sum, count |
content.proton.documentdb.job.total | The job load average total of all job metrics | fraction | average |
content.proton.documentdb.job.attribute_flush | Flushing of attribute vector(s) to disk | fraction | average |
content.proton.documentdb.job.memory_index_flush | Flushing of memory index to disk | fraction | average |
content.proton.documentdb.job.disk_index_fusion | Fusion of disk indexes | fraction | average |
content.proton.documentdb.job.document_store_flush | Flushing of document store to disk | fraction | average |
content.proton.documentdb.job.document_store_compact | Compaction of document store on disk | fraction | average |
content.proton.documentdb.job.bucket_move | Moving of buckets between 'ready' and 'notready' sub databases | fraction | average |
content.proton.documentdb.job.lid_space_compact | Compaction of lid space in document meta store and attribute vectors | fraction | average |
content.proton.documentdb.job.removed_documents_prune | Pruning of removed documents in 'removed' sub database | fraction | average |
content.proton.documentdb.threading_service.master.queuesize | Size of threading service master task queue | task | max, sum, count |
content.proton.documentdb.threading_service.master.accepted | Number of accepted threading service master tasks | task | rate |
content.proton.documentdb.threading_service.master.wakeups | Number of times a threading service master worker thread has been woken up | wakeup | rate |
content.proton.documentdb.threading_service.master.utilization | Ratio of time the threading service master worker threads has been active | fraction | max, sum, count |
content.proton.documentdb.threading_service.index.queuesize | Size of threading service index task queue | task | max, sum, count |
content.proton.documentdb.threading_service.index.accepted | Number of accepted threading service index tasks | task | rate |
content.proton.documentdb.threading_service.index.wakeups | Number of times a threading service index worker thread has been woken up | wakeup | rate |
content.proton.documentdb.threading_service.index.utilization | Ratio of time the threading service index worker threads has been active | fraction | max, sum, count |
content.proton.documentdb.threading_service.summary.queuesize | Size of threading service summary task queue | task | max, sum, count |
content.proton.documentdb.threading_service.summary.accepted | Number of accepted threading service summary tasks | task | rate |
content.proton.documentdb.threading_service.summary.wakeups | Number of times a threading service summary worker thread has been woken up | wakeup | rate |
content.proton.documentdb.threading_service.summary.utilization | Ratio of time the threading service summary worker threads has been active | fraction | max, sum, count |
content.proton.documentdb.ready.lid_space.lid_bloat_factor | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) | fraction | average |
content.proton.documentdb.ready.lid_space.lid_fragmentation_factor | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) | fraction | average |
content.proton.documentdb.ready.lid_space.lid_limit | The size of the allocated lid space | documentid | last |
content.proton.documentdb.ready.lid_space.highest_used_lid | The highest used lid | documentid | last |
content.proton.documentdb.ready.lid_space.used_lids | The number of lids used | documentid | last |
content.proton.documentdb.notready.lid_space.lid_bloat_factor | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) | fraction | average |
content.proton.documentdb.notready.lid_space.lid_fragmentation_factor | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) | fraction | average |
content.proton.documentdb.notready.lid_space.lid_limit | The size of the allocated lid space | documentid | last |
content.proton.documentdb.notready.lid_space.highest_used_lid | The highest used lid | documentid | last |
content.proton.documentdb.notready.lid_space.used_lids | The number of lids used | documentid | last |
content.proton.documentdb.removed.lid_space.lid_bloat_factor | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) | fraction | average |
content.proton.documentdb.removed.lid_space.lid_fragmentation_factor | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) | fraction | average |
content.proton.documentdb.removed.lid_space.lid_limit | The size of the allocated lid space | documentid | last |
content.proton.documentdb.removed.lid_space.highest_used_lid | The highest used lid | documentid | last |
content.proton.documentdb.removed.lid_space.used_lids | The number of lids used | documentid | last |
content.proton.documentdb.bucket_move.buckets_pending | The number of buckets left to move | bucket | last |
content.proton.resource_usage.disk | The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | fraction | average |
content.proton.resource_usage.disk_usage.total | The total relative amount of disk used by this content node (value in the range [0, 1]) | fraction | max |
content.proton.resource_usage.disk_usage.total_utilization | The relative amount of disk used compared to the content node disk resource limit | fraction | max |
content.proton.resource_usage.disk_usage.transient | The relative amount of transient disk used by this content node (value in the range [0, 1]) | fraction | max |
content.proton.resource_usage.memory | The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller | fraction | average |
content.proton.resource_usage.memory_usage.total | The total relative amount of memory used by this content node (value in the range [0, 1]) | fraction | max |
content.proton.resource_usage.memory_usage.total_utilization | The relative amount of memory used compared to the content node memory resource limit | fraction | max |
content.proton.resource_usage.memory_usage.transient | The relative amount of transient memory used by this content node (value in the range [0, 1]) | fraction | max |
content.proton.resource_usage.memory_mappings | The number of mapped memory areas | area | max |
content.proton.resource_usage.open_file_descriptors | The number of open files | file | max |
content.proton.resource_usage.feeding_blocked | Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) | binary | max |
content.proton.resource_usage.malloc_arena | Size of malloc arena | byte | max |
content.proton.documentdb.attribute.resource_usage.address_space | The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1]) | fraction | max |
content.proton.documentdb.attribute.resource_usage.feeding_blocked | Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1) | binary | max |
content.proton.resource_usage.cpu_util.setup | cpu used by system init and (re-)configuration | fraction | max, sum, count |
content.proton.resource_usage.cpu_util.read | cpu used by reading data from the system | fraction | max, sum, count |
content.proton.resource_usage.cpu_util.write | cpu used by writing data to the system | fraction | max, sum, count |
content.proton.resource_usage.cpu_util.compact | cpu used by internal data re-structuring | fraction | max, sum, count |
content.proton.resource_usage.cpu_util.other | cpu used by work not classified as a specific category | fraction | max, sum, count |
content.proton.transactionlog.entries | The current number of entries in the transaction log | record | average |
content.proton.transactionlog.disk_usage | The disk usage (in bytes) of the transaction log | byte | average |
content.proton.transactionlog.replay_time | The replay time (in seconds) of the transaction log during start-up | second | last |
content.proton.documentdb.ready.document_store.disk_usage | Disk space usage in bytes | byte | average |
content.proton.documentdb.ready.document_store.disk_bloat | Disk space bloat in bytes | byte | average |
content.proton.documentdb.ready.document_store.max_bucket_spread | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) | fraction | average |
content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.ready.document_store.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.notready.document_store.disk_usage | Disk space usage in bytes | byte | average |
content.proton.documentdb.notready.document_store.disk_bloat | Disk space bloat in bytes | byte | average |
content.proton.documentdb.notready.document_store.max_bucket_spread | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) | fraction | average |
content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.notready.document_store.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.notready.document_store.memory_usage.dead_bytes | The number of dead bytes (<= used_bytes) | byte | average |
content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.removed.document_store.disk_usage | Disk space usage in bytes | byte | average |
content.proton.documentdb.removed.document_store.disk_bloat | Disk space bloat in bytes | byte | average |
content.proton.documentdb.removed.document_store.max_bucket_spread | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) | fraction | average |
content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.removed.document_store.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.removed.document_store.memory_usage.dead_bytes | The number of dead bytes (<= used_bytes) | byte | average |
content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.ready.document_store.cache.memory_usage | Memory usage of the cache (in bytes) | byte | average |
content.proton.documentdb.ready.document_store.cache.hit_rate | Rate of hits in the cache compared to number of lookups | fraction | average |
content.proton.documentdb.ready.document_store.cache.lookups | Number of lookups in the cache (hits + misses) | operation | rate |
content.proton.documentdb.ready.document_store.cache.invalidations | Number of invalidations (erased elements) in the cache. | operation | rate |
content.proton.documentdb.notready.document_store.cache.memory_usage | Memory usage of the cache (in bytes) | byte | average |
content.proton.documentdb.notready.document_store.cache.hit_rate | Rate of hits in the cache compared to number of lookups | fraction | average |
content.proton.documentdb.notready.document_store.cache.lookups | Number of lookups in the cache (hits + misses) | operation | rate |
content.proton.documentdb.notready.document_store.cache.invalidations | Number of invalidations (erased elements) in the cache. | operation | rate |
content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.ready.attribute.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.ready.attribute.memory_usage.dead_bytes | The number of dead bytes (<= used_bytes) | byte | average |
content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.notready.attribute.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.notready.attribute.memory_usage.dead_bytes | The number of dead bytes (<= used_bytes) | byte | average |
content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.index.memory_usage.allocated_bytes | The number of allocated bytes | byte | average |
content.proton.documentdb.index.memory_usage.used_bytes | The number of used bytes (<= allocated_bytes) | byte | average |
content.proton.documentdb.index.memory_usage.dead_bytes | The number of dead bytes (<= used_bytes) | byte | average |
content.proton.documentdb.index.memory_usage.onhold_bytes | The number of bytes on hold | byte | average |
content.proton.documentdb.matching.queries | Number of queries executed | query | rate |
content.proton.documentdb.matching.soft_doomed_queries | Number of queries hitting the soft timeout | query | rate |
content.proton.documentdb.matching.query_latency | Total average latency (sec) when matching and ranking a query | second | max, sum, count |
content.proton.documentdb.matching.query_setup_time | Average time (sec) spent setting up and tearing down queries | second | max, sum, count |
content.proton.documentdb.matching.docs_matched | Number of documents matched | document | rate, count |
content.proton.documentdb.matching.rank_profile.queries | Number of queries executed | query | rate |
content.proton.documentdb.matching.rank_profile.soft_doomed_queries | Number of queries hitting the soft timeout | query | rate |
content.proton.documentdb.matching.rank_profile.soft_doom_factor | Factor used to compute soft-timeout | fraction | min, max, sum, count |
content.proton.documentdb.matching.rank_profile.query_latency | Total average latency (sec) when matching and ranking a query | second | max, sum, count |
content.proton.documentdb.matching.rank_profile.query_setup_time | Average time (sec) spent setting up and tearing down queries | second | max, sum, count |
content.proton.documentdb.matching.rank_profile.grouping_time | Average time (sec) spent on grouping | second | max, sum, count |
content.proton.documentdb.matching.rank_profile.rerank_time | Average time (sec) spent on 2nd phase ranking | second | max, sum, count |
content.proton.documentdb.matching.rank_profile.docs_matched | Number of documents matched | document | rate, count |
content.proton.documentdb.matching.rank_profile.limited_queries | Number of queries limited in match phase | query | rate |
content.proton.documentdb.feeding.commit.operations | Number of operations included in a commit | operation | max, sum, count, rate |
content.proton.documentdb.feeding.commit.latency | Latency for commit in seconds | second | max, sum, count |
Name | Description | Unit | Suffixes |
---|---|---|---|
sentinel.restarts | Number of service restarts done by the sentinel | restart | count |
sentinel.totalRestarts | Total number of service restarts done by the sentinel since the sentinel was started | restart | last |
sentinel.uptime | Time the sentinel has been running | second | last |
sentinel.running | Number of services the sentinel has running currently | instance | count, last |
Name | Description | Unit | Suffixes |
---|---|---|---|
slobrok.heartbeats.failed | Number of heartbeat requests failed | request | count |
slobrok.missing.consensus | Number of seconds without full consensus with all other brokers | second | count |
Name | Description | Unit | Suffixes |
---|---|---|---|
vds.server.network.tls-handshakes-failed | Number of client or server connection attempts that failed during TLS handshaking | operation | count |
vds.server.network.peer-authorization-failures | Number of TLS connection attempts failed due to bad or missing peer certificate credentials | failure | count |
vds.server.network.client.tls-connections-established | Number of secure mTLS connections established | connection | count |
vds.server.network.server.tls-connections-established | Number of secure mTLS connections established | connection | count |
vds.server.network.client.insecure-connections-established | Number of insecure (plaintext) connections established | connection | count |
vds.server.network.server.insecure-connections-established | Number of insecure (plaintext) connections established | connection | count |
vds.server.network.tls-connections-broken | Number of TLS connections broken due to failures during frame encoding or decoding | connection | count |
vds.server.network.failed-tls-config-reloads | Number of times background reloading of TLS config has failed | failure | count |
vds.server.network.rpc-capability-checks-failed | Number of RPC operations that failed to due one or more missing capabilities | failure | count |
vds.server.network.status-capability-checks-failed | Number of status page operations that failed to due one or more missing capabilities | failure | count |
vds.server.fnet.num-connections | Total number of connection objects | connection | count |
vds.datastored.alldisks.buckets | Number of buckets managed | bucket | average |
vds.datastored.alldisks.docs | Number of documents stored | document | average |
vds.datastored.alldisks.bytes | Number of bytes stored | byte | average |
vds.visitor.allthreads.averagevisitorlifetime | Average lifetime of a visitor | millisecond | max, sum, count |
vds.visitor.allthreads.averagequeuewait | Average time an operation spends in input queue. | millisecond | max, sum, count |
vds.visitor.allthreads.queuesize | Size of input message queue. | operation | max, sum, count |
vds.visitor.allthreads.completed | Number of visitors completed | operation | rate |
vds.visitor.allthreads.created | Number of visitors created. | operation | rate |
vds.visitor.allthreads.failed | Number of visitors failed | operation | rate |
vds.visitor.allthreads.averagemessagesendtime | Average time it takes for messages to be sent to their target (and be replied to) | millisecond | max, sum, count |
vds.visitor.allthreads.averageprocessingtime | Average time used to process visitor requests | millisecond | max, sum, count |
vds.filestor.queuesize | Size of input message queue. | operation | max, sum, count |
vds.filestor.averagequeuewait | Average time an operation spends in input queue. | millisecond | max, sum, count |
vds.filestor.active_operations.size | Number of concurrent active operations | operation | max, sum, count |
vds.filestor.active_operations.latency | Latency (in ms) for completed operations | millisecond | max, sum, count |
vds.filestor.throttle_window_size | Current size of async operation throttler window size | operation | max, sum, count |
vds.filestor.throttle_waiting_threads | Number of threads waiting to acquire a throttle token | thread | max, sum, count |
vds.filestor.throttle_active_tokens | Current number of active throttle tokens | instance | max, sum, count |
vds.filestor.allthreads.mergemetadatareadlatency | Time spent in a merge step to check metadata of current node to see what data it has. | millisecond | max, sum, count |
vds.filestor.allthreads.mergedatareadlatency | Time spent in a merge step to read data other nodes need. | millisecond | max, sum, count |
vds.filestor.allthreads.mergedatawritelatency | Time spent in a merge step to write data needed to current node. | millisecond | max, sum, count |
vds.filestor.allthreads.put_latency | Latency of individual puts that are part of merge operations | millisecond | max, sum, count |
vds.filestor.allthreads.remove_latency | Latency of individual removes that are part of merge operations | millisecond | max, sum, count |
vds.filestor.allstripes.throttled_rpc_direct_dispatches | Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy | instance | rate |
vds.filestor.allstripes.throttled_persistence_thread_polls | Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy | instance | rate |
vds.filestor.allstripes.timeouts_waiting_for_throttle_token | Number of times a persistence thread timed out waiting for an available throttle policy token | instance | rate |
vds.filestor.allthreads.put.count | Number of requests processed. | operation | rate |
vds.filestor.allthreads.put.failed | Number of failed requests. | operation | rate |
vds.filestor.allthreads.put.test_and_set_failed | Number of operations that were skipped due to a test-and-set condition not met | operation | rate |
vds.filestor.allthreads.put.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.put.request_size | Size of requests, in bytes | byte | max, sum, count |
vds.filestor.allthreads.remove.count | Number of requests processed. | operation | rate |
vds.filestor.allthreads.remove.failed | Number of failed requests. | operation | rate |
vds.filestor.allthreads.remove.test_and_set_failed | Number of operations that were skipped due to a test-and-set condition not met | operation | rate |
vds.filestor.allthreads.remove.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.remove.request_size | Size of requests, in bytes | byte | max, sum, count |
vds.filestor.allthreads.get.count | Number of requests processed. | operation | rate |
vds.filestor.allthreads.get.failed | Number of failed requests. | operation | rate |
vds.filestor.allthreads.get.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.get.request_size | Size of requests, in bytes | byte | max, sum, count |
vds.filestor.allthreads.update.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.update.failed | Number of failed requests. | request | rate |
vds.filestor.allthreads.update.test_and_set_failed | Number of requests that were skipped due to a test-and-set condition not met | request | rate |
vds.filestor.allthreads.update.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.update.request_size | Size of requests, in bytes | byte | max, sum, count |
vds.filestor.allthreads.createiterator.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.createiterator.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.visit.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.visit.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.remove_location.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.remove_location.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.splitbuckets.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.joinbuckets.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.deletebuckets.count | Number of requests processed. | request | rate |
vds.filestor.allthreads.deletebuckets.failed | Number of failed requests. | request | rate |
vds.filestor.allthreads.deletebuckets.latency | Latency of successful requests. | millisecond | max, sum, count |
vds.filestor.allthreads.setbucketstates.count | Number of requests processed. | request | rate |
vds.mergethrottler.averagequeuewaitingtime | Time merges spent in the throttler queue | millisecond | max, sum, count |
vds.mergethrottler.queuesize | Length of merge queue | instance | max, sum, count |
vds.mergethrottler.active_window_size | Number of merges active within the pending window size | instance | max, sum, count |
vds.mergethrottler.bounced_due_to_back_pressure | Number of merges bounced due to resource exhaustion back-pressure | instance | rate |
vds.mergethrottler.locallyexecutedmerges.ok | The number of successful merges for 'locallyexecutedmerges' | instance | rate |
vds.mergethrottler.mergechains.ok | The number of successful merges for 'mergechains' | operation | rate |
vds.mergethrottler.mergechains.failures.busy | The number of merges that failed because the storage node was busy | operation | rate |
vds.mergethrottler.mergechains.failures.total | Sum of all failures | operation | rate |