This document provides reference documentation for the Vespa metric set, including suffixes present per metric. If the suffix column contains "N/A" then the base name of the corresponding metric is used with no suffix.
Name | Unit | Suffixes | Description |
---|---|---|---|
cluster-controller.down.count |
node | last, max | Number of content nodes down |
cluster-controller.initializing.count |
node | last, max | Number of content nodes initializing |
cluster-controller.maintenance.count |
node | last, max | Number of content nodes in maintenance |
cluster-controller.retired.count |
node | last, max | Number of content nodes that are retired |
cluster-controller.stopping.count |
node | last | Number of content nodes currently stopping |
cluster-controller.up.count |
node | last, max | Number of content nodes up |
cluster-controller.nodes-not-converged |
node | max | Number of nodes not converging to the latest cluster state version |
cluster-controller.cluster-buckets-out-of-sync-ratio |
fraction | max | Ratio of buckets in the cluster currently in need of syncing |
cluster-controller.busy-tick-time-ms |
millisecond | count, last, max, sum | Time busy |
cluster-controller.idle-tick-time-ms |
millisecond | count, last, max, sum | Time idle |
cluster-controller.work-ms |
millisecond | count, last, sum | Time used for actual work |
cluster-controller.is-master |
binary | last, max | 1 if this cluster controller is currently the master, or 0 if not |
cluster-controller.remote-task-queue.size |
operation | last | Number of remote tasks queued |
cluster-controller.resource_usage.nodes_above_limit |
node | last, max | The number of content nodes above resource limit, blocking feed |
cluster-controller.resource_usage.max_memory_utilization |
fraction | last, max | Current memory utilisation, for content node with highest value |
cluster-controller.resource_usage.max_disk_utilization |
fraction | last, max | Current disk space utilisation, for content node with highest value |
cluster-controller.resource_usage.memory_limit |
fraction | last, max | Memory space limit as a fraction of available memory |
cluster-controller.resource_usage.disk_limit |
fraction | last, max | Disk space limit as a fraction of available disk space |
reindexing.progress |
fraction | last, max | Re-indexing progress |
Name | Unit | Suffixes | Description |
---|---|---|---|
http.status.1xx |
response | rate | Number of responses with a 1xx status |
http.status.2xx |
response | rate | Number of responses with a 2xx status |
http.status.3xx |
response | rate | Number of responses with a 3xx status |
http.status.4xx |
response | rate | Number of responses with a 4xx status |
http.status.5xx |
response | rate | Number of responses with a 5xx status |
application_generation |
version | N/A | The currently live application config generation (aka session id) |
jdisc.gc.count |
operation | average, last, max | Number of JVM garbage collections done |
jdisc.gc.ms |
millisecond | average, last, max | Time spent in JVM garbage collection |
jdisc.jvm |
version | last | JVM runtime version |
jdisc.memory_mappings |
operation | max | JDISC Memory mappings |
jdisc.open_file_descriptors |
item | max | JDISC Open file descriptors |
jdisc.thread_pool.unhandled_exceptions |
thread | count, last, max, min, sum | Number of exceptions thrown by tasks |
jdisc.thread_pool.work_queue.capacity |
thread | count, last, max, min, sum | Capacity of the task queue |
jdisc.thread_pool.work_queue.size |
thread | count, last, max, min, sum | Size of the task queue |
jdisc.thread_pool.rejected_tasks |
thread | count, last, max, min, sum | Number of tasks rejected by the thread pool |
jdisc.thread_pool.size |
thread | count, last, max, min, sum | Size of the thread pool |
jdisc.thread_pool.max_allowed_size |
thread | count, last, max, min, sum | The maximum allowed number of threads in the pool |
jdisc.thread_pool.active_threads |
thread | count, last, max, min, sum | Number of threads that are active |
jdisc.deactivated_containers.total |
item | last, sum | JDISC Deactivated container instances |
jdisc.deactivated_containers.with_retained_refs.last |
item | last | JDISC Deactivated container nodes with retained refs |
jdisc.application.failed_component_graphs |
item | rate | JDISC Application failed component graphs |
jdisc.application.component_graph.creation_time_millis |
millisecond | last | JDISC Application component graph creation time |
jdisc.application.component_graph.reconfigurations |
item | rate | JDISC Application component graph reconfigurations |
jdisc.singleton.is_active |
item | last, max, min | JDISC Singleton is active |
jdisc.singleton.activation.count |
operation | last | JDISC Singleton activations |
jdisc.singleton.activation.failure.count |
operation | last | JDISC Singleton activation failures |
jdisc.singleton.activation.millis |
millisecond | last | JDISC Singleton activation time |
jdisc.singleton.deactivation.count |
operation | last | JDISC Singleton deactivations |
jdisc.singleton.deactivation.failure.count |
operation | last | JDISC Singleton deactivation failures |
jdisc.singleton.deactivation.millis |
millisecond | last | JDISC Singleton deactivation time |
jdisc.http.ssl.handshake.failure.missing_client_cert |
operation | rate | JDISC HTTP SSL Handshake failures due to missing client certificate |
jdisc.http.ssl.handshake.failure.expired_client_cert |
operation | rate | JDISC HTTP SSL Handshake failures due to expired client certificate |
jdisc.http.ssl.handshake.failure.invalid_client_cert |
operation | rate | JDISC HTTP SSL Handshake failures due to invalid client certificate |
jdisc.http.ssl.handshake.failure.incompatible_protocols |
operation | rate | JDISC HTTP SSL Handshake failures due to incompatible protocols |
jdisc.http.ssl.handshake.failure.incompatible_chifers |
operation | rate | JDISC HTTP SSL Handshake failures due to incompatible chifers |
jdisc.http.ssl.handshake.failure.connection_closed |
operation | rate | JDISC HTTP SSL Handshake failures due to connection closed |
jdisc.http.ssl.handshake.failure.unknown |
operation | rate | JDISC HTTP SSL Handshake failures for unknown reason |
jdisc.http.request.prematurely_closed |
request | rate | HTTP requests prematurely closed |
jdisc.http.request.requests_per_connection |
request | average, count, max, min, sum | HTTP requests per connection |
jdisc.http.request.uri_length |
byte | count, max, sum | HTTP URI length |
jdisc.http.request.content_size |
byte | count, max, sum | HTTP request content size |
jdisc.http.requests |
request | count, rate | HTTP requests |
jdisc.http.filter.rule.blocked_requests |
request | rate | Number of requests blocked by filter |
jdisc.http.filter.rule.allowed_requests |
request | rate | Number of requests allowed by filter |
jdisc.http.filtering.request.handled |
request | rate | Number of filtering requests handled |
jdisc.http.filtering.request.unhandled |
request | rate | Number of filtering requests unhandled |
jdisc.http.filtering.response.handled |
request | rate | Number of filtering responses handled |
jdisc.http.filtering.response.unhandled |
request | rate | Number of filtering responses unhandled |
jdisc.http.handler.unhandled_exceptions |
request | rate | Number of unhandled exceptions in handler |
jdisc.tls.capability_checks.succeeded |
operation | rate | Number of TLS capability checks succeeded |
jdisc.tls.capability_checks.failed |
operation | rate | Number of TLS capability checks failed |
jdisc.http.jetty.threadpool.thread.max |
thread | count, last, max, min, sum | Configured maximum number of threads |
jdisc.http.jetty.threadpool.thread.min |
thread | count, last, max, min, sum | Configured minimum number of threads |
jdisc.http.jetty.threadpool.thread.reserved |
thread | count, last, max, min, sum | Configured number of reserved threads or -1 for heuristic |
jdisc.http.jetty.threadpool.thread.busy |
thread | count, last, max, min, sum | Number of threads executing internal and transient jobs |
jdisc.http.jetty.threadpool.thread.total |
thread | count, last, max, min, sum | Current number of threads |
jdisc.http.jetty.threadpool.queue.size |
thread | count, last, max, min, sum | Current size of the job queue |
serverNumOpenConnections |
connection | average, last, max | The number of currently open connections |
serverNumConnections |
connection | average, last, max | The total number of connections opened |
serverBytesReceived |
byte | count, sum | The number of bytes received by the server |
serverBytesSent |
byte | count, sum | The number of bytes sent from the server |
handled.requests |
operation | count | The number of requests handled per metrics snapshot |
handled.latency |
millisecond | count, max, sum | The time used for requests during this metrics snapshot |
httpapi_latency |
millisecond | count, max, sum | Duration for requests to the HTTP document APIs |
httpapi_pending |
operation | count, max, sum | Document operations pending execution |
httpapi_num_operations |
operation | rate | Total number of document operations performed |
httpapi_num_updates |
operation | rate | Document update operations performed |
httpapi_num_removes |
operation | rate | Document remove operations performed |
httpapi_num_puts |
operation | rate | Document put operations performed |
httpapi_succeeded |
operation | rate | Document operations that succeeded |
httpapi_failed |
operation | rate | Document operations that failed |
httpapi_parse_error |
operation | rate | Document operations that failed due to document parse errors |
httpapi_condition_not_met |
operation | rate | Document operations not applied due to condition not met |
httpapi_not_found |
operation | rate | Document operations not applied due to document not found |
httpapi_failed_unknown |
operation | rate | Document operations failed by unknown cause |
httpapi_failed_timeout |
operation | rate | Document operations failed by timeout |
httpapi_failed_insufficient_storage |
operation | rate | Document operations failed by insufficient storage |
mem.heap.total |
byte | average | Total available heap memory |
mem.heap.free |
byte | average | Free heap memory |
mem.heap.used |
byte | average, max | Currently used heap memory |
mem.direct.total |
byte | average | Total available direct memory |
mem.direct.free |
byte | average | Currently free direct memory |
mem.direct.used |
byte | average, max | Direct memory currently used |
mem.direct.count |
byte | max | Number of direct memory allocations |
mem.native.total |
byte | average | Total available native memory |
mem.native.free |
byte | average | Currently free native memory |
mem.native.used |
byte | average | Native memory currently used |
athenz-tenant-cert.expiry.seconds |
second | last, max, min | Time remaining until Athenz tenant certificate expires |
container-iam-role.expiry.seconds |
second | N/A | Time remaining until IAM role expires |
peak_qps |
query_per_second | max | The highest number of qps for a second for this metrics snapshot |
search_connections |
connection | count, max, sum | Number of search connections |
feed.operations |
operation | rate | Number of document feed operations |
feed.latency |
millisecond | count, max, sum | Feed latency |
feed.http-requests |
operation | count, rate | Feed HTTP requests |
queries |
operation | rate | Query volume |
query_container_latency |
millisecond | count, max, sum | The query execution time consumed in the container |
query_latency |
millisecond | count, max, sum | The overall query latency as seen by the container |
query_timeout |
millisecond | count, max, min, sum | The amount of time allowed for query execution, from the client |
failed_queries |
operation | rate | The number of failed queries |
degraded_queries |
operation | rate | The number of degraded queries, e.g. due to some content nodes not responding in time |
hits_per_query |
hit_per_query | count, max, sum | The number of hits returned |
query_hit_offset |
hit | count, max, sum | The offset for hits returned |
documents_covered |
document | count | The combined number of documents considered during query evaluation |
documents_total |
document | count | The number of documents to be evaluated if all requests had been fully executed |
documents_target_total |
document | count | The target number of total documents to be evaluated when when all data is in sync |
jdisc.render.latency |
nanosecond | average, count, last, max, min, sum | The time used by the container to render responses |
query_item_count |
item | count, max, sum | The number of query items (terms, phrases, etc) |
docproc.proctime |
millisecond | count, max, sum | Time spent processing document |
docproc.documents |
document | count, max, min, sum | Number of processed documents |
totalhits_per_query |
hit_per_query | count, max, sum | The total number of documents found to match queries |
empty_results |
operation | rate | Number of queries matching no documents |
requestsOverQuota |
operation | count, rate | The number of requests rejected due to exceeding quota |
relevance.at_1 |
score | count, sum | The relevance of hit number 1 |
relevance.at_3 |
score | count, sum | The relevance of hit number 3 |
relevance.at_10 |
score | count, sum | The relevance of hit number 10 |
error.timeout |
operation | rate | Requests that timed out |
error.backends_oos |
operation | rate | Requests that failed due to no available backends nodes |
error.plugin_failure |
operation | rate | Requests that failed due to plugin failure |
error.backend_communication_error |
operation | rate | Requests that failed due to backend communication error |
error.empty_document_summaries |
operation | rate | Requests that failed due to missing document summaries |
error.invalid_query_parameter |
operation | rate | Requests that failed due to invalid query parameters |
error.internal_server_error |
operation | rate | Requests that failed due to internal server error |
error.misconfigured_server |
operation | rate | Requests that failed due to misconfigured server |
error.invalid_query_transformation |
operation | rate | Requests that failed due to invalid query transformation |
error.results_with_errors |
operation | rate | The number of queries with error payload |
error.unspecified |
operation | rate | Requests that failed for an unspecified reason |
error.unhandled_exception |
operation | rate | Requests that failed due to an unhandled exception |
serverRejectedRequests |
operation | count, rate | Deprecated. Use jdisc.thread_pool.rejected_tasks instead. |
serverThreadPoolSize |
thread | last, max | Deprecated. Use jdisc.thread_pool.size instead. |
serverActiveThreads |
thread | count, last, max, min, sum | Deprecated. Use jdisc.thread_pool.active_threads instead. |
jrt.transport.tls-certificate-verification-failures |
failure | N/A | TLS certificate verification failures |
failure | N/A | TLS peer authorization failures | |
jrt.transport.server.tls-connections-established |
connection | N/A | TLS server connections established |
jrt.transport.client.tls-connections-established |
connection | N/A | TLS client connections established |
jrt.transport.server.unencrypted-connections-established |
connection | N/A | Unencrypted server connections established |
jrt.transport.client.unencrypted-connections-established |
connection | N/A | Unencrypted client connections established |
embedder.latency |
millisecond | count, max, sum | Time spent creating an embedding |
embedder.sequence_length |
byte | count, max, sum | Size of sequence produced by tokenizer |
Name | Unit | Suffixes | Description |
---|---|---|---|
vds.idealstate.buckets_rechecking |
bucket | average | The number of buckets that we are rechecking for ideal state operations |
vds.idealstate.idealstate_diff |
bucket | average | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state |
vds.idealstate.buckets_toofewcopies |
bucket | average | The number of buckets the distributor controls that have less than the desired redundancy |
vds.idealstate.buckets_toomanycopies |
bucket | average | The number of buckets the distributor controls that have more than the desired redundancy |
vds.idealstate.buckets |
bucket | average | The number of buckets the distributor controls |
vds.idealstate.buckets_notrusted |
bucket | average | The number of buckets that have no trusted copies. |
vds.idealstate.bucket_replicas_moving_out |
bucket | average | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. |
vds.idealstate.bucket_replicas_copying_out |
bucket | average | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge |
vds.idealstate.bucket_replicas_copying_in |
bucket | average | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for |
vds.idealstate.bucket_replicas_syncing |
bucket | average | Bucket replicas that need syncing due to mismatching metadata |
vds.idealstate.max_observed_time_since_last_gc_sec |
second | average | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. |
vds.idealstate.delete_bucket.done_ok |
operation | rate | The number of operations successfully performed |
vds.idealstate.delete_bucket.done_failed |
operation | rate | The number of operations that failed |
vds.idealstate.delete_bucket.pending |
operation | average | The number of operations pending |
vds.idealstate.merge_bucket.done_ok |
operation | rate | The number of operations successfully performed |
vds.idealstate.merge_bucket.done_failed |
operation | rate | The number of operations that failed |
vds.idealstate.merge_bucket.pending |
operation | average | The number of operations pending |
vds.idealstate.merge_bucket.blocked |
operation | rate | The number of operations blocked by blocking operation starter |
vds.idealstate.merge_bucket.throttled |
operation | rate | The number of operations throttled by throttling operation starter |
vds.idealstate.merge_bucket.source_only_copy_changed |
operation | rate | The number of merge operations where source-only copy changed |
vds.idealstate.merge_bucket.source_only_copy_delete_blocked |
operation | rate | The number of merge operations where delete of unchanged source-only copies was blocked |
vds.idealstate.merge_bucket.source_only_copy_delete_failed |
operation | rate | The number of merge operations where delete of unchanged source-only copies failed |
vds.idealstate.split_bucket.done_ok |
operation | rate | The number of operations successfully performed |
vds.idealstate.split_bucket.done_failed |
operation | rate | The number of operations that failed |
vds.idealstate.split_bucket.pending |
operation | average | The number of operations pending |
vds.idealstate.join_bucket.done_ok |
operation | rate | The number of operations successfully performed |
vds.idealstate.join_bucket.done_failed |
operation | rate | The number of operations that failed |
vds.idealstate.join_bucket.pending |
operation | average | The number of operations pending |
vds.idealstate.garbage_collection.done_ok |
operation | rate | The number of operations successfully performed |
vds.idealstate.garbage_collection.done_failed |
operation | rate | The number of operations that failed |
vds.idealstate.garbage_collection.pending |
operation | average | The number of operations pending |
vds.idealstate.garbage_collection.documents_removed |
document | count, rate | Number of documents removed by GC operations |
vds.distributor.puts.latency |
millisecond | count, max, sum | The latency of put operations |
vds.distributor.puts.ok |
operation | rate | The number of successful put operations performed |
vds.distributor.puts.failures.total |
operation | rate | Sum of all failures |
vds.distributor.puts.failures.notfound |
operation | rate | The number of operations that failed because the document did not exist |
vds.distributor.puts.failures.test_and_set_failed |
operation | rate | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.puts.failures.concurrent_mutations |
operation | rate | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.puts.failures.notconnected |
operation | rate | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.puts.failures.notready |
operation | rate | The number of operations discarded because distributor was not ready |
vds.distributor.puts.failures.wrongdistributor |
operation | rate | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.puts.failures.safe_time_not_reached |
operation | rate | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.puts.failures.storagefailure |
operation | rate | The number of operations that failed in storage |
vds.distributor.puts.failures.timeout |
operation | rate | The number of operations that failed because the operation timed out towards storage |
vds.distributor.puts.failures.busy |
operation | rate | The number of messages from storage that failed because the storage node was busy |
vds.distributor.puts.failures.inconsistent_bucket |
operation | rate | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.removes.latency |
millisecond | count, max, sum | The latency of remove operations |
vds.distributor.removes.ok |
operation | rate | The number of successful removes operations performed |
vds.distributor.removes.failures.total |
operation | rate | Sum of all failures |
vds.distributor.removes.failures.notfound |
operation | rate | The number of operations that failed because the document did not exist |
vds.distributor.removes.failures.test_and_set_failed |
operation | rate | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.removes.failures.concurrent_mutations |
operation | rate | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.updates.latency |
millisecond | count, max, sum | The latency of update operations |
vds.distributor.updates.ok |
operation | rate | The number of successful updates operations performed |
vds.distributor.updates.failures.total |
operation | rate | Sum of all failures |
vds.distributor.updates.failures.notfound |
operation | rate | The number of operations that failed because the document did not exist |
vds.distributor.updates.failures.test_and_set_failed |
operation | rate | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.updates.failures.concurrent_mutations |
operation | rate | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.updates.diverging_timestamp_updates |
operation | rate | Number of updates that report they were performed against divergent version timestamps on different replicas |
vds.distributor.removelocations.ok |
operation | rate | The number of successful removelocations operations performed |
vds.distributor.removelocations.failures.total |
operation | rate | Sum of all failures |
vds.distributor.gets.latency |
millisecond | count, max, sum | The average latency of gets operations |
vds.distributor.gets.ok |
operation | rate | The number of successful gets operations performed |
vds.distributor.gets.failures.total |
operation | rate | Sum of all failures |
vds.distributor.gets.failures.notfound |
operation | rate | The number of operations that failed because the document did not exist |
vds.distributor.visitor.latency |
millisecond | count, max, sum | The average latency of visitor operations |
vds.distributor.visitor.ok |
operation | rate | The number of successful visitor operations performed |
vds.distributor.visitor.failures.total |
operation | rate | Sum of all failures |
vds.distributor.visitor.failures.notready |
operation | rate | The number of operations discarded because distributor was not ready |
vds.distributor.visitor.failures.notconnected |
operation | rate | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.visitor.failures.wrongdistributor |
operation | rate | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.visitor.failures.safe_time_not_reached |
operation | rate | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.visitor.failures.storagefailure |
operation | rate | The number of operations that failed in storage |
vds.distributor.visitor.failures.timeout |
operation | rate | The number of operations that failed because the operation timed out towards storage |
vds.distributor.visitor.failures.busy |
operation | rate | The number of messages from storage that failed because the storage node was busy |
vds.distributor.visitor.failures.inconsistent_bucket |
operation | rate | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.visitor.failures.notfound |
operation | rate | The number of operations that failed because the document did not exist |
vds.distributor.docsstored |
document | average | Number of documents stored in all buckets controlled by this distributor |
vds.distributor.bytesstored |
byte | average | Number of bytes stored in all buckets controlled by this distributor |
vds.bouncer.clock_skew_aborts |
operation | count | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range |
Name | Unit | Suffixes | Description |
---|---|---|---|
logd.processed.lines |
item | count | Number of log lines processed |
Name | Unit | Suffixes | Description |
---|---|---|---|
endpoint.certificate.expiry.seconds |
second | N/A | Time until node endpoint certificate expires |
node-certificate.expiry.seconds |
second | N/A | Time until node certificate expires |
Name | Unit | Suffixes | Description |
---|---|---|---|
content.proton.config.generation |
version | last | The oldest config generation used by this search node |
content.proton.documentdb.documents.total |
document | last, max | The total number of documents in this documents db (ready + not-ready) |
content.proton.documentdb.documents.ready |
document | last, max | The number of ready documents in this document db |
content.proton.documentdb.documents.active |
document | last, max | The number of active / searchable documents in this document db |
content.proton.documentdb.documents.removed |
document | last, max | The number of removed documents in this document db |
content.proton.documentdb.index.docs_in_memory |
document | last, max | Number of documents in memory index |
content.proton.documentdb.disk_usage |
byte | last | The total disk usage (in bytes) for this document db |
content.proton.documentdb.memory_usage.allocated_bytes |
byte | max | The number of allocated bytes |
content.proton.documentdb.heart_beat_age |
second | last, min | How long ago (in seconds) heart beat maintenace job was run |
content.proton.docsum.docs |
document | rate | Total docsums returned |
content.proton.docsum.latency |
millisecond | count, max, sum | Docsum request latency |
content.proton.search_protocol.query.latency |
second | count, max, sum | Query request latency (seconds) |
content.proton.search_protocol.query.request_size |
byte | count, max, sum | Query request size (network bytes) |
content.proton.search_protocol.query.reply_size |
byte | count, max, sum | Query reply size (network bytes) |
content.proton.search_protocol.docsum.latency |
second | average, count, max, sum | Docsum request latency (seconds) |
content.proton.search_protocol.docsum.request_size |
byte | count, max, sum | Docsum request size (network bytes) |
content.proton.search_protocol.docsum.reply_size |
byte | count, max, sum | Docsum reply size (network bytes) |
content.proton.search_protocol.docsum.requested_documents |
document | count, max, sum | Total requested document summaries |
content.proton.executor.proton.queuesize |
task | count, max, sum | Size of executor proton task queue |
content.proton.executor.proton.accepted |
task | rate | Number of executor proton accepted tasks |
content.proton.executor.proton.wakeups |
wakeup | rate | Number of times a executor proton worker thread has been woken up |
content.proton.executor.proton.utilization |
fraction | count, max, sum | Ratio of time the executor proton worker threads has been active |
content.proton.executor.flush.queuesize |
task | count, max, sum | Size of executor flush task queue |
content.proton.executor.flush.accepted |
task | rate | Number of accepted executor flush tasks |
content.proton.executor.flush.wakeups |
wakeup | rate | Number of times a executor flush worker thread has been woken up |
content.proton.executor.flush.utilization |
fraction | count, max, sum | Ratio of time the executor flush worker threads has been active |
content.proton.executor.match.queuesize |
task | count, max, sum | Size of executor match task queue |
content.proton.executor.match.accepted |
task | rate | Number of accepted executor match tasks |
content.proton.executor.match.wakeups |
wakeup | rate | Number of times a executor match worker thread has been woken up |
content.proton.executor.match.utilization |
fraction | count, max, sum | Ratio of time the executor match worker threads has been active |
content.proton.executor.docsum.queuesize |
task | count, max, sum | Size of executor docsum task queue |
content.proton.executor.docsum.accepted |
task | rate | Number of executor accepted docsum tasks |
content.proton.executor.docsum.wakeups |
wakeup | rate | Number of times a executor docsum worker thread has been woken up |
content.proton.executor.docsum.utilization |
fraction | count, max, sum | Ratio of time the executor docsum worker threads has been active |
task | count, max, sum | Size of executor shared task queue | |
task | rate | Number of executor shared accepted tasks | |
wakeup | rate | Number of times a executor shared worker thread has been woken up | |
fraction | count, max, sum | Ratio of time the executor shared worker threads has been active | |
content.proton.executor.warmup.queuesize |
task | count, max, sum | Size of executor warmup task queue |
content.proton.executor.warmup.accepted |
task | rate | Number of accepted executor warmup tasks |
content.proton.executor.warmup.wakeups |
wakeup | rate | Number of times a warmup executor worker thread has been woken up |
content.proton.executor.warmup.utilization |
fraction | count, max, sum | Ratio of time the executor warmup worker threads has been active |
content.proton.executor.field_writer.queuesize |
task | count, max, sum | Size of executor field writer task queue |
content.proton.executor.field_writer.accepted |
task | rate | Number of accepted executor field writer tasks |
content.proton.executor.field_writer.wakeups |
wakeup | rate | Number of times a executor field writer worker thread has been woken up |
content.proton.executor.field_writer.utilization |
fraction | count, max, sum | Ratio of time the executor fieldwriter worker threads has been active |
content.proton.executor.field_writer.saturation |
fraction | count, max, sum | Ratio indicating the max saturation of underlying worker threads. A higher saturation than utilization indicates a bottleneck in one of the worker threads. |
content.proton.documentdb.job.total |
fraction | average | The job load average total of all job metrics |
content.proton.documentdb.job.attribute_flush |
fraction | average | Flushing of attribute vector(s) to disk |
content.proton.documentdb.job.memory_index_flush |
fraction | average | Flushing of memory index to disk |
content.proton.documentdb.job.disk_index_fusion |
fraction | average | Fusion of disk indexes |
content.proton.documentdb.job.document_store_flush |
fraction | average | Flushing of document store to disk |
content.proton.documentdb.job.document_store_compact |
fraction | average | Compaction of document store on disk |
content.proton.documentdb.job.bucket_move |
fraction | average | Moving of buckets between 'ready' and 'notready' sub databases |
content.proton.documentdb.job.lid_space_compact |
fraction | average | Compaction of lid space in document meta store and attribute vectors |
content.proton.documentdb.job.removed_documents_prune |
fraction | average | Pruning of removed documents in 'removed' sub database |
content.proton.documentdb.threading_service.master.queuesize |
task | count, max, sum | Size of threading service master task queue |
content.proton.documentdb.threading_service.master.accepted |
task | rate | Number of accepted threading service master tasks |
content.proton.documentdb.threading_service.master.wakeups |
wakeup | rate | Number of times a threading service master worker thread has been woken up |
content.proton.documentdb.threading_service.master.utilization |
fraction | count, max, sum | Ratio of time the threading service master worker threads has been active |
content.proton.documentdb.threading_service.index.queuesize |
task | count, max, sum | Size of threading service index task queue |
content.proton.documentdb.threading_service.index.accepted |
task | rate | Number of accepted threading service index tasks |
content.proton.documentdb.threading_service.index.wakeups |
wakeup | rate | Number of times a threading service index worker thread has been woken up |
content.proton.documentdb.threading_service.index.utilization |
fraction | count, max, sum | Ratio of time the threading service index worker threads has been active |
content.proton.documentdb.threading_service.summary.queuesize |
task | count, max, sum | Size of threading service summary task queue |
content.proton.documentdb.threading_service.summary.accepted |
task | rate | Number of accepted threading service summary tasks |
content.proton.documentdb.threading_service.summary.wakeups |
wakeup | rate | Number of times a threading service summary worker thread has been woken up |
content.proton.documentdb.threading_service.summary.utilization |
fraction | count, max, sum | Ratio of time the threading service summary worker threads has been active |
content.proton.documentdb.ready.lid_space.lid_bloat_factor |
fraction | average | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) |
content.proton.documentdb.ready.lid_space.lid_fragmentation_factor |
fraction | average | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) |
content.proton.documentdb.ready.lid_space.lid_limit |
documentid | last, max | The size of the allocated lid space |
content.proton.documentdb.ready.lid_space.highest_used_lid |
documentid | last, max | The highest used lid |
content.proton.documentdb.ready.lid_space.used_lids |
documentid | last, max | The number of lids used |
content.proton.documentdb.notready.lid_space.lid_bloat_factor |
fraction | average | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) |
content.proton.documentdb.notready.lid_space.lid_fragmentation_factor |
fraction | average | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) |
content.proton.documentdb.notready.lid_space.lid_limit |
documentid | last, max | The size of the allocated lid space |
content.proton.documentdb.notready.lid_space.highest_used_lid |
documentid | last, max | The highest used lid |
content.proton.documentdb.notready.lid_space.used_lids |
documentid | last, max | The number of lids used |
content.proton.documentdb.removed.lid_space.lid_bloat_factor |
fraction | average | The bloat factor of this lid space, indicating the total amount of holes in the allocated lid space ((lid_limit - used_lids) / lid_limit) |
content.proton.documentdb.removed.lid_space.lid_fragmentation_factor |
fraction | average | The fragmentation factor of this lid space, indicating the amount of holes in the currently used part of the lid space ((highest_used_lid - used_lids) / highest_used_lid) |
content.proton.documentdb.removed.lid_space.lid_limit |
documentid | last, max | The size of the allocated lid space |
content.proton.documentdb.removed.lid_space.highest_used_lid |
documentid | last, max | The highest used lid |
content.proton.documentdb.removed.lid_space.used_lids |
documentid | last, max | The number of lids used |
content.proton.documentdb.bucket_move.buckets_pending |
bucket | last, max, sum | The number of buckets left to move |
content.proton.resource_usage.disk |
fraction | average | The relative amount of disk used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller |
content.proton.resource_usage.disk_usage.total |
fraction | max | The total relative amount of disk used by this content node (value in the range [0, 1]) |
content.proton.resource_usage.disk_usage.total_utilization |
fraction | max | The relative amount of disk used compared to the content node disk resource limit |
content.proton.resource_usage.disk_usage.transient |
fraction | max | The relative amount of transient disk used by this content node (value in the range [0, 1]) |
content.proton.resource_usage.memory |
fraction | average | The relative amount of memory used by this content node (transient usage not included, value in the range [0, 1]). Same value as reported to the cluster controller |
content.proton.resource_usage.memory_usage.total |
fraction | max | The total relative amount of memory used by this content node (value in the range [0, 1]) |
content.proton.resource_usage.memory_usage.total_utilization |
fraction | max | The relative amount of memory used compared to the content node memory resource limit |
content.proton.resource_usage.memory_usage.transient |
fraction | max | The relative amount of transient memory used by this content node (value in the range [0, 1]) |
content.proton.resource_usage.memory_mappings |
file | max | The number of memory mapped files |
content.proton.resource_usage.open_file_descriptors |
file | max | The number of open files |
content.proton.resource_usage.feeding_blocked |
binary | last, max | Whether feeding is blocked due to resource limits being reached (value is either 0 or 1) |
content.proton.resource_usage.malloc_arena |
byte | max | Size of malloc arena |
content.proton.documentdb.attribute.resource_usage.address_space |
fraction | max | The max relative address space used among components in all attribute vectors in this document db (value in the range [0, 1]) |
content.proton.documentdb.attribute.resource_usage.feeding_blocked |
binary | max | Whether feeding is blocked due to attribute resource limits being reached (value is either 0 or 1) |
content.proton.resource_usage.cpu_util.setup |
fraction | count, max, sum | cpu used by system init and (re-)configuration |
content.proton.resource_usage.cpu_util.read |
fraction | count, max, sum | cpu used by reading data from the system |
content.proton.resource_usage.cpu_util.write |
fraction | count, max, sum | cpu used by writing data to the system |
content.proton.resource_usage.cpu_util.compact |
fraction | count, max, sum | cpu used by internal data re-structuring |
content.proton.resource_usage.cpu_util.other |
fraction | count, max, sum | cpu used by work not classified as a specific category |
content.proton.transactionlog.entries |
record | average | The current number of entries in the transaction log |
content.proton.transactionlog.disk_usage |
byte | average | The disk usage (in bytes) of the transaction log |
content.proton.transactionlog.replay_time |
second | last, max | The replay time (in seconds) of the transaction log during start-up |
content.proton.documentdb.ready.document_store.disk_usage |
byte | average | Disk space usage in bytes |
content.proton.documentdb.ready.document_store.disk_bloat |
byte | average | Disk space bloat in bytes |
content.proton.documentdb.ready.document_store.max_bucket_spread |
fraction | average | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) |
content.proton.documentdb.ready.document_store.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes |
content.proton.documentdb.ready.document_store.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) |
content.proton.documentdb.ready.document_store.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold |
content.proton.documentdb.notready.document_store.disk_usage |
byte | average | Disk space usage in bytes |
content.proton.documentdb.notready.document_store.disk_bloat |
byte | average | Disk space bloat in bytes |
content.proton.documentdb.notready.document_store.max_bucket_spread |
fraction | average | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) |
content.proton.documentdb.notready.document_store.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes |
content.proton.documentdb.notready.document_store.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) |
content.proton.documentdb.notready.document_store.memory_usage.dead_bytes |
byte | average | The number of dead bytes (<= used_bytes) |
content.proton.documentdb.notready.document_store.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold |
content.proton.documentdb.removed.document_store.disk_usage |
byte | average | Disk space usage in bytes |
content.proton.documentdb.removed.document_store.disk_bloat |
byte | average | Disk space bloat in bytes |
content.proton.documentdb.removed.document_store.max_bucket_spread |
fraction | average | Max bucket spread in underlying files (sum(unique buckets in each chunk)/unique buckets in file) |
content.proton.documentdb.removed.document_store.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes |
content.proton.documentdb.removed.document_store.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) |
content.proton.documentdb.removed.document_store.memory_usage.dead_bytes |
byte | average | The number of dead bytes (<= used_bytes) |
content.proton.documentdb.removed.document_store.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold |
content.proton.documentdb.ready.document_store.cache.memory_usage |
byte | average | Memory usage of the cache (in bytes) |
content.proton.documentdb.ready.document_store.cache.hit_rate |
fraction | average | Rate of hits in the cache compared to number of lookups |
content.proton.documentdb.ready.document_store.cache.lookups |
operation | rate | Number of lookups in the cache (hits + misses) |
content.proton.documentdb.ready.document_store.cache.invalidations |
operation | rate | Number of invalidations (erased elements) in the cache. |
content.proton.documentdb.notready.document_store.cache.memory_usage |
byte | average | Memory usage of the cache (in bytes) |
content.proton.documentdb.notready.document_store.cache.hit_rate |
fraction | average | Rate of hits in the cache compared to number of lookups |
content.proton.documentdb.notready.document_store.cache.lookups |
operation | rate | Number of lookups in the cache (hits + misses) |
content.proton.documentdb.notready.document_store.cache.invalidations |
operation | rate | Number of invalidations (erased elements) in the cache. |
content.proton.documentdb.ready.attribute.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes |
content.proton.documentdb.ready.attribute.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) |
content.proton.documentdb.ready.attribute.memory_usage.dead_bytes |
byte | average | The number of dead bytes (<= used_bytes) |
content.proton.documentdb.ready.attribute.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold |
content.proton.documentdb.ready.attribute.disk_usage |
byte | average | Disk space usage (in bytes) of the flushed snapshot of this attribute for this document type |
content.proton.documentdb.notready.attribute.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes |
content.proton.documentdb.notready.attribute.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) |
content.proton.documentdb.notready.attribute.memory_usage.dead_bytes |
byte | average | The number of dead bytes (<= used_bytes) |
content.proton.documentdb.notready.attribute.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold |
content.proton.index.cache.postinglist.memory_usage |
byte | average | Memory usage of the cache (in bytes). Contains disk index posting list files across all document types |
content.proton.index.cache.postinglist.hit_rate |
fraction | average | Rate of hits in the cache compared to number of lookups. Contains disk index posting list files across all document types |
content.proton.index.cache.postinglist.lookups |
operation | rate | Number of lookups in the cache (hits + misses). Contains disk index posting list files across all document types |
content.proton.index.cache.postinglist.invalidations |
operation | rate | Number of invalidations (erased elements) in the cache. Contains disk index posting list files across all document types |
content.proton.index.cache.bitvector.memory_usage |
byte | average | Memory usage of the cache (in bytes). Contains disk index bitvector files across all document types |
content.proton.index.cache.bitvector.hit_rate |
fraction | average | Rate of hits in the cache compared to number of lookups. Contains disk index bitvector files across all document types |
content.proton.index.cache.bitvector.lookups |
operation | rate | Number of lookups in the cache (hits + misses). Contains disk index bitvector files across all document types |
content.proton.index.cache.bitvector.invalidations |
operation | rate | Number of invalidations (erased elements) in the cache. Contains disk index bitvector files across all document types |
content.proton.documentdb.index.memory_usage.allocated_bytes |
byte | average | The number of allocated bytes for the memory index for this document type |
content.proton.documentdb.index.memory_usage.used_bytes |
byte | average | The number of used bytes (<= allocated_bytes) for the memory index for this document type |
content.proton.documentdb.index.memory_usage.dead_bytes |
byte | average | The number of dead bytes (<= used_bytes) for the memory index for this document type |
content.proton.documentdb.index.memory_usage.onhold_bytes |
byte | average | The number of bytes on hold for the memory index for this document type |
content.proton.documentdb.index.io.search.read_bytes |
byte | count, sum | Bytes read from disk index posting list and bitvector files as part of search for this document type |
content.proton.documentdb.index.io.search.cached_read_bytes |
byte | count, sum | Bytes read from cached disk index posting list and bitvector files as part of search for this document type |
content.proton.documentdb.ready.index.disk_usage |
byte | average | Disk space usage (in bytes) of this index field in all disk indexes for this document type |
content.proton.documentdb.matching.queries |
query | rate | Number of queries executed |
content.proton.documentdb.matching.soft_doomed_queries |
query | rate | Number of queries hitting the soft timeout |
content.proton.documentdb.matching.query_latency |
second | count, max, sum | Total average latency (sec) when matching and ranking a query |
content.proton.documentdb.matching.query_setup_time |
second | count, max, sum | Average time (sec) spent setting up and tearing down queries |
content.proton.documentdb.matching.docs_matched |
document | count, rate | Number of documents matched |
content.proton.documentdb.matching.rank_profile.queries |
query | rate | Number of queries executed |
content.proton.documentdb.matching.rank_profile.soft_doomed_queries |
query | rate | Number of queries hitting the soft timeout |
content.proton.documentdb.matching.rank_profile.soft_doom_factor |
fraction | count, max, min, sum | Factor used to compute soft-timeout |
content.proton.documentdb.matching.rank_profile.query_latency |
second | count, max, sum | Total average latency (sec) when matching and ranking a query |
content.proton.documentdb.matching.rank_profile.query_setup_time |
second | count, max, sum | Average time (sec) spent setting up and tearing down queries |
content.proton.documentdb.matching.rank_profile.grouping_time |
second | count, max, sum | Average time (sec) spent on grouping |
content.proton.documentdb.matching.rank_profile.rerank_time |
second | count, max, sum | Average time (sec) spent on 2nd phase ranking |
content.proton.documentdb.matching.rank_profile.docs_matched |
document | count, rate | Number of documents matched |
content.proton.documentdb.matching.rank_profile.limited_queries |
query | rate | Number of queries limited in match phase |
content.proton.documentdb.feeding.commit.operations |
operation | count, max, rate, sum | Number of operations included in a commit |
content.proton.documentdb.feeding.commit.latency |
second | count, max, sum | Latency for commit in seconds |
Name | Unit | Suffixes | Description |
---|---|---|---|
sentinel.restarts |
restart | count | Number of service restarts done by the sentinel |
sentinel.totalRestarts |
restart | last, max, sum | Total number of service restarts done by the sentinel since the sentinel was started |
sentinel.uptime |
second | last | Time the sentinel has been running |
sentinel.running |
instance | count, last | Number of services the sentinel has running currently |
Name | Unit | Suffixes | Description |
---|---|---|---|
slobrok.heartbeats.failed |
request | count | Number of heartbeat requests failed |
slobrok.missing.consensus |
second | count | Number of seconds without full consensus with all other brokers |
Name | Unit | Suffixes | Description |
---|---|---|---|
vds.datastored.alldisks.buckets |
bucket | average | Number of buckets managed |
vds.datastored.alldisks.docs |
document | average | Number of documents stored |
vds.datastored.alldisks.bytes |
byte | average | Number of bytes stored |
vds.visitor.allthreads.averagevisitorlifetime |
millisecond | count, max, sum | Average lifetime of a visitor |
vds.visitor.allthreads.averagequeuewait |
millisecond | count, max, sum | Average time an operation spends in input queue. |
vds.visitor.allthreads.queuesize |
operation | count, max, sum | Size of input message queue. |
vds.visitor.allthreads.completed |
operation | rate | Number of visitors completed |
vds.visitor.allthreads.created |
operation | rate | Number of visitors created. |
vds.visitor.allthreads.failed |
operation | rate | Number of visitors failed |
vds.visitor.allthreads.averagemessagesendtime |
millisecond | count, max, sum | Average time it takes for messages to be sent to their target (and be replied to) |
vds.visitor.allthreads.averageprocessingtime |
millisecond | count, max, sum | Average time used to process visitor requests |
vds.filestor.queuesize |
operation | count, max, sum | Size of input message queue. |
vds.filestor.averagequeuewait |
millisecond | count, max, sum | Average time an operation spends in input queue. |
vds.filestor.active_operations.size |
operation | count, max, sum | Number of concurrent active operations |
vds.filestor.active_operations.latency |
millisecond | count, max, sum | Latency (in ms) for completed operations |
vds.filestor.throttle_window_size |
operation | count, max, sum | Current size of async operation throttler window size |
vds.filestor.throttle_waiting_threads |
thread | count, max, sum | Number of threads waiting to acquire a throttle token |
vds.filestor.throttle_active_tokens |
instance | count, max, sum | Current number of active throttle tokens |
vds.filestor.allthreads.mergemetadatareadlatency |
millisecond | count, max, sum | Time spent in a merge step to check metadata of current node to see what data it has. |
vds.filestor.allthreads.mergedatareadlatency |
millisecond | count, max, sum | Time spent in a merge step to read data other nodes need. |
vds.filestor.allthreads.mergedatawritelatency |
millisecond | count, max, sum | Time spent in a merge step to write data needed to current node. |
vds.filestor.allthreads.merge_put_latency |
millisecond | count, max, sum | Latency of individual puts that are part of merge operations |
vds.filestor.allthreads.merge_remove_latency |
millisecond | count, max, sum | Latency of individual removes that are part of merge operations |
vds.filestor.allstripes.throttled_rpc_direct_dispatches |
instance | rate | Number of times an RPC thread could not directly dispatch an async operation directly to Proton because it was disallowed by the throttle policy |
vds.filestor.allstripes.throttled_persistence_thread_polls |
instance | rate | Number of times a persistence thread could not immediately dispatch a queued async operation because it was disallowed by the throttle policy |
vds.filestor.allstripes.timeouts_waiting_for_throttle_token |
instance | rate | Number of times a persistence thread timed out waiting for an available throttle policy token |
vds.filestor.allthreads.put.count |
operation | rate | Number of requests processed. |
vds.filestor.allthreads.put.failed |
operation | rate | Number of failed requests. |
vds.filestor.allthreads.put.test_and_set_failed |
operation | rate | Number of operations that were skipped due to a test-and-set condition not met |
vds.filestor.allthreads.put.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.put.request_size |
byte | count, max, sum | Size of requests, in bytes |
vds.filestor.allthreads.remove.count |
operation | rate | Number of requests processed. |
vds.filestor.allthreads.remove.failed |
operation | rate | Number of failed requests. |
vds.filestor.allthreads.remove.test_and_set_failed |
operation | rate | Number of operations that were skipped due to a test-and-set condition not met |
vds.filestor.allthreads.remove.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.remove.request_size |
byte | count, max, sum | Size of requests, in bytes |
vds.filestor.allthreads.get.count |
operation | rate | Number of requests processed. |
vds.filestor.allthreads.get.failed |
operation | rate | Number of failed requests. |
vds.filestor.allthreads.get.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.get.request_size |
byte | count, max, sum | Size of requests, in bytes |
vds.filestor.allthreads.update.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.update.failed |
request | rate | Number of failed requests. |
vds.filestor.allthreads.update.test_and_set_failed |
request | rate | Number of requests that were skipped due to a test-and-set condition not met |
vds.filestor.allthreads.update.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.update.request_size |
byte | count, max, sum | Size of requests, in bytes |
vds.filestor.allthreads.createiterator.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.createiterator.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.visit.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.visit.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.remove_location.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.remove_location.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.splitbuckets.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.joinbuckets.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.deletebuckets.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.deletebuckets.failed |
request | rate | Number of failed requests. |
vds.filestor.allthreads.deletebuckets.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.remove_by_gid.count |
request | rate | Number of requests processed. |
vds.filestor.allthreads.remove_by_gid.failed |
request | rate | Number of failed requests. |
vds.filestor.allthreads.remove_by_gid.latency |
millisecond | count, max, sum | Latency of successful requests. |
vds.filestor.allthreads.setbucketstates.count |
request | rate | Number of requests processed. |
vds.mergethrottler.averagequeuewaitingtime |
millisecond | count, max, sum | Time merges spent in the throttler queue |
vds.mergethrottler.queuesize |
instance | count, max, sum | Length of merge queue |
vds.mergethrottler.active_window_size |
instance | count, max, sum | Number of merges active within the pending window size |
vds.mergethrottler.estimated_merge_memory_usage |
byte | count, max, sum | An estimated upper bound of the memory usage (in bytes) of the merges currently in the active window |
vds.mergethrottler.bounced_due_to_back_pressure |
instance | rate | Number of merges bounced due to resource exhaustion back-pressure |
vds.mergethrottler.locallyexecutedmerges.ok |
instance | rate | The number of successful merges for 'locallyexecutedmerges' |
vds.mergethrottler.mergechains.ok |
operation | rate | The number of successful merges for 'mergechains' |
vds.mergethrottler.mergechains.failures.busy |
operation | rate | The number of merges that failed because the storage node was busy |
vds.mergethrottler.mergechains.failures.total |
operation | rate | Sum of all failures |
vds.server.network.tls-handshakes-failed |
operation | count | Number of client or server connection attempts that failed during TLS handshaking |
failure | count | Number of TLS connection attempts failed due to bad or missing peer certificate credentials | |
vds.server.network.client.tls-connections-established |
connection | count | Number of secure mTLS connections established |
vds.server.network.server.tls-connections-established |
connection | count | Number of secure mTLS connections established |
vds.server.network.client.insecure-connections-established |
connection | count | Number of insecure (plaintext) connections established |
vds.server.network.server.insecure-connections-established |
connection | count | Number of insecure (plaintext) connections established |
vds.server.network.tls-connections-broken |
connection | count | Number of TLS connections broken due to failures during frame encoding or decoding |
vds.server.network.failed-tls-config-reloads |
failure | count | Number of times background reloading of TLS config has failed |
vds.server.network.rpc-capability-checks-failed |
failure | count | Number of RPC operations that failed to due one or more missing capabilities |
vds.server.network.status-capability-checks-failed |
failure | count | Number of status page operations that failed to due one or more missing capabilities |
vds.server.fnet.num-connections |
connection | count | Total number of connection objects |