Name | Description | Unit |
---|---|---|
vds.idealstate.buckets_rechecking |
The number of buckets that we are rechecking for ideal state operations | bucket |
vds.idealstate.idealstate_diff |
A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state | bucket |
vds.idealstate.buckets_toofewcopies |
The number of buckets the distributor controls that have less than the desired redundancy | bucket |
vds.idealstate.buckets_toomanycopies |
The number of buckets the distributor controls that have more than the desired redundancy | bucket |
vds.idealstate.buckets |
The number of buckets the distributor controls | bucket |
vds.idealstate.buckets_notrusted |
The number of buckets that have no trusted copies. | bucket |
vds.idealstate.bucket_replicas_moving_out |
Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. | bucket |
vds.idealstate.bucket_replicas_copying_out |
Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge | bucket |
vds.idealstate.bucket_replicas_copying_in |
Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for | bucket |
vds.idealstate.bucket_replicas_syncing |
Bucket replicas that need syncing due to mismatching metadata | bucket |
vds.idealstate.max_observed_time_since_last_gc_sec |
Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. | second |
vds.idealstate.delete_bucket.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.delete_bucket.done_failed |
The number of operations that failed | operation |
vds.idealstate.delete_bucket.pending |
The number of operations pending | operation |
vds.idealstate.delete_bucket.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.delete_bucket.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.idealstate.merge_bucket.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.merge_bucket.done_failed |
The number of operations that failed | operation |
vds.idealstate.merge_bucket.pending |
The number of operations pending | operation |
vds.idealstate.merge_bucket.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.merge_bucket.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.idealstate.merge_bucket.source_only_copy_changed |
The number of merge operations where source-only copy changed | operation |
vds.idealstate.merge_bucket.source_only_copy_delete_blocked |
The number of merge operations where delete of unchanged source-only copies was blocked | operation |
vds.idealstate.merge_bucket.source_only_copy_delete_failed |
The number of merge operations where delete of unchanged source-only copies failed | operation |
vds.idealstate.split_bucket.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.split_bucket.done_failed |
The number of operations that failed | operation |
vds.idealstate.split_bucket.pending |
The number of operations pending | operation |
vds.idealstate.split_bucket.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.split_bucket.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.idealstate.join_bucket.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.join_bucket.done_failed |
The number of operations that failed | operation |
vds.idealstate.join_bucket.pending |
The number of operations pending | operation |
vds.idealstate.join_bucket.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.join_bucket.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.idealstate.garbage_collection.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.garbage_collection.done_failed |
The number of operations that failed | operation |
vds.idealstate.garbage_collection.pending |
The number of operations pending | operation |
vds.idealstate.garbage_collection.documents_removed |
Number of documents removed by GC operations | document |
vds.idealstate.garbage_collection.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.garbage_collection.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.distributor.puts.latency |
The latency of put operations | millisecond |
vds.distributor.puts.ok |
The number of successful put operations performed | operation |
vds.distributor.puts.failures.total |
Sum of all failures | operation |
vds.distributor.puts.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.puts.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.puts.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.puts.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.puts.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.puts.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.puts.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.puts.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.puts.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.puts.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.puts.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.removes.latency |
The latency of remove operations | millisecond |
vds.distributor.removes.ok |
The number of successful removes operations performed | operation |
vds.distributor.removes.failures.total |
Sum of all failures | operation |
vds.distributor.removes.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.removes.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.removes.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.removes.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.removes.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.removes.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.removes.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.removes.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.removes.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.removes.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.removes.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.updates.latency |
The latency of update operations | millisecond |
vds.distributor.updates.ok |
The number of successful updates operations performed | operation |
vds.distributor.updates.failures.total |
Sum of all failures | operation |
vds.distributor.updates.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.updates.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.updates.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.updates.diverging_timestamp_updates |
Number of updates that report they were performed against divergent version timestamps on different replicas | operation |
vds.distributor.updates.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.updates.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.updates.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.updates.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.updates.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.updates.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.updates.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.updates.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.updates.fast_path_restarts |
Number of safe path (write repair) updates that were restarted as fast path updates because all replicas returned documents with the same timestamp in the initial read phase | operation |
vds.distributor.removelocations.ok |
The number of successful removelocations operations performed | operation |
vds.distributor.removelocations.failures.total |
Sum of all failures | operation |
vds.distributor.removelocations.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.removelocations.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.removelocations.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.removelocations.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.removelocations.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.removelocations.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.removelocations.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.removelocations.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.removelocations.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.removelocations.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.removelocations.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.removelocations.latency |
The average latency of removelocations operations | millisecond |
vds.distributor.gets.latency |
The average latency of gets operations | millisecond |
vds.distributor.gets.ok |
The number of successful gets operations performed | operation |
vds.distributor.gets.failures.total |
Sum of all failures | operation |
vds.distributor.gets.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.gets.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.gets.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.gets.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.gets.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.gets.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.gets.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.gets.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.gets.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.gets.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.gets.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.visitor.latency |
The average latency of visitor operations | millisecond |
vds.distributor.visitor.ok |
The number of successful visitor operations performed | operation |
vds.distributor.visitor.failures.total |
Sum of all failures | operation |
vds.distributor.visitor.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.visitor.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.visitor.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.visitor.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.visitor.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.visitor.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.visitor.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.visitor.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.visitor.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.visitor.bytes_per_visitor |
The number of bytes visited on content nodes as part of a single client visitor command | operation |
vds.distributor.visitor.docs_per_visitor |
The number of documents visited on content nodes as part of a single client visitor command | operation |
vds.distributor.visitor.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.visitor.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.docsstored |
Number of documents stored in all buckets controlled by this distributor | document |
vds.distributor.bytesstored |
Number of bytes stored in all buckets controlled by this distributor | byte |
metricmanager.periodichooklatency |
Time in ms used to update a single periodic hook | millisecond |
metricmanager.resetlatency |
Time in ms used to reset all metrics. | millisecond |
metricmanager.sleeptime |
Time in ms worker thread is sleeping | millisecond |
metricmanager.snapshothooklatency |
Time in ms used to update a single snapshot hook | millisecond |
metricmanager.snapshotlatency |
Time in ms used to take a snapshot | millisecond |
vds.distributor.activate_cluster_state_processing_time |
Elapsed time where the distributor thread is blocked on merging pending bucket info into its bucket database upon activating a cluster state | millisecond |
vds.distributor.bucket_db.memory_usage.allocated_bytes |
The number of allocated bytes | byte |
vds.distributor.bucket_db.memory_usage.dead_bytes |
The number of dead bytes (<= used_bytes) | byte |
vds.distributor.bucket_db.memory_usage.onhold_bytes |
The number of bytes on hold | byte |
vds.distributor.bucket_db.memory_usage.used_bytes |
The number of used bytes (<= allocated_bytes) | byte |
vds.distributor.getbucketlists.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.getbucketlists.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.getbucketlists.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.getbucketlists.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.getbucketlists.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.getbucketlists.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.getbucketlists.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.getbucketlists.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.getbucketlists.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.getbucketlists.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.getbucketlists.failures.total |
Total number of failures | operation |
vds.distributor.getbucketlists.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.getbucketlists.latency |
The average latency of getbucketlists operations | millisecond |
vds.distributor.getbucketlists.ok |
The number of successful getbucketlists operations performed | operation |
vds.distributor.recoverymodeschedulingtime |
Time spent scheduling operations in recovery mode after receiving new cluster state | millisecond |
vds.distributor.set_cluster_state_processing_time |
Elapsed time where the distributor thread is blocked on processing its bucket database upon receiving a new cluster state | millisecond |
vds.distributor.state_transition_time |
Time it takes to complete a cluster state transition. If a state transition is preempted before completing, its elapsed time is counted as part of the total time spent for the final, completed state transition | millisecond |
vds.distributor.stats.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.stats.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.stats.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.stats.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.stats.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.stats.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.stats.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.stats.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.stats.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.stats.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.stats.failures.total |
The total number of failures | operation |
vds.distributor.stats.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.stats.latency |
The average latency of stats operations | millisecond |
vds.distributor.stats.ok |
The number of successful stats operations performed | operation |
vds.distributor.update_gets.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.update_gets.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.update_gets.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.update_gets.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.update_gets.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.update_gets.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.update_gets.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.update_gets.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.update_gets.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.update_gets.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.update_gets.failures.total |
The total number of failures | operation |
vds.distributor.update_gets.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.update_gets.latency |
The average latency of update_gets operations | millisecond |
vds.distributor.update_gets.ok |
The number of successful update_gets operations performed | operation |
vds.distributor.update_metadata_gets.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.update_metadata_gets.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.update_metadata_gets.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.update_metadata_gets.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.update_metadata_gets.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.update_metadata_gets.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.update_metadata_gets.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.update_metadata_gets.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.update_metadata_gets.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.update_metadata_gets.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.update_metadata_gets.failures.total |
The total number of failures | operation |
vds.distributor.update_metadata_gets.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.update_metadata_gets.latency |
The average latency of update_metadata_gets operations | millisecond |
vds.distributor.update_metadata_gets.ok |
The number of successful update_metadata_gets operations performed | operation |
vds.distributor.update_puts.failures.busy |
The number of messages from storage that failed because the storage node was busy | operation |
vds.distributor.update_puts.failures.concurrent_mutations |
The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID | operation |
vds.distributor.update_puts.failures.inconsistent_bucket |
The number of operations failed due to buckets being in an inconsistent state or not found | operation |
vds.distributor.update_puts.failures.notconnected |
The number of operations discarded because there were no available storage nodes to send to | operation |
vds.distributor.update_puts.failures.notfound |
The number of operations that failed because the document did not exist | operation |
vds.distributor.update_puts.failures.notready |
The number of operations discarded because distributor was not ready | operation |
vds.distributor.update_puts.failures.safe_time_not_reached |
The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed | operation |
vds.distributor.update_puts.failures.storagefailure |
The number of operations that failed in storage | operation |
vds.distributor.update_puts.failures.test_and_set_failed |
The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document | operation |
vds.distributor.update_puts.failures.timeout |
The number of operations that failed because the operation timed out towards storage | operation |
vds.distributor.update_puts.failures.total |
The total number of put failures | operation |
vds.distributor.update_puts.failures.wrongdistributor |
The number of operations discarded because they were sent to the wrong distributor | operation |
vds.distributor.update_puts.latency |
The average latency of update_puts operations | millisecond |
vds.distributor.update_puts.ok |
The number of successful update_puts operations performed | operation |
vds.idealstate.nodes_per_merge |
The number of nodes involved in a single merge operation. | node |
vds.idealstate.set_bucket_state.blocked |
The number of operations blocked by blocking operation starter | operation |
vds.idealstate.set_bucket_state.done_failed |
The number of operations that failed | operation |
vds.idealstate.set_bucket_state.done_ok |
The number of operations successfully performed | operation |
vds.idealstate.set_bucket_state.pending |
The number of operations pending | operation |
vds.idealstate.set_bucket_state.throttled |
The number of operations throttled by throttling operation starter | operation |
vds.bouncer.clock_skew_aborts |
Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range | operation |