Name | Unit | Description |
---|---|---|
vds.idealstate.buckets_rechecking |
bucket | The number of buckets that we are rechecking for ideal state operations |
vds.idealstate.idealstate_diff |
bucket | A number representing the current difference from the ideal state. This is a number that decreases steadily as the system is getting closer to the ideal state |
vds.idealstate.buckets_toofewcopies |
bucket | The number of buckets the distributor controls that have less than the desired redundancy |
vds.idealstate.buckets_toomanycopies |
bucket | The number of buckets the distributor controls that have more than the desired redundancy |
vds.idealstate.buckets |
bucket | The number of buckets the distributor controls |
vds.idealstate.buckets_notrusted |
bucket | The number of buckets that have no trusted copies. |
vds.idealstate.bucket_replicas_moving_out |
bucket | Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. |
vds.idealstate.bucket_replicas_copying_out |
bucket | Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge |
vds.idealstate.bucket_replicas_copying_in |
bucket | Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for |
vds.idealstate.bucket_replicas_syncing |
bucket | Bucket replicas that need syncing due to mismatching metadata |
vds.idealstate.max_observed_time_since_last_gc_sec |
second | Maximum time (in seconds) since GC was last successfully run for a bucket. Aggregated max value across all buckets on the distributor. |
vds.idealstate.delete_bucket.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.delete_bucket.done_failed |
operation | The number of operations that failed |
vds.idealstate.delete_bucket.pending |
operation | The number of operations pending |
vds.idealstate.delete_bucket.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.delete_bucket.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.idealstate.merge_bucket.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.merge_bucket.done_failed |
operation | The number of operations that failed |
vds.idealstate.merge_bucket.pending |
operation | The number of operations pending |
vds.idealstate.merge_bucket.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.merge_bucket.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.idealstate.merge_bucket.source_only_copy_changed |
operation | The number of merge operations where source-only copy changed |
vds.idealstate.merge_bucket.source_only_copy_delete_blocked |
operation | The number of merge operations where delete of unchanged source-only copies was blocked |
vds.idealstate.merge_bucket.source_only_copy_delete_failed |
operation | The number of merge operations where delete of unchanged source-only copies failed |
vds.idealstate.split_bucket.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.split_bucket.done_failed |
operation | The number of operations that failed |
vds.idealstate.split_bucket.pending |
operation | The number of operations pending |
vds.idealstate.split_bucket.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.split_bucket.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.idealstate.join_bucket.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.join_bucket.done_failed |
operation | The number of operations that failed |
vds.idealstate.join_bucket.pending |
operation | The number of operations pending |
vds.idealstate.join_bucket.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.join_bucket.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.idealstate.garbage_collection.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.garbage_collection.done_failed |
operation | The number of operations that failed |
vds.idealstate.garbage_collection.pending |
operation | The number of operations pending |
vds.idealstate.garbage_collection.documents_removed |
document | Number of documents removed by GC operations |
vds.idealstate.garbage_collection.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.garbage_collection.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.distributor.puts.latency |
millisecond | The latency of put operations |
vds.distributor.puts.ok |
operation | The number of successful put operations performed |
vds.distributor.puts.failures.total |
operation | Sum of all failures |
vds.distributor.puts.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.puts.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.puts.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.puts.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.puts.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.puts.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.puts.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.puts.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.puts.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.puts.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.puts.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.removes.latency |
millisecond | The latency of remove operations |
vds.distributor.removes.ok |
operation | The number of successful removes operations performed |
vds.distributor.removes.failures.total |
operation | Sum of all failures |
vds.distributor.removes.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.removes.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.removes.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.removes.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.removes.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.removes.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.removes.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.removes.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.removes.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.removes.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.removes.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.updates.latency |
millisecond | The latency of update operations |
vds.distributor.updates.ok |
operation | The number of successful updates operations performed |
vds.distributor.updates.failures.total |
operation | Sum of all failures |
vds.distributor.updates.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.updates.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.updates.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.updates.diverging_timestamp_updates |
operation | Number of updates that report they were performed against divergent version timestamps on different replicas |
vds.distributor.updates.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.updates.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.updates.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.updates.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.updates.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.updates.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.updates.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.updates.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.updates.fast_path_restarts |
operation | Number of safe path (write repair) updates that were restarted as fast path updates because all replicas returned documents with the same timestamp in the initial read phase |
vds.distributor.removelocations.ok |
operation | The number of successful removelocations operations performed |
vds.distributor.removelocations.failures.total |
operation | Sum of all failures |
vds.distributor.removelocations.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.removelocations.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.removelocations.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.removelocations.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.removelocations.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.removelocations.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.removelocations.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.removelocations.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.removelocations.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.removelocations.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.removelocations.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.removelocations.latency |
millisecond | The average latency of removelocations operations |
vds.distributor.gets.latency |
millisecond | The average latency of gets operations |
vds.distributor.gets.ok |
operation | The number of successful gets operations performed |
vds.distributor.gets.failures.total |
operation | Sum of all failures |
vds.distributor.gets.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.gets.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.gets.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.gets.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.gets.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.gets.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.gets.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.gets.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.gets.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.gets.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.gets.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.visitor.latency |
millisecond | The average latency of visitor operations |
vds.distributor.visitor.ok |
operation | The number of successful visitor operations performed |
vds.distributor.visitor.failures.total |
operation | Sum of all failures |
vds.distributor.visitor.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.visitor.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.visitor.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.visitor.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.visitor.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.visitor.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.visitor.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.visitor.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.visitor.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.visitor.bytes_per_visitor |
operation | The number of bytes visited on content nodes as part of a single client visitor command |
vds.distributor.visitor.docs_per_visitor |
operation | The number of documents visited on content nodes as part of a single client visitor command |
vds.distributor.visitor.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.visitor.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.docsstored |
document | Number of documents stored in all buckets controlled by this distributor |
vds.distributor.bytesstored |
byte | Number of bytes stored in all buckets controlled by this distributor |
metricmanager.periodichooklatency |
millisecond | Time in ms used to update a single periodic hook |
metricmanager.resetlatency |
millisecond | Time in ms used to reset all metrics. |
metricmanager.sleeptime |
millisecond | Time in ms worker thread is sleeping |
metricmanager.snapshothooklatency |
millisecond | Time in ms used to update a single snapshot hook |
metricmanager.snapshotlatency |
millisecond | Time in ms used to take a snapshot |
vds.distributor.activate_cluster_state_processing_time |
millisecond | Elapsed time where the distributor thread is blocked on merging pending bucket info into its bucket database upon activating a cluster state |
vds.distributor.bucket_db.memory_usage.allocated_bytes |
byte | The number of allocated bytes |
vds.distributor.bucket_db.memory_usage.dead_bytes |
byte | The number of dead bytes (<= used_bytes) |
vds.distributor.bucket_db.memory_usage.onhold_bytes |
byte | The number of bytes on hold |
vds.distributor.bucket_db.memory_usage.used_bytes |
byte | The number of used bytes (<= allocated_bytes) |
vds.distributor.getbucketlists.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.getbucketlists.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.getbucketlists.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.getbucketlists.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.getbucketlists.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.getbucketlists.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.getbucketlists.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.getbucketlists.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.getbucketlists.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.getbucketlists.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.getbucketlists.failures.total |
operation | Total number of failures |
vds.distributor.getbucketlists.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.getbucketlists.latency |
millisecond | The average latency of getbucketlists operations |
vds.distributor.getbucketlists.ok |
operation | The number of successful getbucketlists operations performed |
vds.distributor.recoverymodeschedulingtime |
millisecond | Time spent scheduling operations in recovery mode after receiving new cluster state |
vds.distributor.set_cluster_state_processing_time |
millisecond | Elapsed time where the distributor thread is blocked on processing its bucket database upon receiving a new cluster state |
vds.distributor.state_transition_time |
millisecond | Time it takes to complete a cluster state transition. If a state transition is preempted before completing, its elapsed time is counted as part of the total time spent for the final, completed state transition |
vds.distributor.stats.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.stats.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.stats.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.stats.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.stats.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.stats.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.stats.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.stats.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.stats.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.stats.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.stats.failures.total |
operation | The total number of failures |
vds.distributor.stats.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.stats.latency |
millisecond | The average latency of stats operations |
vds.distributor.stats.ok |
operation | The number of successful stats operations performed |
vds.distributor.update_gets.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.update_gets.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.update_gets.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.update_gets.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.update_gets.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.update_gets.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.update_gets.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.update_gets.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.update_gets.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.update_gets.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.update_gets.failures.total |
operation | The total number of failures |
vds.distributor.update_gets.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.update_gets.latency |
millisecond | The average latency of update_gets operations |
vds.distributor.update_gets.ok |
operation | The number of successful update_gets operations performed |
vds.distributor.update_metadata_gets.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.update_metadata_gets.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.update_metadata_gets.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.update_metadata_gets.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.update_metadata_gets.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.update_metadata_gets.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.update_metadata_gets.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.update_metadata_gets.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.update_metadata_gets.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.update_metadata_gets.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.update_metadata_gets.failures.total |
operation | The total number of failures |
vds.distributor.update_metadata_gets.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.update_metadata_gets.latency |
millisecond | The average latency of update_metadata_gets operations |
vds.distributor.update_metadata_gets.ok |
operation | The number of successful update_metadata_gets operations performed |
vds.distributor.update_puts.failures.busy |
operation | The number of messages from storage that failed because the storage node was busy |
vds.distributor.update_puts.failures.concurrent_mutations |
operation | The number of operations that were transiently failed due to a mutating operation already being in progress for its document ID |
vds.distributor.update_puts.failures.inconsistent_bucket |
operation | The number of operations failed due to buckets being in an inconsistent state or not found |
vds.distributor.update_puts.failures.notconnected |
operation | The number of operations discarded because there were no available storage nodes to send to |
vds.distributor.update_puts.failures.notfound |
operation | The number of operations that failed because the document did not exist |
vds.distributor.update_puts.failures.notready |
operation | The number of operations discarded because distributor was not ready |
vds.distributor.update_puts.failures.safe_time_not_reached |
operation | The number of operations that were transiently failed due to them arriving before the safe time point for bucket ownership handovers has passed |
vds.distributor.update_puts.failures.storagefailure |
operation | The number of operations that failed in storage |
vds.distributor.update_puts.failures.test_and_set_failed |
operation | The number of mutating operations that failed because they specified a test-and-set condition that did not match the existing document |
vds.distributor.update_puts.failures.timeout |
operation | The number of operations that failed because the operation timed out towards storage |
vds.distributor.update_puts.failures.total |
operation | The total number of put failures |
vds.distributor.update_puts.failures.wrongdistributor |
operation | The number of operations discarded because they were sent to the wrong distributor |
vds.distributor.update_puts.latency |
millisecond | The average latency of update_puts operations |
vds.distributor.update_puts.ok |
operation | The number of successful update_puts operations performed |
vds.idealstate.nodes_per_merge |
node | The number of nodes involved in a single merge operation. |
vds.idealstate.set_bucket_state.blocked |
operation | The number of operations blocked by blocking operation starter |
vds.idealstate.set_bucket_state.done_failed |
operation | The number of operations that failed |
vds.idealstate.set_bucket_state.done_ok |
operation | The number of operations successfully performed |
vds.idealstate.set_bucket_state.pending |
operation | The number of operations pending |
vds.idealstate.set_bucket_state.throttled |
operation | The number of operations throttled by throttling operation starter |
vds.bouncer.clock_skew_aborts |
operation | Number of client operations that were aborted due to clock skew between sender and receiver exceeding acceptable range |