Example getting a metric value from using the prometheus endpoint:
$ curl -s http://ENDPOINT/prometheus/v1/values/?consumer=vespa | \ grep "vds.idealstate.merge_bucket.pending.average" | egrep -v 'HELP|TYPE'
Example getting a metric value using /metrics/v2/values:
$ curl ENDPOINT/metrics/v2/values | \ jq -r -c ' .nodes | .hostname as $h | .services.metrics | select(.values."content.proton.documentdb.documents.total.last") | [$h, .dimensions.documenttype, .values."content.proton.documentdb.documents.total.last"] | @tsv' node9.vespanet music 0 node8.vespanet music 0
Metrics in Vespa are generated from services running on the individual nodes, and in many cases have many recordings per metric, from within each node, with unique tag / dimension combinations. These recordings need to be put together to contribute to the overall picture of how the system is behaving. If this is done the right way you will be able to “zoom out” to get the bigger picture, or to “zoom in” to see how things behave in more detail. This is very useful when looking into possible production issues. Unfortunately it is easy to combine metrics the wrong way, resulting in potentially significantly distorted graphs.
For each of the values (suffixes) available for the different metrics here is how we recommend that you aggregate them to get the best use of them. The guidelines should be used both for aggregations over time (multiple snapshot intervals) and over tag combinations.
Use the highest value available
Use the lowest value available
Use the sum of all values
Use the sum of all values
To generate an average value you want to do
Avoid this except for metrics you expect to be stable, such as amount of memory available on a node, etc. This value is the last from a metrics snapshot period, hence basically a single value picked from all values during the snapshot period. Typically very noisy for volatile metrics. It does not make sense to aggregate on this value at all, but if you must then choose a value with the same combination of tags over time.
This value cannot be aggregated in a way that gives a mathematically correct value. But where you have to
either compute the average value for the most realistic value,
Same as for the
Node metrics in /metrics/v1/values are listed per service, with a set of system metrics - example:
Vespa metric-set has a richer set of metrics, see
Example minimal metric-set; system metric-set + a specific metric:
Example default metric-set and more; system metric-set + default metric-set + a built-in metric:
The names of metrics emitted by Vespa typically follow this naming scheme:
<prefix>.<service>.<component>.<suffix>. The separator (
. here) may differ for
different metrics integrations. Similarly, the
<prefix> string may differ depending on your configuration.
Further some metrics have several levels of
component names. Each metric will have a number of values associated
with them, one for each
suffix provided by the metric. Typical suffixes include
Metrics from the container with description and unit can be found in the container metrics reference. The most commonly used metrics are mentioned below.
These metrics are output for the server as a whole, e.g. related to resources.
Some metrics indicate memory usage, such as
Other metics are related to the JVM garbage collection,
Metrics for the container thread pools.
jdisc.thread_pool.* metrics have a dimension
threadpool with thread pool name,
e.g default-pool for the container's default thread pool.
See Container Tuning for details.
These are metrics specific for HTTP. Those metrics that are specific to a connector will have a dimension containing the TCP listen port.
Refer to Container Metrics
for metrics on HTTP status response codes,
http.status.* or more detailed requests related to the handling of requests,
Other relevant metrics include
For metrics related to queries please start with the
httpapi_* metrics for more insights.
For metrics related to feeding into Vespa,
we recommend using the
Each of the services running in a Vespa installation maintains and reports a number of metrics.
Metrics from the container services are the most commonly used, and are listed in Container Metrics. You will find the metrics available there, with description and unit.
Find a full example in the album-recommendation-java sample application.
I have two different libraries that are running as components with their own threads within the vespa container. We are injecting MetricReceiver to each library. After injecting the receiver we store the reference to this receiver in a container-wide object so that they can be used inside these libraries (the libraries each have several classes and such, so it is not possible to inject the receiver every time and we need to use the stored reference). Questions:
Q: Is the MetricReceiver object unique within the container? That is, if I am injecting the receiver to two different components, is always the same object getting injected?
A: Yes, you get the same object.
Q: How long does an object remain valid? Does the same object remain valid for the life of the container (meaning from container booting up to the point of restart/shutdown) or can the object change? I ask this because we store the reference to the receiver at a common place so that it can be used to emit metrics elsewhere in the library where we can’t inject it, so I am wondering how frequently we need to update this reference.
A: It remains valid for the lifetime of the component to which it got injected. Therefore, if you share component references through some other mean than direct or indirect injection you may end up with invalid references. A "container-wide object" sounds like trouble. You should have it injected into all the components that needs it instead. Or, if you feel that will be too fine-grained, create one large object which gets these things injected, and then have that injected into all components that need the common stuff.