Statistics API - Generating statistics in a plugin

This document describes how to use the statistics module in Vespa and includes an example Container plugin which uses the statistics module.

The goal of the statistics module is making it possible to gather measurements of a component with minimal effort. Collected data are logged at given intervals. Processing and logging is done in the background by the library, measurements are sent to the library by Handle instances. A client obtains and initializes a (subclass of) Handle, then collects data using that handle.

Events are registered by name. If two events are registered using the same name, the last one registered will be used. All statistics objects are thread-safe.

Refer to Statistics API doc for JavaDoc of the different classes.

Write a plugin using statistics

Using the statistics library is supposed to be as non-intrusive in code as possible. Sampling a value is supposed to be cheap. It is necessary to supply a Statistics manager into a handle during construction, the manager instance is acquired through dependency injection in the container. Example searcher:

package com.mydomain.example;

import com.yahoo.search.Query;
import com.yahoo.search.Result;
import com.yahoo.search.searchchain.Execution;
import com.yahoo.prelude.query.*;
import com.yahoo.statistics.Value;
import com.yahoo.statistics.HistogramType;
import com.yahoo.statistics.Statistics;
import java.util.Iterator;

public class QueryDepthSearcher extends Searcher {

    private final Value queryDepth;

    public QueryDepthSearcher(Statistics manager) {
        // Initialize new statistics value handle:
        // - the event name is "query_depth"
        // - Do not read config
        // - Do use a histogram split at 0.0, 1.0, 2.0 and 3.0
        // - Use a regular histogram, not some sort of cumulative histogram
        queryDepth = new Value("query_depth", manager,
                new Value.Parameters()
                        .setLogHistogram(true)
                        .setLimits(new Limits(new double[] { 0, 1, 2, 3 })
                        .setHistogramId(HistogramType.REGULAR)
                        // Do log mean value pr logging interval
                        .setLogMean(true)
                        // Log mean with the name "query_depth", not "query_depth.mean"
                        .setNameExtension(false)));
    }

    @Override
    public Result search(Query query, Execution execution) {
        int depth = stackDepth(0, query.getModel().getQueryTree().getRoot());
        queryDepth.put(depth);

        return execution.search(query);
    }

    /**
     * Helper function to find depth of query trees. Just to have
     * something to sample.
     */
    private int stackDepth(int i, Item root) {
        if (root == null) {
            return i;
        }
        if (root instanceof CompositeItem) {
            int maxDepth = i;
            for (Iterator<Item> j = ((CompositeItem) root).getItemIterator();
                    j.hasNext();) {
                int newDepth = stackDepth(i+1, j.next());
                maxDepth = Math.max(maxDepth, newDepth);
            }
            return maxDepth;
        } else {
            return i;
        }
    }
}
Above is a complete Container plugin. The only thing it does is sample the depth of the query tree, and logging the mean depth along with a histogram for the depths.

Note that it is often cleaner to get the parameters for a Value or Counter instance from the custom configuration classes than by overriding the contents of the statistics configuration. I.e. simply read the desired values from the custom configuration, and initialize the Value (or other class) using a standard constructor ignoring configuration.

Value Class

The Value class is a Handle subclass which represents sampled values which will fluctuate, as contradistinguished from monotone growth, over time. The values may be handled as raw values which the surrounding code simply sets to some appropriate level, or a bucket that the surrounding code fills with samples – for which derived values, such as the mean value, is logged.

The default behavior is logging nothing. Everything emitted to the log must be stated in the Value.Parameters instance or config.

A Value instance can be initialized with a set of limits to split a histogram for the samples. In other words, using the limits { 0, 1, 2 } will lead to the histogram having four buckets, covering the intervals { <-∞, 0>, [0, 1>, [1, 2>, [2, ∞> }. Samples put into the Value instance will be counted in the correct bucket automatically, and the histogram will be sent to the log and reset for each logging interval.

Apart from the raw/last value sampled and a histogram for the value distribution, the minimum, maximum and mean value can be logged.

Value instances use the log events value and histogram.

Counter class

The Counter class is a subclass of Handle which represents integer values increasing either over the lifetime of a process, or for each logging interval. A Counter may be incremented by 1, or by an arbitrary integer.

ResetCounter

A counter may be reset between each logging interval. It is then output to the log as a value event instead of a count. This is chosen either by using a constructor argument or using config. Counter events use the log events count and value.

ValueGroup class

A ValueGroup represents a set of associated Value instances. Each time a variable is sampled, the name of the Value in question is passed to the ValueGroup along with the new sample.

A ValueGroup only supports logging the raw value, not derived statistics (histogram, min, etc). In the log, the name for each value will be nameOfValueGroup.nameOfValue.

A ValueGroup is marginally more expensive than a Value, since each sample is a Map lookup to find the correct Value instance.

CounterGroup class

A CounterGroup represents a set of associated Counter instances. Each time a Counter for a CounterGroup is incremented, the name of the Counter in question is passed to the CounterGroup along with the increment.

A CounterGroup is marginally more expensive than a Counter, since each sample is a Map lookup to find the correct Counter instance.

Callbacks

It is sometimes useful to do some processing right before data is dumped to log, or simply only then perform measurements. The statistics library supports this by having a callback slot in all objects used when collecting data. The callback is called immediately before the object logs its values. As the callback is run in the worker thread of the statistics library, a callback must be lightweight.

Example code snippet which obtains the number of currently running worker threads:

private final Value activeQueries; // raw measure every 5 minutes

// Callback to measure queries in flight every five minutes
private static class ActivitySampler implements Callback {
  public void run(Handle h, boolean firstRun) {
        if (firstRun) {
            // don't insert a value, ensure counter is reset at reconfig
            return;
        }
        // Sample the current number of currently active queries
        // in progress in the container.
        ((Value) h).put(Server.get().searchQueriesInFlight());
  }
}

public StatisticsSearcher(Statistics manager) {
  activeQueries = new Value("active_queries", manager,
          new Value.Parameters().setLogRaw(true)
                  .setCallback(new ActivitySampler()));
First a Value instance is obtained, then a callback is inserted, and the value is ready for use. The callback will be activated automatically every 5 minutes (or every configured n minutes).

The Callback interface defines two arguments which are passed to the implementing object. The Handle argument is the statistics handle which called the callback, typically a Value or Counter owning the callback object. The second, the boolean, is set to true the first time, and only the first time, a callback is called from the given handle. The reason for adding this information is in case the callback emits a log event to the log itself (even though that would be unusual) and the first emitted value should be a well-defined special case, e.g. a hard-coded zero value.

Configuring statistics components

Statistics events can be configured either using custom config classes in conjunction with programmatic setup, or overriding global arrays. For most cases, using custom config classes is desirable.

Custom config classes

Using custom configuration classes has two huge advantages over overriding values in the global configuration:

  • It is easier to maintain small, specifically modelled custom config classes than the contents of a very general array in the configuration.
  • By using custom config classes, fewer components have to be re-initialized during reconfiguration. A very large number of components depend on Statistics, changing some global statistics configuration, will force all these instances to be constructed once more. Changing a configuration only used a single place, will only lead to that component (and any dependants) being reconstructed.
Assume a searcher plugin which tracks the number of hits returned each logging interval, and optionally tracks the mean (number of hits returned). First, one would create config definition as outlined in the documentation for general configurable components. The config definition file could be named hitrates.def and might look like:

package=com.mydomain.example

## Switch to control whether to log mean number of hits per query
logmean bool default=false
Then, we need a search which actually does the measurements, and also reads the configuration:
package com.mydomain.example;

import com.yahoo.search.Query;
import com.yahoo.search.Result;
import com.yahoo.search.Searcher;
import com.yahoo.search.searchchain.Execution;
import com.yahoo.statistics.Statistics;
import com.yahoo.statistics.Value;

/**
 * An example searcher measuring the total number of documents fetched each
 * logging interval, and optionally the mean number of documents per query.
 */
public class HitratesMeasurer extends Searcher {
    private final Value rates;

    public HitratesMeasurer(final Statistics manager,
            final HitratesConfig config) {
        // Create a Value object which always logs the sum, but whether the mean
        // is to be logged, is fetched from config. Activate name extension to
        // allow for multiple values being logged, using a suffix to distinguish
        // them.
        rates = new Value("hit_rates", manager, new Value.Parameters()
                .setNameExtension(true).setLogSum(true)
                .setLogMean(config.logmean()));
    }

    @Override
    public Result search(final Query query, final Execution execution) {
        final Result result = execution.search(query);
        // Measure the number of documents returned to the user
        rates.put(result.getConcreteHitCount());
        return result;
    }
}
It is then possible to turn logging of the mean event on and off, without having to maintain the entire control array in the statistics configuration.

Using the global configuration

In general, one should usually employ smaller custom configuration classes as described above. If a single part of the global statistics config changes, all components emitting statistics events have to be constructed once more, making reconfiguration very expensive. Also, maintaining the arrays inside the global configuration is very cumbersome, compared to changing specific structures inside custom configuration classes.

Below follows a version of the config definition for the statistics library, found in statistics.def

<config name="container.statistics">
<!-- Interval between internal sample points measured in minutes -->
<collectionintervalsec>5</collectionintervalsec>

<!-- Interval between each dump to log measured in minutes -->
<loggingintervalsec>5</loggingintervalsec>

<!-- Configuration for measured values -->
<values index="0">
  <!-- A string which uniquely identifies the event in the log -->
  <name>example_event</name>

  <!-- An array of what operations should be performed on gathered data -->
  <operations index="0">
    <!-- An enum with the name of the operation to perform in the event. Legal
         values RAW, MEAN, MIN, MAX, REGULAR, CUMULATIVE, REVERSE_CUMULATIVE.
         -->
    <name>RAW</name>

    <!-- Arguments for the operations, typically limits for arrays. -->
    <arguments index="0">
      <key>some_string</key>
      <value>some_string</value>
    </arguments>
  </operations>
</values>

<!-- The counters which should be reset between each collection interval,
and therefore be logged as values. -->
<counterresets index="0">
  <name>name_of_count_to_reset</name>
</counterresets>
</config>
Reconfiguring a component is done by adding the config tag with the name statistics to the appropriate component in services.xml. The following example shows how to change the measurement and logging intervals to get measurements each minute from all containers:
<!-- other config… -->
<container>
  <search>
    <config name="container.statistics">
      <collectionintervalsec>1</collectionintervalsec>
      <loggingintervalsec>1</loggingintervalsec>
    </config>
    <!-- other config… -->
  </search>
<!-- other config… -->
</container>
This is the complete content of statistics.cfg when the only desired change is getting the statistics events every minute. Obviously this will only affect components using this library.

Logging intervals

Data is dumped to log every n minutes. The first logging event is scheduled as soon as possible, but at m×n minutes after 1970-00-00 00:00, where m is an integer and leap seconds are ignored. The default value of n is 5, leading to data being logged 0, 5, 10, 15, 20, … minutes past each hour.

Reconfiguring logging intervals

At the time of writing, collectionintervalsec has to be set equal to loggingintervalsec, as there currently is no internal extra data aggregation. So, to make all containers log events every minute, the following example in services.xml should do the trick:

<!-- other config… -->
<container>
  <config name="container.statistics">
    <collectionintervalsec>1</collectionintervalsec>
    <loggingintervalsec>1</loggingintervalsec>
  </config>
   <!-- other config… -->
</container>
<!-- other config… -->

Deactivating event logging

The one exception to the rule of logging interval always being equal to the collection interval is when deactivating all events, this is done by setting loggingintervalsec to 0. So, to e.g. deactivate all event logging from docproc, something like the following would be appropriate:

<docproc version="3.0">
  <config name="container.statistics">
    <loggingintervalsec>0</loggingintervalsec>
  </config>
<!-- other config… -->
</docproc>