• [+] expand all

Container Tuning

A collection of configuration parameters to tune the Container as used in Vespa. Some configuration parameters have native services.xml support while others are configured through generic config overrides.

Container worker threads

The container uses multiple thread pools for its operations. Most components including request handlers use the container's default thread pool, which is controlled by a shared executor instance. Any component can utilize the default pool by injecting an java.util.concurrent.Executor instance. Some built-in components have dedicated thread pools - such as the Jetty server and the search handler. These thread pools are injected through special wiring in the config model and are not easily accessible from other components.

The thread pools are by default scaled on the system resources as reported by the JVM (Runtime.getRuntime().availableProcessors()). It's paramount that the -XX:ActiveProcessorCount/jvm_availableProcessors configuration is correct for the container to work optimally. The default thread pool configuration can be overridden through services.xml. We recommend you keep the default configuration as it's tuned to work across a variety of workloads. Note that the default configuration and pool usage may change between minor versions.

The container will pre-start the minimum number of worker threads, so even an idle container may report running several hundred threads. The thread pool is pre-started with the number of thread specified in the threads parameter. Note that tuning the capacity upwards increases the risk of high GC pressure as concurrency becomes higher with more in-flight requests. The GC pressure is a function of number of in-flight requests, the time it takes to complete the request and the amount of garbage produced per request. Increasing the queue size will allow the application to handle shorter traffic bursts without rejecting requests, although increasing the average latency for those requests that are queued up. Large queues will also increase heap consumption in overload situations. Extra threads will be created once the queue is full (when boost is specified), and are destroyed after an idle timeout. If all threads are occupied, requests are rejected with a 503 response.

The effective thread pool configuration and utilization statistics can be observed through the Container Metrics. See Thread Pool Metrics for a list of metrics exported.

Lower limit

The container will override any configuration if the effective value is below a fixed minimum. This is to reduce the risk of certain deadlock scenarios and improve concurrency for low-resource environments.
  • Minimum 8 threads.
  • Minimum 650 queue capacity (if queue is not disabled).


<container id="container" version="1.0">

        <!-- Search handler thread pool -->
            <threads boost="12">4</threads>

    <!-- Default thread pool -->
    <config name="container.handler.threadpool">


JVM heap size

Change the default JVM heap size settings used by Vespa to better suit the specific hardware settings or application requirements.

By setting the relative size of the total JVM heap in percentage of available memory, one does not know exactly what the heap size will be, but the configuration will be adaptable and ensure that the container can start even in environments with less available memory. The example below allocates 50% of available memory on the machine to the JVM heap:

<container id="container" version="1.0">
        <jvm allocated-memory="50%" />
        <node hostalias="node0" />

JVM Tuning

Use gc-options for controlling GC related parameters and options for tuning other parameters. See reference documentation. Example: Running with 4 GB heap using G1 garbage collector and using NewRatio = 1 (equal size of old and new generation) and enabling verbose GC logging (logged to stdout to vespa.log file).

<container id="default" version="1.0">
        <jvm options="-Xms4g -Xmx4g -XX:+PrintCommandLineFlags -XX:+PrintGC"
             gc-options="-XX:+UseG1GC -XX:MaxTenuringThreshold=15" />
        <node hostalias="node0" />

The default heap size with docker image is 1.5g which can for high throughput applications be on the low side, causing frequent garbage collection. By default, the G1GC collector is used.

Config Server and Config Proxy

The config server and proxy are not executed based on the model in services.xml. On the contrary, they are used to bootstrap the services in that model. Consequently, one must use configuration variables to set the JVM parameters for the config server and config proxy. They also need to be restarted (services in the config proxy's case) after a change, but one does not need to vespa prepare or vespa activate first. Example:


Refer to Setting Vespa variables.

Container warmup

Some applications observe that the first queries made to a freshly started container take a long time to complete. This is typically due to some components performing lazy setup of data structures or connections. Lazy initialization should be avoided in favor of eager initialization in component constructor, but this is not always possible.

A way to avoid problems with the first queries in such cases is to perform warmup queries at startup. This is done by issuing queries from the constructor of the Handler of regular queries. If using the default handler, com.yahoo.search.handler.SearchHandler, subclass this and configure your subclass as the handler of query requests in services.xml.

Add a call to a warmupQueries() method as the last line of your handler constructor. The method can look something like this:

private void warmupQueries() {
    String[] requestUris = new String[] {"warmupRequestUri1", "warmupRequestUri2"};
    int warmupIterations = 50;

    for (int i = 0; i < warmupIterations; i++) {
        for (String requestUri : requestUris) {
            handle(HttpRequest.createTestRequest(requestUri, com.yahoo.jdisc.http.HttpRequest.Method.GET));

Since these queries will be executed before the container starts accepting external queries, they will cause the first external queries to observe a warmed up container instance.

Use metrics.ignore in the warmup queries to eliminate them from being reported in metrics.