A collection of configuration parameters to tune the Container as used in Vespa. Some configuration parameters have native services.xml support while others are configured through generic config overrides.
The container uses multiple thread pools for its operations.
Most components including request handlers use the container's default thread pool,
which is controlled by a shared executor instance.
Any component can utilize the default pool by injecting an java.util.concurrent.Executor
instance.
Some built-in components have dedicated thread pools - such as the Jetty server and the search handler.
These thread pools are injected through special wiring in the config model and
are not easily accessible from other components.
The thread pools are by default scaled on the system resources as reported by the JVM
(Runtime.getRuntime().availableProcessors()
).
It's paramount that the -XX:ActiveProcessorCount
/jvm_availableProcessors
configuration is correct for the container to work optimally.
The default thread pool configuration can be overridden through services.xml.
We recommend you keep the default configuration as it's tuned to work across a variety of workloads.
Note that the default configuration and pool usage may change between minor versions.
The container will pre-start the minimum number of worker threads,
so even an idle container may report running several hundred threads.
The thread pool is pre-started with the number of thread specified in the threads
parameter.
Note that tuning the capacity upwards increases the risk of high GC pressure
as concurrency becomes higher with more in-flight requests.
The GC pressure is a function of number of in-flight requests, the time it takes to complete the request
and the amount of garbage produced per request.
Increasing the queue size will allow the application to handle shorter traffic bursts without rejecting requests,
although increasing the average latency for those requests that are queued up.
Large queues will also increase heap consumption in overload situations.
Extra threads will be created once the queue is full (when boost
is specified), and are destroyed after an idle timeout.
If all threads are occupied, requests are rejected with a 503 response.
The effective thread pool configuration and utilization statistics can be observed through the Container Metrics. See Thread Pool Metrics for a list of metrics exported.
jdisc.thread_pool.work_queue.size
- will instead switch to measure how many threads are active.
Change the default JVM heap size settings used by Vespa to better suit the specific hardware settings or application requirements.
By setting the relative size of the total JVM heap in percentage of available memory, one does not know exactly what the heap size will be, but the configuration will be adaptable and ensure that the container can start even in environments with less available memory. The example below allocates 50% of available memory on the machine to the JVM heap:
Use gc-options for controlling GC related parameters and options for tuning other parameters. See reference documentation. Example: Running with 4 GB heap using G1 garbage collector and using NewRatio = 1 (equal size of old and new generation) and enabling verbose GC logging (logged to stdout to vespa.log file).
The default heap size with docker image is 1.5g which can for high throughput applications be on the low side, causing frequent garbage collection. By default, the G1GC collector is used.
The config server and proxy are not executed based on the model in services.xml. On the contrary, they are used to bootstrap the services in that model. Consequently, one must use configuration variables to set the JVM parameters for the config server and config proxy. They also need to be restarted (services in the config proxy's case) after a change, but one does not need to vespa prepare or vespa activate first. Example:
VESPA_CONFIGSERVER_JVMARGS -Xlog:gc VESPA_CONFIGPROXY_JVMARGS -Xlog:gc -Xmx256m
Refer to Setting Vespa variables.
Some applications observe that the first queries made to a freshly started container take a long time to complete. This is typically due to some components performing lazy setup of data structures or connections. Lazy initialization should be avoided in favor of eager initialization in component constructor, but this is not always possible.
A way to avoid problems with the first queries in such cases is to perform warmup queries at startup. This is done by issuing queries from the constructor of the Handler of regular queries. If using the default handler, com.yahoo.search.handler.SearchHandler, subclass this and configure your subclass as the handler of query requests in services.xml.
Add a call to a warmupQueries() method as the last line of your handler constructor. The method can look something like this:
Since these queries will be executed before the container starts accepting external queries, they will cause the first external queries to observe a warmed up container instance.
Use metrics.ignore in the warmup queries to eliminate them from being reported in metrics.