Vespa Benchmarking

Benchmarking a Vespa application is essential to get an idea of how well the test configuration performs. Thus, benchmarking is an essential part of sizing a search cluster itself. Benchmarking a cluster can answer the following questions:

  • What throughput and latency can you expect from a search node?
  • Which resource is the bottleneck in the system?
These in turn indirectly answers other questions such as how many nodes are needed, and if it will help to upgrade your disk or CPU. Thus, benchmarking will help you get the optimal Vespa configuration, using all resources optimally, which in turn lowers your costs. But when should you benchmark? A good rule is to benchmark whenever your workload changes. You should benchmark before you setup your system initially. More benchmarking should be done when adding new features to your queries.

Before you start benchmarking, consider:

  • What is your expected query mix? Having a representative query mix to test with is essential in order to get useful results. Splitting up in different types of queries is also a useful way to get an idea of which query is the heaviest.
  • What is the expected SLA, both in terms of latency and throughput?
  • How important is real-time behavior for you? What is the rate of incoming documents, if any?
Having an understanding of the query mix and SLA will help setting the parameters of the benchmarking tools.

Benchmarking tools

Vespa provides a load generator tool, vespa-fbench, to run queries and generate statistics, much like a traditional web server load generator. Fbench allows you to run any number of clients (i.e. the more clients, the higher load), for any length of time, and adjust the client response time before issuing another query. As outputs, vespa-fbench gives you the throughput, max, min, and average latency, as well as the 25, 50, 75, 90, 95 and 99 percentiles, allowing you to get quite accurate information of how well the system manages the workload.

Preparing queries

vespa-fbench uses query files, which are files where each line is a query following this pattern: /search/?query=foo&parameter=blabla

A common way to make query files is to use the queries from production installations, or generate the queries from the document feed or expected queries. Fbench runs each client in a separate thread, and to get a realistic query load, one should split the query files into one file per client. The vespa-fbench-split-file utility, can assist processing a query file for vespa-fbench. Having prepared the query files, one can move over to running the benchmark.


A typical vespa-fbench command looks like:

$ vespa-fbench -n 8 -q query%03d.txt -s 300 -c 0 8080
This starts 8 clients, each using queries from a query file prefixed with query, followed by the client number. This way, client 1 will use query000.txt etc. The -s parameter indicates that the benchmark will run for 300 seconds. The -c parameter, states that each client should wait for 0 milliseconds between each query. This enables you to control user interactivity. The last two parameters are container hostname and port. Multiple hosts and ports may be provided, and the clients will be uniformly distributed to query the containers round robin.

Run vespa-fbench with no arguments for more help.

Post Processing

Having completed the benchmark, vespa-fbench will provide you with a summary - example:

***************** Benchmark Summary *****************
clients:                      30
ran for:                    1800 seconds
cycle time:                    0 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:    12169514
cycles not held:        12169514
minimum response time:      0.82 ms
maximum response time:   3010.53 ms
average response time:      4.44 ms
25 percentile:              3.00 ms
50 percentile:              4.00 ms
75 percentile:              6.00 ms
90 percentile:              7.00 ms
95 percentile:              8.00 ms
99 percentile:             11.00 ms
actual query rate:       6753.90 Q/s
utilization:               99.93 %
The various aspects of latency and throughput are covered. It is also important to take note of the number of failed requests, as a high number here can indicate that the system is overloaded or that the queries are invalid.

There are also tools to format the vespa-fbench output into something more manageable for plotting. formats the above output into a space-separated format.