Benchmarking a Vespa application is essential to get an idea of how well the test configuration performs. Thus, benchmarking is an essential part of sizing a search cluster itself. Benchmarking a cluster can answer the following questions:
- What throughput and latency can you expect from a search node?
- Which resource is the bottleneck in the system?
Before you start benchmarking, consider:
- What is your expected query mix? Having a representative query mix to test with is essential in order to get useful results. Splitting up in different types of queries is also a useful way to get an idea of which query is the heaviest.
- What is the expected SLA, both in terms of latency and throughput?
- How important is real-time behavior for you? What is the rate of incoming documents, if any?
Vespa provides a load generator tool, vespa-fbench, to run queries and generate statistics, much like a traditional web server load generator. Fbench allows you to run any number of clients (i.e. the more clients, the higher load), for any length of time, and adjust the client response time before issuing another query. As outputs, vespa-fbench gives you the throughput, max, min, and average latency, as well as the 25, 50, 75, 90, 95 and 99 percentiles, allowing you to get quite accurate information of how well the system manages the workload.
vespa-fbench uses query files, which are files where each line is a query
following this pattern:
A common way to make query files is to use the queries from production installations, or generate the queries from the document feed or expected queries. Fbench runs each client in a separate thread, and to get a realistic query load, one should split the query files into one file per client. The vespa-fbench-split-file utility, can assist processing a query file for vespa-fbench. Having prepared the query files, one can move over to running the benchmark.
A typical vespa-fbench command looks like:
$ vespa-fbench -n 8 -q query%03d.txt -s 300 -c 0 myhost.mydomain.com 8080This starts 8 clients, each using queries from a query file prefixed with
query, followed by the client number. This way, client 1 will use
-sparameter indicates that the benchmark will run for 300 seconds. The
-cparameter, states that each client should wait for 0 milliseconds between each query. This enables you to control user interactivity. The last two parameters are container hostname and port. Multiple hosts and ports may be provided, and the clients will be uniformly distributed to query the containers round robin.
Run vespa-fbench with no arguments for more help.
Having completed the benchmark, vespa-fbench will provide you with a summary - example:
***************** Benchmark Summary ***************** clients: 30 ran for: 1800 seconds cycle time: 0 ms lower response limit: 0 bytes skipped requests: 0 failed requests: 0 successful requests: 12169514 cycles not held: 12169514 minimum response time: 0.82 ms maximum response time: 3010.53 ms average response time: 4.44 ms 25 percentile: 3.00 ms 50 percentile: 4.00 ms 75 percentile: 6.00 ms 90 percentile: 7.00 ms 95 percentile: 8.00 ms 99 percentile: 11.00 ms actual query rate: 6753.90 Q/s utilization: 99.93 %The various aspects of latency and throughput are covered. It is also important to take note of the number of failed requests, as a high number here can indicate that the system is overloaded or that the queries are invalid.
There are also tools to format the vespa-fbench output into something more
manageable for plotting.
resultfilter.pl formats the above output into a space-separated format.