Vespa HTTP Benchmarking Tool - vespa-fbench

This is the HTML version of the man page for the command line tool vespa-fbench - a tool used for benchmarking the capacity of a Vespa system. Description:

  • Several hostnames and ports can be listed
  • This is distributed in round-robin manner to clients


vespa-fbench [-n numClients] [-c cycleTime] [-l limit] [-i ignoreCount] [-s seconds] [-q queryFilePattern] [-o outputFilePattern] [-r restartLimit] [-m maxLineSize] [-p seconds] [-k] [-x] [-y] [-z] <hostname> <port>


Some options described below, use vespa-fbench -h for a complete list:

-n numClients Run vespa-fbench with numClients clients in parallel. If not specified, vespa-fbench will use a default value of 10 clients.
-c cycleTime each client will make a request each <num> milliseconds [1000] ('-1' -> cycle time should be twice the response time)
-l limit minimum response size for successful requests [0]
-i ignoreCount do not log the <num> first results. -1 means no logging [0]
-s seconds run the test for <num> seconds. -1 means forever [60]
-q queryFilePattern pattern defining input query files ['query%03d.txt'] (the pattern is used with sprintf to generate filenames)
-o outputFilePattern save query results to output files with the given pattern (default is not saving.)
-r restartLimit number of times to re-use each query file. -1 means no limit [-1]
-m maxLineSize max line size in input query files [8192]. Can not be less than the minimum [1024].
-p seconds print summary every <num> seconds. only available when installing vespa-fbench from test branch,
-k enable HTTP keep-alive.
-x write benchmarkdata-reporting to output file.
-y write data on coverage to output file (must used with -x).
-z use single query file to be distributed between clients.
-P use POST for requests instead of GET
-T CA certificate file to verify peer against use to benchmark https enabled port. (e.g -T /etc/ssl/certs/ca-bundle.crt)

Preparing query files

vespa-fbench uses query files, which are files where each line is a query following the pattern:


When using the -X (HTTP POST) option vespa-fbench expects the following format:

{"yql" : "select * from sources * where default contains \"bad\" order by year desc;"} 
Any line starting with "/" will be taken as an URL path, with the following lines taken as the content (these lines can NOT start with "/"). With json body payload you also need to specify the content type by -H "Content-Type: application/json"

Run vespa-fbench

It is recommended to run multiple tests by changing the value in -n parameter to measure how the system might sustain query load and still meet the QPS and/or latency requirements. For example, starting from 1 client, 10, 20, etc.

A typical vespa-fbench command looks like:

$ vespa-fbench -n 10 -q query%03d.txt -s 300 -c 0 -o output%03d.txt -xy 8080

This creates 10 clients which will run for 300 seconds (5 minutes). The -c parameter states that each client will wait for 0 milliseconds between each request. Each client would use a query and output file given by the given pattern and it's client number, i.e. client 1 will use query file query001.txt and output file output001.txt.

The options -xy makes vespa-fbench clients output benchmarking data to it's output files

It is possible to list several hostnames and ports. The different hostnames will be distributed to the clients in a round-robin manner, such that, with two hosts, client 0, 2, …, 38 would make requests to the first host while client 1, 3, …, 39 would make requests to the second host.

Note that saving all responses to disk might impact the performance of the benchmarking itself. If only the summary is needed it is recommended to not use output files.

Using vespa-fbench results

After running vespa-fbench you will have a summary written to stdout and an output file from each client.

Understanding benchmarking results

After a test run has completed, vespa-fbench outputs various test results. This section will explain what each of these numbers mean. Notes:

  • 'system utilization' provides no information about the Vespa system under test and should not be used for benchmarking
  • In some modes of operation, vespa-fbench waits before sending the next query
  • 'system utilization' represents the time that vespa-fbench is sending queries and waiting for responses. For example, a 'system utilization' of 50% means that vespa-fbench is stress testing the system 50% of the time, and is doing nothing the remaining 50% of the time
  • Do not run vespa-fbench on the same machine as the container or the search node because it will impact the performance of vespa system
  • vespa-fbench latency results include network latency. Measure and subtract network latency to obtain the true vespa query latency. See also extended results for this info from Vespa
  • If many of the queries return zero results, the average latency will be low

Basic results

connection reuse count This value indicates how many times HTTP connections were reused to issue another request. Note that this number will only be displayed if the -k switch (enable HTTP keep-alive) is used.
clients Echo of the -n parameter.
cycle time Echo of the -c parameter.
lower response limit Echo of the -l parameter.
skipped requests Number of requests that was skipped by vespa-fbench. vespa-fbench will typically skip a request if the line containing the query url exceeds a pre-defined limit. Skipped requests will have minimal impact on the statistical results.
failed requests The number of failed requests. A request will be marked as failed if en error occurred while reading the result or if the result contained less bytes than 'lower response limit'.
successful requests Number of successful requests. Each performed request is counted as either successful or failed. Skipped requests (see above) are not performed and therefore not counted.
cycles not held Number of cycles not held. The cycle time is specified with the -c parameter. It defines how often a client should perform a new request. However, a client may not perform another request before the result from the previous request has been obtained. Whenever a client is unable to initiate a new request 'on time' due to not being finished with the previous request, this value will be increased.
minimum response time The minimum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server.
maximum response time The maximum response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server.
average response time The average response time. The response time is measured as the time period from just before the request is sent to the server, till the result is obtained from the server.
X percentile The X percentile of the response time samples; a value selected such that X percent of the response time samples are below this value. In order to calculate percentiles, a histogram of response times is maintained for each client at runtime and merged after the test run ends. If a percentile value exceeds the upper bound of this histogram, it will be approximated (and thus less accurate) and marked with '(approx)'.
actual query rate The average number of queries per second; QPS.
utilization The percentage of time used waiting for the server to complete (successful) requests. Note that if a request fails, the utilization will drop since the client has 'wasted' the time spent on the failed request.
zero hit queries The number of queries that gave zero hits in Vespa

Extended results

-x Activate benchmarkdata-reporting

This results will be added to the output file if the -x switch is active(activate benchmarkdata-reporting) is used.

NumHits Number of hits returned
NumFastHits Number of actual document hits returned
TotalHitCount Total number of hits for query
QueryHits Hits as specified in query
QueryOffset Offset as specified in query
NumErrors Number of error hits returned
NumGroupHits Number of grouping hits returned
SearchTime Time used for searching. Entire query time for one phase search, first phase for two-phase search
AttributeFetchTime Time used for attribute fetching, or 0 for one phase search
FillTime Time used for summary fetching, or 0 for one phase search

-y Activate additional data on coverage

This uses the report coverage query feature. These results will be added to the output file if the -y switch is active.

DocsSearched Total number of documents in nodes searched
NodesSearched Total number of search nodes which were used
FullCoverage 1 if true, 0 if false