Profiling

Guidelines when profiling:

  • Define clearly what to profile.
  • Find a load that represents what to profile. This is often the hardest part, as there is a lot of noise if stressing other components.
  • Make sure that there are no other bottlenecks that blocks stressing the profiled component. It makes little sense to do cpu profiling if the network is the limitation.
  • If possible, write special unit-tests like benchmark programs that stress exactly what to profile.
  • If the system is multithreaded:
    • Always profile single threaded first - that gives a baseline for doing the scaling tests. Verify one is utilizing as many cores as expected.
    • Increase scaling gradually to at least 2x numcores or until throughput degrades.

Also see using valgrind with Vespa.

CPU profiling

vmstat

vmstat can be used to figure out what kind of resources are used:

  • cpu usage split in user, system, idle, and io wait: system should be low(<10)
  • swap in/out: should be zero.
Note: A maxed out system should have either maxed out disks or cpu (idle == 0). If not, there might be locks.

Example:

$ vmstat 1

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
0  0   5628 3315460 304024 23008616    0    0    14    34    0     0  0  0 99  0
1  0   5628 3298884 304024 23008640    0    0     0   396   33  4615  9  1 90  0
0  0   5628 3316336 304028 23008644    0    0     0     0   15  4469  4  1 95  0
0  0   5628 3316592 304028 23008644    0    0     0     0   24  4364  0  0 100  0
0  0   5628 3316592 304028 23008644    0    0     0  2948   20  4305  0  0 100  0
0  0   5628 3316468 304028 23008644    0    0     0     0   22  4259  0  0 100  0
0  0   5628 3316468 304028 23008644    0    0     0   180   20  4279  0  0 100  0
0  0   5628 3316468 304028 23008644    0    0     0     0   26  4349  0  0 100  0
16  0   5628 3284236 304056 23008688    0    0    12   188   17  9196 38  2 60  0
19  0   5628 3267020 304056 23008732    0    0     8   128   44  6408 99  1  0  0
16  0   5628 3245472 304060 23008840    0    0    20     0    9  7191 99  1  0  0
17  0   5628 3227784 304060 23008872    0    0    20     0   27  6420 99  1  0  0
top Use top to see which applications consume cpu and memory.

CPU Profiling using perf

Sometimes, when debugging cpu usage in a remote cluster and debugging performance, it might be beneficial to get a performance profile snapshot. To use perf, install vespa-debuginfo-<vespa-version> matching the Vespa version, like:

$ rpm -q vespa-debuginfo
$ sudo yum install vespa-debuginfo-7.147.12
The pid of the vespa-proton-bin process can be obtained using vespa-sentinel-cmd, or top/ps. Record:
$ sudo perf record -g --pid=<pid-of-proton-process> sleep 60
View a performance profile report:
$ sudo perf report
Sometimes it's useful to have kernel debug info installed to get symbol info for the Linux kernel:
$ sudo yum install kernel-debuginfo
Its important to get somewhat same version of kernel-debuginfo as the kernel package.

Container privileges

When debugging an unprivileged docker container, perf commands can be executed from inside a privileged container sharing pid space:

$ CONTAINER=host002-09
$ sudo docker run -ti --rm --privileged --pid container:$CONTAINER \
  --entrypoint bash $(sudo docker ps --filter name=$CONTAINER --format "{{.Image}}")
This starts a privileged container that shares the pid namespace, using the same docker image as the container to debug. Run perf record ... inside this privileged container.