Querying Vespa

This is an overview of the components and data flow for Vespa queries.

Queries are sent to Vespa using the Search API. Query strings are written in YQL.

Query flow

  1. A query is sent from a front-end application to a Search Container node, implemented using the Vespa container. Find search endpoints using vespa-model-inspect service container
  2. Query processing, like stemming and geo-tagging, is done in search chains
  3. The query is sent from the container to all content clusters in the system - see federation. Content clusters have top-level dispatchers (TLDs), which dispatches queries to proton search nodes. For better caching, queries have affinity to TLDs

At this point the query enters one or more content clusters:

  1. The query arrives at each content node's mid-level dispatcher (MLD - binary is the same as for TLD). An exception to this is grouped clusters, in this case a subset of content nodes are queried. MLD forwards the query to vespa-proton.
  2. vespa-proton searches its index and returns its results to the MLD.
  3. When all MLDs have responded to the TLD, it merges their partial results and responds to the search container.
  4. The search container performs processing of the results in search chains, requerying data if needed, before it returns these to the querying entity.


The vespa-dispatch service's primary task is to dispatch queries such that it covers the entire document base, with the lowest possible maximum load on its relevant search nodes. The TLD knows what queries are being processed and at which nodes, and load balancing is done by dispatching to the node (MLD) with the lowest number of running queries.


The two dispatch modes are separable in the log files by their entry; the MLD writes:

(...) searchnode.dispatch0.dispatch
vespa-dispatch (...)
whereas the TLD writes:
topleveldispatch vespa-dispatch (...)