Querying Vespa

Queries are sent to the Vespa Search API and written in YQL. The following diagram illustrates the components and data flow for typical Vespa queries.

Query Flow

  1. A query is sent from a front-end application to a Search Container node, implemented using the Vespa container. You can find specific search endpoints with vespa-model-inspect service container
  2. Query processing, like stemming and geo-tagging, is done in search chains
  3. The query is sent from the container to all content clusters in the system - see federation for more details. Content clusters have top-level dispatchers (TLDs), which dispatch queries to proton search nodes. For better caching, queries have affinity to TLDs.

At this point the query enters one or more content clusters:

  1. The query arrives at each content node's mid-level dispatcher (MLD), which is the same software as the TLD. Grouped clusters are an exception to this, where only a subset of content nodes are queried. The MLD forwards the query to vespa-proton.
  2. vespa-proton searches its index and returns its results to the MLD.
  3. When all MLDs have responded to the TLD, it merges the partial results and responds to the search container.
  4. The search container performs processing of the results in search chains making further queries if needed before it returns the results to the querying entity.


The primary task of the vespa-dispatch service is to dispatch queries so that they cover the entire document base with the lowest possible maximum load on the relevant search nodes. The TLD knows what queries are being processed at each node and the load is balanced by dispatching queries to the MLD with the lowest number of running queries.


The two dispatch modes are separable in the log files by their entry. The MLD writes:

(...) searchnode.dispatch0.dispatch
vespa-dispatch (...)
whereas the TLD writes:
topleveldispatch vespa-dispatch (...)