Streaming Search

A search engine normally implements indexing structures like reverse indexes to reduce query latency. It does indexing up-front, so later matching and ranking is quick. It also normally keeps a copy of the original document for later retrieval / use in search summaries.

Simplified, the engine keeps the original data plus auxiliary data structures to reduce query latency. This induces both extra work - indexing - as compared to only store the raw data, and extra static resource usage - disk, memory - to keep these structures.

Streaming search is an alternative to indexed search. It is useful in cases where the document corpus is statically split into many subsets and all searches go to just one (or a few) of the small subsets. The canonical example being personal indexes where a user only searches his own data. Read more on document identifier schemes to learn how to specify subsets.

In streaming mode, only the raw data of the documents is stored, in the document store. Only data structures for document IDs are in memory, not attributes. It matches documents to queries by streaming through them, similar to a grep. This is too costly for a global search but works fine for searching small subsets of the data. This means Vespa can avoid the overhead of maintaining reverse indexes. Streaming mode is suitable when subsets are on average very small compared to the entire corpus. Vespa maintains low latency also for the occasional large subset (say, users with huge amounts of data) by automatically sharding the data over many content nodes, searched in parallel.

Streaming search uses the same implementation of most features in Vespa including ranking, matching and grouping and supports the same features. However, streaming search does not support stemming but supports a wider range of term matching options, which can be specified either at query time or at configuration time:

Feature Query language syntax Search definition syntax Example
substring *bla* match:substring /search/?query=*bla*
prefix bla* match:prefix /search/?query=bla*
suffix *bla match:suffix /search/?query=*bla
exact Exact string item match:exact Text ToDo


  • Streaming search has low latency if the data searched per node is small. Total data volume can be huge as data searched is limited by a predicate.
  • Streaming search is highly flexible as it does not create precomputed indexes, and hence supports more matching options.
  • Streaming search uses less disk space and memory, and zero CPU for indexing. It uses more CPU for search.
  • Streaming search does not have linguistic features like stemming and normalizations.

Using streaming mode

To enable streaming, set mode=streaming on the document type:

<content id="mycluster" version="1.0">
    <document type="mytype" mode="streaming" />
Note: Switching indexing mode requires data re-feed.

To run a streaming search query:


Disk sizing

Disk sizing for streaming search is considering space used to store IDs in the document meta store and space used in the document store (summary) - example:

$ du -sh $VESPA_HOME/var/db/vespa/search/cluster.mystream/n1/documents/doctype/0.ready/*
  4.0K	attribute
  216M	documentmetastore
  4.0K	index
  1.5G	summary
Both scale linearly with number of documents - document meta store with approx 30 bytes per document, document store depending on document size. Hence, to estimate disk used, feed X% of corpus and extrapolate.

Memory sizing

Two data structures are loaded into memory in a streaming search:

This is the memory used to keep documents searchable - adding to this is the memory used per search. Applications with a low query rate can optimize for static memory use by presetting the document meta store ensuring it is never re-sized - example setting 5M documents per node:
Note: This is a hard limit if the node does not have memory to keep more than one attribute in memory!

Streaming search query tuning

Streaming search is a visit operation. Parallelism is configured using persistence-threads:

  <thread count='8'/>
  <thread count='4' lowest-priority='VERY_HIGH'/>   <!-- Threads dedicated to streaming search iterators -->
<visitors thread-count='8'/>

Summary store: Direct IO and cache

For better control of memory usage, use direct IO for reads when summary cache is enabled - this makes the OS buffer cache size smaller and more predictable performance. The summary cache will cache recent entries and increase performance for users or groups which does repeated accesses. Below setting sets aside 1GB for summary cache.


Searchable copies

Vespa has a concept of searchable and ready copies for indexed search. In short, indices are generated for replicas used in search - other replicas do not have the indices generated. This does not apply for streaming search, where the point is not having indices. When nodes stop, replicas transfer to the active database - for streaming, disable this by setting searchable copies to the same level as redundancy:

  <content id="mycluster" version="1.0">
The effect of not setting the same number is higher load on nodes and hence worse latency during state transitions (i.e. nodes going up and down).

When redundancy = searchable copies, all documents are found in the 0.ready database.