• [+] expand all

Streaming Search

Search engines make queries fast by creating indexes over the stored data. While the indexes cost extra resources to build and maintain, this is usually a good tradeoff because they make queries so much cheaper. However, this does not hold for use cases where the data is split into many small subsets where each query just searches one (or a few) of these subsets, the canonical example being personal indexes where a user only searches their own data.

For such use cases, Vespa provides streaming search - a mode where only the raw data of the documents is stored and searches are implemented by streaming - no indexes required. In addition, attributes are also only stored on disk so that the only data needed in memory is 45 bytes per document, meaning that streaming mode lets you store billions of documents on each node.

This is especially important in personal data applications using vector embeddings, which otherwise require a lot of memory and require ANN to perform well, which is often unsuited for searching personal data as they don't surface all the most relevant documents.

Streaming mode is suitable when subsets are on average small compared to the entire corpus. Vespa delivers low query latency also for the occasional large subset (say, users with huge amounts of data) by automatically sharding such data groups over multiple content nodes, searched in parallel.

Streaming search uses the same implementation of most features in Vespa, including ranking, matching and grouping, and supports the same features, with these exceptions:

  • stemming is not supported with streaming.
  • Since there is no index, the content nodes does not collect term statistics, and term significance should be passed explicitly with query terms if text matching features that benefit from it are used.
  • Streaming search supports a wider range of matching options (such as substring and prefix), and these can be specified either at query time or at configuration time.

These are the steps required to use streaming search:

  1. Set indexing mode to streaming:
    <content id="mycluster" version="1.0">
            <document type="myType" mode="streaming" />
  2. Use document IDs which contains a group value specifying the small subset the document belongs to (usually a userid). These have the form id:myNamespace:myType:g=myUserid:myLocalid and when represented as paths in document/v1 requests, document/v1/myNamespace/myType/group/myUserId/myLocalId
  3. Specify the subset to search using the query parameter streaming.groupname.

See the vector streaming search sample application for a complete example.

Indexing statements are - as the name indicates - mostly used for indexing, and so they are not executed by default with streaming search.

However, sometimes it is convenient to run indexing statements also when using streaming, for example to use the embed function to turn text into an embedding vector, as in

indexing: input myTextField | embed | attribute

Indexing statements are run by a document processor, so to enable them with streaming, enable document processing enabled on a container cluster and point to it as the one to do indexing processing from the content cluster:

<services version="1.0">
    <container id="myContainers" version="1.0">

    <content id="mail" version="1.0">
            <document type="myType" mode="streaming" />
            <document-processing chain="indexing" cluster="myContainers" />

Streaming search offers more flexibility in matching text fields: Match settings can be specified at query time on any text field, and fields marked with indexing: index supports prefix and substring matching.

To specify match settings at query time in YQL:

select * from sources * where artist contains ({prefix:true}"col")
select * from sources * where artist contains ({substring:true}"old")
select * from sources * where artist contains ({suffix:true}"play")

To specify a default match setting for a field in the schema:

field artist type string {
    indexing: summary | index
    match: prefix

Streaming search grouping extension

Grouping works as normal with streaming search but offers two additional features, explained here.

Grouping over all documents

Since streaming search "looks at" all documents matching the group name/selection regardless of the query, it is possible to group over all those documents and not just the ones matching the query. This is done by using where(true) in the grouping expression:

all( where(true) all(group(myfield) each(output(count()))) )

When doing this, relevancy is not calculated for groups, as only matched hits have relevance.

The docidnsspecific function

The docidnsspecific function returns the docid without namespace.

all( group(docidnsspecific()) each(output(count())) )

Memory: Streaming search requires 45 bytes of memory per document regardless of the document content.

Disk: Streaming search requires disk space to store the raw document data in compressed form. The size is dependent on the actual data but can be extrapolated linearly with the number of documents.

Streaming search is a visit operation. Parallelism is configured using persistence-threads:

<persistence-threads count='8'/>
<visitors thread-count='8'/>

On Vespa Cloud, this number is set automatically to match the number of VCPUs set in resources. If you cannot get lower latency by increasing vcpus, it means your streaming searches have become IO bound.

Tuning summary store: Direct IO and cache

For better control of memory usage, use direct IO for reads when summary cache is enabled - this makes the OS buffer cache size smaller and more predictable performance. The summary cache will cache recent entries and increase performance for users or groups doing repeated accesses. This sets aside 1 GB for summary cache.