Search engines make queries fast by creating indexes over the stored data. While the indexes cost extra resources to build and maintain, this is usually a good tradeoff because they make queries so much cheaper. However, this does not hold for use cases where the data is split into many small subsets where each query just searches one (or a few) of these subsets, the canonical example being personal indexes where a user only searches their own data.
For such use cases, Vespa provides streaming search - a mode where only the raw data of the documents is stored and searches are implemented by streaming - no indexes required. In addition, attributes are also only stored on disk so that the only data needed in memory is 45 bytes per document, meaning that streaming mode lets you store billions of documents on each node.
This is especially important in personal data applications using vector embeddings, which otherwise require a lot of memory and require ANN to perform well, which is often unsuited for searching personal data as they don't surface all the most relevant documents.
Streaming mode is suitable when subsets are on average small compared to the entire corpus. Vespa delivers low query latency also for the occasional large subset (say, users with huge amounts of data) by automatically sharding such data groups over multiple content nodes, searched in parallel.
Streaming search uses the same implementation of most features in Vespa, including matching, ranking, grouping and sorting, and mostly supports the same features. A schema used in indexed mode can in most cases be used in streaming search without any changes. The following differences however apply:
These are the steps required to use streaming search:
<content id="mycluster" version="1.0">
<documents>
<document type="myType" mode="streaming" />
id:myNamespace:myType:g=myUserid:myLocalid
and when represented as paths in document/v1 requests,
document/v1/myNamespace/myType/group/myUserId/myLocalId
See the vector streaming search sample application for a complete example.
<document-processing/>
tags.
The configuration is identical to using indexed mode.
Indexing statements are - as the name indicates - mostly used for indexing, and so they are not executed by default with streaming search.
However, sometimes it is convenient to run indexing statements also when using streaming, for example to
use the embed
function to turn text into an embedding vector, as in
indexing: input myTextField | embed | attribute
Indexing statements are run by a document processor, so to enable them with streaming, enable document processing enabled on a container cluster and point to it as the one to do indexing processing from the content cluster:
<services version="1.0"> <container id="myContainers" version="1.0"> ... <document-processing/> ... </container> <content id="mail" version="1.0"> ... <documents> <document type="myType" mode="streaming" /> <document-processing chain="indexing" cluster="myContainers" /> </documents> ... </content> </services>
Streaming search offers more flexibility in matching text fields: Match settings
can be specified at query time on any text field, and fields marked with indexing: index
supports suffix and
substring matching.
To specify match settings at query time in YQL:
select * from sources * where artist contains ({prefix:true}"col") select * from sources * where artist contains ({substring:true}"old") select * from sources * where artist contains ({suffix:true}"play")
To specify a default match setting for a field in the schema:
field artist type string { indexing: summary | index match: substring }
Grouping works as normal with streaming search but offers two additional features, explained here.
Since streaming search "looks at" all documents matching the group name/selection
regardless of the query, it is possible to group over all those documents and not just the ones
matching the query. This is done by using where(true)
in the grouping expression:
all( where(true) all(group(myfield) each(output(count()))) )
When doing this, relevancy is not calculated for groups, as only matched hits have relevance.
The docidnsspecific
function returns the docid without namespace.
all( group(docidnsspecific()) each(output(count())) )
Memory: Streaming search requires 45 bytes of memory per document regardless of the document content.
Disk: Streaming search requires disk space to store the raw document data in compressed form. The size is dependent on the actual data but can be extrapolated linearly with the number of documents.
Streaming search is a visit operation. Parallelism is configured using persistence-threads:
<persistence-threads count='8'/> <visitors thread-count='8'/>
On Vespa Cloud, this number is set automatically to match the number of VCPUs set in resources. If you cannot get lower latency by increasing vcpus, it means your streaming searches have become IO bound.
For better control of memory usage, use direct IO for reads when document store cache is enabled - this makes the OS buffer cache size smaller and more predictable performance. The document store cache will cache recent entries and increase performance for users or groups doing repeated accesses. This sets aside 1 GB for document store cache.
<engine> <proton> <tuning> <searchnode> <summary> <io> <write>directio</write> <read>directio</read> </io> <store> <cache> <maxsize>1073741824</maxsize> </cache>