A document summary is the information that is shown for each document in a query result. What information to include is determined by a document summary class: A named set of fields with config on which information they should contain.
A special document summary named default
is always present and used by default.
This contains:
Summary classes are defined in the schema:
schema music { document music { field artist type string { indexing: summary | index } field album type string { indexing: summary | index index: enable-bm25 } field year type int { indexing: summary | attribute } field category_scores type tensor<float>(cat{}) { indexing: summary | attribute } } document-summary my-short-summary { summary artist {} summary album {} } }
See the schema reference for details.
The summary class to use for a query is determined by the parameter presentation.summary;:
$ vespa query "select * from music where album contains 'head'" \ "presentation.summary=my-short-summary"
A common reason to define a document summary class is performance: By configuring a document summary which only contains attributes the result can be generated without disk accesses. Note that this is needed to ensure only memory is accessed even if all fields are attributes because the document id is not stored as an attribute.
Document summaries may also contain dynamic snippets and highlighted terms.
The document summary class to use can also be issued programmatically to the fill()
method from a searcher, and multiple fill operations interleaved with programmatic filtering can be used to
optimize data access and transfer when programmatic filtering in a Searcher is used.
A YQL statement can also be used to filter which fields from a document summary to include in results. Note that this is just a field filter in the container - a summary containing all fields of a summary class is always fetched from content nodes, so to optimize performance it is necessary to create custom summary classes.
$ vespa query "select artist, album, documentid, sddocname from music where album contains 'head'"
Use *
to select all the fields of the chosen document summary class used,
(which is default
by default).
$ vespa query "select * from music where album contains 'head'"
Summary classes may define fields by names not used in the document type:
document-summary rename-summary { summary artist_name { source: artist } }
Refer to the schema reference for adding attribute and non-attribute fields - some changes require re-indexing.
Use dynamic to generate dynamic snippets from fields based on the query keywords. Example from Vespa Documentation Search - see the schema:
document doc {
field content type string {
indexing: summary | index
summary : dynamic
}
A query for document summary returns:
Use document summaries to configure which fields ... indexing: summary | index } } document-summary titleyear { summary title ...
The example above creates a dynamic summary with the matched terms highlighted. The latter is called bolding and can be enabled independently of dynamic summaries.
Refer to the reference for the response format.
You can configure generation of dynamic snippets by adding an instance of the vespa.config.search.summary.juniperrc config in services.xml inside the <content> cluster tag for the content cluster in question. E.g:
<content ...> ... <config name="vespa.config.search.summary.juniperrc"> <max_matches>2</max_matches> <length>1000</length> <surround_max>500</surround_max> <min_length>300</min_length> </config> ... </content>
Numbers here are in bytes.
Attribute fields are held in memory. This means summaries are memory-only operations if all fields requested are attributes, and is the optimal way to get high query throughput. The other document fields are stored as blobs in the document store. Requesting these fields may therefore require a disk access, increasing latency.
When using additional summary classes to increase performance, only the network data size is changed - the data read from storage is unchanged. Having "debug" fields with summary enabled will hence also affect the amount of information that needs to be read from disk.
See query execution - breakdown of the summary (a.k.a. result processing, rendering) phase:
The work, and thus latency, increases with more hits. Use query tracing to analyze performance.
Refer to content node summary cache.