A document summary is the information that is shown for each document in a query result.
What information to include is determined by a document summary class:
A named set of fields with config on which information they should contain.
A special document summary named default is always present and used by default.
This contains:
all fields which specifies in their indexing statements that they may be included in summaries
schema music {
document music {
field artist type string {
indexing: summary | index
}
field album type string {
indexing: summary | index
index: enable-bm25
}
field year type int {
indexing: summary | attribute
}
field category_scores type tensor<float>(cat{}) {
indexing: summary | attribute
}
}
document-summary my-short-summary {summary artist {}summary album {}}
}
The summary class to use for a query is determined by the parameter
presentation.summary;:
$ vespa query "select * from music where album contains 'head'" \
"presentation.summary=my-short-summary"
A common reason to define a document summary class is performance:
By configuring a document summary which only contains attributes the result can be generated
without disk accesses. Note that this is needed to ensure only memory is accessed even if all fields are
attributes because the document id is not stored as an attribute.
The document summary class to use can also be issued programmatically to the fill()
method from a searcher, and multiple fill operations interleaved with programmatic filtering can be used to
optimize data access and transfer when programmatic filtering in a Searcher is used.
Selecting summary fields in YQL
A YQL statement can also be used to filter which fields from a document summary
to include in results. Note that this is just a field filter in the container -
a summary containing all fields of a summary class is always
fetched from content nodes, so to optimize performance it is necessary to create custom summary classes.
$ vespa query "select artist, album, documentid, sddocname from music where album contains 'head'"
{"root":{},"children":[{"id":"id:mynamespace:music::a-head-full-of-dreams","relevance":0.16343879032006284,"source":"mycontentcluster","fields":{"sddocname":"music","documentid":"id:mynamespace:music::a-head-full-of-dreams","artist":"Coldplay","album":"A Head Full of Dreams"}}]}}
Use * to select all the fields of the chosen document summary class used,
(which is default by default).
$ vespa query "select * from music where album contains 'head'"
{"root":{},"children":[{"id":"id:mynamespace:music::a-head-full-of-dreams","relevance":0.16343879032006284,"source":"mycontentcluster","fields":{"sddocname":"music","documentid":"id:mynamespace:music::a-head-full-of-dreams","artist":"Coldplay","album":"A Head Full of Dreams","year":2015,"category_scores":{"type":"tensor<float>(cat{})","cells":{"pop":1.0,"rock":0.20000000298023224,"jazz":0.0}}}}]}}
Summary field rename
Summary classes may define fields by names not used in the document type:
document-summary rename-summary {
summary artist_name {
source: artist
}
}
Use dynamic
to generate dynamic snippets from fields based on the query keywords.
Example from Vespa Documentation Search - see the
schema:
document doc {
field content type string {
indexing: summary | index
summary : dynamic
}
A query for document summary returns:
Use document summaries to configure which fields ...
indexing: summary | index } } document-summary
titleyear { summary title ...
The example above creates a dynamic summary with the matched terms highlighted.
The latter is called bolding
and can be enabled independently of dynamic summaries.
You can configure generation of dynamic snippets by adding an instance of the
vespa.config.search.summary.juniperrc config
in services.xml inside the <content> cluster tag for the content cluster in question. E.g:
Attribute fields are held in memory.
This means summaries are memory-only operations if all fields requested are attributes,
and is the optimal way to get high query throughput.
The other document fields are stored as blobs in the
document store.
Requesting these fields may therefore require a disk access, increasing latency.
Important:
The default summary class will access the document store
as it includes the documentid field
which is stored there.
For maximum query throughput using memory-only access, use a dedicated summary class with attributes only.
When using additional summary classes to increase performance,
only the network data size is changed - the data read from storage is unchanged.
Having "debug" fields with summary enabled will hence also affect the
amount of information that needs to be read from disk.
See query execution -
breakdown of the summary (a.k.a. result processing, rendering) phase:
Getting data across from content nodes to containers.
Deserialization from internal binary formats (potentially) to Java objects
if touched in a Searcher,
and finally serialization to JSON (default rendering) + rendering and network.
The work, and thus latency, increases with more hits.
Use query tracing to analyze performance.