# Document Summaries

Use YQL to select which fields to include in results:

$vespa query "select artist, album from music where album contains 'head'"  { "root": { }, "children": [ { "id": "index:mycontentcluster/0/de97c3f0cf0d1122b3494a44", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "artist": "Coldplay", "album": "A Head Full of Dreams" } } ] } } In addition to the schema fields, one can also select sddocname and documentid: $ vespa query "select artist, album, documentid, sddocname from music where album contains 'head'"

{
"root": { },
"children": [
{
"relevance": 0.16343879032006284,
"source": "mycontentcluster",
"fields": {
"sddocname": "music",
"artist": "Coldplay",
"album": "A Head Full of Dreams"
}
}
]
}
}

Use * to select all fields. As no summary class is given, the default summary class with all fields is used:

$vespa query "select * from music where album contains 'head'"  { "root": { }, "children": [ { "id": "id:mynamespace:music::a-head-full-of-dreams", "relevance": 0.16343879032006284, "source": "mycontentcluster", "fields": { "sddocname": "music", "documentid": "id:mynamespace:music::a-head-full-of-dreams", "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015, "category_scores": { "type": "tensor<float>(cat{})", "cells": { "pop": 1.0, "rock": 0.20000000298023224, "jazz": 0.0 } } } } ] } } Query performance depends on the fields returned, see the performance section. ## Summary classes Use summary classes - reference - to simplify the YQL, by naming the fields in a class: schema music { document music { field artist type string { indexing: summary | index } field album type string { indexing: summary | index index: enable-bm25 } field year type int { indexing: summary | attribute } field category_scores type tensor<float>(cat{}) { indexing: summary | attribute } } document-summary short-summary { summary artist type string {} summary album type string {} } }  Use presentation.summary=[summary name] in queries to select summary class: $ vespa query "select * from music where album contains 'head'" \
"presentation.summary=short-summary"


The select statement in YQL lists a set of fields to return. Vespa in general makes a best-effort to return those fields, and only those fields, unless a wildcard ("*") is given as argument. The wildcard implies returning the full set of fields included in the given summary class.

In conjunction with YQL statements, the summary class argument operates like a definition of the set which YQL select then chooses a subset of fields from. In other words, if the YQL expression is select * …, and the summary class argument is short-summary, all the fields in the summary class short-summary will be returned.

The default summary class contains all schema fields plus sddocname and documentid.

### Summary field rename

Use a summary class to give a field another name in query results:

    document-summary rename-summary {
summary artist_name type string {
source: artist
}
}


Refer to the schema reference for adding attribute and non-attribute fields - some changes require re-indexing.

## Dynamic summaries

Use dynamic to generate dynamic abstracts of fields, based on query keywords. Example from Vespa Documentation Search - see the schema:

document doc {

field content type string {
indexing: summary | index
summary : dynamic
}


A query for document summary returns:

Use document summaries to configure which fields ... indexing: summary | index } } document-summary titleyear { summary title type string ...

The example above creates a dynamic summary with the matched terms highlighted. The latter is called bolding and can be enabled independently of dynamic summaries.

Refer to the reference for the response format.

## Performance

Attribute fields are held in memory. This means summaries are memory-only operations if all fields requested are attributes, and is the optimal way to get high query throughput. The other document fields are stored as blobs/records in the document store. Requesting these fields will therefore require a disk access, increasing latency.

When using additional summary classes to increase performance, only the network data size is changed - the data read from storage is unchanged. Having "debug" fields with summary enabled will hence also affect the amount of information that needs to be read from disk.

See query execution - breakdown of the summary (a.k.a. result processing, rendering) phase:

• The document summary latency on the content node, tracked by content_proton_search_protocol_docsum_latency_average.
• Getting data across from content nodes to containers.
• Deserialization from internal binary formats (potentially) to Java objects if touched in a Searcher, and finally serialization to JSON (default rendering) + rendering and network.

The work, and thus latency increases with more hits. Use query tracing to analyze performance.

Refer to content node summary cache.