Search Definition Reference
This document lists the syntax and content of search definitions, document types and fields. This is a reference, read search definitions first for an overview. Find an example at the end.
There must be at least one search definition (.sd) file containing a search element in an application package.
Syntax
Throughout this document, a string in square brackets represents some argument. The whole string, including the brackets, is replaced by a concrete string in a search definition.
Constructs in search definitions have a regular syntax. Each element starts by the element identifier, possibly followed by the name of this particular occurrence of the element, possibly followed by a space-separated list of interleaved attribute names and attribute values, possibly followed by the element body. Thus, one will find elements of these varieties:
[element-identifier] : [element-body]
[element-identifier] [element-name] : [element-body]
[element-identifier] [element-name] [attribute-name] [attribute-value]
[element-identifier] [element-name] [attribute-name] [attribute-value] { [element-body] }One-line element values starts by a colon and ends by newline. Multiline values (for fields supporting them) are any block of text enclosed in curly brackets. Comments may be inserted anywhere and start with a hash (#). Names are identifiers: They must match
["a"-"z","A"-"Z", "_"]["a"-"z","A"-"Z","0"-"9","_"]*
.
Elements
A search definition must contain no more than one search clause - elements:
search document struct field match field alias attribute bolding id index indexing indexing-rewrite match normalizing query-command rank rank-type sorting stemming struct-field indexing match query-command struct-field … summary summary-to DEPRECATED summary summary-to DEPRECATED weight weightedset compression index field fieldset rank-profile match-phase attribute order max-hits diversity attribute min-groups first-phase keep-rank-count rank-score-drop-limit expression ignore-default-rank-features num-threads-per-search rank rank-type rank-features constants rank-properties second-phase expression rerank-count summary-features constant stemming document-summary summary annotation field import field
search
The root element of search definitions.
A search definition describes how some data should be stored, indexed, ranked
and presented in results.
A search definition must be defined in a file named [search-definition-name].sd
.
search [name] { [body] }The body is mandatory and may contain:
Name | Description | Occurrence |
---|---|---|
document | A document defined in this search definition | One |
field | A field not contained in the document. Use fields outside documents to derive new field values to be placed in the indexing structure from document fields | Zero to many |
fieldset | Group document fields together for searching | Zero to many |
rank-profile | An explicitly defined set of ranking settings | Zero to many |
constant | A constant tensor located in a file used for ranking | Zero to many |
stemming | The default stemming setting. Not applicable to streaming search | Zero or one |
document-summary | An explicitly defined document summary | Zero to many |
annotation | Defines an annotation type | Zero to many |
import field | Import a field value from a global document | Zero to many |
document
Contained in search
.
Describes a document type. This can also be the root of the search definition,
if the document is not to be searched directly.
A document type may inherit the fields of one or more other document types.
If no document types are explicitly inherited,
the document inherits the generic document
type.
document [name] inherits [name-list] { [body] }The document name is optional, it defaults to the containing
search
element's name. If there is no containing search
element, the document name is required.
The inherits
attribute is optional
and has as value a comma-separated list of names of other document types.
The body of a document type is optional and may contain:
Name | Description | Occurrence |
---|---|---|
struct | A struct type definition for this document. | Zero to many |
field | A field of this document. | Zero to many |
compression | Specifies compression options for documents of this document type in storage. | Zero to one |
struct
Contained in document
.
Defines a composite type.
A struct consists of zero or more fields that the user can access together as one.
The struct has to be defined before it is used as a type in a field specification.
struct [name] { [body] }The struct name can not have underscores.
Note that struct types are supported differently in indexed search and streaming search mode. Take a look at struct type, struct array type and map type for more details.
The body of a struct is optional and may contain:
Name | Description | Occurrence |
---|---|---|
field | A field of this struct. | Zero to many |
field
Contained in search
,
document
,
struct
or
annotation
.
Defines a named value with a type and (optionally) how this field
should be stored, indexed, searched, presented and how it should influence ranking.
field [name] type [type-name] { [body] }Do not use names that are used for other purposes in the indexing language or other places in the search definition file. Reserved names are:
- attribute
- body
- case
- context
- documentid
- else
- header
- hit
- host
- if
- index
- position
- reference
- relevancy
- sddocname
- summary
- switch
- tokenize
The type attribute is mandatory - see field type for details and indexing restrictions. Supported types:
Name | Singular/Multi | Type |
---|---|---|
annotationreference<annotationtype> | singlevalue | Reference to a string annotation |
array<type> | multivalue | Array of type |
weightedset<element-type> | multivalue | Like array , but each element is also assigned an integer weight |
bool | singlevalue | true or false |
byte | singlevalue | Signed 8-bit integer |
double | singlevalue | 64-bit IEEE 754 floating point |
float | singlevalue | 32-bit IEEE 754 floating point |
int | singlevalue | Signed 32-bit integer |
long | singlevalue | Signed 64-bit integer |
position | singlevalue | Position in geographical coordinates, e.g. latitude and longitude |
predicate | singlevalue | Boolean expression in predicate logic |
raw | singlevalue | Binary data |
string | singlevalue | Text |
structname | singlevalue | Declares a field with a specific struct type, given by the struct name |
map<key-type,value-type> | multivalue | Map using the given types as keys and values. Keys and values can be any type |
tensor(dimension-1,...,dimension-N) | multivalue | Tensor with a set of named dimensions and a set of values located in the space of those dimensions |
uri | singlevalue | Uniform Resource Identifier (a URL or any other unique string id) |
reference<document-type> | singlevalue | Reference to an instance of a document-type used in a parent-child relationship |
The body of a field is optional for search
,
document
and
struct
, and disallowed for
annotation
. It may contain the following elements:
Name | Description | Occurrence |
---|---|---|
alias | Make an index or attribute available in searches under an additional name | Zero to many |
attribute | Specify an attribute setting. | Zero to many |
bolding | Specifies whether content of this field should be bolded. | Zero to one |
id | Explicitly decide the numerical id of this field. Is normally not necessary, but can be used to save some disk space. | Zero to one |
index | Specify a parameter of an index. Not applicable to streaming search | Zero to many |
indexing | The indexing statements used to create index structure additions from this field. | Zero to one |
indexing-rewrite | Determines the rewriting Vespa is allowed to do on the indexing statements of this field.Not applicable to streaming search | Zero to one |
match | Set the matching type to use for this field. | Zero to one |
normalizing | Specifies the kind of spelling normalizing to do on this field. | Zero or one. |
query-command | Specifies a command which can be received by a plugin searcher in the Search Container. | Zero to many |
rank | The high level ranking method to use for the field | Zero or one |
rank-type | Selects the set of low-level rank settings to be used for this field when using default nativeRank . |
Zero to one |
sorting | The sort specification for this field. | Zero or one. |
stemming | Specifies the kind of stemming to use for this field. Not applicable to streaming search | Zero or one. |
struct-field | A subfield of a field of type struct. The struct must have been defined to contain this subfield in the struct definition. If you want the subfield to be handled differently from the rest of the struct, you may specify it within the body of the struct-field. | Zero to many. |
summary | Sets a summary setting of this field, set to dynamic
to make a dynamic summary. |
Zero to many |
summary-to | The list of document summary names this should be included in. DEPRECATED Not applicable to streaming search, instead declare non-standard summaries in a document-summary tag outside of the document declaration | Zero to one |
weight | The importance of a term boost field, a positive integer. | Zero to one |
weightedset | Attributes of a weighted set type. | Zero to one |
If the field is part of a struct definition, i.e. contained in the
struct
element,
only match
may be specified.
If the field is of type struct, only
indexing
,
match
and
query-command
may be specified.
A field
declared outside of a document
tag (i.e. immediately within
a search
tag) is referred to as an extra-field. Such fields may not be set directly,
not programmatically and not through a feed - doing so will cause the document to be rejected by the indexer.
Extra-field may only be populated using indexing statements
that input the value of proper fields
(e.g. indexing: input my_document_field | normalize | summary | index
).
struct-field
Contained in field
or
struct-field
.
Defines how this struct field (a subfield of a struct) should be stored,
indexed, searched, presented and how it should influence ranking.
The field in which this struct field is contained must be of
type struct or a collection of type struct.
Note that struct fields are supported differently in indexed search and
streaming search:
struct-field [name] { [body] }The body of a struct field is optional and may contain the following elements:
Name | Description | Supported in | Occurrence |
---|---|---|---|
indexing | The indexing statements used to create index structure additions from this field.
For indexed search only attribute is supported, which makes the struct field a searchable in-memory attribute.
For streaming search only index and summary is supported.
|
Indexed and streaming | Zero to one |
attribute | Specifies an attribute setting. | Indexed | Zero to many |
match | Set the matching type to use for this field. | Streaming | Zero to one |
query-command | Specifies a command which can be received by a plugin searcher in the Search Container. | Streaming | Zero to many |
struct-field | A subfield of a field of type struct. The struct must have been defined to contain this subfield in the struct definition. If you want the subfield to be handled differently from the rest of the struct, you may specify it within the body of the struct-field. | Streaming | Zero to many. |
summary | Sets a summary setting of this field, set to dynamic
to make a dynamic summary. |
Streaming | Zero to many |
summary-to | DEPRECATED The list of document summary names this should be included in. | Streaming | Zero to one |
indexing
,
match
and
query-command
may be specified.
fieldset
Contained in search
.
Note: this is not related to the Document fieldset.
Fieldsets provide a way to group fields together for searching, to search multiple fields - example:
fieldset myfieldset { fields: a,b,c }Using the query
yql=select+*+from+sources+*+where+myfieldset+contains+"foo"%3B
will return all the documents for which one or more of the fields a, b or c contain "foo".
By naming the field set 'default', those fields are searched without
specifying the field set in unstructured queries: query=foo
.
The fields making up the field set should be as similar as possible in terms of indexing clause, matching etc. If they are not, test the application thoroughly. For example, it will work for a mix of attributes and indexes, but the matching for attribute fields will always be exact unless in streaming mode.
If specific match settings for the field set is needed, such as exact, specify it using a match clause:
fieldset myfieldset { fields: a,b,c match { exact } }Use
query-commands
in the field set to set search settings.
Example:
fieldset myfieldset { fields: a,b,c query-command:"exact @@" }
compression
DEPRECATED - DO NOT USE - see deprecations.
Contained in
document
. If a compression level is set within this element, lz4 compression is enabled for whole documents.compression { [body] }The body of a compression specification is optional and may contain:
Name Description Occurrence type LZ4 is the only valid compression method. Zero to one level Enable compression. LZ4 is linear and 9 means HC(high compression) Zero to one threshold A percentage (multiplied by 100) giving the maximum size that compressed data can have to keep the compressed value. If the resulting compressed data is higher than this, the document will be stored uncompressed. Default value is 95. Zero to one
rank-profile
Contained in search
.
A rank profile is a named set of rank settings which can be specified
during queries (see the ranking
parameter in the
search API).
Rank profiles are used to specify an alternative ranking of the same data for different purposes, and to experiment with new rank settings. If no explicit rank profile is specified, one called "default" is implicitly created to hold the rank settings from each field. The "default" rank profile is always selected for queries which does not specify one. It is possible to add additional settings to the default rank profile by explicitly defining it.
rank-profile [name] inherits [rank-profile] { [body] }The
inherits
attribute is optional. If defined, it
contains the name of one other rank profile in the same search
definition. Values not defined in this rank profile will then be
inherited as expected. It is possible to inherit the default rank
profile, even if it is not explicitly listed.
In addition to the default
rank profile, a profile named unranked
is implicitly created.
This rank-profile makes sure that the rank phases in the search backend are skipped and
should be used for queries that only require matching and do not use ranking.
If you are sorting on something different than rank score this is also the profile to use.
Note that this profile should not be used if the query contains Wand
search operators.
Also note that using this profile will give better performance as the rank phases are skipped.
The body of a rank-profile may contain:
Name | Description | Occurrence |
---|---|---|
match-phase | Ranking configuration to be used for hit limitation during matching. | Zero or one |
first-phase | The ranking config to be used for first-phase ranking. | Zero or one |
rank-features | The rank features to be dumped when using the query-argument rankfeatures. | Zero or more |
second-phase | The ranking config to be used for second-phase ranking. | Zero or one |
summary-features | The rank features to be dumped for all queries. | Zero or more |
ignore-default-rank-features | Do not dump the default set of rank features, only those explicitly specified with the rank-features command. | Zero or one |
num-threads-per-search | Overrides the global persearch threads to a lower value. | Zero or one |
constants | List of constant key/value pairs available in ranking expressions. | Zero or one |
rank-properties | List of any rank property key-values to be used by rank features. | Zero or one |
function [name] | Define named functions that can be referenced during ranking phase(s) and (if without arguments) as part of the summary-features. | Zero or more |
rank | The high level ranking method to use for a field in this profile. | Zero or more |
rank-type | The rank type of a field in this profile. | Zero or more |
match-phase
Contained in rank-profile
.
The config specifying ranking to be used during matching.
This is used to limit the result set in order to cut latency.
It is particularly useful if the first-phase ranking is expensive.
It can be used for sorting on numeric values to limit the evaluated result set.
match-phase { attribute: [numeric single value attribute] order: [ascending | descending] max-hits: [integer] diversity }
Name | Description |
---|---|
attribute | Which attribute to use as the quality signal. The attribute referenced must be a single valued numeric attribute with fast-search enabled. No default. |
order | Whether the attribute should be used in descending order (prefer documents with a high score)
or ascending order (prefer documents with a low value in the attribute).
Usually it is not necessary to specify this, as the default value descending
is by far the most common. |
max-hits | Requested hits per search node. Usually a number like 10000 works well here. |
diversity | Guarantee a minimum result set diversity. |
diversity
Contained in match-phase
.
Diversity is used to specify diversity in different phases -
supported in match-phase
.
It is used to guarantee a minimum result set diversity.
Specify the name of an attribute that will be used to provide diversity.
Result sets are guaranteed to get at least min-groups
unique values from the diversity attribute
from this phase.
A document is considered as a candidate if:
- The query has not yet reached the
max-hits
number produced from this phase. - The query has not yet reached the max number of candidates in one group.
This is computed by the
max-hits
of the phase divided bymin-groups
diversity { attribute: [numeric attribute] min-groups: [integer] }
Name | Description |
---|---|
attribute | Which attribute to use when deciding diversity. The attribute referenced must be a single valued numeric or string attribute. |
min-groups | Specifies the minimum number of groups returned from the phase.
Using this with match-phase
often means one can reduce max-hits |
first-phase
Contained in rank-profile
.
The config specifying the first phase of ranking.
This is the initial ranking performed on all hits, and you should therefore avoid doing heavy rank-calculations here.
By default, this will use the ranking feature nativeRank
.
first-phase { [body] }The body of a firstphase-ranking statement consists of:
Name | Description |
---|---|
expression | Specify the ranking expression to be used for first phase of ranking - see ranking expressions. |
keep-rank-count | How many documents to keep the first phase top rank values for. Default value is 10000. |
rank-score-drop-limit | Drop all hits with a first phase rank score less than or equal to this floating point number. Use this to implement a rank cutoff. Default is -Double.MAX_VALUE. |
expression
Contained in first-phase
or
second-phase
.
Specify a ranking expression.
The expression can either be written directly or loaded from a file.
When writing it directly the syntax is:
expression: [ranking expression]or
expression { [ranking expression] [ranking expression] [ranking expression] }The second format is primarily a convenience feature when using long expressions, enabling them to be split over multiple lines.
Expressions can also be loaded from a separate file. This is useful when dealing with the very long expressions generated by e.g. MLR. The syntax is:
expression: file:[path-to-expressionfile]The path is relative to the location of the search definition file (note: directories are not allowed in the path). The file itself must end with
.expression
. This suffix is optional in the sd-file.
Therefore expression: file:mlrranking.expression
and
expression: file:mlrranking
are identical.
Both refer to a file called mlrranking.expression
in the searchdefinition directory.
rank-features
Contained in rank-profile
.
List of extra rank features to be dumped
when using the query-argument rankfeatures.
rank-features: [feature] [feature]or
rank-features { [feature] [feature] }Any number of ranking features can be listed on each line, separated by space.
constants
Contained in rank-profile
.
List of constants available in ranking expressions, resolved and optimized at configuration time.
constants { key: value }
Name | Description |
---|---|
key | Name of the constants. |
value | A number or any string. Must be quoted if it contains spacing. |
rank-properties
Contained in rank-profile
.
List of generic properties, in the form of key/value pairs to be used by ranking features.
rank-properties { key: value }
Name | Description |
---|---|
key | Name of the property. |
value | A number or any string. Must be quoted if it contains spacing. |
function (inline)? [name]
Contained in rank-profile
.
Define a named function that can be referenced as a part of the ranking expression,
or (if having no arguments) as a feature.
A function accepts any number of arguments.
function [name]([arg1], [arg2], [arg3]) { expression: … }or
function [name] ([arg1], [arg2], [arg3]) { expression { [ranking expression] [ranking expression] … }Note that the parenthesis is required after the name. A rank-profile example is shown below:
rank-profile default inherits default { function myfeature() { expression: fieldMatch(title) + freshness(timestamp) } function otherfeature(foo) { expression{ nativeRank(foo, body) } } first-phase { expression: myfeature * 10 } second-phase { expression: otherfeature(title) * myfeature } summary-features: myfeature }You can not include functions that accept arguments in summary features.
Adding the inline
modifier will inline this function in the calling expression
if it also has no arguments.
This is faster for very small and cheap functions (and more expensive for others).
second-phase
Contained in rank-profile
.
The config specifying the second phase of ranking. This is the optional reranking performed on the best hits from the
first phase, and where you should put any advanced ranking calculations (e.g. MLR).
By default, no second-phase ranking is performed.
In streaming search we perform the second phase ranking on all hits.
You can therefore put all the rank calculation in the first phase rank expression and just skip second phase.
second-phase { [body] }The body of a secondphase-ranking statement consists of:
Name | Description |
---|---|
expression | Specify the ranking expression to be used for first phase of ranking. (for a description, see the ranking expression documentation. |
rerank-count | Optional argument. Specifies the number of hits to be reranked. Default value is 100 |
summary-features
Contained in rank-profile
.
List of rank
features to be dumped for every query. Using many items will have a
performance impact, a larger list to be returned only when requested can
be specified in rank-features.
summary-features: [feature] [feature]…or
summary-features { [feature] [feature] }Any number of ranking features can be listed on each line, separated by space.
constant
Contained in search
.
This defines a named constant tensor located in a file with a given type
that can be used in ranking expressions via the rank feature
constant.
A constant with a given name is defined as follows:
constant [name] { [body] }The body of a constant must contain:
Name | Description | Occurrence |
---|---|---|
file | Path to the location of the file containing the constant tensor. The path is relative to the root of the application package containing this sd-file. The format of the file is JSON and is the same as when specifying a tensor field in a document put or update. Refer to the Document JSON Format for reference. Compression is supported - if the filename ends with ".json.lz4", Vespa assumes the tensor is LZ4 compressed. | One |
type | The type of the constant tensor, refer to tensor-type-spec for reference. | One |
constant my_constant_tensor { file: constants/my_constant_tensor_file.json type: tensor<float>(x{},y{}) }This example has a constant tensor with two mapped dimensions,
x
and y
.
An example JSON file with such tensor constant:
{ "cells": [ { "address": { "x": "a", "y": "b"}, "value": 2.0 }, { "address": { "x": "c", "y": "d"}, "value": 3.0 } ] }When an application with tensor constants is deployed, the files are distributed to the content nodes before the new configuration is being used by the search nodes. Incremental changes to constant tensors is not supported. When changed, replace the old file with a new one and re-deploy the application or create a new constant with a new name in a new file.
document-summary
Contained in search
.
An explicitly defined document summary. By default, a document summary
named default
is created. Using this element, other document
summaries containing a different set of fields can be created.
document-summary [name] { [body] }The body of a document summary consists of:
Name | Description | Occurrence |
---|---|---|
from-disk | Marks this summary as accessing fields on disk | Zero or one |
summary | A summary field in this document summary. | Zero to many |
stemming
Contained in field
,
search
or
index
.
Sets how to stem a field or an index, or how to stem by default.
Read more on stemming.
stemming: [stemming-type]The stemming types are:
Type | Description |
---|---|
none | No stemming: Keep words unchanged |
best | Use the 'best' stem of each word according to some heuristic scoring. This is the default setting |
shortest | Use the shortest stem of each word |
multiple | Use multiple stems. Retains all stems returned from the linguistics library |
normalizing
Contained in field
.
Sets normalizing to be done on this field.
Default is to normalize.
normalizing: [normalizing-type]
Type | Description |
---|---|
none | No normalizing. |
alias
Contained in attribute
,
field
or
index
.
Makes an index or attribute available under an additional name:
alias [index/attr-name]: [alias]If the index/attribute name is skipped, the containing field or index name is used. Alias names can be any name string, dots are allowed as well.
attribute
Contained in field
or
struct-field
.
Specifies a property of an index structure attribute:
attribute [attribute-name]: [property]or
attribute [attribute-name] { [property] [property] … }Read the introduction to attributes. The attribute name can be skipped, in which case the field name is used. Actions required when adding or modifying attributes. The following properties are available:
Property | Description |
---|---|
fast-search | Create a B-tree index with B-tree posting lists for the attribute. This speeds up search in the attribute, trading off memory. |
fast-access |
If searchable-copies <
redundancy ,
use fast-access to load the attribute in memory on all nodes with a document replica.
Use this for fast access when doing
partial updates and when used in a
selection expression for garbage collection.
If redundancy == searchable-copies (default) this property is a no-op.
|
alias | An alias for the attribute. Add an attribute name before the colon to specify an alias for another attribute than the one given by field name. |
sorting | The sort specification for this attribute. |
Note that normalizing and tokenization is not enabled by default for attribute fields. Queries in attribute fields are hence not normalized. Use index on fields to enable. Both index and attribute can be set on a field.
sorting
Contained in attribute
or
field
.
Specifies how sorting should be done.
sorting : [property]or
sorting { [property] [property] … }
Property | Description |
---|---|
order |
Either ascending or descending. Default is ascending. Used unless overridden in sortspec in query. |
function |
The Sort function to be used. Implemented functions are raw, lowercase, and uca. The default is uca, but please note that if no language or locale is specified in the query sortspec, the field, or generally for the query, lowercase will be used instead. Used unless overridden in sortspec in query. |
strength |
Sort
strength to be used. Implemented levels are primary,
secondary, tertiary, quaternary
and identical. The default is primary.
Used unless overridden in sortspec
in query. Only applicable if function is set to uca.
|
locale |
Locale
to be used. The default is none, indicating that it is
inferred from query. It should only be set here if the
attribute is filled with data that is in 1 language only. Used
unless overridden in sortspec
in query. Only applicable if function is set
to uca.
|
bolding
Contained in field
.
Highlight matching query terms in the summary:
bolding: onNot applicable to streaming search. Instead use
summary: dynamic
.
The default is no bolding, set bolding: on
to enable it. Note that this command is overridden by
summary: dynamic
, if both are specified, bolding will be ignored. The difference between using bolding instead
of summary: dynamic
is the latter will provide a dynamic abstract in addition to highlighting
search terms while the first only does highlighting.
The default XML element used to highlight the search terms is <hi> - to override, set container.qr-searchers configuration. Example using <strong>:
<container> <search> <config name="container.qr-searchers"> <tag> <bold> <open><strong></open> <close></strong></close> </bold> <separator>...</separator> </tag> </config> <search> <container>
id
Contained in field
.
Sets the numerical id of this field.
All fields have a document-internal id internally for transfer and storage.
Id's are usually determined programmatically as a 31-bit number.
Some storage and transfer space can be saved by instead explicitly setting id's to a 7-bit number.
id: [positive integer]An id must satisfy these requirements:
- Must be a positive integer
- Must be less than 100 or larger than 127
- Must be unique within the document and all documents this document inherits
index
Contained in field
or search
.
Sets index parameters.
Content in fields with index are normalized and
tokenized by default.
This element can be single- or multivalued:
index [index-name]: [property]or
index [index-name] { [property] [property] … }The index name can be skipped inside fields, causing the index name to be the field name. Parameters:
Property | Description | Occurrence |
---|---|---|
alias | Specify an alias to this index to be available in searches. | Zero to many |
stemming | Set the stemming of this index. Indexes without a stemming setting get their stemming setting from the fields added to the index. Setting this explicitly is useful if fields with conflicting stemming settings are added to this index. | Zero to one |
arity | Set the
arity value for a predicate field.
The data type for the containing field must be predicate . |
One (mandatory for predicate fields), else zero. |
lower-bound | Set the
lower bound value for a predicate field.
The data type for the containing field must be predicate . |
Zero to one. |
upper-bound | Set the
upper bound value for predicate fields.
The data type for the containing field must be predicate . |
Zero to one. |
dense-posting-list-threshold | Set the
dense posting list threshold value for predicate fields.
The data type for the containing field must be predicate . |
Zero to one. |
enable-bm25 | Enable this index field to be used with the bm25 rank feature. This creates posting lists for the indexes for this field that have interleaved features in the document id streams. This makes it very fast to compute the bm25 score. | Zero to one. |
indexing
Contained in field
or
struct-field
.
One or more Indexing Language instructions used to produce index, attribute
and summary data from this field. Indexing instructions has pipeline
semantics similar to unix shell commands. The value of the field
enters the pipeline during indexing and the pipeline puts the value
into the desired index structures, possibly doing transformations and
pulling in other values along the way.
indexing: [index-statement]or
indexing { [indexing-statement]; [indexing-statement]; … }If the field containing this is defined outside the document, it must start by an indexing statement which outputs a value (either "input [fieldname]" to fetch a field value, or a literal, e.g "some-value" ). Fields in documents will use the value of the enclosing field as input (input [fieldname]) if one isn't explicitly provided.
Specify the operations separated by the pipe (|
) character.
For advanced processing needs,
use the indexing language,
or write a document processor.
Supported expressions for fields are:
attribute |
Attribute is used to make a field available for sorting,
grouping, ranking and searching using match mode word .
|
---|---|
index | Creates a searchable index for the values of this field. All strings are lower-cased before stored in the index. By default the index name will be the same as the name of the search definition field. Use a fieldset to combine fields in the same set for searching. |
set_language | Sets document language - details. |
summary | Includes the value of this field in a summary field. Modify summary output by using summary: (e.g. to generate dynamic teasers). |
When combining both index
and attribute
in the indexing statement for a field, e.g indexing: summary|attribute|index
,
the match mode becomes text
for the field. So searches in this field will not search the contents in the attribute but the index
indexing-rewrite
Contained in field
.
Vespa will normally rewrite indexing statements extensively to
implement the technical tasks which are required to carry out the
intentions of the indexing statement. The rewriting done can be
controlled using this element.
indexing-rewrite: noneInclude this to let an indexing statement pass through unaltered. Note that such statements must begin with an
input <fieldname>
, get_var
or
constant expression. You should understand which rewrites Vespa
does, and be certain that your indexing statement can do without them
to use this. This statement must be placed somewhere below the
indexing
statement in the field.
match
Contained in field
, fieldset
or
struct-field
.
Sets the matching method to use for this field to something else than the default token matching.
match: [property]or
match { [property] [property] … }Whether the match type is
text
, word
or exact
,
all term matching will be done after normalization
and locale independent lowercasing (in that order).
Also see search using regular expressions.
Property | Valid with | Description |
---|---|---|
text |
Indexes, streaming | Default for indexes. Can not be combined with exact matching. The field is matched per token. |
exact |
Indexes, attributes, streaming | Can not be combined with text matching. The field is matched exactly: Strings containing any characters whatsoever will be indexed and matched as-is. In queries, the exact match string ends at the exact match terminator (below).
A field with |
exact-terminator |
Indexes, attributes, streaming |
Only valid for match { exact exact-terminator: "@%" }on a field called tag to make query tag:a b c!@%
match documents with the string a b c!
Example using the default terminator: If someword AND (tag:!*!@@ OR tag:(kanoo)@@)matches documents with someword
and either !*! or (kanoo) as a tag.
Note that without the @@ terminating the second tag string,
the second tag value would be (kanoo)) .
|
word |
Indexes, attributes | This is the default matching mode for string attributes. Can not be combined with text matching. Word matching is like exact matching, but with more advanced query parsing. The query terms is heuristically parsed taking into account some usual query syntax characters; one can also use double quotes to include space, star, or exclamation marks.
Example: If foo AND (artist:"'N Sync" OR artist:"*NSYNC" OR artist:A*teens OR artist:"Wham!")matches documents with foo and at least one of
'N Sync or *NSYNC or A*teens or Wham!
in the artist field
Note that without the quotes, the space in |
prefix |
Attributes, streaming | Set default match mode to prefix for the field - i.e. queries do not need to specify prefix matching. As the data structures in attributes and streaming search support prefix searches, one can always set prefix matching in the query, without setting the field to prefix default. Also see regular expressions. |
substring |
Streaming | Set default match mode to substring for the field. Only available in streaming search. As the data structures in streaming search support substring searches, one can always set substring matching in the query, without setting the field to substring default. Also see regular expressions. |
suffix |
Streaming | Like substring (above). |
max-length |
Indexes, streaming | Limit the length of the field that will be used for matching. |
gram |
Indexes | This field is matched using n-grams. For example, with the default gram size 2 the string "hi blue" is tokenized to "hi bl lu ue" both in the index and in queries to the index. N-gram matching is useful mainly as an alternative to segmentation in CJK languages. Typically it results in increased recall and lower precision. However, as Vespa usually uses proximity in ranking, the precision offset may not be of much importance. Grams consumes more resources than other matching methods because both indexes and queries will have more terms, and the terms contains repetition of the same letters. On the other hand, CPU intensive CJK segmentation is avoided. It may also be used for substring matching in general. |
gram-size |
Indexes | A positive, nonzero, number, default 2. Sets the gram size when gram matching is used. Example: match { gram gram-size: 3 } |
rank
Contained in field
or
rank-profile
.
Set the kind of ranking calculations which will be done for the field. Even though the
actual ranking expressions decide the ranking, this settings tells Vespa which preparatory calculations
and which data structures are needed for the field.
rank [field-name]: [ranking settings]or
rank { [ranking setting] }The field name should only be specified when used inside a rank-profile. The following ranking settings are supported in addition to the default:
Ranking setting | Description |
---|---|
filter | Indicates that matching in this field should use fast bit vector data structures only. This saves a lot of CPU during matching, but only a few simple ranking features will be available for the field. This setting is appropriate for fields typically used for filtering or simple boosting purposes, like filtering or boosting on the language of the document. Example. |
normal |
The reverse of filter .
Matching in this field will use normal data structures and give normal match information for ranking.
Used to turn off implicit rank: filter when using match: exact.
If both filter and normal are set somehow,
the effect is as if only normal was specified.
|
query-command
Contained in fieldset
, field
or
struct-field
.
Specifies a function to be performed on query terms to the indexes of this field when searching.
The Search Container server has support for writing Vespa Searcher plugins which processes these commands.
query-command: [any string]If you write a plugin searcher which needs some index-specific configuration parameter, that parameter can be set here.
rank-type
Contained in field
or
rank-profile
.
Selects the low-level rank settings to be used for this field when using nativeRank
.
rank-type [field-name]: [rank-type-name]The field name can be skipped inside fields. Defined rank types are:
Type | Description |
---|---|
identity | Used for fields which contains only what this document is, e.g. "Title". Complete identity hits will get a very high rank. |
about | Some text which is (only) about this document, e.g. "Description". About hits get high rank on partial matches and higher for matches early in the text and repetitive matches. This is the default rank type. |
tags | Used for simple tag fields of type tag. The tags rank type uses a logarithmic table to give more relative boost in the low range: As tags are added they should have significant impact on rank score, but as more and more tags are added, each new tag should contribute less. |
empty | Gives no relevancy effect on matches. Used for fields you just want to treat as filters. |
nativeRank
you can specify a rank type per field.
If the supported rank types do not meet your requirements you can explicit configure
the native rank features using rank-properties.
See the native rank reference for more information.
summary-to
DEPRECATED
Contained in field
or
struct-field
.
Specifies the name of the document summaries which should contain this field.
summary-to: [summary-name], [summary-name], …Fields with summary will always be part of the default summary regardless of this setting. Use explicit document-summary instead. See also document summaries.
summary
Contained in field
or
document-summary
or
struct-field
.
Declares a summary field.
summary: [property]or
summary [name] type [type] { [body] }The summary name can be skipped if this is set inside a field. The name will then be the same as the name of the source field. In fields, the summary type can also be skipped, in which case the type will be determined by the field type. The summary data types available are the same as the document field data types. full summary is the default. Long field values (like document content fields) should be made dynamic. The body of a summary may contain:
Name | Description | Occurrence |
---|---|---|
full |
Returns the full field value in the summary (the default). | Zero to one |
dynamic |
Make the value returned in results from this summary field be a dynamic abstract of the source summary
field by extracting fragments of text around matching words. Matching words will also be highlighted, in
similarity with the bolding feature.
This highlighting is not affected by the query-argument bolding.
The default XML element used to highlight query terms is
<hi> - refer to bolding for how to configure.
|
Zero to one |
source |
Specifies the name of the field or fields from which the value of this summary
field should be fetched. If multiple fields are specified, the value
will be taken from the first field if that has a value, from the
second if the first one is empty and so on.
source: [field-name], [field-name], …When this is not specified, the source field is assumed to be the field with the same name as the summary field. |
Zero to one |
to |
Specifies the name of the document summaries this should be included in.
to: [document-summary-name], [document-summary-name], …This can only be specified in fields, not in explicit document summaries. When this is not specified, the field will go to the default document summary. |
Zero to one |
matched-elements-only |
Specifies that only the matched elements in a searchable array of struct or map type field are returned as part of document summary. Is typically used in accordance with the sameElement operator, but can also be used when searching directly on a sub struct field. Is also supported in streaming search, or when the array of struct or map type field is imported. Example .sd file from array of struct and map type system test. | Zero to one |
weight
Contained in field
.
The weight of a field - the default is 100.
The field weight is used when calculating the rank scores.
weight: [positive integer]
weightedset
Contained in field
of type weightedset.
Properties of a weighted set.
weightedset: [property]or
weightedset { [property] [property] … }
Property | Description | Occurrence |
---|---|---|
create-if-nonexistent |
If the weight of a key is adjusted in a document using a partial update increment or decrement command, but the key is currently not present, the command will be ignored by default. Set this to make keys to be created in this case instead. This is useful when the weight is used to represent the count of the key. | Zero to one |
remove-if-zero |
This is the companion of create-if-nonexistent for the converse case:
By default keys may have zero as weight.
With this turned on, keys whose weight is adjusted (or set) to zero, will be removed. |
Zero to one |
annotation
Contained in search
.
Defines an annotation type, to be used by the Annotations API.
A name of the annotation is mandatory, the body is optional.
annotation [name] { [body] }
import field
Contained in search
.
Using a reference to a document type,
import a field from that document type into this search definition to be used for matching, ranking, grouping and sorting.
Refer to parent/child.
Only attribute fields can be imported. The imported field inherit all but the following properties from the parent field:
attribute: fast-access
Field type | Restriction |
---|---|
array of struct | Can be imported if at least one of the struct fields has an attribute. Only the struct fields with attributes will be visible. |
map of struct | Can be imported if the key field has an attribute and at least one of the struct fields has an attribute. Only the key field and the struct fields with attributes will be visible. |
map | Can be imported if both key and value fields have attributes. |
position | Can be imported if it has an attribute. |
array of position | Can be imported if it has an attribute. |
To use an imported field in summary, create an explicit document summary containing the field.
Field types
string | Use for a text field of any length. String fields may only contain text characters, as defined by
isTextCharacter in
com.yahoo.text.Text
field surname type string { indexing: summary | index }
| ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
int | Use for single 32-bit integers.
field release_year type int { indexing: summary | attribute }
| ||||||||||||||||||
long | Use for single 64-bit integers.
field bignumber type long { indexing: summary | attribute }
| ||||||||||||||||||
bool | Use for boolean values.
field alive type bool { indexing: summary | attribute }
| ||||||||||||||||||
byte | Use for single 8-bit numbers.
field smallnumber type byte { indexing: summary | attribute }
| ||||||||||||||||||
float | Use for floating point numbers (32-bit IEEE 754 float).
field myfloat type float { indexing: summary | attribute }
| ||||||||||||||||||
double | Use for high precision floating point numbers (64-bit IEEE 754 double).
field mydouble type double { indexing: summary | attribute }
| ||||||||||||||||||
position |
Used to filter and/or rank documents by distance to a position in the query,
see Geo search.
field location type position { indexing: attribute }
| ||||||||||||||||||
predicate |
Use to match queries to a set of boolean constraints.
See querying predicate fields.
field predicate_field type predicate { indexing: attribute index { arity: 2 # mandatory lower-bound: 3 upper-bound: 200 dense-posting-list-threshold: 0.25 } }
| ||||||||||||||||||
raw | Use for binary data
field rawfield type raw { indexing: summary }
| ||||||||||||||||||
uri | Use for URL type matching
| ||||||||||||||||||
array<type> |
For single-value (primitive) types, use to create an array field of the element type:
struct person { field first_name type string {} field last_name type string {} } field people type array<person> { indexing: summary struct-field first_name { indexing: attribute } struct-field last_name { indexing: attribute } }The entire people field is part of document summary, and the struct fields first_name and last_name are defined as attributes available for searching. Note that you can define only a subset of the struct fields as attributes. Use the sameElement operator to ensure matches in same struct field instance. Use matched-elements-only to reduce the amount of data that is returned in document summary. Restrictions:
| ||||||||||||||||||
weightedset<element-type> | Use to create a multivalue field of the element type, where each element is assigned a signed 32-bit integer weight. field tag type weightedset<string> { indexing: attribute | summary }The element type can be any single value type. The weights may be assigned any semantics by the application, default 1. Two main use cases:
nativeRank directly as the rank score of the field.
It is also possible to create a rank type which uses a rank boost table,
weightboost to calculate the rank value from the weight (the tags rank type does this by default).
It is possible to specify that a new key should be created if it does not exist before the update,
and that it should be removed if the weight is set to zero.
This is only usable together with the
| ||||||||||||||||||
tensor(dimension-1,...,dimension-N) |
Use to create a tensor field with the given
tensor type spec
that can be used for ranking -
a tensor field is not searchable.
See tensor evaluation reference for definition, the tensor user guide and the JSON feed format. field tensorfield type tensor<float>(x{},y{}) { indexing: attribute | summary } field tensorfield type tensor<float>(x[2],y[2]) { indexing: attribute | summary }
| ||||||||||||||||||
struct |
Use to define a field with a struct datatype.
Create a struct type inside the document definition and
declare the struct field in a document or struct using the struct type name as the field type:
struct person { field first_name type string {} field last_name type string {} } field my_person type person { indexing: summary }Restrictions:
| ||||||||||||||||||
map<key-type,value-type> |
Use to create a map where each unique key is mapped to a single value. Any primitive type is used as key-type and any Vespa type as value-type. A map entry is handled as a struct with a key and value field with key-type and value-type as types. Example: struct person { field first_name type string {} field last_name type string {} } field identities type map<string, person> { indexing: summary struct-field key { indexing: attribute } struct-field value.last_name { indexing: attribute } }The entire identities field is part of document summary, and the struct fields key and value.last_name are attributes available for searching using the sameElement operator, and grouping. Note that you can define only a subset of the struct fields as attributes. Use matched-elements-only to reduce the amount of data that is returned in document summary. The next example shows a map of primitive types, where the key and value struct fields are specified as attributes: field my_map type map<string, int> { indexing: summary struct-field key { indexing: attribute } struct-field value { indexing: attribute } }The previous example is similar to the following, the difference being that an array can contain the same element multiple times and maintains order. struct mystruct { field key type string { } field value type int { } } field my_array type array<mystruct> { indexing: summary struct-field key { indexing: attribute } struct-field value { indexing: attribute } }Restrictions:
| ||||||||||||||||||
annotationreference |
Use to define a field (inside annotation, or inside e.g. a
struct used by a field in an annotation) with a reference to another annotation.
Should only be used for fields declared inside annotation,
or as a base type by the use of any of the compound types listed above, inside annotation.
To define a such a field, you must first create an annotation type.
The struct must be defined inside the search definition.
To declare an annotationreference field in an annotation, use the annotation name to identify the field type:
annotation foo { field baz type annotationreference<bar> { } } annotation bar { }
| ||||||||||||||||||
reference<document-type> |
A reference<document-type> field is a reference to an instance of a document-type -
i.e. a foreign key.
field artist_ref type reference<artist> { indexing: attribute }The reference is the document id of the document-type instance. References are used to join documents in a parent-child relationship. A reference can only be made to global documents. The following type of references are not supported:
|
Document and search field types
Note that it is possible to make a document field of one type into one or more instances of another search field, by declaring a field outside the document, which uses other fields as input. For example, to create an integer attribute for a string containing a comma-separated list of integers in the document, do like this:
search example { document example { field yearlist type string { # Comma-separated years … } … } field year type array<int> { # Search field using the yearlist value indexing: input yearlist | split "," | attribute } }
Example
search example { document example { field title type string { indexing: summary | index } field description type string { indexing: summary | index } field author type string { indexing: summary | index # author name only, so no stemming stemming: none } field category type string { indexing: summary | attribute attribute: fast-search match: exact #Don't tokenize rank:filter # Only for matching. Most efficient search of a string type } field popularity type int { indexing: summary | attribute attribute:fast-search } field measurement type int { indexing: summary | attribute } # Categories as an array - preferable field morecategories type array<string> { indexing: index match: exact } } fieldset default { fields: title, description } }
Modify Search Definitions
This section describes how a search definition in a live application can be modified - categories:
- Valid changes without restart or re-feed
- Changes that require restart but not re-feed
- Changes that require re-feed
vespa-deploy prepare
on a new application package,
the changes in the search definition files are compared with the files in the current active package.
If some of the changes require restart or re-feed, the output from vespa-deploy prepare
specifies which actions are needed.
The document mode(s) (index, streaming)
for which a change is applicable is also listed in the categories below.
Treat mode store-only the same as mode streaming.
NOTE: If there are changes to perform on a live
system that are not covered by this document and no output is given from vespa-deploy prepare
,
their impact is undefined and in no way guaranteed to allow a system to stay live until re-feeding.
Changes not related to the search definition are discussed
in admin procedures.
It is best practice to try changes in a staging system first.
Valid changes without restart or re-feed
Procedure:
- Run
vespa-deploy prepare
on the changed application - Run
vespa-deploy activate
. The changes will take effect immediately
Change | Applicable for mode | Description |
---|---|---|
Add a new document field | index, streaming | Add a new document field as index, attribute, summary or any combinations of these. Existing documents will implicitly get the new field with no content. Documents fed after the change can specify the new field. If the field has existed with same type earlier, then old content may or may not reappear |
Remove a document field | index, streaming | Existing documents will no longer see the removed field, but the field data is not completely removed from the search node |
Add or remove an existing document field from document summary | index, streaming | Add an existing field to summary or any number of summary classes, and remove an existing field from summary or any number of summary classes |
Remove the attribute aspect from a field that is also an index field | index | This is the only scenario of changing the attribute aspect of a document field that is allowed without restart |
Change the attribute aspect of a document field | streaming | Add or remove a field as attribute. In mode streaming this only indicates that the field is used for grouping, sorting, ranking and matching. No changes to underlying data structures |
Change the index aspect of a document field | streaming | Add or remove a field as index. In mode streaming this only indicates that the field is used for matching. No changes to underlying data structures |
Add, change or remove match settings for a field | streaming | In mode streaming such change does not effect the documents stored in the backend and can be done without restart and re-feed |
Add, change or remove field sets | index, streaming | Change fieldsets used to group fields together for searching |
Change the alias or sorting attribute settings for an attribute field | index, streaming | |
Add, change or remove rank profiles | index, streaming | |
Change document field weights | index, streaming | |
Add, change or remove field aliases | index, streaming | |
Add, change or remove rank settings for a field | index, streaming |
Exception: Changing rank: filter on an attribute field in mode index requires restart.
See details in next section
|
Add or remove a search definition | index, streaming | Removing a search definition file will make proton drop all documents of that type - subsequently releasing memory and disk. |
Changes that require restart but not re-feed
Procedure:
- Run
vespa-deploy prepare
on the changed application. Output specifies which restart actions are needed - Run
vespa-deploy activate
- Restart
services
on the services specified in the prepare output
Change | Applicable for mode | Description |
---|---|---|
Change the attribute aspect of a document field | index | Add or remove a field as attribute. When adding, the attribute is populated based on the field value in stored documents during restart. When removing, the field value in stored documents is updated based on the content in the attribute during restart |
Change the attribute settings for an attribute field | index, streaming |
For mode index: Change the following attribute settings:
fast-search , fast-access .
For mode streaming: Change the following attribute settings: fast-access (the other settings are not used)
|
Change the rank filter setting for an attribute field | index |
Add or remove rank: filter on an attribute field.
|
search test { document test { field f1 type string { indexing: summary } } }Then add field
f1
as an attribute:
search test { document test { field f1 type string { indexing: attribute | summary } } }The following is output from
vespa-deploy prepare
-
which restart actions are needed:
WARNING: Change(s) between active and new application that require restart: In cluster 'mycluster' of type 'search': Restart services of type 'searchnode' because: 1) Document type 'test': Field 'f1' changed: add attribute aspect
Changes that require re-feed
All of the changes listed below require re-feeding of all documents. Unless a change is listed in the above sections treat it as if it was listed here. Until re-feed is complete, affected fields will be empty or have potentially wrong annotations not matching the query processing. Procedure:
- Run
vespa-deploy prepare
on the changed application. Output specifies which re-feed actions are needed - Stop feeding, wait until done
- Run
vespa-deploy activate
- Re-feed all documents
Change | Applicable for mode | Description |
---|---|---|
Change the data type or collection type of a document field | index, streaming |
Existing documents will no longer have any content for this field.
To populate the field, re-feed the existing documents using the new type for this field.
There will be no automatic conversion from old to new field type.
NOTE: If not re-feeding after such a change, serving works, but searching this field will not give any results |
Change index aspect of a document field | index | This changes the document processing pipeline before documents arrive in the backend. Only documents fed after index aspect was added will have annotations and be present in the reverse index. Only documents fed after index aspect was removed will avoid disk bloat due to unneeded annotations |
Change fields from static to dynamic summary, or vice versa | index | |
Switch stemming/normalizing on or off | index |
This changes the document processing pipeline before documents
arrive in the backend, and what annotations are made for an indexed field.
NOTE: If not re-feeding after such a change, serving works, but recall is undefined as the index has been produced using a different setting than the one used when doing stemming/normalizing of the query terms |
Switch bolding on or off | index | |
Add, change or remove match settings for a field | index |
Example: Adding match: word to a field.
This changes the document processing pipeline before documents arrive in the backend, and what annotations are made for an indexed field. NOTE: If not re-feeding after such a change, serving works, but recall is undefined as the index has been produced using one match mode while run-time is using a different match mode |
Change the tensor type of a tensor attribute | index |
search test { document test { field f1 type string { indexing: summary } } }Then add field
f1
as an index:
search test { document test { field f1 type string { indexing: index | summary } } }The following is output from
vespa-deploy prepare
-
which re-feed actions are needed:
WARNING: Change(s) between active and new application that require re-feed: Re-feed document type 'test' in cluster 'mycluster' because: 1) Document type 'test': Field 'f1' changed: add index aspect, indexing script: '{ input f1 | summary f1; }' -> '{ input f1 | tokenize normalize stem:"SHORTEST" | index f1 | summary f1; }'