This documents the syntax and content of schemas, document types and fields. This is a reference, see schemas for an overview. Find an example at the end.
Throughout this document, a string in square brackets represents some argument. The whole string, including the brackets, is replaced by a concrete string in a schema.
Constructs in schemas have a regular syntax. Each element starts by the element identifier, possibly followed by the name of this particular occurrence of the element, possibly followed by a space-separated list of interleaved attribute names and attribute values, possibly followed by the element body. Thus, one will find elements of these varieties:
[element-identifier] : [element-body]
[element-identifier] [element-name] : [element-body]
[element-identifier] [element-name] [attribute-name] [attribute-value]
[element-identifier] [element-name] [attribute-name] [attribute-value] {
[element-body]
}
One-line element values starts by a colon and ends by newline.
Multiline values (for fields supporting them) are any block of text enclosed in curly brackets.
Comments may be inserted anywhere and start with a hash (#).
Names are identifiers, they must match ["a"-"z","A"-"Z", "_"]["a"-"z","A"-"Z","0"-"9","_"]*
.
Elements and structure of a schema file:
schema document struct field match field alias attribute distance-metric bolding dictionary id index hnsw indexing indexing-rewrite match normalizing query-command rank rank-type sorting stemming struct-field indexing match query-command struct-field … summary summary-to DEPRECATED summary summary-to DEPRECATED weight weightedset compression index field fieldset rank-profile match-phase attribute order max-hits diversity attribute min-groups first-phase keep-rank-count rank-score-drop-limit expression second-phase expression rerank-count function [name] inputs constants onnx-model rank-properties match-features mutate on-match on-first-phase on-second-phase on-summary summary-features rank-features ignore-default-rank-features num-threads-per-search num-search-partitions min-hits-per-thread termwise-limit post-filter-threshold approximate-threshold rank rank-type constant onnx-model stemming document-summary summary annotation field import field raw-as-base64-in-summary
The root element of schemas.
A schema describes a type of data and what we should compute over it.
A schema must be defined in a file named [schema-name].sd
.
schema [name] inherits [name] { [body] }
The inherits
attribute is optional.
If a schema is inherited, this schema will include all the constructs of it as if they
were defined in this (except the parent document type).
The document type in this must declare that it inherits the document type of the parent schema.
The body is mandatory and may contain:
Name | Description | Occurrence |
---|---|---|
document | A document type defined in this schema | One |
field | A field not contained in the document. Use synthetic fields (outside document) to derive new field values to be placed in the indexing structure from document fields. Find examples in reindexing. | Zero to many |
fieldset | Group document fields together for searching | Zero to many |
rank-profile | A bundle of ranking functions and settings, selectable in a query. | Zero to many |
constant | A constant tensor located in a file used for ranking | Zero to many |
onnx-model | An ONNX model located in the application package used for ranking | Zero to many |
stemming | The default stemming setting. | Zero or one |
raw-as-base64-in-summary | Base64 encode raw fields in summary rather than using an escaped string. Default is true. | Zero or one |
document-summary | An explicitly defined document summary | Zero to many |
annotation | Defines an annotation type | Zero to many |
import field | Import a field value from a global document | Zero to many |
Contained in schema
and describes a document type.
This can also be the root of the schema, if the document is not to be queried directly.
document [name] inherits [name-list] { [body] }
The document name is optional, it defaults to the containing schema
element's name. If there is no containing schema
element, the document name is required.
The inherits
attribute is optional
and has as value a comma-separated list of names of other document types.
A document type may inherit the fields of one or more other document types,
see document inheritance for examples.
If no document types are explicitly inherited,
the document inherits the generic document
type.
The body of a document type is optional and may contain:
Name | Description | Occurrence |
---|---|---|
struct | A struct type definition for this document. | Zero to many |
field | A field of this document. | Zero to many |
compression | Specifies compression options for documents of this document type in storage. | Zero to one |
Contained in document
.
Defines a composite type.
A struct consists of zero or more fields that the user can access together as one.
The struct has to be defined before it is used as a type in a field specification.
struct [name] { [body] }
The body of a struct is optional and may contain:
Name | Description | Occurrence |
---|---|---|
field | A field of this struct. | Zero to many |
Contained in schema
,
document
,
struct
or
annotation
.
Defines a named value with a type and (optionally) how this field
should be stored, indexed, searched, presented and how it should influence ranking.
field [name] type [type-name] { [body] }Do not use names that are used for other purposes in the indexing language or other places in the schema file. Reserved names are:
Other names not to use include any words that start with a number or includes special characters.
The type attribute is mandatory - see field type for details and indexing restrictions. Supported types:
Name | Singular/Multi | Type |
---|---|---|
annotationreference<annotationtype> | singlevalue | Reference to a string annotation |
array<type> | multivalue | Array of type |
weightedset<element-type> | multivalue | Like array , but each element is also assigned an integer weight |
bool | singlevalue | true or false |
byte | singlevalue | Signed 8-bit integer |
double | singlevalue | 64-bit IEEE 754 floating point |
float | singlevalue | 32-bit IEEE 754 floating point |
int | singlevalue | Signed 32-bit integer |
long | singlevalue | Signed 64-bit integer |
position | singlevalue | Position in geographical coordinates, e.g. latitude and longitude |
predicate | singlevalue | Boolean expression in predicate logic |
raw | singlevalue | Binary data |
string | singlevalue | Text |
structname | singlevalue | Declares a field with a specific struct type, given by the struct name |
map<key-type,value-type> | multivalue | Map using the given types as keys and values. Keys and values can be any type |
tensor(dimension-1,...,dimension-N) | multivalue | Tensor with a set of named dimensions and a set of values located in the space of those dimensions |
uri | singlevalue | Uniform Resource Identifier (a URL or any other unique string id) |
reference<document-type> | singlevalue | Reference to an instance of a document-type used in a parent-child relationship |
The body of a field is optional for schema
,
document
and
struct
, and disallowed for
annotation
. It may contain the following elements:
Name | Description | Occurrence |
---|---|---|
alias | Make an index or attribute available in queries under an additional name. This has minimal performance impact and can safely be added to running applications. | Zero to many |
attribute | Specify an attribute setting. | Zero to many |
bolding | Specifies whether the content of this field should be bolded. Only supported for index fields of type string or array<string>. | Zero to one |
id | Explicitly decide the numerical id of this field. Is normally not necessary, but can be used to save some disk space. | Zero to one |
index | Specify a parameter of an index. | Zero to many |
indexing | The indexing statements used to create index structure additions from this field. | Zero to one |
indexing-rewrite | Determines the rewriting Vespa is allowed to do on the indexing statements of this field. | Zero to one |
match | Set the matching type to use for this field. | Zero to one |
normalizing | Specifies the kind of text normalizing to do on a string field. | Zero or one. |
query-command | Specifies a command which can be received by a plugin searcher in the Search Container. | Zero to many |
rank | Specify if the field is used for ranking. | Zero or one |
rank-type | Selects the set of low-level rank settings to be used for this field when using default nativeRank . |
Zero to one |
sorting | The sort specification for this field. | Zero or one. |
stemming | Specifies stemming options to use for this field. | Zero or one. |
struct-field | A subfield of a field of type struct. The struct must have been defined to contain this subfield in the struct definition. If you want the subfield to be handled differently from the rest of the struct, you may specify it within the body of the struct-field. | Zero to many. |
summary | Sets a summary setting of this field, set to dynamic
to make a dynamic summary. |
Zero to many |
summary-to |
Deprecated:
Use document-summary instead.
The list of document summary names this should be included in. |
Zero to one |
weight | The importance of a field when searching multiple fields and using nativeRank . |
Zero to one |
weightedset | Properties of a weightedset weightedset<element-type> | Zero to one |
Fields can not have default values. See the document guide for how to auto-set field values.
It is not possible to query for fields without value (i.e. query for NULL) - see the query language reference. Fields without value are not returned in query results.
Fields can be declared outside the document block in the schema.
These fields are not part of the document type but behave like regular fields for queries.
Since they are not part of the document they cannot be written directly,
but instead take their values from document fields, using the input
expression:
indexing: input my_document_field | embed | summary | index
This is useful e.g. to index a field in multiple ways, or to change the field value, something which is not allowed with document fields. When the document field(s) used as input are updated, these fields are updated with them.
Contained in field
or
struct-field
.
Defines how this struct field (a subfield of a struct) should be stored,
indexed, searched, presented and how it should influence ranking.
The field in which this struct field is contained must be of
type struct or a collection of type struct:
struct-field [name] { [body] }
The body of a struct field is optional and may contain the following elements:
Name | Description | Occurrence |
---|---|---|
indexing | The indexing statements used to create index structure additions from this field.
For indexed search only attribute is supported,
which makes the struct field a searchable in-memory attribute. |
Zero to one |
attribute | Specifies an attribute setting. For example attribute:fast-search . |
Zero to many |
If this struct field is of type struct (i.e. a nested struct),
only indexing:summary
may be specified.
Contained in schema
.
Fieldsets provide a way to group fields with the same match
settings together for searching.
To search multiple fields - example:
fieldset myfieldset { fields: a,b,c }
Using the query yql=select from sources * where myfieldset contains "foo"
will return all the documents for which one or more of the fields a, b or c contain "foo".
By naming the field set default, those fields are searched without
specifying the field set in unstructured queries: query=foo
.
The fields making up the fieldset should be as similar as possible in terms of indexing clause and match mode. If they are not, test the application thoroughly. Having different match modes for the fields in the fieldset will generate a warning during application deployment.
Adding a fieldset will not create extra index structures in memory / on disk, it is just a mapping.
If specific match settings for the field set is needed, such as exact, specify it using a match clause:
fieldset myfieldset { fields: a,b,c match { exact } }
Use query-commands
in the field set to set search settings. Example:
fieldset myfieldset { fields: a,b,c query-command:"exact @@" }
Contained in document
.
If a compression level is set within this element,
lz4 compression is enabled for whole documents.
compression { [body] }The body of a compression specification is optional and may contain:
Name | Description | Occurrence |
---|---|---|
type | LZ4 is the only valid compression method. |
Zero to one |
level | Enable compression. LZ4 is linear and 9 means HC(high compression) |
Zero to one |
threshold |
A percentage (multiplied by 100) giving the maximum size that compressed data can have to keep the compressed value. If the resulting compressed data is higher than this, the document will be stored uncompressed. Default value is 95. |
Zero to one |
Contained in schema
or equivalently in separate files in the
application package, named
[profile-name].profile
in any directory below schemas/[schema-name]/
.
A rank profile is a named set of ranking expression functions and
settings which can be
selected in the query).
Whether defined inline in the schema or in a separate .profile file, the syntax of a rank profile is
rank-profile [name] inherits [rank-profile1], [rank-profile2], ... { [body] }
The inherits
list is optional and may contain the name of other rank profiles
in this schema or one it inherits.
Elements not defined in this rank profile will then be inherited from those profiles.
Inheriting multiple profiles which define the same elements leads to an error at deployment.
The body of a rank-profile may contain:
Name | Description | Occurrence |
---|---|---|
strict | true/false: Whether to use strict or loose type checking. | Zero or one |
match-phase | Ranking configuration to be used for hit limitation during matching. | Zero or one |
first-phase | The ranking config to be used for first-phase ranking. | Zero or one |
second-phase | The ranking config to be used for second-phase ranking. | Zero or one |
function [name] | Defines a named function that can be referenced during ranking phase(s) and (if without arguments) as part of match-and summary-features. | Zero or more |
inputs | List of query features used in ranking expressions. | Zero or many |
constants | List of constant features available in ranking expressions. | Zero or many |
mutate | Specification of mutations you can apply after different phases of a query. | Zero or many |
onnx-model | An onnx model to make available in this profile. | Zero or many |
rank-properties | List of any rank property key-values to be used by rank features. | Zero or one |
match-features | The rank features to be returned with each hit, computed in the match phase. | Zero or more |
summary-features | The rank features to be returned with each hit, computed in the fill phase. | Zero or more |
rank-features | The rank features to be dumped when using the query-argument rankfeatures. | Zero or more |
ignore-default-rank-features |
Do not dump the default set of rank features, only those explicitly specified with the rank-features command. |
Zero or one |
num-threads-per-search |
Overrides the global persearch threads to a lower value. |
Zero or one |
min-hits-per-thread |
After estimating number of hits for a query prior to query evaluation, this number is used to decide how many threads to use for the query.
Current default is 1. If you suspect the fixed cost per thread is too high increasing this number might be a good idea. Especially if most of your queries are cheap, but you have increased the num-threads-per-search in order to reduce latency for your costly queries covering a lot of documents. The default might change, or the optimal value might be adaptive rendering overrides ignored or counterproductive. |
Zero or one |
num-search-partitions |
Number of logical partitions the corpus on a searchnode is divided in. By default, this is the same as num-threads-per-search. A partition is the smallest unit a search thread will handle. If you have a locality in time when searching and feeding documents, you might want to split it into more, smaller partitions. That way you avoid that one costly partition leaves some threads idle while others are working hard. If you have 8 threads per search, you might have 10x as many partitions at 80 reducing max skew with a similar factor. Note that a value of zero turns on adaptive partitioning which tries to solve this optimally.
Note:
If
num-search-partitions is set to 0 (work sharing is enabled),
make sure termwise-limit is set to 1.0 (termwise evaluation is disabled).
This is to avoid redoing termwise evaluation when work is passed from one thread to another.
| Zero or one |
termwise-limit |
If estimated number of hits > corpus * termwise-limit, it will prune candidates with a cpu cache friendly TAAT with the terms not needed for ranking, prior to doing DAAT. Current default is 1.0 which turns it off. A value between 0.05 and 0.20 can be a good starting point. This is particularly useful if you have many weak filters. Note that this is a manual override. The default might change, or the optimal value might be adaptive rendering overrides ignored or counterproductive. |
Zero or one |
post-filter-threshold |
Threshold value (in the range [0.0, 1.0]) deciding if a query with an approximate nearestNeighbor operator combined with filters is evaluated using post-filtering instead of the default pre-filtering. Post-filtering is chosen when the estimated filter hit ratio of the query is larger than this threshold. The default value is 1.0, which disables post-filtering. See Controlling the filtering behavior with approximate nearest neighbor search for more details. |
Zero or one |
approximate-threshold |
Threshold value (in the range [0.0, 1.0]) deciding if a query with an approximate nearestNeighbor operator combined with filters is evaluated by searching the HNSW graph for approximate neighbors with pre-filtering, or performing an exact nearest neighbor search with pre-filtering. The fallback to exact search is chosen when the estimated filter hit ratio of the query is less than this threshold. The default value is 0.05. See Controlling the filtering behavior with approximate nearest neighbor search for more details. |
Zero or one |
rank | Specify if the field is used for ranking. | Zero or more |
rank-type | The rank-type of a field in this profile. | Zero or more |
Contained in rank-profile
.
The match-phase feature lets you increase performance by limiting hits
exposed to first-phase ranking to the highest (lowest) values of some attribute.
The performance gain may be substantial, especially with an expensive first-phase function.
The quality loss is dependent on how well the chosen attribute correlates with the
first-phase score.
Documents which have no value of the chosen attribute will be taken as having the value 0.
See also graceful degradation.
match-phase { attribute: [numeric single value attribute] order: [ascending | descending] max-hits: [integer] diversity }
Name | Description |
---|---|
attribute |
The quality attribute that decides which documents are a match if the match phase estimates that there will be more than max-hits hits. The attribute must be single-value numeric with fast-search enabled. It should correlate with the order which would be produced by a full query evaluation. No default. |
order |
Whether the attribute should be used in |
max-hits |
The max hits each content node should attempt to produce in the match phase. Usually a number like 10000 works well here. |
diversity | Guarantee a minimum result set diversity among the hits chosen by match-phase. |
Contained in rank-profile
.
True or false. By default, Vespa uses loose type checking, where any query feature used but not defined in
a query profile type is assumed to be a floats.
Set true to cause a deploy failure on missing query property type definitions instead.
strict: true
Contained in match-phase
.
Diversity is used to specify diversity in different phases -
supported in match-phase
.
It is used to guarantee a minimum result set diversity among the hits selected by match-phase and is only relevant
in that context - not as a general way of achieving diversity.
Specify the name of an attribute that will be used to provide diversity.
Result sets are guaranteed to get at least min-groups
unique values from the diversity attribute
from this phase.
A document is considered as a candidate if:
max-hits
number produced from this phase.max-hits
of the phase divided by min-groups
diversity { attribute: [numeric attribute] min-groups: [integer] }
Name | Description |
---|---|
attribute |
Which attribute to use when deciding diversity. The attribute referenced must be a single-valued numeric or string attribute. |
min-groups |
Specifies the minimum number of groups returned from the phase.
Using this with |
Contained in rank-profile
.
The config specifying the first phase of ranking.
See
phased ranking with Vespa.
This is the initial ranking performed on all matching documents, you should therefore avoid doing
computationally expensive relevancy calculations here.
By default, this will use the ranking feature nativeRank
.
first-phase { [body] }The body of a firstphase-ranking statement consists of:
Name | Description |
---|---|
expression |
Specify the ranking expression to be used for first phase of ranking - see ranking expressions. |
keep-rank-count |
How many documents to keep the first phase top rank values for. Default value is 10000. |
rank-score-drop-limit |
Drop all hits with a first phase rank score less than or equal to this floating point number.
Use this to implement a rank cutoff.
Default is |
Contained in first-phase
or
second-phase
.
Specify a ranking expression.
The expression can either be written directly or loaded from a file.
When writing it directly the syntax is:
expression: [ranking expression]or
expression { [ranking expression] [ranking expression] [ranking expression] }
The second format is primarily a convenience feature when using long expressions, enabling them to be split over multiple lines.
Expressions can also be loaded from a separate file. This is useful when dealing with the long expressions generated by e.g. MLR. The syntax is:
expression: file:[path-to-expressionfile]
The path is relative to the location of the schema definition file.
The file itself must end with .expression
. This suffix is optional in the sd-file.
Therefore expression: file:mlrranking.expression
and
expression: file:mlrranking
are identical.
Both refer to a file called mlrranking.expression
in the schemas directory.
Contained in rank-profile
.
List of extra rank features to be dumped
when using the query-argument rankfeatures.
rank-features: [feature] [feature]or
rank-features { [feature] [feature] }
Any number of ranking features can be listed on each line, separated by space.
Contained in rank-profile
.
List of inputs: Query features consumed by ranking expressions in this profile.
Query features are set either as a request property,
or equivalently from a Searcher, by calling
query.getRanking().getFeatures().put("query(myInput)", myValue)
.
Query feature types can also be declared in query profile types, but declaring inputs in the profile needing them is usually preferable.
Inputs are inherited from inherited profiles.
inputs { name [type]? (: value)? }
Name | Description |
---|---|
name | The name of the inputs, written either the full feature name query(myName) , or just as name .
|
type | The type of the constant, either double or a tensor type.
If omitted, the type is double.
|
value | An optional default module, used if this input is not set in the query. A number, or a tensor on literal form. |
Input examples:
inputs { myDouble: 0.5 query(myOtherDouble) double query(myArray) tensor(x[3]) query(myMap) tensor(key{}]):{key1: 1.0, key2: 2.0} }
Contained in rank-profile
.
List of constants available in ranking expressions, resolved and optimized at configuration time.
Constants are inherited from inherited profiles, and from the schema itself.
constants { name [type]?: value|file:[path] }
Name | Description |
---|---|
name | The name of the constant, written either the full feature name constant(myName) , or just as name .
|
type | The type of the constant, either double or a tensor type.
If omitted, the type is double.
|
value | A number, a tensor on literal form,
or file: followed by a path from the application package root to a
file containing the constant.
The file must be stored on the tensor JSON Format
and end with .json . The file may be lz4 compressed, in which case the ending must be
.json.lz4 .
|
Constant examples:
constants { myDouble: 0.5 constant(myOtherDouble) double: 0.6 constant(myArray) tensor(x[3]):[1, 2, 3] constant(myMap) tensor(key{}]):{key1: 1.0, key2: 2.0} constant(myLargeTensor) tensor(x[10000]): file:constants/myTensor.json.lz4 }
Contained in rank-profile
.
List of generic properties, in the form of key/value pairs to be used by ranking features.
Examples.
rank-properties { key: value }
Name | Description |
---|---|
key | Name of the property. |
value | A number or any string. Must be quoted if it contains spacing. |
Contained in rank-profile
.
Define a named function that can be referenced as a part of the ranking expression,
or (if having no arguments) as a feature.
A function accepts any number of arguments.
function [name]([arg1], [arg2], [arg3]) { expression: … }or
function [name] ([arg1], [arg2], [arg3]) { expression { [ranking expression] [ranking expression] … }Note that the parenthesis is required after the name. A rank-profile example is shown below:
rank-profile default inherits default { function myfeature() { expression: fieldMatch(title) + freshness(timestamp) } function otherfeature(foo) { expression{ nativeRank(foo, body) } } first-phase { expression: myfeature * 10 } second-phase { expression: otherfeature(title) * myfeature } summary-features: myfeature }
You can not include functions that accept arguments in summary features.
Adding the inline
modifier will inline this function in the calling expression
if it also has no arguments.
This is faster for small and cheap functions (and more expensive for others).
Contained in rank-profile
.
The config specifying the second phase of ranking.
See
phased ranking with Vespa.
This is the optional re-ranking phase performed on the top ranking hits from the
first-phase
, and where you should put any advanced relevancy calculations.
For example Machine Learned Ranking (MLR) models.
By default, no second-phase ranking is performed.
second-phase { [body] }The body of a secondphase-ranking statement consists of:
Name | Description |
---|---|
expression | Specify the ranking expression to be used for first phase of ranking. (for a description, see the ranking expression documentation. |
rerank-count |
Optional argument. Specifies the number of hits to be re-ranked in the second phase. Default value is 100. This can also be set in the query. Note that this value is local to each node involved in a query. |
Contained in rank-profile
.
List of rank features to be included with each result hit,
in the summaryfeatures field.
Also see
feature values in results.
If not specified, the features are as specified in the parent profile (if any). To inherit the features from the parent profile and specify additional features, specify explicitly that the features should be inherited from the parent as shown below. Refer to schema inheritance for examples.
The rank features specified here are computed in the fill phase of multiphased queries.
summary-features: [feature] [feature]…
or
summary-features [inherits parent-profile]? { [feature] [feature] }
Any number of rank features separated by space can be listed on each line.
Contained in rank-profile
.
List of rank features to be included with each result hit,
in the matchfeatures field.
Also see
feature values in results.
If not specified, the features are as specified in the parent profile (if any). To inherit the features from the parent profile and specify additional features, specify explicitly that the features should be inherited from the parent as shown below, also see schema inheritance.
match-features is similar to summary-features, but the rank features specified here are computed in the first phase of multiphase searching, also called the match phase. This gives a different performance trade-off, for details see feature values in results.
match-features: [feature] [feature]…
or
match-features [inherits parent-profile]? { [feature] [feature] }
Any number of ranking features separated by space can be listed on each line.
Contained in rank-profile
.
Specifies mutating operations you can do to each of the documents that make it through the 4 query phases,
on-match, on-first-phase, on-second-phase and on-summary
mutate { [phase name] { [attribute name] [operation] [numeric_value] } }The phases are:
Name | Description |
---|---|
on-match |
All documents that satisfies the query. |
on-first-phase |
All documents from on-match, and is not dropped due the optional rank-score-drop-limit |
on-second-phase |
All documents from on-first-phase that makes it onto the second-phase heap. |
on-summary |
All documents where are a summary is requested. |
The attribute must be a single value numeric attribute, enabled as mutable. It must also be defined outside of the document clause.
Operation | Description |
---|---|
= | Set the value of the attribute to the given value. |
+= | Add the given value to the attribute |
-= | Subtract the given value from the attribute |
Find examples and use cases in rank phase statistics.
Prefer to define constants in the rank profiles that need them, with rank profile inheritance to avoid repetition. See constants.
Contained in schema
.
This defines a named constant tensor located in a file with a given type
that can be used in ranking expressions using the rank feature
constant(name):
constant [name] { [body] }The body of a constant must contain:
Name | Description | Occurrence |
---|---|---|
file |
Path to the file containing this constant, relative from the application package root.
The file must be stored on the tensor JSON Format
and end with .json . The file may be lz4 compressed, in which case the ending must be
.json.lz4 .
|
One |
type | The type of the constant tensor, refer to tensor-type-spec for reference. | One |
constant my_constant_tensor { file: constants/my_constant_tensor_file.json type: tensor<float>(x{},y{}) }This example has a constant tensor with two mapped dimensions,
x
and y
.
An example JSON file with such tensor constant:
{ "cells": [ { "address": { "x": "a", "y": "b"}, "value": 2.0 }, { "address": { "x": "c", "y": "d"}, "value": 3.0 } ] }
When an application with tensor constants is deployed, the files are distributed to the content nodes before the new configuration is being used by the search nodes. Incremental changes to constant tensors is not supported. When changed, replace the old file with a new one and re-deploy the application or create a new constant with a new name in a new file.
Contained in schema
.
Whether raw fields should be rendered as a base64 encoded string in summary,
the same way as in json feed format,
rather than an escaped string. This is default true.
Contained in rank-profile
or schema
.
This defines a named ONNX model located in a file
that can be used in ranking expressions using the "onnx" rank feature.
Prefer to define onnx models in the rank profiles using them. Onnx models are inherited from parent profiles, and from the schema.
onnx-model [name] { [body] }
The body of an ONNX model must contain:
Name | Description | Occurrence |
---|---|---|
file | Path to the location of the file containing the ONNX model. The path is relative to the root of the application package containing this sd-file. | One |
input | An input to the ONNX model. The ONNX name as given in the model as well as the source for the input is specified. | Zero to many |
output | An output of the ONNX model. The ONNX name as given in the model as well as the name for use in Vespa is specified. If no output are defined and are not referred to from the rank feature, the first output defined in the model is used. | Zero to many |
For more details including examples, see ranking with ONNX models.
Contained in schema
.
An explicitly defined document summary. By default, a document summary
named default
is created. Using this element, other document
summaries containing a different set of fields can be created.
document-summary [name] inherits [document-summary] { [body] }
The inherits
attribute is optional.
If defined, it contains the name of one other document summary in the same schema.
This will cause summary fields to be inherited from the referenced document summary.
Refer to schema inheritance for examples.
The body of a document summary consists of:
Name | Description | Occurrence |
---|---|---|
from-disk | Marks this summary as accessing fields on disk | Zero or one |
summary | A summary field in this document summary. | Zero to many |
omit-summary-features | Specifies that summary-features should be omitted from this document summary. Use this to reduce CPU cost in multiphase searching when using multiple document summaries to fill hits, and only some of them need the summary features that are specified in the rank-profile. | Zero or one |
Use the summary query parameter to choose a document summary in searches. See also document summaries.
Contained in field
,
schema
or
index
.
Sets how to stem a field or an index, or how to stem by default.
Read more on stemming.
stemming: [stemming-type]The stemming types are:
Type | Description |
---|---|
none | No stemming: Keep words unchanged |
best | Use the 'best' stem of each word according to some heuristic scoring. This is the default setting |
shortest | Use the shortest stem of each word |
multiple | Use multiple stems. Retains all stems returned from the linguistics library |
Contained in field
.
Sets normalizing to be done on this field.
Default is to normalize.
normalizing: [normalizing-type]
Type | Description |
---|---|
none | No normalizing. |
Contained in attribute
,
field
or
index
.
Makes an index or attribute available under an additional name:
alias [index/attr-name]: [alias]
If the index/attribute name is skipped, the containing field or index name is used. Alias names can be any name string, dots are allowed as well.
Contained in field
,
and specifies details of the dictionary used in the inverted index of the field.
Applies only to attributes annotated with fast-search
.
You can specify either btree
or hash
, or both.
Note that prefix search for strings and range search for numeric fields will fall back to full scan
if using hash
.
It is primarily intended for use when you have many unique terms with few occurrences (short postinglists),
where the dictionary lookup cost is significant.
Use hash
for fields with high uniqueness (high cardinality),
for example an 'id' field which is unique in the corpus where the posting list is always of size 1.
Normally, btree
is your best choice as it offers reasonable performance
for both exact, prefix and range type of dictionary lookups.
This is also the default.
Find more details in attribute index structures.
In addition, one can specify uncased
or cased
dictionary for string attributes,
default is uncased
.
This setting is sanity checked against match casing.
In an uncased
dictionary,
casing is normalized by lowercasing so that 'bear' equals 'Bear' equals 'BEAR'.
In a cased
dictionary, they will all be different.
Contained in field
or
struct-field
.
Specifies a property of an index structure attribute:
attribute [attribute-name]: [property]or
attribute [attribute-name] { [property] [property] … }Read the introduction to attributes. The attribute name can be skipped, in which case the field name is used. Actions required when adding or modifying attributes. Properties:
Property | Description |
---|---|
fast-search | Create a dictionary / index structure to speed up search in the attribute. Read more. |
fast-access |
If searchable-copies <
redundancy ,
use fast-access to load the attribute in memory on all nodes with a document replica.
Use this for fast access when doing
partial updates and when used in a
selection expression for garbage collection.
If searchable-copies ==
redundancy (default), this property is a no-op.
Read more.
|
fast-rank | Only supported for tensor field types with at least one mapped dimension. Ensures that the per-document tensors are stored in-memory using a format that is more optimal for ranking expression evaluation. This comes at the cost of using more memory. Without this setting these tensors are serialized in-memory, which requires de-serialization as part of ranking expression evaluation. See tensor performance. |
paged | This can reduce memory footprint by allowing paging the attribute data out of memory to disk. Not supported for tensor with fast-rank and predicate types. See paged attributes for details. Do not enable paged before fully understanding the consequences. |
alias | An alias for the attribute. Add an attribute name before the colon to specify an alias for another attribute than the one given by field name. |
sorting | The sort specification for this attribute. |
distance-metric | Specifies the distance metric to use with the nearestNeighbor query operator. Only relevant for tensor attribute fields. |
mutable | Marks the attribute as a special mutable attribute that can be updated by a mutate operation during query evaluation. |
An attribute is multivalued if assigning it multiple values during indexing, by using a multivalued field type like array or map, or by using e.g. split / for_each or by letting multiple fields write their value to the attribute field.
Note that normalizing and tokenization is not supported for attribute fields. Queries in attribute fields are not normalized, nor stemmed. Use index on fields to enable. Both index and attribute can be set on a field.
Contained in attribute
or
field
.
Specifies how sorting should be done.
sorting : [property]or
sorting { [property] … }
Property | Description |
---|---|
order |
ascending (default) or descending .
Used unless overridden using order by in query.
|
function |
Sort function:
uca (default), lowercase or raw .
Note that if no language or locale is specified in the query, the field,
or generally for the query, lowercase will be used instead of uca .
See order by for details.
|
strength |
UCA sort strength, default primary -
see strength for values.
Values set in the query overrides the schema definition.
|
locale |
UCA locale, default none, indicating that it is inferred from query. It should only be set here if the attribute is filled with data in one language only. See locale for details. Values set in the query overrides the schema definition. |
Contained in attribute
.
Specifies the distance metric to use with the nearestNeighbor
query operator to calculate the distance between document positions and the query position.
Only relevant for tensor attribute fields.
Distance metrics used with HNSW must obey all the defining properties of a metric space:
distance(q,d) = distance(d,q)
distance(q,q) = 0
distance(q,d) <= distance(q,r) + distance(r,d)
The distance metric used must match the distance metric used during metric representation learning (model). If you are using an "off-the-shelf" model to vectorize your data, please ensure that the distance metric matches the distance metric suggested used with the model. Different type of vectorization models use different type of distance metrics.
It's possible to do an order preserving transformation of a maximum inner product space to metric space by knowing the maximum vector length (max norm). See this high level guide based on section 3.1 Order Preserving Transformations in this paper. The downside of this transformation is that maximum length or magnitude needs to be known prior to indexing the vectors in Vespa, making it only practical for batch oriented vector indexing.
distance-metric
,
the content nodes must be restarted to rebuild the HNSW index -
see changes that require restart but not re-feed
The Dense passage retrieval sample app uses the mentioned transformation as the original text to vector model was trained using MIP. The MS Marco ranking sample app uses a text to vector model which used cosine similarity during training so no transformation was required.
distance-metric: [metric]
Metric | Description |
---|---|
euclidean |
The squared euclidean distance. Calculated by
distance(q,d) = sqrt(sum((q - d)^2)) .
The
Euclidean distance metric.
This is the default distance-metric used if not specified.
|
angular |
Computes the angle between the vectors and uses that as the distance metric.
This range is [0,pi], which is the angular distance.
Note that this is closely related to the
cosine similarity
which is just cos(angle) .
If possible, it's slightly better for performance to normalize both query and document vectors to unit vectors
and use the innerproduct metric instead. Using the angle or cosine similarity produce
the same ordering. One can get the cosine score instead for returned hits by
calculating the cosine similarity for the angle returned by the distance
rank-feature.
rank-profile cosine { first-phase { expression: cos(distance(field, embedding)) } } |
innerproduct |
Must only be used when query and document vectors are normalized to unit vectors. There is no runtime validation which checks that vectors are unit vectors as it would impact performance negatively. Using innerproduct with vectors that are not unit vectors violates the mentioned metric space properties and causes unpredictable nearest neighbor search.
The length, magnitude, or norm of a vector x is calculated as
The Vespa innerproduct computes the
cosine similarity
and uses |
geodegrees |
Only valid for geographical coordinates (two-dimensional vectors containing latitude and longitude on Earth, in degrees). Computes the great-circle distance (in kilometers) between two geographical points using the Haversine formula. See geodegrees system test for an example. |
hamming |
Hamming distance metric.
This isn't useful for floating-point data since it only counts the number of dimensions
where the vectors have different coordinates, meaning you only get 1 bit of information
from each floating-point number. Instead, it should be used for binary data where each
bit is considered a separate coordinate. Practically, this means you should use the
int8 cell value type
for your tensor, with the usual encoding from bit pattern to numerical value, for example:
[-128,127] both to have a more natural format
for representing binary data, and to avoid the overhead of parsing a large json array of numbers.
|
Contained in field
or summary
.
Highlight matching query terms in the summary:
bolding: on
The default is no bolding, set bolding: on
to enable it.
Note that this command is overridden by summary: dynamic
.
If both are specified, bolding will be ignored.
The difference between using bolding instead of summary: dynamic
is the latter will provide a dynamic abstract in addition to highlighting query terms,
while the first only highlights.
Bolding is only supported for index fields of type string or array<string>.
The default XML element used to highlight the search terms is <hi> -
to override, set container.qr-searchers configuration. Example using <strong>
:
<container> <search> <config name="container.qr-searchers"> <tag> <bold> <open><strong></open> <close></strong></close> </bold> <separator>...</separator> </tag> </config> <search> <container>
Maximum field byte length for bolding is 64Mb -
field values larger than this will be represented as a snippet as in summary: dynamic
.
Contained in field
.
Sets the numerical id of this field.
All fields have a document-internal id internally for transfer and storage.
Ids are usually determined programmatically as a 31-bit number.
Some storage and transfer space can be saved by instead explicitly setting id's to a 7-bit number.
id: [positive integer]
An id must satisfy these requirements:
Contained in field
or schema
.
Sets index parameters.
Content in fields with index are normalized and
tokenized by default.
This element can be single- or multivalued:
index [index-name]: [property]or
index [index-name] { [property] [property] … }The index name can be skipped inside fields, causing the index name to be the field name. Parameters:
Property | Description | Occurrence |
---|---|---|
alias | Specify an alias to this index to be available in searches. | Zero to many |
stemming | Set the stemming of this index. Indexes without a stemming setting get their stemming setting from the fields added to the index. Setting this explicitly is useful if fields with conflicting stemming settings are added to this index. | Zero to one |
arity | Set the
arity value for a predicate field.
The data type for the containing field must be predicate . |
One (mandatory for predicate fields), else zero. |
lower-bound | Set the
lower bound value for a predicate field.
The data type for the containing field must be predicate . |
Zero to one. |
upper-bound | Set the
upper bound value for predicate fields.
The data type for the containing field must be predicate . |
Zero to one. |
dense-posting-list-threshold | Set the
dense posting list threshold value for predicate fields.
The data type for the containing field must be predicate . |
Zero to one. |
enable-bm25 | Enable this index field to be used with the bm25 rank feature. This creates posting lists for the indexes for this field that have interleaved features in the document id streams. This makes it fast to compute the bm25 score. | Zero to one. |
hnsw | Specifies that an HNSW index should be built to speed up approximate nearest neighbor search. Only useful for tensor attribute fields with one indexed dimension using the nearestNeighbor query operator. | Zero to one. |
Contained in index
.
Specifies that an HNSW index should be built to speed up approximate nearest neighbor search.
Only useful for tensor attribute fields with one indexed dimension using the
nearestNeighbor
query operator.
This implements a modified version of the
Hierarchical Navigable Small World (HNSW) graphs algorithm (paper).
hnsw { [parameter]: [value] [parameter]: [value] ... }
The following parameters are used when building the index graph:
Parameter | Description |
---|---|
max-links-per-node |
Specifies how many links per HNSW node to select when building the graph. Default value is 16. In HNSWlib (implementation based on the paper) this parameter is known as M. |
neighbors-to-explore-at-insert |
Specifies how many neighbors to explore when inserting a document in the HNSW graph. Default value is 200. In HNSWlib this parameter is known as ef_construction. |
The distance metric specified on the attribute is used when building and searching the graph. Example:
index { hnsw { max-links-per-node: 16 neighbors-to-explore-at-insert: 100 } }
See Approximate Nearest Neighbor Search using HNSW Index for examples of use, and see Approximate Nearest Neighbor Search in Vespa - Part 1 blog post for how the Vespa team selected HNSW as the baseline algorithm for extension and integration in Vespa.
Contained in field
or
struct-field
.
One or more Indexing Language instructions used to produce index, attribute
and summary data from this field. Indexing instructions has pipeline
semantics similar to unix shell commands. The value of the field
enters the pipeline during indexing and the pipeline puts the value
into the desired index structures, possibly doing transformations and
pulling in other values along the way.
indexing: [index-statement]or
indexing { [indexing-statement]; [indexing-statement]; … }
If the field containing this is defined outside the document, it must start by an indexing statement which outputs a value (either "input [fieldname]" to fetch a field value, or a literal, e.g "some-value" ). Fields in documents will use the value of the enclosing field as input (input [fieldname]) if one isn't explicitly provided.
Specify the operations separated by the pipe (|
) character.
For advanced processing needs,
use the indexing language,
or write a document processor.
Supported expressions for fields are:
expression | description |
---|---|
attribute |
Attribute is used to make a field available for sorting,
grouping, ranking and searching using match mode word .
|
index |
Creates a searchable index for the values of this field
using match mode |
set_language | Sets document language - details. |
summary | Includes the value of this field in a summary field. Modify summary output by using summary: (e.g. to generate dynamic teasers). |
When combining both index
and attribute
in the indexing statement for a field,
e.g indexing: summary | attribute | index
,
the match mode becomes text
for the field.
So searches in this field will not search the contents in the attribute but the index.
Find examples and more details in the Text Matching and Ranking guide.
Contained in field
.
Vespa will normally rewrite indexing statements extensively to
implement the technical tasks which are required to carry out the
intentions of the indexing statement. The rewriting done can be
controlled using this element.
indexing-rewrite: none
Include this to let an indexing statement pass through
unaltered. Note that such statements must begin with an
input <fieldname>
, get_var
or
constant expression. You should understand which rewrites Vespa
does, and be certain that your indexing statement can do without them
to use this. This statement must be placed somewhere below the
indexing
statement in the field.
Contained in field
, fieldset
or
struct-field
.
Sets the matching method to use for this field to something else than the default token matching.
match: [property]or
match { [property] [property] … }
Whether the match type is text
, word
or exact
,
all term matching will be done after normalization
and locale independent lowercasing (in that order).
Find examples and more details in the Text Matching and Ranking guide. Also see search using regular expressions.
Property | Valid with | Description |
---|---|---|
text |
index |
Default for string fields with |
exact |
index, attribute | Can not be combined with text matching. The field is matched exactly: Strings containing any characters whatsoever will be indexed and matched as-is. In queries, the exact match string ends at the exact match terminator (below).
A field with |
exact-terminator |
index, attribute |
Only valid for match { exact exact-terminator: "@%" }
on a field called
Example using the default terminator: If someword AND (tag:!*!@@ OR tag:(kanoo)@@)matches documents with someword
and either !*! or (kanoo) as a tag.
Note that without the @@ terminating the second tag string,
the second tag value would be (kanoo)) .
|
word |
index, attribute | This is the default matching mode for string attributes. Can not be combined with text matching. Word matching is like exact matching, but with more advanced query parsing. The query terms is heuristically parsed taking into account some usual query syntax characters; one can also use double quotes to include space, star, or exclamation marks.
Example: If foo AND (artist:"'N Sync" OR artist:"*NSYNC" OR artist:A*teens OR artist:"Wham!")
matches documents with
Note that without the quotes, the space in |
prefix |
attribute | Has no effect as attributes always support prefix searches. Prefix matching must be specified in the query. See also regular expressions. |
cased |
attribute | Enable case-sensitive matching. Only relevant for string attributes. |
uncased |
index, attribute | Enable case-insensitive matching. This is the default for all string fields. |
max-length |
index | Limit the length of the field that will be used for matching. |
gram |
index | This field is matched using n-grams. For example, with the default gram size 2 the string "hi blue" is tokenized to "hi bl lu ue" both in the index and in queries to the index. N-gram matching is useful mainly as an alternative to segmentation in CJK languages. Typically, it results in increased recall and lower precision. However, as Vespa usually uses proximity in ranking, the precision offset may not be of much importance. Grams consume more resources than other matching methods because both indexes and queries will have more terms, and the terms contains repetition of the same letters. On the other hand, CPU intensive CJK segmentation is avoided. It may also be used for substring matching in general. |
gram-size |
index | A positive, nonzero, number, default 2. Sets the gram size when gram matching is used. Example: match { gram gram-size: 3 } |
Contained in field
or
rank-profile
.
Set the kind of ranking calculations which will be done for the field. Even though the
actual ranking expressions decide the ranking, this setting tells Vespa which preparatory calculations
and which data structures are needed for the field.
rank [field-name]: [ranking settings]or
rank { [ranking setting] }The field name should only be specified when used inside a rank-profile. The following ranking settings are supported in addition to the default:
Ranking setting | Description |
---|---|
filter |
Indicates that matching in this field should use fast bit vector data structures only. This saves CPU during matching, but only a few simple ranking features will be available for the field. This setting is appropriate for fields typically used for filtering or simple boosting purposes, like filtering or boosting on the language of the document. For index fields, this setting does not change index formats but helps choose the most compact representation when matching against the field. For attribute fields with fast-search this setting builds additional posting list representations (bit vectors) which can speed up query evaluation significantly. See feature tuning and the practical search performance guide. |
normal |
The reverse of |
Related: See the filter query annotation for how to annotate query terms as filters.
Contained in fieldset
, field
or
struct-field
.
Specifies a function to be performed on query terms to the indexes of this field when searching.
The Search Container server has support for writing Vespa Searcher plugins which processes these commands.
query-command: [an identifier or quoted string]
If you write a plugin searcher which needs some index-specific configuration parameter, that parameter can be set here.
There is one built-in query-command available: phrase-segmenting
.
If this is set, terms connected by non-word characters in user queries (such as "a.b")
will be parsed to a phrase item, instead of by default, an AND item where these terms have connectivity
set to 1.
Contained in field
or
rank-profile
.
Selects the low-level rank settings to be used for this field when using nativeRank
.
rank-type [field-name]: [rank-type-name]The field name can be skipped inside fields. Defined rank types are:
Type | Description |
---|---|
identity | Used for fields which contains only what this document is, e.g. "Title". Complete identity hits will get a high rank. |
about | Some text which is (only) about this document, e.g. "Description". About hits get high rank on partial matches and higher for matches early in the text and repetitive matches. This is the default rank type. |
tags | Used for simple tag fields of type tag. The tags rank type uses a logarithmic table to give more relative boost in the low range: As tags are added they should have significant impact on rank score, but as more and more tags are added, each new tag should contribute less. |
empty | Gives no relevancy effect on matches. Used for fields you just want to treat as filters. |
For nativeRank
one can specify a rank type per field.
If the supported rank types do not meet requirements,
one can explicitly configure the native rank features using rank-properties.
See the native rank reference for more information.
field
or
struct-field
.
Specifies the name of the document summaries which should contain this field.
summary-to: [summary-name], [summary-name], …
Fields with summary will always be part of the default summary regardless of this setting. Use explicit document-summary instead. See also document summaries.
Contained in field
or
document-summary
or
struct-field
.
Declares a summary field.
summary: [property]or
summary [name] type [type] { [body] }The summary name can be skipped if this is set inside a field. The name will then be the same as the name of the source field. In fields, the summary type can also be skipped, in which case the type will be determined by the field type. The summary data types available are the same as the document field data types. full summary is the default. Long field values (like document content fields) should be made dynamic. The body of a summary may contain:
Name | Description | Occurrence |
---|---|---|
full |
Returns the full field value in the summary (the default). | Zero to one |
bolding: on |
Specifies whether the content of this field should be bolded. Only supported for index fields of type string or array<string>. | Zero to one |
dynamic |
Make the value returned in results from this summary field be a dynamic abstract of the source
field by extracting fragments of text around matching query terms. Matching query terms will also be highlighted, in
similarity with the bolding feature.
This highlighting is not affected by the query-argument bolding.
The default XML element used to highlight query terms is
<hi> - refer to bolding for how to configure.
dynamic is only supported for index fields of type string or array<string>.
For array<string> fields, a dynamic abstract is created per string item in the array.
|
Zero to one |
source |
Specifies the name of the field or fields from which the value of this summary field should be fetched. If multiple fields are specified, the value will be taken from the first field if that has a value, from the second if the first one is empty and so on. source: [field-name], [field-name], … When this is not specified, the source field is assumed to be the field with the same name as the summary field. Refer to attribute and non-attribute fields for modifying a schema. |
Zero to one |
to |
Specifies the name of the document summaries this should be included in.
to: [document-summary-name], [document-summary-name], …This can only be specified in fields, not in explicit document summaries. When this is not specified, the field will go to the default document summary. |
Zero to one |
matched-elements-only |
Specifies that only the matched elements in a searchable array of primitive, weightedset, array of struct or map type field are returned as part of document summary. For array of struct or map type fields this is typically used in accordance with the sameElement operator, but it can also be used when searching directly on a sub struct field. Is also supported when the field is imported. Is not supported for index fields in indexed search. Example .sd files from system tests: |
Zero to one |
Read more about document summaries.
Contained in field
.
The weight of a field - the default is 100.
The field weight is used when calculating the rank scores.
weight: [positive integer]
Contained in field
of type weightedset.
Properties of a weighted set.
weightedset: [property]or
weightedset { [property] [property] … }
Property | Description | Occurrence |
---|---|---|
create-if-nonexistent |
If the weight of a key is adjusted in a document using a partial update increment or decrement command,
but the key is currently not present, the command will be ignored by default.
Set this to make keys to be created in this case instead.
This is useful when the weight is used to represent the count of the key.
field tag type weightedset<string> { indexing: attribute | summary weightedset { create-if-nonexistent remove-if-zero } } |
Zero to one |
remove-if-zero |
This is the companion of create-if-nonexistent for the converse case:
By default keys may have zero as weight.
With this turned on, keys whose weight is adjusted (or set) to zero, will be removed. |
Zero to one |
Contained in schema
.
Defines an annotation type, to be used by the Annotations API.
A name of the annotation is mandatory, the body is optional.
annotation [name] { [body] }
Contained in schema
.
Using a reference to a document type,
import a field from that document type into this schema to be used for matching, ranking, grouping and sorting.
Only attribute fields can be imported.
The imported field inherits all but the following properties from the parent field:
Refer to parent/child for a complete example. Note that the imported field is put outside of the document type:
schema myschema { document myschema { field parentschema_ref type reference<parentschema> { indexing: attribute } } import field parentschema_ref.name as parent_name {} }
Extra restrictions apply for some of the field types:
Field type | Restriction |
---|---|
array of struct | Can be imported if at least one of the struct fields has an attribute. All struct fields with attributes must have primitive types. Only the struct fields with attributes will be visible. |
map of struct | Can be imported if the key field has an attribute and at least one of the struct fields has an attribute. All struct fields with attributes must have primitive types. Only the key field and the struct fields with attributes will be visible. |
map | Can be imported if both key and value fields have primitive types and have attributes. |
position | Can be imported if it has an attribute. |
array of position | Can be imported if it has an attribute. |
To use an imported field in summary, create an explicit document summary containing the field.
Imported fields can be used to expire documents, but read this first.
string |
Use for a text field of any length.
String fields may only contain text characters, as defined by field surname type string { indexing: summary | index }
| ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
int | Use for single 32-bit integers. field release_year type int { indexing: summary | attribute }
| ||||||||||||||||||
long | Use for single 64-bit integers. field bignumber type long { indexing: summary | attribute }
| ||||||||||||||||||
bool | Use for boolean values. field alive type bool { indexing: summary | attribute }
Important:
Defaults to
false if not specified.
| ||||||||||||||||||
byte | Use for single 8-bit numbers. field smallnumber type byte { indexing: summary | attribute }
| ||||||||||||||||||
float | Use for floating point numbers (32-bit IEEE 754 float). field myfloat type float { indexing: summary | attribute }
| ||||||||||||||||||
double | Use for high precision floating point numbers (64-bit IEEE 754 double). field mydouble type double { indexing: summary | attribute }
| ||||||||||||||||||
position |
Used to filter and/or rank documents by distance to a position in the query, see Geo search. field location type position { indexing: attribute }
| ||||||||||||||||||
predicate |
Use to match queries to a set of boolean constraints. See querying predicate fields. field predicate_field type predicate { indexing: attribute index { arity: 2 # mandatory lower-bound: 3 upper-bound: 200 dense-posting-list-threshold: 0.25 } }
| ||||||||||||||||||
raw | Use for binary data field rawfield type raw { indexing: summary }
| ||||||||||||||||||
uri | Use for URL type matching
| ||||||||||||||||||
array<type> |
For single-value (primitive) types, use array<type> to create an array field of the element type:
Also use to create an array field of the given struct type. The struct type must be defined separately. Example: struct person { field first_name type string {} field last_name type string {} } field people type array<person> { indexing: summary struct-field first_name { indexing: attribute attribute: fast-search } } The people field is part of document summary. The struct field first_name is defined as an attribute for searching, with fast-search. A subset, or all, of the struct fields can be defined as attributes. Use the sameElement operator to ensure matches in same struct field instance. Use matched-elements-only to reduce the amount of data that is returned in document summary.
Important:
key and value are reserved words in an array<struct>,
as these are used to implement map.
Do not use these as struct-field names.
Restrictions:
| ||||||||||||||||||
weightedset<element-type> | Use to create a multivalue field of the element type, where each element is assigned a signed 32-bit integer weight. field tag type weightedset<string> { indexing: attribute | summary }The element type can be any single value type. Prefer not to use floating point number types like float or double. The weights may be assigned any semantics by the application. Two main use cases:
The weight of a matching value is by default used in It is possible to specify that a new key should be created if it does not exist before the update, and that it should be removed if the weight is set to zero - see the reference for an example. The weightedset field does not support filtering on weight. Solve this using the map type and sameElement query operator - see example.
| ||||||||||||||||||
tensor(dimension-1,...,dimension-N) |
Use to create a tensor field with the given tensor type spec that can be used for ranking - a tensor field is not searchable. See tensor evaluation reference for definition, the tensor user guide and the JSON feed format. field tensorfield type tensor<float>(x{},y{}) { indexing: attribute | summary } field tensorfield type tensor<float>(x[2],y[2]) { indexing: attribute | summary }
| ||||||||||||||||||
struct |
Use to define a field with a struct datatype. Create a struct type inside the document definition and declare the struct field in a document or struct using the struct type name as the field type: struct person { field first_name type string {} field last_name type string {} } field my_person type person { indexing: summary }Restrictions:
| ||||||||||||||||||
map<key-type,value-type> |
Use to create a map where each unique key is mapped to a single value. Any primitive type can be used as key-type and any Vespa primitive type as value-type. A map entry is handled as a struct with a key and value field with key-type and value-type as types. Example: struct person { field first_name type string {} field last_name type string {} } field identities type map<string, person> { indexing: summary struct-field key { indexing: attribute attribute: fast-search } }
The entire identities field is part of document summary,
and the struct fields
key is defined as an attribute, available for searching using the
sameElement operator, and
grouping. The The next example shows a map of primitive types, where the key and value struct fields are specified as attributes: field my_map type map<string, int> { indexing: summary struct-field key { indexing: attribute } struct-field value { indexing: attribute } }The following array of struct example is similar to the above, the difference being that an array can contain the same element multiple times and maintains order. struct mystruct { field key type string { } field value type int { } } field my_array type array<mystruct> { indexing: summary struct-field key { indexing: attribute attribute: fast-search } }Restrictions:
| ||||||||||||||||||
annotationreference |
Use to define a field (inside annotation, or inside e.g. a struct used by a field in an annotation) with a reference to another annotation. Should only be used for fields declared inside annotation, or as a base type by the use of any of the compound types listed above, inside annotation. To define a such a field, you must first create an annotation type. The struct must be defined inside the schema. To declare an annotationreference field in an annotation, use the annotation name to identify the field type: annotation foo { field baz type annotationreference<bar> { } } annotation bar { }
| ||||||||||||||||||
reference<document-type> |
A reference<document-type> field is a reference to an instance of a document-type - i.e. a foreign key. field artist_ref type reference<artist> { indexing: attribute }The reference is the document id of the document-type instance. References are used to join documents in a parent-child relationship. A reference can only be made to global documents. The following type of references are not supported:
|
Note that it is possible to make a document field of one type into one or more instances of another search field, by declaring a field outside the document, which uses other fields as input. For example, to create an integer attribute for a string containing a comma-separated list of integers in the document, do like this:
schema example { document example { field yearlist type string { # Comma-separated years } } field year type array<int> { # Search field using the yearlist value indexing: input yearlist | split "," | attribute } }
schema example { document example { field title type string { indexing: summary | index index: enable-bm25 } field description type string { indexing: summary | index index: enable-bm25 } field author type string { indexing: summary | index stemming: none #disable stemming rank:filter } field category type string { indexing: summary | attribute attribute: fast-search match: word rank:filter } field popularity type int { indexing: summary | attribute attribute:fast-search } field measurement type int { indexing: summary | attribute } field morecategories type array<string> { indexing: index match: word } } fieldset default { fields: title, description } rank-profile bm25 inherits default { first-phase { expression: bm25(title) + bm25(description) + attribute(popularity) } } }
This section describes how a schema in a live application can be modified—categories:
When running vespa-deploy prepare
on a new application package,
the changes in the schema files are compared with the files in the current active package.
If some of the changes require restart or re-feed, the output from vespa-deploy prepare
specifies which actions are needed.
vespa-deploy prepare
,
the impact is undefined and in no way guaranteed to allow a system to stay live until re-feeding.
Changes not related to the schema are discussed
in admin procedures.
Procedure:
vespa-deploy prepare
on the changed applicationvespa-deploy activate
. The changes will take effect immediatelyChange | Description |
---|---|
Add a new document field | Add a new document field as index, attribute, summary or any combinations of these. Existing documents will implicitly get the new field with no content. Documents fed after the change can specify the new field. If the field has existed with same type earlier, then old content may or may not reappear |
Remove a document field | Existing documents will no longer see the removed field, but the field data is not completely removed from the search node |
Add or remove an existing document field from document summary |
Add an existing field to summary or any number of summary classes, and remove an existing field from summary or any number of summary classes. Example: document-summary short-summary { summary artist type string {} } A change adding an attribute field with a new name to a summary class using source does not require restart or re-feed: field artist type string { indexing: summary | attribute } document-summary rename-summary { summary artist_name type string { source: artist } } Also see non-attribute fields. |
Remove the attribute aspect from a field that is also an index field | This is the only scenario of changing the attribute aspect of a document field that is allowed without restart |
Add, change or remove field sets | Change fieldsets used to group fields together for searching |
Change the alias or sorting attribute settings for an attribute field | |
Add, change or remove rank profiles | |
Change document field weights | |
Add, change or remove field aliases | |
Add, change or remove rank settings for a field |
Exception: Changing rank: filter on an attribute field in mode index requires restart.
See details in next section
|
Add or remove a schema | Removing a schema definition file will make proton drop all documents of that type - subsequently releasing memory and disk. |
Procedure:
vespa-deploy prepare
on the changed application.
Output specifies which restart actions are needed
vespa-deploy activate
services
on the services specified in the prepare
output
Change | Description |
---|---|
Change the attribute aspect of a document field | Add or remove a field as attribute. When adding, the attribute is populated based on the field value in stored documents during restart. When removing, the field value in stored documents is updated based on the content in the attribute during restart. Procedure. |
Change the attribute settings for an attribute field |
Change the following attribute settings: fast-search , fast-access , fast-rank , paged .
|
Change the rank filter setting for an attribute field |
Add or remove rank: filter on an attribute field.
|
Change the hnsw index settings for a tensor attribute field | Add or remove the hnsw index on a tensor attribute field, or change the settings of the index. |
Change the distance metric for a tensor attribute field | Change, add or remove the distance metric on a tensor attribute field. If no distance metric is specified, euclidean is used as the default. |
Example: Given a content cluster mycluster with mode index:
schema test { document test { field f1 type string { indexing: summary } } }Then add field
f1
as an attribute:
schema test { document test { field f1 type string { indexing: attribute | summary } } }The following is output from
vespa-deploy prepare
-
which restart actions are needed:
WARNING: Change(s) between active and new application that require restart: In cluster 'mycluster' of type 'search': Restart services of type 'searchnode' because: 1) Document type 'test': Field 'f1' changed: add attribute aspect
All of the changes listed below require reindexing of all documents. Unlike re-feed, which requires an external source of data, reindexing is done using documents stored in Vespa, and is automatic (once triggered). It can also run concurrently with feed and serving, but until reindexing is complete, affected fields will be empty or have potentially wrong annotations not matching the query processing. Procedure:
vespa-deploy prepare
on the changed application.
Output specifies which reindexing actions are needed
vespa-deploy activate
Changes:
Change | Description |
---|---|
Change index aspect of a document field | This changes the document processing pipeline before documents arrive in the backend. Only documents fed after index aspect was added will have annotations and be present in the reverse index. Only documents fed after index aspect was removed will avoid disk bloat due to unneeded annotations. Procedure. |
Change fields from static to dynamic summary, or vice versa | |
Switch stemming/normalizing on or off |
This changes the document processing pipeline before documents arrive in the backend, and what annotations are made for an indexed field.
Important:
If not re-feeding after such a change, serving works,
but recall is undefined as the index has been produced using a different setting
than the one used when doing stemming/normalizing of the query terms.
|
Switch bolding on or off | |
Add, change or remove match settings for a field |
Example: Adding This changes the document processing pipeline before documents arrive in the backend, and what annotations are made for an indexed field.
Important:
If not reindexing after such a change, serving works,
but recall is undefined as the index has been produced using one match mode
while run-time is using a different match mode.
|
Add or remove a new non-attribute document field from document summary |
A change adding an index or summary field field (without attribute) with a new name to a summary class using source requires re-index: field artist type string { indexing: summary | index } document-summary rename-summary { summary artist_name type string { source: artist } } Also see attribute fields. |
schema test { document test { field f1 type string { indexing: summary } } }Then add field
f1
as an index:
schema test { document test { field f1 type string { indexing: index | summary } } }The following is output from
vespa-deploy prepare
-
which reindex actions are needed:
WARNING: Change(s) between active and new application that require re-index: Reindex document type 'test' in cluster 'mycluster' because: 1) Document type 'test': Field 'f1' changed: add index aspect, indexing script: '{ input f1 | summary f1; }' -> '{ input f1 | tokenize normalize stem:"SHORTEST" | index f1 | summary f1; }'
All of the changes listed below require re-feeding of all documents. Unless a change is listed in the above sections treat it as if it was listed here. Until re-feed is complete, affected fields will be empty or have potentially wrong annotations not matching the query processing. Procedure:
vespa-deploy prepare
on the changed application.
Output specifies which re-feed actions are needed
vespa-deploy activate
Change | Description |
---|---|
Change a document field's data type or collection type |
Existing documents will no longer have any content for this field. To populate the field, re-feed the existing documents using the new type for this field. There will be no automatic conversion from old to new field type.
Important:
If not re-feeding after such a change, serving works,
but searching this field will not give any results
|
Change a tensor attribute's tensor type |
schema test { document test { field f1 type string { indexing: summary } } }Then change field
f1
to hold an int:
schema test { document test { field f1 type int { indexing: summary } } }The following is output from
vespa-deploy prepare
-
which re-feed actions are needed:
WARNING: Change(s) between active and new application that require re-feed: Re-feed document type 'test' in cluster 'mycluster' because: 1) Document type 'test': Field 'f1' changed: data type: 'string' -> 'int'