A schema defines a document type and what we want to compute over it.
Schemas are stored in files named the same as the schema, with the ending ".sd" (for schema definition),
in the schemas/
directory of the application package.
Refer to the schema reference.
Document types, rank profiles and document summaries in schemas can be inherited.
Schema example:
schema music { document music { field artist type string { indexing: summary | index } field artistId type string { indexing: summary | attribute match: word rank: filter } field title type string { indexing: summary | index } field album type string { indexing: index } field duration type int { indexing: summary } field year type int { indexing: summary | attribute } field popularity type int { indexing: summary | attribute } } fieldset default { fields: artist, title, album } rank-profile song inherits default { first-phase { expression { nativeRank(artist,title) + if(isNan(attribute(popularity)) == 1, 0, attribute(popularity)) } } } }
searchdefinitions/
directory and use search
instead of schema
as the top level tag. This is deprecated.
An overview of the most important schema concepts, see the schema reference for a complete list.
A document is the unit the rank-profile evaluates, and is returned in query results. Documents have fields - reads and writes updates full documents or some fields of documents. Refer to the schema reference.
Documents can have relations, field values can be imported from parent documents.
Note that the document id is not a field of the document - add this explicitly if needed.
A field has a type, see field reference for a full list.
A field contained in a document can be written to, read from and queried - this is the normal field use. A field can also be generated (i.e. a synthetic field) - in this case, the field definition is outside the document. See reindexing for examples.
A field can be single value, like a string, or multivalue, like an array of strings - see the field type list. Most multivalue fields can be used in grouping. Accessing attributes in maps and arrays of struct in ranking is not possible. The rank feature attribute(name).count can be used in ranking to rank based on number of elements in a multivalue attribute. To filter based on number of elements, create a strict tiering rank function combined with a rank-score-drop-limit. Then use a query variable for number of elements. Note that doing this filtering is more expensive to evaluate than just having a separate field for the count.
indexing configures how to process data of a field during indexing - the most important ones are:
index | For unstructured text: Create a text index for this field. Text matching and all text ranking features become available. Indexes are disk backed and do not need to fit in memory. Reference / index details |
attribute | For structured data: Keep this field in memory in a forward structure. This makes the field available for grouping, sorting and ranking. Attributes may also be searched by complete match (word or exact), or (for numerical fields) by range. Optionally a B-tree in memory can also be created by adding the fast-search option - this improves performance if the attribute is a strong criterion in queries (i.e. filters out many documents). Reference / attribute details |
summary | Include this field in the document summary in search result sets. Reference / document summary details |
Indexing instructions have pipeline semantics similar to unix shell commands, with data flowing from left to right. They can perform complex transformations on field values, or just send the field value unchanged to the next sections of the index structure. Example: The data is first added to the document summary, then added as an in-memory attribute and finally indexed:
indexing: summary | attribute | index
attribute
and index
is set on a field,
queries to this field use index
mode.
The normal case for setting both is to run queries (using index
) with
grouping (that requires attribute
).
The match mode configures how query items are matched to fields (e.g. exact or prefix matching), and is tightly coupled with indexing. Find more details in text matching.
When searching in array or map of struct, sameElement() is a useful query operator to restrict matches to same struct element (e.g. first_name contains 'Joe', last_name contains 'Smith' - both must match in the same field value). Note that the document summary will not contain which element(s) matched.
A fieldset groups fields together for querying. If no field/field set name is given for a query term, the fieldset named default is queried. Example:
fieldset default { fields: artist, title, album } $ENDPOINT/search/? yql=select * from sources * where default contains "bob" and title contains "best"
The rank profile defines the computation to be made over documents of this type when mathcing a query. Learn more in getting started with ranking.
If you use IntelliJ, you can install the Vespa IntelliJ plugin to simplify working with schema files.
Vespa is built for safe schema modifications,
like adding a field or changing indexing or match modes.
A new version of the schema is deployed in an application package.
As some changes are potentially destructive (e.g. change a field index settings),
the deploy
command will by default not accept such changes.
Example output from deploy (change from index to attribute):
Invalid application package: Error loading default.default: indexing-change: Document type 'music': Field 'artist' changed: remove index aspect, matching: 'text' -> 'word', stemming: 'best' -> 'none', normalizing: 'ACCENT' -> 'LOWERCASE', summary field 'artist' transform: 'none' -> 'attribute', indexing script: '{ input artist | tokenize normalize stem:\"BEST\" | summary artist | index artist; }' -> '{ input artist | summary artist | attribute artist; }' To allow this add <allow until='yyyy-mm-dd'>indexing-change</allow> to validation-overrides.xml
To accept such changes, add a validation-override:
<validation-overrides> <allow until="2021-08-30">indexing-change</allow> </validation-overrides>
By blocking destructive changes, it is safe and easy to automate on an evolving schema. Many schema changes are non-destructive and does not require the validation override, like adding a field. Read more in modifying-schemas.
An application can define multiple document types, each in their own schema. Multiple schemas can either be mapped to a single content cluster, or one can define separate content clusters for schemas to be able to scale differently for the document types. A single container cluster can be used to query all the document types in both these configurations.
In an application with multiple document types, the query restricts which document types to be used. Vespa will by default query all document types and all clusters in parallel, and blend results based on score - find details in federation.
To limit the query to a subset of the document types, set restrict to a comma-separated list of schema names:
$ENDPOINT/search/?
yql=select * from sources * where title contains "bob"&
restrict=music,books
A schema is mapped to a content cluster in services.xml. The content cluster stores and computes over documents in the schema using the rank profile. Applications can map many schemas to one content cluster. Use multiple content clusters if documents of different schemas have different performance characteristics - read more in the serving scaling guide.
<content id="items" version="1.0"> <documents> <document type="music" mode="index" /> <document type="books" mode="index" /> </documents> </content>
To limit a query to a subset of the content clusters, set sources to a comma-separated list of content cluster ids, e.g.:
$ENDPOINT/search/?
yql=select * from sources * where title contains "bob"&
sources=items,news
Both restrict and sources can be combined to search both a subset of document types and content clusters.
See content nodes and schemas for more details.
Both document types and full schemas can be inherited to make it easy to design a structured application package with little duplication. Document type inheritance defines a type hierarchy which is also useful for applications that federate queries as queries can be written to the common supertype. This guide covers the different elements in the schema that supports inheritance:
A schema that inherits another gets all the content of the parent schema as if it was defined inside the inheriting schema. A schema that inherits another must also (explicitly) inherit its document type.
Both schemas music and books have the title field through inheritance:
schema items { document items { field title type string { indexing: summary | index } } } schema books { document books inherits items { field author type string { indexing: summary | index } } } schema music { document music inherits items { field artist type string { indexing: summary | index } } }
This is equivalent to:
schema books { document books { field title type string { indexing: summary | index } field author type string { indexing: summary | index } } } schema music { document music { field title type string { indexing: summary | index } field artist type string { indexing: summary | index } } }
Notes:
Where fields define the document types, rank profiles define the computations over the documents. Rank profiles can be inherited:
schema items { rank-profile items_ranking_base { function title_score() { expression: fieldLength(title) } first-phase { expression: title_score } summary-features { title_score } } } schema books { rank-profile items_ranking inherits items_ranking_base {} rank-profile items_subschema_ranking inherits items_ranking_base { first-phase { expression: title_score + fieldMatch(author) } summary-features inherits items_ranking_base { fieldMatch(author) } } } schema music { rank-profile items_ranking inherits items_ranking_base {} rank-profile items_subschema_ranking inherits items_ranking_base { first-phase { expression: title_score + fieldMatch(artist) } summary-features inherits items_ranking_base { fieldMatch(artist) } } }
items_ranking can be considered the "base" ranking. Pro-tip: Set this as the default rank profile by modifying the default query profile:
<query-profile id="default"> <field name="ranking.profile">items_ranking</field> </query-profile>
Queries using ranking.profile=default will then use the first-phase ranking defined in items.sd.
Another way to inherit behavior is to override the first-phase ranking in the sub-schemas, still using functions defined in the super-schema (e.g. title_score).
Summary-features and match-features are rank features computed during ranking, to be included in results. These features can be inherited - the above will include scores from features in super- and sub-schema - example:
"summaryfeatures": { "fieldMatch(author)": 0, "rankingExpression(title_score)": 4 }
Here, both books and music schemas implement rank profiles with same names (e.g. items_subschema_ranking), so they can be used in queries spanning both. If a query's rank profile can not be found in a given schema, Vespa's default rank profile nativerank is used.
Document summaries can inherit others defined in the same or an inherited schema.
schema books {
document-summary items_summary_tiny {
summary title type string {}
}
document-summary items_summary_full inherits items_summary_tiny {
summary author type string {}
}
}