• [+] expand all

Schemas

A schema defines a document type and what we want to compute over it. Schemas are stored in files named the same as the schema, with the ending ".sd" (for schema definition), in the schemas/ directory of the application package. Refer to the schema reference.

Document types, rank profiles and document summaries in schemas can be inherited, see the schema inheritance guide for examples.

Compatibility note: Schemas may also be stored in the searchdefinitions/ directory and use search instead of schema as the top level tag. Use of searchdefinitions and search will be deprecated, use schema instead.

Schema example:

schema music {
    document music {
        field artist type string {
            indexing: summary | index
        }

        field artistId type string {
            indexing: summary | attribute
            match   : exact
        }

        field title type string {
            indexing: summary | index
        }

        field album type string {
            indexing: index
        }

        field duration type int {
            indexing: summary
        }

        field year type int {
            indexing: summary | attribute
        }

        field popularity type int {
            indexing: summary | attribute
        }
    }

    fieldset default {
        fields: artist, title, album
    }

    rank-profile song inherits default {
        first-phase {
            expression:nativeRank(artist,title,album) + if(isNan(attribute(popularity)) == 1, 0,attribute(popularity))
        }
    }
}

document

A document is the unit the rank-profile evaluates, and is returned in query results. Documents have fields - reads and writes updates full documents or some fields of documents. Refer to the schema reference.

Documents can have relations, field values can be imported from parent documents.

Note that the document id is not a field of the document - add this explicitly if needed.

field

A field has a type, see field reference for a full list.

A field contained in a document can be written to, read from and queried - this is the normal field use. A field can also be generated (i.e. a synthetic field) - in this case, the field definition is outside the document. See reindexing for examples.

A field can be single value, like a string, or multivalue, like an array of strings - see the field type list. Most multivalue fields can be used in grouping. Accessing attributes in maps and arrays of struct in ranking is not possible. The rank feature attribute(name).count can be used in ranking to rank based on number of elements in a multivalue attribute. To filter based on number of elements, create a strict tiering rank function combined with a rank-score-drop-limit. Then use a query variable for number of elements. Note that doing this filtering is more expensive to evaluate than just having a separate field for the count.

indexing

indexing configures how to process data of a field during indexing - the most important ones are:

index For unstructured text: Create a text index for this field. Text matching and all text ranking features become available. Indexes are disk backed and do not need to fit in memory. Reference / index details
attribute For structured data: Keep this field in memory in a forward structure. This makes the field available for grouping, sorting and ranking. Attributes may also be searched by complete match (word or exact), or (for numerical fields) by range. Optionally a B-tree in memory can also be created by adding the fast-search option - this improves performance if the attribute is a strong criterion in queries (i.e filters out many documents). Reference / attribute details
summary Include this field in the document summary in search result sets. Reference / document summary details
Indexing instructions have pipeline semantics similar to unix shell commands, with data flowing from left to right. They can perform complex transformations on field values, or just send the field value unchanged to the next sections of the index structure. Example, where the data is first added to the document summary, then added as an in-memory attribute and finally indexed.

indexing: summary | attribute | index

match

The match mode configures how query items are matched to fields (e.g. exact or prefix matching), and is tightly coupled with indexing. Find more details in text matching.

When searching in array or map of struct, sameElement() is a useful query operator to restrict matches to same struct element (e.g. first_name contains 'Joe', last_name contains 'Smith' - both must match in the same field value). Note that the document summary will not contain which element(s) matched.

fieldset

A fieldset groups fields together for querying. If no field/field set name is given for a query term, the fieldset named default is queried. Example:

fieldset default {
    fields: artist, title, album
}
$ENDPOINT/search/?
  yql=select * from sources * where default contains "bob" and title contains "best";

rank-profile

The rank profile defines the computation over the documents, given a query. This is hence the core of the application logic. Vespa has built-in rank profiles for text ranking, the default is nativerank - a rank profile is hence optional. Learn more in getting started with ranking.

Schema modifications

Vespa is built for safe schema modifications, like adding a field or changing indexing or match modes. A new version of the schema is deployed in an application package. As some changes are potentially destructive (e.g. change a field index settings), the deploy will by default not accept such changes. Example output from deploy (change from index to attribute):

Invalid application package: Error loading default.default: indexing-change:
Document type 'music': Field 'artist' changed:
remove index aspect,
  matching: 'text' -> 'word',
  stemming: 'best' -> 'none', normalizing: 'ACCENT' -> 'LOWERCASE',
  summary field 'artist' transform: 'none' -> 'attribute',
  indexing script:
    '{ input artist | tokenize normalize stem:\"BEST\" | summary artist | index artist; }' ->
    '{ input artist | summary artist | attribute artist; }'
To allow this add indexing-change to validation-overrides.xml

To accept such changes, add a validation-override:

<validation-overrides>
    <allow until="2021-08-30">indexing-change</allow>
</validation-overrides>

By blocking destructive changes, it is safe and easy to automate on an evolving schema. Many schema changes are non-destructive and does not require the validation override, like adding a field. Read more in modifying-schemas.

Multiple schemas

An application can define multiple document types, each in their own schema. Multiple schemas can either be mapped to a single content cluster, or one can define separate content clusters for schemas to be able to scale differently for the document types. A single container cluster can be used to query all the document types in both these configurations.

In an application with multiple document types, the query restricts which document types to be used. Vespa will by default query all document types and all clusters in parallel, and blend results based on score - find details in federation.

To limit the query to a subset of the document types, set restrict to a comma-separated list of schema names:

$ENDPOINT/search/?
  yql=select * from sources * where title contains "bob";&
  restrict=music,books

Content cluster mapping

A schema is mapped to a content cluster in services.xml. The content cluster stores and computes over documents in the schema using the rank profile. Applications can map many schemas to one content cluster. Use multiple content clusters if documents of different schemas have different performance characteristics - read more in the serving scaling guide.

<content id="items" version="1.0">
    <documents>
        <document type="music" mode="index" />
        <document type="books" mode="index" />
    </documents>
</content>

To limit a query to a subset of the content clusters, set sources to a comma-separated list of content cluster ids, e.g.:

$ENDPOINT/search/?
  yql=select * from sources * where title contains "bob";&
  sources=items,news

Both restrict and sources can be combined to search both a subset of document types and content clusters.