Schemas

A schema defines a document type and what we want to compute over it. Schemas are mapped to one or more content clusters in services.xml. These content clusters can then store and compute over data stored in the schema. Schemas are stored in files named the same as the schema, with the ending ".sd" (for schema definition), in the schemas/ directory of the application package.

See also the schema reference.

Schemas may also be stored in the searchdefinitions/ directory and use search instead of schema as the top level tag.

Example:

schema music {
    document music {
        field artist type string {
            indexing: summary | index
        }

        field artistId type string {
            indexing: summary | attribute
        }

        field title type string {
            indexing: summary | index
        }

        field album type string {
            indexing: index
        }

        field duration type int {
            indexing: summary
        }

        field year type int {
            indexing: summary | attribute
        }

        field popularity type int {
            indexing: summary | attribute
        }
    }

    fieldset default {
        fields: artist, title, album
    }

    rank-profile song inherits default {
        first-phase {
            expression:nativeRank(artist,title,album) + if(isNan(attribute(popularity)) == 1, 0,attribute(popularity))
        }
    }
}

field

A field has a type, like string or double - see field reference for a full list.

Documents can have relations, and field values can be imported from parent documents.

indexing

indexing configures how to process data of a field during indexing - the most important ones are:

index For unstructured text: Create a text index for this field. Text matching and all text ranking features become available. Indexes are disk backed and do not need to fit in memory. Reference / index details
attribute For structured data: Keep this field in memory in a forward structure. This makes the field available for grouping, sorting and ranking. Attributes may also be searched by complete match (word or exact), or (for numerical fields) by range. Optionally a B-tree in memory can also be created by adding the fast-search option - this improves performance if the attribute is a strong criterion in queries (i.e filters out many documents). Reference / attribute details
summary Include this field in the document summary in search result sets. Reference / document summary details
Indexing instructions have pipeline semantics similar to unix shell commands, with data flowing from left to right. They can perform complex transformations on field values, or just send the field value unchanged to the next sections of the index structure - example:
indexing: summary | attribute | index
The data is first added to the document summary, then added as an in-memory attribute and finally indexed. The indexing language offers more functionality than this, like filter field values, combine field values, select on different values. Learn more in the indexing language reference.

Matching

Text matching describes the different ways / modes to match queries to documents per field. Query tracing and match configuration inspection is useful to analyze matching.

Make sure the query did not time out when analyzing matches. The soft timeout will return the matches found at timeout, possibly eliminating matches.

Ensure that the configuration of chains and providers is correct with respect to indexing. Example: querying this provider will not lowercase / stem terms:

<provider id="myProvider" type="local" cluster="mydocs" inherits="vespa"
    excludes="com.yahoo.prelude.searcher.BlendingSearcher
              com.yahoo.prelude.querytransform.StemmingSearcher
              com.yahoo.search.querytransform.VespaLowercasingSearcher">
In case of using multiple document types, review restrict / sources settings to make sure queries hit the right document types.

Multivalue fields

A field can be single value, like a string, or multivalue, like an array of strings - see the field type list.

Most multivalue fields can be used in grouping.

When searching in array or map of struct, sameElement() is a useful query operator to restrict matches to same struct element. Note that the document summary will not contain which element(s) matched.

Accessing attributes in maps and arrays of struct in ranking is not possible.

The rank feature attribute(name).count can be used in ranking to rank based on number of elements in a multivalue attribute. To filter based on number of elements, create a strict tiering rank function combined with a rank-score-drop-limit. Then use a query variable for number of elements. Note that doing this filtering is more expensive to evaluate than just having a separate field for the count.

fieldset

A fieldset groups fields together for searching - example:

search/?query=title:sometitle default:someword
This query returns documents having sometitle in the field title, and someword in one or more of the fields in the fieldset default. If no field/field set name is given for a search term, the fieldset named default is searched. Find details in the fieldset reference.

rank-profile

Vespa has built-in rank profiles, and/or such profiles can be configured, by hand or using machine learning. Read more in the ranking documentation.

Multiple schemas

Many application define multiple kinds of data - each in their own schema. Multiple schemas can either be mapped to a single content cluster, or one can define a separate content cluster for each schema to be able to scale differently for each kind of data. A single container cluster can be used to query all the data types in both these configurations.

Document Inheritance

Fields shared by multiple document types can be defined in inherited supertypes.

This is convenient, but also ensures that such fields will be defined consistently, which is crucial when a field is accessed in a query to be evaluated over multiple types.

To let a document inherit another add inherits [document-name] after the document name. Multiple inheritance and multiple levels of inheritance is supported. Example:

document cod inherits food, fish {
    …
}

Inheriting a document type defined in another content cluster is allowed.

Overriding fields defined in supertypes is not allowed. Imported fields defined in supertypes are not inherited.

Querying multiple document types

In an application with multiple types of data, each query may decide which types are to be searched by each query. The details on how this works are in federation. In summary, the following always apply:

  • Vespa will by default query all document types and all clusters in parallel, and blend results based on score.
  • To limit the query to a subset of the types inside a content cluster, set restrict to a comma-separated list of schema names:
    /search/?query=lotr&restrict=music,book
    
  • To limit the query to a subset of the content clusters, set sources to a comma-separated list of content cluster names.
    /search/?query=lotr&sources=music_cluster,book_cluster
    
    Both these parameters can be combined to search both a subset of types and clusters.
Queries are evaluated in parallel over all selected document types and all clusters.