Query result sorting

A sorting specification in a query consists of one or more sorting expressions. Each sorting expression is an optional sort order followed by an attribute name or a function over an attribute. Multiple expressions are separated by a single SPACE character.

Using more than one sort expression will give you multilevel sorting. In this case, the most significant expression is the first, while subsequent expressions detail sorting within groups of equal values for the previous expression.

Sorting       ::= SortExpr ( ' ' SortExpr )*
SortExpr      ::= [SortOrder] SortObject
SortObject    ::= SortAttribute | Function
Function      ::= LOWERCASE '(' SortAttribute ')' |
RAW '(' SortAttribute ')' |
UCA '(' SortAttribute [ ',' Locale [ ',' Strength] ] ')'
LOWERCASE     ::= 'lowercase'
UCA           ::= 'uca'
RAW           ::= 'raw'
Locale        ::= An identifier following unicode locale identifiers, fx 'en_US'.
Strength      :: 'PRIMARY' | 'SECONDARY' | 'TERTIARY' | 'QUATERNARY' | 'IDENTICAL'
SortOrder     ::= '+' | '-'
SortAttribute ::= ID                          /* A valid attribute name */


Find sorting examples in the Blog search tutorial. See Geo search for sorting by distance. Refer to YQL Vespa reference for how to set sorting parameters in YQL.

Sort order

For sort order, + denotes ascending sorting order, while - gives descending order.

Ascending order is defined as lowest values first for numerical attributes. Strings are sorted according to the sort function chosen. Descending order is the reverse of ascending order.

If +/- is omitted, the default will be used, either the system wide default of + or any override in searchdefinition. Also note that when composing the query URL, + has to be encoded as %2B. For consistency, - can be encoded as %2D.

Default sorting order

Default sort order is + or ascending, except for the special builtin [rank] which has - or descending.

Sort attribute

The sorting attribute in a sort expression is the name of an attribute in the index. Attribute names will often be the same as field names. In the search definition, an attribute is indicated by the indexing language fragment for a field having an attribute statement.

When sorting on attributes, it is recommended to use the built in unranked rank-profile, this allows the search kernel to execute the query significantly faster then execution the ranking framework for many hits and then finally ignore this score and sort by the specified sorting specification.

Sort function

It is possible to specify how sorting should be done. The default depends on configuration of attributes and language, if specified in query. Fallback when no sorting could be done is using the order that the documents were indexed in (on the backend search node).

You can specify that you want to sort on raw binary values, or a simple and cheap lowercase, but usually the more expensive, but linguistically correct, UCA sorting is used. Default sort function for strings is uca, unless overridden in searchdefinition, or given explicitly in sortspec. Numeric fields are numerically sorted.

Raw byteorder Here a simple and fast ordering based on memcmp of utf8 for strings and correct sort order compliant binary rep for other fields is done. However that is not correct for anything except computers, looking only at the binary representation.
Lowercase This improves the sorting by first lowercasing and normalising the strings before sorting. This is slightly more correct and might be enough for what you want. It is not that much more costly than raw sort.
UCA

This sorting is based on the icu library that follows the Universal Collation Algorithm. The specification of locale and strength are identical to how icu specifies them. The default is strength PRIMARY which only sorts on primary differentiating characteristics; this means that letters in uppercase/lowercase or with differences in accents only are considered equal.

Note that both locale and strength are optional, but that if you need to set strength, you must also specify locale:

Default sort locale Locale is default derived from query unless overridden in searchdefinition, or given explicitly in sortspec. Note that if neither the query sortspec, nor the searchdefinition, nor the query itself specifies a language or locale then UCA sorting won't be used by default anymore; in this case the fallback is to use the lowercase function instead. Strength is PRIMARY unless overridden in searchdefinition, or given explicitly in sortspec.

Special sorting attributes

Three special attributes are available for sorting in addition to the index specific attributes:

[relevance] The document's relevance score for this query. This is the same as the default ordering when no sort specification is given ([rank] is a legacy alias for the same thing). The document's source name. This is only relevant when querying multiple sources. The document's identification in the search backend. This will typically give you the documents in indexing order. Keep in mind that this id is unique only to the backend node. The same document might have different id on a different node. The same way a different document might have the same id on another node. This is just intended as a cheap way of getting an almost stable sort order.
These special attributes are most useful as secondary sort expressions in a multilevel sort. This will allow you to sort groups of equal values for the primary expression in either relevancy or indexing order. Without this additional sort expression, the order within each equal group is not deterministic.

Limitations

Note that it is only possible to sort on attributes. Trying to sort on a plain field, without an associated attribute, will not work. Trying to sort on a multivalued attribute will also not work; the sort expression will be ignored.

Also note that match-phase is enabled when sorting.

Examples

Sort by surname in ascending order:

+surname

Sort by surname in ascending order after lowercasing the surname:

+lowercase(surname)

Sort by surname in ascending English US locale collation order.

+uca(surname,en_US)

Sort by surname in ascending Norwegian 'Bokmål' locale collation order:

+uca(surname,nb_NO)

Sort by surname in ascending Norwegian 'Bokmål' locale collation order, but more attributes of a character are used to distinguish. Now it is case sensitive and 'aa' and 'å' are different:

+uca(surname,nb_NO,TERTIARY)

Sort by surname, with the youngest ones first when the names are equal:

+surname -yearofbirth

Sort in ascending order birthyear groups, and sort by relevancy within each group of equal values with the highest rank first:

+yearofbirth -[rank]