Sorting Reference
A sorting specification in a query consists of one or more sorting expressions. Each sorting expression is an optional sort order followed by an attribute name or a function over an attribute. Multiple expressions are separated by a single SPACE character.
Using more than one sort expression will give you multilevel sorting. In this case, the most significant expression is the first, while subsequent expressions detail sorting within groups of equal values for the previous expression.
Sorting ::= SortExpr ( ' ' SortExpr )* SortExpr ::= [SortOrder] SortObject SortObject ::= SortAttribute | Function Function ::= LOWERCASE '(' SortAttribute ')' | RAW '(' SortAttribute ')' | UCA '(' SortAttribute [ ',' Locale [ ',' Strength] ] ')' LOWERCASE ::= 'lowercase' UCA ::= 'uca' RAW ::= 'raw' Locale ::= An identifier following unicode locale identifiers, fx 'en_US'. Strength :: 'PRIMARY' | 'SECONDARY' | 'TERTIARY' | 'QUATERNARY' | 'IDENTICAL' SortOrder ::= '+' | '-' SortAttribute ::= ID /* A valid attribute name */
Find sorting examples in the Blog search tutorial. See Geo search for sorting by distance. Refer to YQL Vespa reference for how to set sorting parameters in YQL.
Sort order
+
denotes ascending sorting order,
while -
gives descending order.
Ascending order is defined as lowest values first for numerical attributes.
Strings are sorted according to the sort function chosen.
Descending order is the reverse of ascending order.
Note: +
in query URLs must be encoded as %2B -
for consistency, -
can be encoded as %2D.
Default sort order
If +
/-
is omitted, the default is used,
either the system wide default of +
or any override in searchdefinition.
Default sort order is +
or ascending,
except for [rank]
or the special builtin [relevance]
,
which has -
or descending.
Sort attribute
The sorting attribute in a sort expression is the name of an attribute in the index. Attribute names will often be the same as field names. In the search definition, an attribute is indicated by the indexing language fragment for a field having an attribute statement.
When sorting on attributes, it is recommended to use the built in unranked rank-profile, this allows the search kernel to execute the query significantly faster then execution the ranking framework for many hits and then finally ignore this score and sort by the specified sorting specification.
Sort function
It is possible to specify how sorting should be done. The default depends on configuration of attributes and language, if specified in query. Fallback when no sorting could be done is using the order that the documents were indexed in (on the backend search node).
You can specify that you want to sort on raw binary values,
or a simple and cheap lowercase, but usually the more expensive,
but linguistically correct,
UCA sorting
is used.
Default sort function for strings is uca
, unless overridden in
searchdefinition,
or given explicitly in sortspec.
Numeric fields are numerically sorted.
Raw byteorder | Here a simple and fast ordering based on memcmp of utf8 for strings and correct sort order compliant binary rep for other fields is done. However that is not correct for anything except computers, looking only at the binary representation. | ||||
---|---|---|---|---|---|
Lowercase | This improves the sorting by first lowercasing and normalising the strings before sorting. This is slightly more correct and might be enough for what you want. It is not that much more costly than raw sort. | ||||
UCA |
This sorting is based on the icu library
that follows the
Universal Collation Algorithm.
The specification of
locale
and strength
are identical to how icu specifies them.
The default is strength Note that both locale and strength are optional, but that if you need to set strength, you must also specify locale:
|
Special sorting attributes
Three special attributes are available for sorting in addition to the index specific attributes:
[relevance] | The document's relevance score for this query. This is the same as the default ordering when no sort specification is given ([rank] is a legacy alias for the same thing). |
---|---|
[source] | The document's source name. This is only relevant when querying multiple sources. |
[docid] | The document's identification in the search backend. This will typically give you the documents in indexing order. Keep in mind that this id is unique only to the backend node. The same document might have different id on a different node. The same way a different document might have the same id on another node. This is just intended as a cheap way of getting an almost stable sort order. |
Limitations
Note that it is only possible to sort on attributes. Trying to sort on a plain field, without an associated attribute, will not work. Trying to sort on a multivalued attribute will also not work; the sort expression will be ignored.
Also note that match-phase is enabled when sorting.
Examples
Sort by surname in ascending order:
+surname
Sort by surname in ascending order after lowercasing the surname:
+lowercase(surname)
Sort by surname in ascending English US locale collation order.
+uca(surname,en_US)
Sort by surname in ascending Norwegian 'Bokmål' locale collation order:
+uca(surname,nb_NO)
Sort by surname in ascending Norwegian 'Bokmål' locale collation order, but more attributes of a character are used to distinguish. Now it is case sensitive and 'aa' and 'å' are different:
+uca(surname,nb_NO,TERTIARY)
Sort by surname, with the youngest ones first when the names are equal:
+surname -yearofbirth
Sort in ascending order birthyear groups, and sort by relevancy within each group of equal values with the highest relevance first:
+yearofbirth -[relevance]