Simple Query Language Reference

The simple query langauge is a legacy, please use YQL.

The simple query type has four subtypes:

  • all - All the words of the query must match a document for it to be a match. This is the default.
  • web - Like the all type, but with the following differences:
    • + in front of a term means “search for this term as-is”
    • term1 OR term2 (capital OR) means match either term1 or term2
    • The special syntax for url matching used in the other languages is not supported
  • any - It is enough that one of the words of the query matches for it to be a match. Set type=any to use this.
  • phrase - The words of the query should be treated as a phrase, the words must match in the exact order given for it to be a match, and colon, plus and so on is ignored. Set type=phrase to use this.

Simple Query Syntax

Query      ::= Expr ( SPACE Expr )*
Expr       ::= Term | Prefix? '(' SimpleTerm+ ')'
Term       ::= Prefix? Field? CoreTerm Weight?
SimpleTerm ::= Field? CoreTerm Weight?
Prefix     ::= '+' | '-'
Field      ::= ID ':'                              /* A valid field name or alias */
Weight     ::= '!'+ | '!' NUM                      /* NUM is a percentage. */
CoreTerm   ::= WORD | Phrase | NumTerm | PrefixTerm | SubstringTerm | SuffixTerm
Phrase     ::= '"' WORD+ '"'
NumTerm    ::= NUM | '<' NUM | '>' NUM | '[' NUM? ';' NUM? ';' HITLIMIT? ']'
                                                   /* NUM is any numeric type including floating point */
                                                   /* HITLIMIT is a optional count of many hits you want as minimum from this range */
PrefixTerm    ::= WORD '*'
SubstringTerm ::= '*' WORD '*'
SuffixTerm    ::= '*' WORD

Prefix searching

Prefix searching is only available for streaming and attributes. A prefix search term (e.g. 'car*') behaves like a pattern match on the given field: Documents that have at least one word beginning with the given prefix are returned (or not returned if the '-' syntax is used). A prefix search term does not add or change the ranking of the documents in the result set.

If match:prefix is specified in the search definition, then this is the default match mode for this field. If it is not specified, then tokenized search is the default matching mode for streaming, and exact match for attributes.

An example of using prefix search with streaming:

car* golf
This query will get all documents with words beginning with “car”, and also containing the word “golf”.

Substring searching

Substring searching is only available for streaming. When using a substring search term (e.g '*esp*') documents that have at least one word where a prefix of a suffix of the word equals the substring term are returned. A substring search term does not add or change the ranking of the documents in the result set.

The match type of the field does not have to be substring in order to use substring searching. By specifying a substring search term in the query you override the match type.

An example of using substring search:

*esp*
This query will return all documents with words containing “esp”, for instance “vespa”.

Suffix searching

Suffix searching is only available for streaming. When using a suffix search term (e.g '*spa') documents that have at least one word where a suffix of the word equals the suffix term are returned. A suffix search term does not add or change the ranking of the documents in the result set.

The match type of the field does not have to be suffix in order to use suffix searching. By specifying a suffix search term in the query you override the match type.

An example of using suffix search:

*spa
This query will return all documents with words ending with “spa”, for instance “vespa”.

Term weight

The weight is either one or more ! characters, or a ! followed by an integer. The integer is a fixed point scaling number with decimal factor 100, i.e. it can be regarded as a percentage. When using repeated ! characters, the weight is increased with 50 (from a default value of 100) for each !. A weight expression may also be applied to a phrase.

A term weight is used to modify the relative importance of the terms in your query. The term score is only one part of the overall rank calculation, but by adding weight to the most important terms, you can ensure that they contribute more. For more details on rank calculation, see Ranking guide.

Numerical terms

[x;y] matches any number between x and y, including the endpoints x and y. Note that >number is the same as [number+1;] and <number is the same as [;number-1].

A few examples using numerical terms: <>perl size:<100 This query will get all documents with the word “perl” and with size less than 100Kb.

chess kasparov -karpov date:[19990101;19991231]
This query will get all documents last modified in 1999 containing chess and kasparov, but not karpov.

In order to quickly fetch the best documents given a simple range you can do that very efficiently using capped range search. For it to be efficient it requires that you use fast-search on the attribute used for range search.

It is fast because it will start only scan enough terms in the dictionary to satisfy the number of documents requested. A positive number will start from the left of your range and work its way right. A negative number will start from right and go left.
date:[0;21000101;10]
Will give your the at least 10 first documents since the birth of Jesus.
date:[0;21000101;-10]
Will give your the at least 10 last documents since the birth of Jesus.

Grouping in the simple query language

There is only one level of parentheses supported; any use of additional parentheses within the expression will be ignored. In addition, note that the terms within should not be prefixed with + or -.

When the parentheses are prefixed by a + (may be excluded for all type, because expressions are + by default), the search requires that at least one of the included terms is present in the document. This effectively gives you a way of having alternative terms expressing the same intent, while requiring that the concept is covered in the document.

When the parentheses are prefixed by a -, the search excludes all documents that include all the terms, but allows documents that only use some of the terms in the expression. It is a bit more difficult to find good use for this syntax; it could for instance be used to remove documents that compare two different products, while still allowing documents only discussing one of them.

More examples using simple query language can be seen in Searching with Vespa.

Search in URLs

Create a URL-field in the index by creating a field of type uri - refer to this for how to build queries. The indexer will report an ERROR in the log for invalid URLs. Notes:

  • Note however that finding documents matching a full URL does not behave like exact matching in i.e. string fields, but more likesubstring matching. A search for myurlfield:http://www.mydomain.com/ will match documents where myurlfield is set to both http://www.mydomain.com/, http://www.mydomain.com/test, and http://redirect.com/?goto=http://www.mydomain.com/
  • Hostname searches have an anchoring mechanism to limit which URLs match. By default, queries are anchored in the end, which means that a search for mydomain.com will match www.mydomain.com, but not mydomain.com.au. Adding a ^ (caret) to the start will turn on anchoring at the start, meaning that the query will only return exact matches. Adding a * at the end will turn off anchoring at the end. The query ^mydomain.com* will match mydomain.com.au, but not www.mydomain.com.

Field Path Syntax

Streaming search supports the field path syntax of the document selection language when searching structs and maps. Special for the map type is the ability to select a subset of map entries to search using the mymap{"foo"} syntax.

See the field path documentation for use-cases of the map data type.

In the result output, a map is represented in the same way as in the Document XML:

<field name="mymap">
  <item><key>foo</key><value>bar</value></item>
  <item><key>fuz</key><value>baz</value></item>
</field>

Removing syntax characters from queries

It will sometimes be more robust to remove characters which are used in the query syntax from a user's search terms. An example could be URLs containing parentheses. Comma ("," ASCII 0x2C) may be used as a safe replacement character in these cases.

(x url:http://site.com/a)b) y
The URL http://site.com/a)b in this example could be quoted as following:
(x url:http://site.com/a,b) y

Examples

The simple query language syntax accepts any input string and makes the most of it. A basic query consists of words separated by spaces (encoded as %20). In addition,

  • A phrase can be searched by enclosing it in quotes, like "match exactly this"
  • Phrases and words may be preceded by -, meaning documents must not contain this
  • Phrases and words may be preceded by +, meaning documents must contain this, currently only in use for subtype any
  • Groups of words and phrases may be grouped using parenthesis, like -(do not match if all of these words matches)
  • Each word or phrase may be preceded by an index or attribute name and a colon, like indexname:word, to match in that index. If the index name is omitted the index named default is searched.
Any noise (characters not in indexes or attributes, and with no query language meaning) is ignored, all query strings are valid. The exception is queries which have no meaningful interpretation. An example is -a, which one would expect to return all documents not containing a. Vespa, however, will return the error message Null query. All the following examples are of type all.

Get all documents with the word word, having microsoft but not bug in the title:

word title:microsoft -title:bug
Search for all documents having the phrase "to be or not not be", but excluding those having shakespeare in the title:
"to be or not to be" -title:shakespeare
Get all documents with the word Christmas in the title that were last modified Christmas Day 2009:
title:Christmas date:20091225
Get documents on US Foreign politics, excluding those matching both rival presidential candidates:
"us foreign politics" -(clinton trump)
Get documents on US Foreign politics, including only those matching at least one of the rival presidential candidates:
"us foreign politics" (clinton trump)