Simple Query Language Reference
The simple query language allows application end users to issue more complex queries than a list of tokens. It is a heuristic, non-structured language, which attempts to do something probably-useful with any input given. It is combined with the structured YQL by using the userQuery operator. Types:
all | Default. All the words of the query must match a document for it to be a match. |
---|---|
web | Like the all type, with the following differences:
|
any | It suffices that one of the words of the query matches for it to be a match. |
phrase | The words of the query is considered a phrase, the words must match in the given order. Colon, plus and so on is ignored. |
Simple Query Syntax
Query ::= Expr ( SPACE Expr )* Expr ::= Term | Prefix? '(' SimpleTerm+ ')' Term ::= Prefix? Field? CoreTerm Weight? SimpleTerm ::= Field? CoreTerm Weight? Prefix ::= '+' | '-' Field ::= ID ':' /* A valid field name or alias */ Weight ::= '!'+ | '!' NUM /* NUM is a percentage. */ CoreTerm ::= WORD | Phrase | NumTerm | PrefixTerm | SubstringTerm | SuffixTerm Phrase ::= '"' WORD+ '"' NumTerm ::= NUM | '<' NUM | '>' NUM | '[' NUM? ';' NUM? ';' HITLIMIT? ']' /* NUM is any numeric type including floating point */ /* HITLIMIT is a optional count of many hits you want as minimum from this range */ PrefixTerm ::= WORD '*' SubstringTerm ::= '*' WORD '*' SuffixTerm ::= '*' WORD
Prefix searching
Prefix searching is only available for streaming and attributes. A prefix search term (e.g. 'car*') behaves like a pattern match on the given field: Documents that have at least one word beginning with the given prefix are returned (or not returned if the '-' syntax is used). A prefix search term does not add or change the ranking of the documents in the result set.
If match:prefix is specified in the schema, then this is the default match mode for this field. If it is not specified, then tokenized search is the default matching mode for streaming, and exact match for attributes.
An example of using prefix search with streaming:
car* golfThis query will get all documents with words beginning with “car”, and also containing the word “golf”.
Substring searching
Substring searching is only available for streaming. When using a substring search term (e.g '*esp*') documents that have at least one word where a prefix of a suffix of the word equals the substring term are returned. A substring search term does not add or change the ranking of the documents in the result set.
The match type of the field does not have to be substring in order to use substring searching. By specifying a substring search term in the query you override the match type.
An example of using substring search:
*esp*This query will return all documents with words containing “esp”, for instance “vespa”.
Suffix searching
Suffix searching is only available for streaming. When using a suffix search term (e.g '*spa') documents that have at least one word where a suffix of the word equals the suffix term are returned. A suffix search term does not add or change the ranking of the documents in the result set.
The match type of the field does not have to be suffix in order to use suffix searching. By specifying a suffix search term in the query you override the match type.
An example of using suffix search:
*spaThis query will return all documents with words ending with “spa”, for instance “vespa”.
Term weight
The weight is either one or more ! characters, or a ! followed by an integer. The integer is a fixed point scaling number with decimal factor 100, i.e. it can be regarded as a percentage. When using repeated ! characters, the weight is increased with 50 (from a default value of 100) for each !. A weight expression may also be applied to a phrase.
A term weight is used to modify the relative importance of the terms in your query. The term score is only one part of the overall rank calculation, but by adding weight to the most important terms, you can ensure that they contribute more. For more details on rank calculation, see Ranking guide.
Numerical terms
[x;y]
matches any number between x and
y, including the endpoints x and
y. Note that >number
is the same as
[number+1;]
and <number
is the same
as [;number-1]
.
A few examples using numerical terms:
perl size:<100This query will get all documents with the word “perl” and with size less than 100Kb.
chess kasparov -karpov date:[19990101;19991231]This query will get all documents last modified in 1999 containing chess and kasparov, but not karpov.
Advanced range search
In order to quickly fetch the best documents given a simple range you can do that very efficiently using capped range search. For it to be efficient it requires that you use fast-search on the attribute used for range search.
It is fast because it will start only scan enough terms in the dictionary to satisfy the number of documents requested. A positive number will start from the left of your range and work its way right. A negative number will start from right and go left.date:[0;21000101;10]Will give your the at least 10 first documents since the birth of Jesus.
date:[0;21000101;-10]Will give your the at least 10 last documents since the birth of Jesus.
Grouping in the simple query language
There is only one level of parentheses supported; any use of additional parentheses within the expression will be ignored. In addition, note that the terms within should not be prefixed with + or -.
When the parentheses are prefixed by a + (may be excluded
for all
type, because expressions are + by default), the
search requires that at least one of the included terms is present in
the document. This effectively gives you a way of having alternative
terms expressing the same intent, while requiring that the concept is
covered in the document.
When the parentheses are prefixed by a -, the search excludes all documents that include all the terms, but allows documents that only use some of the terms in the expression. It is a bit more difficult to find good use for this syntax; it could for instance be used to remove documents that compare two different products, while still allowing documents only discussing one of them.
Search in URLs
Create a URL-field in the index by creating a field of type uri - refer to this for how to build queries. The indexer will report an ERROR in the log for invalid URLs. Notes:
-
Note however that finding documents matching a full URL does not
behave like exact matching in i.e. string fields, but more likesubstring matching.
A search for
myurlfield:http://www.mydomain.com/
will match documents where myurlfield is set to both http://www.mydomain.com/, http://www.mydomain.com/test, and http://redirect.com/?goto=http://www.mydomain.com/ -
Hostname searches have an anchoring mechanism to limit which URLs match.
By default, queries are anchored in the end,
which means that a search for
mydomain.com
will matchwww.mydomain.com
, but notmydomain.com.au
. Adding a ^ (caret) to the start will turn on anchoring at the start, meaning that the query will only return exact matches. Adding a*
at the end will turn off anchoring at the end. The query^mydomain.com*
will matchmydomain.com.au
, but notwww.mydomain.com
.
Field Path Syntax
Streaming search supports the field path
syntax of the
document selection language when searching structs and maps.
Special for the map type is the ability to select a subset of
map entries to search using the mymap{"foo"}
syntax.
See the field path documentation for use-cases of the map data type.
In the result output, a map is represented in the same way as in the Document XML:
<field name="mymap"> <item><key>foo</key><value>bar</value></item> <item><key>fuz</key><value>baz</value></item> </field>
Removing syntax characters from queries
It will sometimes be more robust to remove characters which are used in the query syntax from a user's search terms. An example could be URLs containing parentheses. Comma ("," ASCII 0x2C) may be used as a safe replacement character in these cases.
(x url:http://site.com/a)b) yThe URL
http://site.com/a)b
in this example could be
quoted as following:
(x url:http://site.com/a,b) y
Examples
The simple query language syntax accepts any input string and makes the most of it. A basic query consists of words separated by spaces (encoded as %20). In addition,
-
A phrase can be searched by enclosing it in quotes, like
"match exactly this"
- Phrases and words may be preceded by -, meaning documents must not contain this
-
Phrases and words may be preceded by +, meaning documents
must contain this, currently only in use for subtype
any
-
Groups of words and phrases may be grouped using parenthesis, like
-(do not match if all of these words matches)
-
Each word or phrase may be preceded by an index or attribute name and a colon,
like
indexname:word
, to match in that index. If the index name is omitted the index named default is searched.
-a
, which one would expect to return
all documents not containing a.
Vespa, however, will return the error message Null query.
All the following examples are of type all.
Get all documents with the word word, having microsoft but not bug in the title:
word title:microsoft -title:bugSearch for all documents having the phrase "to be or not not be", but excluding those having shakespeare in the title:
"to be or not to be" -title:shakespeareGet all documents with the word Christmas in the title that were last modified Christmas Day 2009:
title:Christmas date:20091225Get documents on US Foreign politics, excluding those matching both rival presidential candidates:
"us foreign politics" -(clinton trump)Get documents on US Foreign politics, including only those matching at least one of the rival presidential candidates:
"us foreign politics" (clinton trump)