The simple query language allows application end users to issue more complex queries than a list of tokens. It is a heuristic, non-structured language, which attempts to do something probably-useful with any input given. It is combined with the structured YQL by using the userQuery operator. Types:
all | Creates an AND: All the words of the query must match. |
---|---|
any | Creates an OR: At least one of the words of the query must match. |
weakAnd | Default. Creates a WeakAnd: Like "any", but with performance more like "all". |
tokenize | Simply splits the text into word tokens and assembles them into a WeakAnd item. No characters, such as minus or quote signs carry any semantics in this mode, any non-searchable character is simply ignored. This is useful when querying using text not explicitly written as a search query. |
web | Like the all type, with the following differences:
|
phrase | The words of the query is considered a phrase, the words must match in the given order. Colon, plus and so on is ignored. |
Query ::= Expr ( SPACE Expr )* Expr ::= Term | Prefix? '(' SimpleTerm+ ')' Term ::= Prefix? Field? CoreTerm Weight? SimpleTerm ::= Field? CoreTerm Weight? Prefix ::= '+' | '-' Field ::= ID ':' /* A valid field name or alias */ Weight ::= '!'+ | '!' NUM /* NUM is a percentage. */ CoreTerm ::= WORD | Phrase | NumTerm | PrefixTerm | SubstringTerm | SuffixTerm | SameElement Phrase ::= '"' WORD+ '"' NumTerm ::= NUM | '<' NUM | '>' NUM | '[' NUM? ';' NUM? ';' HITLIMIT? ']' /* NUM is any numeric type including floating point */ /* HITLIMIT is a optional count of many hits you want as minimum from this range */ PrefixTerm ::= WORD '*' SubstringTerm ::= '*' WORD '*' SuffixTerm ::= '*' WORD SameElement ::= '{' Field ( SPACE Field )* '}'
Prefix matching is only available for attributes. A prefix search term (e.g. 'car*') behaves like a pattern match on the given field: Documents that have a field value beginning with the given prefix are matched and returned (or not returned if the '-' syntax is used). A prefix search term does not add or change the ranking of the documents in the result set.
The weight is either one or more ! characters, or a ! followed by an integer. The integer is a fixed point scaling number with decimal factor 100, i.e. it can be regarded as a percentage. When using repeated ! characters, the weight is increased with 50 (from a default value of 100) for each !. A weight expression may also be applied to a phrase.
A term weight is used to modify the relative importance of the terms in your query. The term score is only one part of the overall rank calculation, but by adding weight to the most important terms, you can ensure that they contribute more. For more details on rank calculation, see Ranking guide.
[x;y]
matches any number between x and
y, including the endpoints x and
y. Note that >number
is the same as
[number+1;]
and <number
is the same
as [;number-1]
.
A few examples using numerical terms:
perl size:<100This query will get all documents with the word “perl” and with size less than 100Kb.
chess kasparov -karpov date:[19990101;19991231]
This query will get all documents last modified in 1999 containing chess and kasparov, but not karpov.
In order to quickly fetch the best documents given a simple range you can do that efficiently using capped range search. For it to be efficient it requires that you use fast-search on the attribute used for range search.
It is fast because it will start only scan enough terms in the dictionary to satisfy the number of documents requested. A positive number will start from the left of your range and work its way right. A negative number will start from right and go left.
date:[0;21000101;10]Will give you the at least 10 first documents since the birth of Jesus.
date:[0;21000101;-10]
Will give you the at least 10 last documents since the birth of Jesus.
There is only one level of parentheses supported; any use of additional parentheses within the expression will be ignored. In addition, note that the terms within should not be prefixed with + or -.
When the parentheses are prefixed by a + (can be excluded
for all
type, because expressions are + by default), the
search requires that at least one of the included terms is present in
the document. This effectively gives you a way of having alternative
terms expressing the same intent, while requiring that the concept is
covered in the document.
When the parentheses are prefixed by a -, the search excludes all documents that include all the terms, but allows documents that only use some of the terms in the expression. It is a bit more difficult to find good use for this syntax; it could for instance be used to remove documents that compare two different products, while still allowing documents only discussing one of them.
Create a URL-field in the index by creating a field of type uri - refer to this for how to build queries. The indexer will report an ERROR in the log for invalid URLs. Notes:
myurlfield:http://www.mydomain.com/
will match documents
where myurlfield is set to both http://www.mydomain.com/,
http://www.mydomain.com/test, and http://redirect.com/?goto=http://www.mydomain.com/
mydomain.com
will match www.mydomain.com
,
but not mydomain.com.au
.
Adding a ^ (caret) to the start will turn on anchoring at the start,
meaning that the query will only return exact matches.
Adding a *
at the end will turn off anchoring at the end.
The query ^mydomain.com*
will match mydomain.com.au
,
but not www.mydomain.com
.
Streaming search supports the field path
syntax of the
document selection language when searching structs and maps.
Special for the map type is the ability to select a subset of
map entries to search using the mymap{"foo"}
syntax.
See the field path documentation for use-cases of the map data type.
In the result output, a map is represented in the same way as in the Document XML:
<field name="mymap"> <item><key>foo</key><value>bar</value></item> <item><key>fuz</key><value>baz</value></item> </field>
It will sometimes be more robust to remove characters which are used in the query syntax from a user's search terms. An example could be URLs containing parentheses. Comma ("," ASCII 0x2C) may be used as a safe replacement character in these cases.
(x url:http://site.com/a)b) y
The URL http://site.com/a)b
in this example could be quoted as following:
(x url:http://site.com/a,b) y
The simple query language syntax accepts any input string and makes the most of it. A basic query consists of words separated by spaces (encoded as %20). In addition,
"match exactly this"
any
-(do not match if all of these words matches)
indexname:word
, to match in that index.
If the index name is omitted the index named default is searched.
Any noise (characters not in indexes or attributes, and with no query language meaning)
is ignored, all query strings are valid.
The exception is queries which have no meaningful interpretation.
An example is -a
, which one would expect to return
all documents not containing a.
Vespa, however, will return the error message Null query.
All the following examples are of type all.
Get all documents with the word word, having microsoft but not bug in the title:
word title:microsoft -title:bugSearch for all documents having the phrase "to be or not to be", but excluding those having shakespeare in the title:
"to be or not to be" -title:shakespeareGet all documents with the word Christmas in the title that were last modified Christmas Day 2009:
title:Christmas date:20091225Get documents on US Foreign politics, excluding those matching both rival presidential candidates:
"us foreign politics" -(clinton trump)
Get documents on US Foreign politics, including only those matching at least one of the rival presidential candidates:
"us foreign politics" (clinton trump)