This is the reference for the semantic rule language in Vespa. For a guide on using this language, see query rewriting. Refer to the Query API for how to use in queries.
Semantic rules are collected in files called rule bases. The name of these files are [rule-base-name].sr. They must be placed in [application-package]/rules/ to be deployed.
A rule base may contain any number of the following four constructs, explained in the rest of this document:
Production rules and named conditions are statements.
Statements may span multiple lines and are terminated by ;
.
A directive is a "meta-level" statement which is not used during rule evaluation,
but tells the rule engine how to use the rule base.
A statement starts by @
and ends by newline.
They may take parameters. These directives exist:
Statement | Usage | Location |
---|---|---|
@default | Make this rule base the default, to be used with all queries | Anywhere outside other statements |
@automata(<automata-filename>) | Use an automata file with this base | Anywhere outside other statements |
@include(<rulebase-name>) | Include all the statements of another rule base in this | Anywhere outside other statements |
@super | Include the conditions of the same-named conditions from the included rule base | In a condition |
@stemming(<true|false>) | Whether terms should match after stemming or exactly (true by default) | Before any rule |
@language(<language-code>) | The language of the rule base, which should also be the query language. Influences stemming. | Before any rule |
A production rule is of the form:
<condition> <operator> <production-list>;
This performs the production as defined by the operator if the condition matches. There are two kinds of production rules (and two operators), replacing and adding:
Rule kind | Operator | Meaning |
---|---|---|
Replacing | -> | Replace the matched terms by the production |
Adding | +> | Add the production to the matched terms |
A namespace is a collection of facts which can be read from conditions and changed by productions. Namespaces may be positional (sequences), or not. A positional namespace will track the current fact and match and insert at the current position, while non-positional namespaces will match any fact against any condition.
There is a default namespace which does not need an explicit reference. For query rules, the default namespace is the query terms.
To determine the namespace used to read from conditions or change in productions, use:
<namespace>.<condition> <namespace>.<production>
There are two namespaces defined during query processing:
Namespace | Syntax | Positional | Description |
---|---|---|---|
Query | Yes | The default namespace. References the terms of the query. The condition value returned will be the term itself. | |
Parameter | parameter. |
No | References the parameter of the query. Conditions will be true if the parameter is set in the query. The value returned from conditions is the value of the parameter. Productions will need both a key and value specified to set a parameter value. |
A named condition is on the form:
[condition-name] :- <condition>;
This simply assigns a name to the condition on the right, so it can be referred to the conditions in rules and other named conditions.
A condition is an expression which evaluates to true or false over the facts of a namespace. If the namespace is positional (a sequence), evaluation starts at the current position in the namespace. When evaluated true, conditions will also return a value which can be referenced by comparison conditions.
Conditions may be preceded by a reference name and a label:
(<reference-name>/)?(<label>:)?<condition>
The reference name allows an explicit name to be set, from which the terms matched by the condition can be referred in a condition. This is useful when multiple conditions of the same type are used in the condition of the same rule.
If no reference name is given, the text standing between the square brackets of the condition is used as reference name.
If a label is specified,
the condition will only match terms having that label (the label is the index in query terms).
If a label is not set, the term will match if a label is not set, or if it is default
.
These are the supported kinds of conditions:
Condition | Syntax | Meaning | Returned value |
---|---|---|---|
Term | <term> | True if this is the term at the current position | Determined by the namespace |
Reference (produce the matched term(s)) | [<condition-name>] | Evaluate a named condition | The matched term(s) of the condition |
Reference (produce all terms in the condition) | [<condition-name>*] | Evaluate a named condition | All the terms in the condition |
Sequence | <condition> <condition> | Match both conditions by consecutive terms in the right order in the sequence | The last nested condition value |
Choice | <condition>, <condition> | Match any one of the conditions, each one tried at the current position | The last nested condition value |
Group | (<condition>) | Evaluate the condition inside the grouping as a unit | The last nested condition value |
Ellipsis | … | Matches any sequence to make the overall condition match | The matched sequence |
Referable ellipsis | […] | An ellipsis where the matched sequence can be referenced from the production | The matched sequence |
Not | !<condition> | Matches if the condition does not match | Nothing |
And | <condition> & <condition> | Matches if all the conditions matches at the (same) current position | The last nested condition value |
Comparison | <condition> <operator> <condition> | True if the comparison is true for the values returned from the conditions | The last nested condition value |
Literal | '<literal>' | Returns a value for comparison. This always evaluates to true. | The literal value |
Start anchor | . <condition> | Matches condition only if it matches the query from the start | The matched sequence |
End anchor | <condition> . | Matches condition only if it matches the query to the end | The matched sequence |
The possible operators of a comparison condition are:
Operator | Meaning |
---|---|
= | Left and right values are equal |
<= | Left value is smaller or equal |
>= | Left value is larger or equal |
< | Left value is smaller |
> | Left value is larger |
=~ | Left value contains right value as a substring |
A production list consists of a space-separated list of productions which are carried out when the production of a rule is matched. A production can be preceded by the type of term to produce, a label (index in queries), and followed by the weight (importance) of the produced value:
(<term-type>)?(<label>:)?<production>(!<weight>)?
The default term type is the term type of the context which the term is added to. The possible explicit term types are:
Syntax | Meaning |
---|---|
? | Insert as an OR term |
= | Insert as an EQUIV term |
+ | Insert as an AND term |
$ | Insert as a RANK term |
- | Insert as a NOT term |
If included, the label decides the label the produced term(s) will have in the namespace. This is the index in the query namespace.
There are three types of productions:
Production | Syntax | Meaning |
---|---|---|
Literal term | <term> | Produce this term literally |
Literal term with value | <term>='<value>' | Produce this term and value literally. |
Reference | [<condition-reference>] | Produce the terms matched by the referenced condition. The
reference name is either the name of a named condition used in
the condition, an
ellipsis - … - or an explicit
condition reference name.
|
The weight is a percentage integer denoting the importance of the produced term. The default is 100. In the query namespace the weight becomes the term weight, determining the relevance contribution of the term.