Vespa Query Language - YQL

Vespa accepts unstructured human input and structured queries for application logic separately, then combines them into a single data structure for executing. Human input is parsed heuristically, while application queries are formulated in YQL.

A simple query URL looks like:

http://myhost.mydomain.com:8080/search/?yql=select%20%2A%20from%20sources%20%2A%20where%20text%20contains%20%22blues%22%3B
In other words, yql contains:
select%20%2A%20from%20sources%20%2A%20where%20text%20contains%20%22blues%22%3B
This matches all documents where the field named text contains the word blues.

This document has examples and guides for how to use search operators as found in the query language reference.

Example queries:

Ordering
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where default contains \"bad\" order by year desc;"}' \
    http://localhost:8080/search/
Grouping
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where default contains \"bad\" | all(group(year) each(output(sum(duration))));"}' \
    http://localhost:8080/search/
Pagination
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where default contains \"bad\" limit 2 offset 1;"}' \
    http://localhost:8080/search/
Numeric
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where year > 2000;"}' \
    http://localhost:8080/search/
Boolean
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where alive = true;"}' \
    http://localhost:8080/search/
Phrase
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where artist contains phrase(\"michael\", \"jackson\");"}' \
    http://localhost:8080/search/
Timeout
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where default contains \"bad\" timeout 100;"}' \
    http://localhost:8080/search/
Regexp
$ curl -H "Content-Type: application/json" \
    --data '{"yql" : "select * from sources * where title matches \"mado[n]+a\";"}' \
    http://localhost:8080/search/

equiv

EQUIV is a query operator that can be used to add synonyms for words where the various synonyms should be equivalent. The typical use case is something like:

  • The user's query is something like (used AND automobile)
  • We somehow - probably by using a dictionary - have knowledge that automobile is a synonym for car
  • We rewrite the query to (used AND (automobile EQUIV car))
  • We do not care if a document contains automobile or car, they are equivalent. We want the query to behave as if all occurrences of car in the document corpus had been replaced by automobile and we were running the original query (used AND automobile)

Difference from OR operator

See the reference for differences between OR and EQUIV. In many cases it might be more correct to use OR instead of EQUIV. When looking for an entity like Sean Diddy Combs this might look appropriate:

"Diddy" EQUIV "Sean Combs" EQUIV "Sean John" EQUIV "Puff Daddy" EQUIV "P. Diddy"
But Diddy is used by other people - even other pop artists - so matching that alone is not a sure hit for the entity we are looking for, and finding more than one of the synonyms in the same text would give better confidence. This is exactly what OR does, so something like:
"Diddy"!20 OR "Sean Combs"!75 OR "Sean John"!75 OR "Puff Daddy"!80 OR "P. Diddy"!60 OR "Sean John Combs"!100
might be better, with lower weights on the alternatives giving less confidence. If it looks like the many words and phrases inside the OR overwhelms other words in the query, giving even lower weights may be useful, for example making the sum of weights 100 - the default weight for just one alternative.

How to use

The decision to use EQUIV must be taken by application-specific dictionary or linguistics use. This can be done from YQL or from a container plugin where the query object can be manipulated as follows:

  • Find a word item in the query
  • Check that an EQUIV can be used in that place (see limitations)
  • Find the synonyms in the dictionary
  • Make word items for the synonyms
  • Make an EquivItem with the synonyms (and the original word) as children
  • Replace the original WordItem with the new EquivItem
For ideas on how the code might look there is javadoc available, with a typical insertion of EquivItem looking like this:

private Item equivize(Item item) {
    if (item instanceof TermItem) {
        String word = ((TermItem)item).stringValue();

        // lookup word in dictionary:
        DictEntry entry = dict.get(word);

        // if synonyms found, make equiv and replace this word:
        if (entry != null) {
            EquivItem eq = new EquivItem(item, entry.synonyms);
            return eq;
        }
    } else if (item instanceof PhraseItem ||
               item instanceof PhraseSegmentItem) {
        // cannot put EQUIV inside PHRASE
        return item;
    } else if (item instanceof CompositeItem) {
        CompositeItem cmp = (CompositeItem)item;
        for (int i = 0; i < cmp.getItemCount(); ++i) {
            cmp.setItem(i, equivize(cmp.getItem(i)));
        }
        return cmp;
    }
    return item;
}