The EQUIV Operator

EQUIV is a query operator that can be used to add synonyms for words where the various synonyms should be equivalent.

Use case

The typical use case is something like this:

  • The user's query is something like (used AND automobile)
  • We somehow - probably by using a dictionary - have knowledge that automobile is a synonym for car
  • We rewrite the query to (used AND (automobile EQUIV car))
  • We do not care if a document contains automobile or car, they are equivalent.
The last point deserves some elaboration: In this case, we want the query to behave as if all occurrences of car in the document corpus had been replaced by automobile and we were running the original query (used AND automobile).

Difference from OR operator

In many circumstances, the OR operator will give the same results as an EQUIV. The matching logic is exactly the same, and an OR does not have the limitations that EQUIV does. The difference is in how matches are visible to ranking functions. All words that are children of an OR count for ranking. When using an EQUIV however, it looks like a single word:

  • Counts as only +1 for queryTermCount
  • Counts as 1 word for completeness measures
  • Proximity will not discriminate different words inside the EQUIV
  • Connectivity can be set between the entire EQUIV and the word before and after
  • Items inside the EQUIV are not directly visible to ranking features, so weight and connectivity on those will have no effect.
In many cases it may still be more appropriate to use OR instead of EQUIV. When looking for an entity like Sean Diddy Combs this might look appropriate:
"Diddy" EQUIV "Sean Combs" EQUIV "Sean John" EQUIV "Puff Daddy" EQUIV "P. Diddy"
But Diddy is used by other people - even other pop artists - so matching that alone is not a sure hit for the entity we are looking for, and finding more than one of the synonyms in the same text would give better confidence. This is exactly what OR does, so something like:
"Diddy"!20 OR "Sean Combs"!75 OR "Sean John"!75 OR "Puff Daddy"!80 OR "P. Diddy"!60 OR "Sean John Combs"!100
might be better, with lower weights on the alternatives giving less confidence. If it looks like the many words and phrases inside the OR overwhelms other words in the query, giving even lower weights may be useful, for example making the sum of weights 100 - the default weight for just one alternative.

How to use EQUIV

The decision to use EQUIV must be taken by application-specific dictionary or linguistics use. This can be done from YQL or from a container plugin where the query object can be manipulated as follows:

  • Find a word item in the query
  • Check that an EQUIV can be used in that place (see limitations below).
  • Find the synonyms you want to insert in your dictionary
  • Make word items for your synonyms
  • Make an EquivItem with the synonyms (and the original word of course) as children
  • Replace the original WordItem with the new EquivItem
For ideas on how the code might look there is javadoc available, with a typical insertion of EquivItem looking like this:

private Item equivize(Item item) {
    if (item instanceof TermItem) {
        String word = ((TermItem)item).stringValue();

        // lookup word in dictionary:
        DictEntry entry = dict.get(word);

        // if synonyms found, make equiv and replace this word:
        if (entry != null) {
            EquivItem eq = new EquivItem(item, entry.synonyms);
            return eq;
        }
    } else if (item instanceof PhraseItem ||
               item instanceof PhraseSegmentItem) {
        // cannot put EQUIV inside PHRASE
        return item;
    } else if (item instanceof CompositeItem) {
        CompositeItem cmp = (CompositeItem)item;
        for (int i = 0; i < cmp.getItemCount(); ++i) {
            cmp.setItem(i, equivize(cmp.getItem(i)));
        }
        return cmp;
    }
    return item;
}

Limitations

There are several limitations on how EQUIV can be used in a query:

  • EQUIV may not appear inside a phrase.
  • It may only contain TermItem and PhraseItem instances. You cannot place operators like AND inside EQUIV.
  • PhraseItems inside EQUIV will rank like as if they have size 1.