Managing Query Phrasing with PhrasingSearcher

This document describes what query phrasing is, and shows how to enable and use it in Vespa.

Often, users will search for phrases like e.g. New York, Rolling Stones, The Who, or daily horoscopes. Considering the latter, most of the time the query string will look like this:

/search/?query=daily horoscopes&…
This is actually a search for documents where both "daily" and "horoscopes" match, but not necessarily documents with the exact phrase "daily horoscopes". This is where PhrasingSearcher comes in. PhrasingSearcher is a Searcher that compares queries with a list of common phrases, and replaces two single search terms with a phrase. So the above query becomes:
/search/?query="daily horoscopes"&…
if "daily horoscopes" is known to be a common phrase.

The PhrasingSearcher must be configured with a list of common phrases. This list has to be compiled into an FSA file, which is the file format that PhrasingSearcher uses. To use the compiled list, it has to be available on all container nodes and its location must be referenced in the container configuration.

Compiling the List of Phrases

The list of phrases must be:

  • all lowercase
  • sorted alphabetically
To accomplish this, use:
$ perl -ne 'print lc' listofphrasestextfile.unsorted.mixedcase > listofphrasestextfile.unsorted
$ sort listofphrasestextfile.unsorted > listofphrasestextfile
Note that the Perl command to convert the text file to lowercase does not handle non-ASCII characters very well. If the list of phrases is e.g. UTF-8 encoded and/or contains non-English characters, double-check that the resulting file is correct.

The vespa-makefsa program is used to compile a list of common phrases into an FSA file.

The FSA can be compiled from the list of phrases:

$ vespa-makefsa listofphrasestextfile phrasefsa

Configuration

The compiled list of phrases must be available on all container nodes. When it is present on the same location in the file system on all nodes, add a qr-searchers.cfg file to the configs/ directory of the application package with the following:

com.yahoo.prelude.querytransform.PhrasingSearcher.automatonfile "/path/to/phrase/file"
Replace the highlighted text with the path to your phrase file. If you already have a qr-searchers.cfg file in configs/, just add the line. Finally, deploy the application.