Inspecting structured data in a Searcher

The Data Access API is used to access structured data (rank features, summary features, arrays and weighted sets).

Use Case: Summary Features

A simple searcher example accessing the summary features looks like this:

import com.yahoo.search.*;
import com.yahoo.search.result.*;
import com.yahoo.search.searchchain.*;
import com.yahoo.data.access.*;

@After(PhaseNames.TRANSFORMED_QUERY)
@Before(PhaseNames.BLENDED_RESULT)
public class SimpleTestSearcher extends Searcher {

    public Result search(Query query, Execution execution) {
        Result r = execution.search(query);
        execution.fill(r);
        for (Hit hit : r.hits().asList()) {
            if (hit.isMeta()) continue;
            Object o = hit.getField("summaryfeatures");
            if (o instanceof Inspectable) {
                Inspectable summaryfeatures = (Inspectable) o;
                Inspector obj = summaryfeatures.inspect();
                if (obj.field("fieldMatch(title)").asDouble(0.0) > 0.85) {
                        hit.setField("goodmatch", "good title");
                }
                if (obj.field("fieldMatch(title)").asDouble(0.0) > 0.95) {
                        hit.setField("goodmatch", "super good title");
                }
                if (obj.field("attribute(quality)").asDouble(0.0) > 0.4) {
                        hit.setField("qualitysource", "good quality");
                }
                if (obj.field("attribute(quality)").asDouble(0.0) > 0.9) {
                        hit.setField("qualitysource", "super good quality");
                }
                hit.removeField("summaryfeatures");
            }
        }
        return r;
    }

}
Here, the searcher processes all hits and looks at the summary features to determine their quality in several ways. (A real example would probably use other ranking features and much more complicated logic, of course).

The Object returned when getting the "summaryfeatures" field implements the com.yahoo.data.access.Inspectable interface. The actual data access API that you will use is the com.yahoo.data.access.Inspector, which can be used to access all sorts of structured data.

In the case of "summaryfeatures" the top-level value is always a simple struct (with type() == Type.OBJECT) with each ranking feature as a double-valued field.

Note the use of the asDouble function with a supplied default value; this ensures that if the ranking feature for some reason is unavailable, we get sensible default behavior instead of getting an exception. If getting the correct data is crucial you may want the exception and handle that instead.

Use Case: accessing array attributes

The following illustrates accessing some field that is of array type:

import com.yahoo.search.*;
import com.yahoo.search.result.*;
import com.yahoo.search.searchchain.*;
import com.yahoo.data.access.*;

@After(PhaseNames.TRANSFORMED_QUERY)
@Before(PhaseNames.BLENDED_RESULT)
public class SimpleTestSearcher extends Searcher {

    public Result search(Query query, Execution execution) {
        Result r = execution.search(query);
        execution.fill(r);
        for (Hit hit : r.hits().asList()) {
            if (hit.isMeta()) continue;
            Object o = hit.getField("titles");
            if (o instanceof Inspectable) {
                StringBuilder pasteBuf = new StringBuilder();
                Inspectable field = (Inspectable) o;
                Inspector arr = field.inspect();
                for (int i = 0; i < arr.entryCount(); i++) {
                    pasteBuf.append(arr.entry(i).asString(""));
                    if (i+1 < arr.entryCount()) {
                        pasteBuf.append(", ");
                    }
                }
                hit.setField("titles", pasteBuf.toString());
            }
        }
        return r;
    }
}
Here we assume there is a field in our search definition like this:
    field titles type array<string> {
        indexing: attribute | summary
    }
Again we process each hit, this time traversing the array and building a string which contains all the titles, transforming a field looking like this:
<titles>
  <item>Bond</item>
  <item>James Bond</item>
<titles>
into this output:
<field name="titles">Bond, James Bond</field>

Use Case: accessing weighted set attributes

The following example illustrates accessing data held in a weighted set. Note that the Data Access API doesn't have a "set" or "weighted set" concept; the weighted set is represented as an unordered array of objects where each object has an "item" and a "weight" field. The weight is a long integer value, while the item type will vary according to the field type as declared in the search definition.

import com.yahoo.search.*;
import com.yahoo.search.result.*;
import com.yahoo.search.searchchain.*;
import com.yahoo.data.access.*;

@After(PhaseNames.TRANSFORMED_QUERY)
@Before(PhaseNames.BLENDED_RESULT)
public class SimpleTestSearcher extends Searcher {

    public Result search(Query query, Execution execution) {
        Result r = execution.search(query);
        execution.fill(r);
        for (Hit hit : r.hits().asList()) {
            processHit(hit);
        }
        return r;
    }

    void processHit(Hit hit) {
        if (hit.isMeta()) return;
        Object o = hit.getField("titles");
        if (o instanceof Inspectable) {
            StringBuilder pasteBuf = new StringBuilder();
            Inspectable field = (Inspectable) o;
            Inspector arr = field.inspect();
            for (int i = 0; i < arr.entryCount(); i++) {
                String sval = arr.entry(i).field("item").asString("");
                long weight = arr.entry(i).field("weight").asLong(0);
                pasteBuf.append("title: ");
                pasteBuf.append(sval);
                pasteBuf.append("[");
                pasteBuf.append(weight);
                pasteBuf.append("]");
                if (i+1 < arr.entryCount()) {
                    pasteBuf.append(", ");
                }
            }
            hit.setField("alternates", pasteBuf.toString());
        }
    }

}
Here we assume there is a field in our search definition like this:
    field titles type weightedset<string> {
        indexing: attribute | summary
    }
Again we process each hit, and format each element of the weighted set, transforming this input:
<titles>
  <item weight='15'>Bond</item>
  <item weight='89'>James Bond</item>
<titles>
into this output:
<field name="alias">title: James Bond[89], title: Bond[15]</field>

Unit testing with structured data

For unit testing it is useful to be able to create structured data fields programmatically. This case be done using Slime:
import com.yahoo.slime.*;
import com.yahoo.data.access.slime.SlimeAdapter;

...

// Struct example:
Slime slime = new Slime();
Cursor struct = slime.setObject();
struct.setString("foo", "bar");
struct.setDouble("number", 1.0);
myHit.setField("mystruct", new SlimeAdapter(struct));

// Array example:
Slime slime = new Slime();
Cursor array = slime.setArray();
array.addString("foo");
array.addString("bar");
myHit.setField("myarray", new SlimeAdapter(array));

// Arrays and objects can be arbitrarily nested

// You can also create the slime structure from a JSON string if you like:
Slime slime = SlimeUtils.jsonToSlime(myJsonString.getBytes(StandardCharsets.UTF_8));
myHit.setField("myfield", new SlimeAdapter(slime.get()));