This guide demonstrates how to deploy a stateless searcher implementing a last stage of phased ranking. The searcher re-ranks the global top 200 documents which have been ranked by the content nodes using the configurable ranking specification in the document schema(s).
The reranking searcher uses multiphase searching:
matching query protocol phase
fill query protocol phase
execution.fill
before the re-ranking logic, this would then cost more resources
than just using match-features
which is delivered in the first protocol matching phase. If one needs
access to a subset of fields during stateless re-ranking, consider configuring a dedicated document summary.See also Life of a query in Vespa.
To define the Vespa app package using our custom reranking searcher we need four files:
services.xml
.We start with defining a simple schema with two fields, we also define a ranking profile with two rank features we want to use in the searcher for re-ranking:
schema doc { document doc { field name type string { indexing: summary | index match:text index: enable-bm25 } field downloads type int { indexing: summary | attribute } } fieldset default { fields: name } rank-profile rank-profile-with-match { first-phase { expression: bm25(name) } match-features: bm25(name) attribute(downloads) } }
The searcher implementing the re-ranking logic:
package ai.vespa.example.searcher;
import com.yahoo.search.Query;
import com.yahoo.search.Result;
import com.yahoo.search.Searcher;
import com.yahoo.search.result.FeatureData;
import com.yahoo.search.result.Hit;
import com.yahoo.search.searchchain.Execution;
public class ReRankingSearcher extends Searcher {
@Override
public Result search(Query query, Execution execution) {
int hits = query.getHits();
query.setHits(200); //Re-ranking window
query.getRanking().setProfile("rank-profile-with-match");
Result result = execution.search(query);
if(result.getTotalHitCount() == 0
|| result.hits().getErrorHit() != null)
return result;
double max = 0;
//Find max value of the window
for (Hit hit : result.hits()) {
FeatureData featureData = (FeatureData) hit.getField("matchfeatures");
if(featureData == null)
throw new RuntimeException("No 'matchfeatures' found - wrong rank profile used?");
double downloads = featureData.getDouble("attribute(downloads)");
if (downloads > max)
max = downloads;
}
//re-rank using normalized value
for (Hit hit : result.hits()) {
FeatureData featureData = (FeatureData) hit.getField("matchfeatures");
if(featureData == null)
throw new RuntimeException("No 'matchfeatures' found - wrong rank profile used?");
double downloads = featureData.getDouble("attribute(downloads)");
double normalizedByMax = downloads / max; //Change me
double bm25Name = featureData.getDouble("bm25(name)");
double newScore = bm25Name + normalizedByMax;
hit.setField("rerank-score",newScore);
hit.setRelevance(newScore);
}
result.hits().sort();
//trim the result down to the requested number of hits
result.hits().trim(0, hits);
return result;
}
}
We also need a services.xml file
to make up a Vespa application package.
Here we include our custom searcher in the default
Vespa search chain:
<?xml version="1.0" encoding="utf-8" ?> <services version="1.0" xmlns:deploy="vespa" xmlns:preprocess="properties"> <container id="default" version="1.0"> <document-api/> <search> <chain id="default" inherits="vespa"> <searcher id="ai.vespa.example.searcher.ReRankingSearcher" bundle="ranking"/> </chain> </search> <nodes> <node hostalias="node1" /> </nodes> </container> <content id="docs" version="1.0"> <redundancy>2</redundancy> <documents> <document type="doc" mode="index" /> </documents> <nodes> <node hostalias="node1" distribution-key="0" /> </nodes> </content> </services>
Notice the bundle
name of the searcher, this needs to be in synch with the artifactId
defined in the pom.xml
.
The pom.xml
file is defined as:
<?xml version="1.0"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>ai.vespa.example</groupId> <artifactId>ranking</artifactId> <!-- Note: When changing this, also change bundle names in services.xml --> <version>1.0.0</version> <packaging>container-plugin</packaging> <parent> <groupId>com.yahoo.vespa</groupId> <artifactId>cloud-tenant-base</artifactId> <version>[7,999)</version> <!-- Use the latest Vespa release on each build --> <relativePath/> </parent> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <test.hide>true</test.hide> </properties> </project>
Now, we have the files and can start Vespa:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 8080:8080 --publish 19071:19071 \ vespaengine/vespa
Install Vespa-cli using Homebrew:
$ brew install vespa-cli
Build the Maven project, this step creates the application package including the custom searcher:
$ (cd my-app && mvn package)
Now we can deploy the application to Vespa using vespa-cli:
$ vespa deploy --wait 300 my-app
Create a few sample docs:
{ "put": "id:docs:doc::0", "fields": { "name": "A sample document", "downloads": 100 } }
{ "put": "id:docs:doc::1", "fields": { "name": "Another sample document", "downloads": 10 } }
Feed them to Vespa using the CLI:
$ vespa document doc-1.json && vespa document doc-2.json
Run a query - this will invoke the reranking searcher since it was included in a the default
search chain:
$ vespa query 'yql=select * from doc where userQuery()' \ 'query=sample'
{
"root": {
"id": "toplevel",
"relevance": 1.0,
"fields": {
"totalCount": 2
},
"coverage": {
"coverage": 100,
"documents": 2,
"full": true,
"nodes": 1,
"results": 1,
"resultsFull": 1
},
"children": [
{
"id": "id:docs:doc::0",
"relevance": 1.1823215567939547,
"source": "docs",
"fields": {
"matchfeatures": {
"attribute(downloads)": 100.0,
"bm25(name)": 0.1823215567939546
},
"rerank-score": 1.1823215567939547,
"sddocname": "doc",
"documentid": "id:docs:doc::0",
"name": "A sample document",
"downloads": 100
}
},
{
"id": "id:docs:doc::1",
"relevance": 0.2823215567939546,
"source": "docs",
"fields": {
"matchfeatures": {
"attribute(downloads)": 10.0,
"bm25(name)": 0.1823215567939546
},
"rerank-score": 0.2823215567939546,
"sddocname": "doc",
"documentid": "id:docs:doc::1",
"name": "Another sample document",
"downloads": 10
}
}
]
}
}
Remove app and data:
$ docker rm -f vespa