Processors, searcher plug-ins and document processors are chained components. They are executed serially, with each providing some service or transform, and other optionally depending on these. In other words, a chain is a set of components with dependencies. Javadoc: com.yahoo.component.chain.Chain
It is useful to read the federation guide before this document.
A chained component has three basic differences from a component in general:
What a component should be placed before, what it should be placed after and what itself provides, may be either defined using Java annotations directly on the component class, or it may be added specifically to the component declarations in services.xml. In general, the implementation should have as many of the necessary annotations as practical, leaving the application specific configuration clean and simple to work with.
The execution order of the components in a chain is not defined by the order of the components in the configuration. Instead, the order is defined by adding the ordering constraints to the components:
@Provides
some
named functionality (the names are just labels that have no meaning
to the container).@Before
some named functionality,@After
some
functionality.The container will pick any ordering of a chain consistent with the constraints of the components in the chain.
Dependencies can be added in two ways. Dependencies which are due to the code should be added as annotations in the code:
import com.yahoo.processing.*;
import com.yahoo.component.chain.dependencies.*;
@Provides("SourceSelection")
@Before("Federation")
@After("IntentModel")
public class SimpleProcessor extends Processor {
@Override
public Response process(Request request, Execution execution) {
//TODO: Implement this
}
}
Multiple functionality names may be specified by using the
syntax @Provides/Before/After({"A",
"B"})
.
Annotations which do not belong in the code may be added in the configuration:
<container version="1.0">
<processing>
<processor id="processor1" class="ai.vespa.examples.Processor1" />
<chain id="default">
<processor idref="processor1"/>
<processor id="processor2" class="ai.vespa.examples.Processor2">
<after>ai.vespa.examples.Processor1</after>
</processor>
</chain>
</processing>
<nodes>
<node hostalias="node1" />
</nodes>
</container>
For convenience, components always Provides
their own
fully qualified class name (the package and simple class name
concatenated, e.g.
ai.vespa.examples.SimpleProcessor
) and their
simple name (that is, only the class name, like
SimpleProcessor
in our searcher case), so it is always
possible to declare that one must execute before or after some
particular component. This goes for both general processors, searchers
and document processors.
Finally, note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”.
As implied by examples above, chains may inherit other chains in services.xml.
A chain will include all components from the chains named in the
optional inherits
attribute, exclude from that set all
components named in the also optional
excludes
attribute and add all the components listed
inside the defining tag. Both inherits
and
excludes
are space delimited lists of reference
names.
For search chains, there are two built-in search chains which are especially
useful to inherit from, native
and vespa
.
native
is a basic search chain, containing the
basic functionality most systems will need anyway,
vespa
inherits from native
and adds a
few extra searchers which most installations containing Vespa backends will need.
A component should be unit tested in a chain containing the components it depends on.
It is not necessary to run the dependency handling framework to achieve that,
as the com.yahoo.component.chain.Chain
class has several
constructors which are easy to use while testing.
Chain<Searcher> c = new Chain(new UselessSearcher("first"), new UselessSearcher("second"), new UselessSearcher("third")); Execution e = new Execution(c, Execution.Context.createContextStub(null)); Result r = e.search(new Query());
The above is a rather useless test, but it illustrates how the basic workflow can be simulated. The constructor will create a chain with supplied searchers in the given order (it will not analyze any annotations).
When different searchers or document processors depend on shared classes or field names, it is good practice defining the name only in a single place. An example in the searcher development introduction illustrates an easy way to do that.
The search chain to use can be selected in the request, by adding the request parameter:
searchChain=myChain
If no chain is selected in the query, the chain called
default
will be used. If no chain called
default
has been configured, the chain called
native
will be used. The native chain is
always present and contains a basic set of searchers needed in most applications.
Custom chains will usually inherit the native chain to include those searchers.
The search chain can also be set in a query profile.
Annotations which do not belong in the code may be added in the configuration, here a simple example with search chains:
<container version="1.0"> <search> <chain id="default" inherits="vespa"> <searcher id="simpleSearcher" bundle="the name in artifactId in pom.xml" /> </chain> <searcher id="simpleSearcher" class="ai.vespa.examples.SimpleSearcher" bundle="the name in artifactId in pom.xml" > <before>Cache</before> <after>Statistics</after> <after>Logging</after> <provides>SimpleTest</provides> </searcher> </search> <nodes> <node hostalias="node1" /> </nodes> </container>
And for document processor chains, it becomes:
<container version="1.0">
<document-processing>
<chain id="default">
<documentprocessor id="ReplaceInFieldDocumentProcessor">
<after>TextMetrics</after>
</documentprocessor>
</chain>
</document-processing>
<nodes>
<node hostalias="node1"/>
</nodes>
</container>
For searcher plugins the class com.yahoo.search.searchchain.PhaseNames defines a set of checkpoints third party searchers may use to help order themselves when extending the Vespa search chains.
Note that ordering constraints are just that; in particular they are not used to determine if a given search chain, or set of search chains, is “complete”.
Use case: In a search chain, do early return and do further search asynchronously using ExecutorService.
Pseudocode: If cache hit (e.g. using Redis), just return cached data. If cache miss, return null data and let the following searcher finish further query and write back to cache: