Custom Linguistics

A linguistics component is an implementation of com.yahoo.language.Linguistics. Refer to the com.yahoo.language.simple.SimpleLinguistics implementation (which can be subclassed for convenience).

SimpleLinguistics provides support for english stemming only. Try loading the com.yahoo.language.simple.SimpleLinguistics module, or providing another linguistics module.

The linguistics implementation must be configured as a component in container clusters doing linguistics processing, see injecting components.

As document processing for indexing is by default done by an autogenerated container cluster which cannot be configured, specify a container cluster for indexing explicitly.

This example shows how to configure SimpleLinguistics for linguistics using the same cluster for both query and indexing processing (if using different clusters, add the same linguistics component to all of them):

<services>

    <container version="1.0" id="mycontainer">
        <component id="com.yahoo.language.simple.SimpleLinguistics"/>
        <document-processing/>
        <search/>
        <nodes ...>
    </container>

    <content version="1.0">
        <redundancy>1</redundancy>
        <documents>
            <document type="mydocument" mode="index"/>
            <document-processing cluster="mycontainer"/>
        </documents>
        <nodes ...>
    </content>

</services>

If changing the linguistics component of a live system, recall can be reduced until all documents are re-written. This because documents will still be stored with tokens generated by the previous linguistics module.