Chunking Reference

Reference configuration for chunkers: Components that splits text into pieces in chunk indexing expressions, as in

indexing: input myTextField | chunk fixed-length 500 | index

See also the guide to working with chunks.

Built-in chunkers

Vespa provides these built-in chunkers:

Chunker idArgumentsDescription
sentence - Splits the text into chunks at sentence boundaries.
fixed-length target chunk length in characters Splits the text into chunks with roughly equal length. This will prefer to make chunks of similar length, and to split at reasonable locations over matching the target length exactly.

Chunker components

Chunkers are components, so you can also add your own:

<container version="1.0">
    <component id="myChunker"
      class="com.example.MyChunker"
      bundle="the name in artifactId in pom.xml">
        <config name='com.example.my-chunker'>
            <myValue>foo</myValue>
        </config>
    </component>
</container>

You create a chunker component by implementing the com.yahoo.language.process.Chunker interface, see these examples.