services.xml - document-processing
This is the document-processing reference in services.xml:
container document-processing [numnodesperclient, preferlocalnode, maxmessagesinqueue, maxqueuebytesize, maxqueuewait, maxconcurrentfactor, documentexpansionfactor, containercorememory] include documentprocessor [class, bundle, id, idref, provides, before, after] provides before after map field [doctype, in-document, in-processor] chain [name, id, idref, inherits, excludes, documentprocessors] map field [doctype, in-document, in-processor] inherits chain exclude documentprocessor [class, bundle, id, idref, provides, before, after] provides before after map field [doctype, in-document, in-processor] phase [id, idref, before, after] before afterThe root element of the document-processing configuration model. Attributes:
|numnodesperclient||optional||Set to some number below the amount of nodes in the cluster to limit how many nodes a single client can connect to. If you have many clients, this can reduce the memory usage on both document-processing and client nodes.|
|preferlocalnode||optional||false||Set to always prefer sending to a document-processing node running on the same host as the client. You should use this if you are running a client on each document-processing node.|
|maxqueuewait||optional||The maximum number of seconds a message should wait in queue before being processed. Docproc will adapt its queue size to adhere to this. If the queue is full, new messages will be replied to with SESSION_BUSY.|
Document Processor elements
documentprocessor elements are contained in docproc chain elements or in the document-processing root.
A documentprocessor element is either a document processor definition or document processor reference. The rest of this section deals with document processor definitions; document processor references are described in docproc chain elements.
A documentprocessor definition causes the creation of exactly one document processor instance. This instance is set up according to the content of the documentprocessor element.
A documentprocessor definition contained in a docproc chain element defines an inner document processor. Otherwise, it defines an outer document processor.
For inner documentprocessors, the name must be unique inside the docproc chain. For outer documentprocessors, the component id must be unique. An inner documentprocessor is not permitted to have the same name as an outer documentprocessor.
- provides, a single name that should be added to the provides list
- before, a single name that should be added to the before list
- after, a single name that should be added to the after list
- config (one or more)
|id||Mandatory. The component id of the documentprocessor instance.|
|provides||Optional. A space separated list of names that represents what this documentprocessor produces.|
|before||Optional. a space separated list of phase or provided names. Phases or documentprocessors providing these names will be placed later in the docproc chain than this document processor.|
|after||Optional. A space separated list of phase or provided names. Phases or documentprocessors providing these names will be placed earlier in the docproc chain than this document processor.|
Defines a documentprocessor instance of a user specified class.
<documentprocessor id="componentId" class="className:versionSpecification" bundle="bundleSymbolicName:versionSpecification"> <config /> </documentprocessor>Optional attributes:
- class, a component specification containing the name of the class to instantiate to create the document processor instance. If missing, copied from id.
- bundle, a component specification containing the bundle symbolic name and version used to select the bundle. The class is retrieved from this bundle. If missing, copied from class.
Docproc chain elements
Specifies how a docproc chain should be instantiated, and how the contained document processors should be ordered.
- documentprocessor element (one or more), either a documentprocessor reference or documentprocessor definition. If the name given for a documentprocessor matches an outer documentprocessor, it is a documentprocessor reference - otherwise, it is a documentprocessor definition. If it is a documentprocessor definition, it is also an implicit documentprocessor reference saying: use exactly this documentprocessor. All these documentprocessor elements must have different name.
- phase (one or more).
- config (one or more - will apply to all inner documentprocessors in this docproc chain, unless overridden by individual inner documentprocessors).
Set up a field name mapping from the name(s) of field(s) in the input documents to the names used in a deployed docproc. The purpose is to reuse functionality without changing the field names. The example below shows the configuration:
<chain name="myChain"> <map> <field in-document="key" in-processor="id"/> </map> <documentprocessor type="CityDocProc"> <map> <field in-document="town" in-processor="city" doctype="restaurant"/> </map> </documentprocessor> <documentprocessor type="CarDocProc"> <map> <field in-document="engine.cylinders" in-processor="cyl"/> </map> </documentprocessor> </chain>In the example, a chain is deployed with 2 docprocs.
For the chain, a mapping from key to id is set up. Imagine that some or all of the docprocs in the chain read and write to a field called id, but we want this functionality to the document field key.
Furthermore, a similar thing is done for the
CityDocProc: The docproc accesses the field
city, whereas it's called town in the feed.
The mapping only applies to the document type restaurant.
CarDocProc accesses a field called cyl.
In this example this is mapped to the field cylinders of a struct engine
using a dotted notation.
If you specify mappings on different levels of the config (say both for a cluster and a docproc), the mapping closest to the actual docproc will take precedence.