services.xml - 'content'

content [version, id, distributor-base-port]
    documents [selection, garbage-collection, garbage-collection-interval]
        document [type, selection, mode]
        document-processing [cluster, chain]
    redundancy
    nodes
        node [baseport, hostalias, jvmargs??, preload, distribution-key, capacity]
    group [distribution-key, name]
        distribution
        node [baseport, hostalias, jvmargs??, preload, distribution-key, capacity]
        group [distribution-key, name]
    engine
        proton
            searchable-copies
            tuning
                searchnode
                    requestthreads
                        search
                        persearch
                        summary
                    flushstrategy
                        native
                            total
                                maxmemorygain
                                diskbloatfactor
                            component
                                maxmemorygain
                                diskbloatfactor
                                maxage
                            transactionlog
                                maxentries DEPRECATED
                                maxsize
                            conservative
                                memory-limit-factor
                                disk-limit-factor
                    resizing
                        initialdocumentcount
                        amortize-count
                    initialize
                        threads
                    feeding
                        concurrency
                    index
                        io
                            write
                            read
                            search
                        warmup
                            time
                            unpack
                    attribute
                        io
                            write
                    removed-db
                        prune
                            age
                            interval
                    summary
                        io
                            write
                            read
                        store
                            cache
                                maxsize
                                maxsize-percent
                                compression
                                    type
                                    level
                            logstore
                                maxfilesize
                                chunk
                                    maxsize
                                    maxentries
                                    compression
                                        type
                                        level
            flush-on-shutdown
            resource-limits
                disk
                memory
    search
        query-timeout
        visibility-delay
        coverage
            minimum
            min-wait-after-coverage-factor
            max-wait-after-coverage-factor
    dispatch DEPRECATED
        num-dispatch-groups DEPRECATED
        group DEPRECATED
            node [distribution-key] DEPRECATED
    tuning
        bucket-splitting [max-documents, max-size, minimum-bits]
        min-node-ratio-per-group
        distribution [type]
        maintenance [start, stop, high]
        merges [max-per-node, max-queue-size]
        persistence-threads [count]
        visitors [thread-count, max-queue-size]
            max-concurrent [fixed, variable]
        dispatch
            max-hits-per-partition
            top-k-probability
            dispatch-policy
            min-group-coverage
            min-active-docs-coverage
        cluster-controller
            init-progress-time
            transition-time
            max-premature-crashes
            stable-state-period
            min-distributor-up-ratio
            min-storage-up-ratio

content

The root element of a Content cluster definition. Creates a content cluster. A content cluster stores and/or indexes documents. The xml file may have zero or more such tags.

Contained in services. Attributes:

  • version (required): Must be set to '1.0' in this version of Vespa.
  • id (required for multiple clusters): Name of the content cluster. If none is supplied, the cluster name will be 'content'. Cluster names must be unique within application, so if several clusters are setup, name must be set for all but one at minimum. Suggested set by everyone for cluster to have a meaningful name. Allows you to add clusters later without having to rename existing one for the names to make sense.
  • distributor-base-port (optional): If a specific port is required for access to the distributor, override it with this attribute.
Required subelements: Optional subelements:

documents

Contained in content. Defines which document types should be routed to this content cluster using the default route, and what documents should be kept in the cluster if the garbage collector runs. Read more on expiring documents. Also have some backend specific configuration for whether documents should be searchable or not. Attributes:

NameRequiredValueDefaultDescription
selection optional string

A document selection, restricting documents that are routed to this cluster. Defaults to a selection expression matching everything.

This selection can be specified to match document identifier specifics that are independent of document types. For restrictions that apply only to a specific document type, this must be done within that particular document type's document element. Trying to use document type references in this selection makes an error during deployment. The selection given here will be merged with per-document type selections specified within document tags, if any, meaning that any document in the cluster must match both selections to be accepted and kept.

This feature is primarily used to expire documents.

garbage-collection optional true / false false If true, regularly verify the documents stored in the cluster to see if they belong in the cluster, and delete them if not. If false, garbage collection is not run.
garbage-collection-interval optional integer 3600 Time (in seconds) between garbage collection cycles.
Subelements:

document

Contained in documents. The document type to be routed to this content cluster. Attributes:

NameRequiredValueDefaultDescription
type required string Document type name
mode required index /
store-only /
streaming

The mode of storing and indexing. In this documentation, index is assumed unless explicitly mentioned streaming or store-only. Refer to streaming search for store-only, as documents are stored the same way for both cases.

Changing mode requires an indexing-mode-change validation override, and documents must be re-fed.

selection optional string

A document selection, restricting documents that are routed to this cluster. Defaults to a selection expression matching everything.

This selection must apply to fields in this document type only. Selection will be merged together with selection for other types and global selection from documents to form a full expression for what documents belong to this cluster.

global optional true / false false

Set to true to distribute all documents of this type to all nodes in the content cluster it is defined.

Fields in global documents can be imported into documents to implement joins - read more in parent/child. Vespa will detect when a new (or outdated) node is added to the cluster and prevent it from taking part in searches until it has received all global documents.

Changing from false to true or vice versa requires a global-document-change validation override. First, stop services on all content nodes. Then, deploy with the validation override. Finally, start services on all content nodes.

Note: global is only supported for mode="index".

document-processing

Contained in documents. Vespa Search specific configuration for which document processing cluster and chain to run index pre processing. Attributes:

NameRequiredValueDefaultDescription
cluster optional string Container cluster on content node Name of a document-processing container cluster that does index pre processing. Use cluster to specify an alternative cluster, other than the default cluster on content nodes.
chain optional string indexing chain A document processing chain in the container cluster specified by cluster to use for index pre processing. The chain must inherit the indexing chain.
Example - the container cluster enables document-processing, referred to by the content cluster:
<container id="my-indexing-cluster" version="1.0">
  <document-processing/>
</container>
<content id="music" version="1.0">
  <documents>
    <document-processing cluster="my-indexing-cluster"/>
  </documents>
</content>
To add document processors either before or after the indexer, declare a chain (inherit indexing) in a document-processing container cluster and add document processors. Annotate document processors with before=indexingStart or after=indexingEnd. Configure this cluster and chain as the indexing chain in the content cluster - example:
<container id="my-indexing-cluster" version="1.0">
  <document-processing>
    <chain id="my-document-processors" inherits="indexing">
      <documentprocessor id="MyDocproc">
        <before>indexingStart</before>
      </documentprocessor>
      <documentprocessor id="MyOtherDocproc">
        <after>indexingEnd</after>
      </documentprocessor>
    </chain>
  </document-processing>
</container>
<content id="music" version="1.0">
  <documents>
    <document-processing cluster="my-indexing-cluster" chain="my-document-processors" />
  </documents>
</content>
Also note the document-api configuration, applications can set up this API on the same nodes as document-processing - find details in indexing.

redundancy

Contained in content. Defines the total number of copies of each piece of data the cluster will maintain to avoid data loss. Example: with a redundancy of 2, the system tolerates 1 node failure before data becomes unavailable (until the system has managed to create new replicas on other online nodes).

Redundancy can be changed without node restart - replicas will be generated or dropped automatically.

nodes

Contained in content. Defines the set of content nodes in the cluster - parent for node-elements.

node

Contained in nodes or group. Configures a content node to the cluster. Attributes:

NameRequiredValueDefaultDescription
distribution-key required integer

Sets the distribution key of a node. It is not recommended to change this for a given node. It is recommended (but not required) that the set of distribution keys in the cluster are contiguous and starting at 0. Example: If the biggest distribution key is 499, then the distribution algorithm need to calculate 500 random numbers to calculate the correct target. It is hence recommended to not leave too many gaps in the distribution key range.

Distribution keys are used to identify nodes and groups for the distribution algorithm. If a node changes distribution key, the distribution algorithm regards it as a new node, hence buckets are redistributed. When merging clusters, one might need to change distribution keys - details on merging clusters.

Content nodes need unique node distribution keys across the whole cluster, as the key is also used as a node identifier where group information is not specified.

capacity optional double 1 Capacity of this node, relative to other nodes. A node with capacity 2 will get double the data and feed requests of a node with capacity 1. This feature is expert mode only. Don't use if you don't know what you are doing.
baseport optional integer baseport
hostalias optional string hostalias
jvmargs optional string jvmargs
preload optional string preload

group

Contained in content or group - groups can be nested. Defines the hierarchical structure of the cluster. Can not be used in conjunction with the nodes element. Groups can contain other groups or nodes, but not both.

When using groups in Open Source Vespa, searchable-copies and redundancy is the total replica number, across all leaf groups in the cluster. For groups in Vespa Cloud, see documentation. Attributes:

NameRequiredValueDefaultDescription
distribution-key required integer Sets the distribution key of a group. It is not allowed to change this for a given group. Group distribution keys only need to be unique among groups that share the same parent group.
name required string The name of the group, used for access from status pages and the like.
Note: There is currently no deployment-time verification that the distribution key remains unchanged for any given node or group. Consequently, take great care when modifying the set of nodes in a content cluster. Assigning a new distribution key to an existing node is undefined behavior; Best case, the existing data will be temporarily unavailable until the error has been corrected. Worst case, risk crashes or data loss.

See Vespa Serving Scaling Guide for when to consider using grouped distribution and Examples for example deployments using flat and grouped distribution.

distribution (in group)

Contained in group. Defines the data distribution to subgroups of this group. distribution should not be in the lowest level group containing storage nodes, as here the ideal state algorithm is used directly. In higher level groups, distribution is mandatory. Attributes:

  • partitions (required, if there are subgroups in the group): String conforming to the partition specification
Partition specificationDescription
*Distribute all copies over 1 of N groups
*|*Distribute all copies over 2 of N groups
*|*|*Distribute all copies over 3 of N groups
The partition specification is used to evenly distribute content copies across groups. You must write one * per group separated by pipes (e.g. *|* for two groups). See Sample deployment configurations.

engine

Contained in content. Specify the content engine to use, and/or adjust tuning parameters for the engine. Allowed engines are proton and dummy, the latter being used for debugging purposes. If no engine is given, proton is used. Sub-element: proton.

proton

Contained in engine. If specified, the content cluster will use the Proton content engine. This engine supports storage, indexed search and secondary indices. Optional sub-elements are searchable-copies, tuning, flush-on-shutdown, and resource-limits.

searchable-copies

Contained in proton. Default value is 2, or redundancy, if lower. If set to less than redundancy, only some of the stored copies are ready for searching at any time. This means that node failures causes temporary data unavailability while the alternate copies are being indexed for search. The benefit is using less memory, trading off availability during transitions. Refer to bucket move.

If updating documents or using document selection for garbage collection, consider setting fast-access on the subset of attribute fields used for this to make sure that these attributes are always kept in memory for fast access. Note that this is only useful if searchable-copies is less than redundancy.

searchable-copies can be changed without node restart.

tuning

Contained in proton, optional. Tune settings for the search nodes in a content cluster - sub-element:

ElementRequiredQuantity
searchnode No Zero or one

searchnode

Contained in tuning, optional. Tune settings for search nodes in a content cluster - sub-elements:

ElementRequiredQuantity
requestthreads No Zero or one
flushstrategy No Zero or one
resizing No Zero or one
initialize No Zero or one
feeding No Zero or one
index No Zero or one
attribute No Zero or one
summary No Zero or one
<tuning>
    <searchnode>
        <requestthreads></requestthreads>
        <flushstrategy></flushstrategy>
        <resizing></resizing>
        <initialize></initialize>
        <feeding></feeding>
        <index></index>
        <attribute></attribute>
        <summary></summary>
    </searchnode>
</tuning>

requestthreads

Contained in searchnode, optional. Tune the number of request threads used on a search node - optional sub-elements:

  • persearch: Number of search threads used per search, see Sizing Vespa Search documentation for an introduction to using multiple threads per search per node to scale down latency. Default value is 1.
  • summary: Number of summary threads, default 16
<requestthreads>
    <search>64</search>
    <persearch>1</persearch>
    <summary>16</summary>
</requestthreads>
Number of threads per search can adjusted down per rank-profile using num-threads-per-search

flushstrategy

Contained in searchnode, optional. Tune the native-strategy for flushing components to disk - a smaller number means more frequent flush:

  • Memory gain is how much memory can be freed by flushing a component
  • Disk gain is how much disk space can be freed by flushing a component (typically by using compaction)
Refer to Proton maintenance jobs. Optional sub-elements:
  • native:
    • total
      • maxmemorygain: The total maximum memory gain (in bytes) for all components before running flush, default 4294967296 (4 GB)
      • diskbloatfactor: Trigger flush if the total disk gain (in bytes) for all components is larger than the factor times current total disk usage, default 0.2
    • component
      • maxmemorygain: The maximum memory gain (in bytes) by a single component before running flush, default 1073741824 (1 GB)
      • diskbloatfactor: Trigger flush if the disk gain (in bytes) by a single component is larger than the given factor times the current disk usage by that component, default 0.2
      • maxage: The maximum age (in seconds) of unflushed content for a single component before running flush, default 86400 (24h)
    • transactionlog
      • maxentries: DEPRECATED (use maxsize instead): The maximum number of entries in the transaction log for a document type before running flush, default 1000000 (1 M)
      • maxsize: The total maximum size (in bytes) of transaction logs for all document types before running flush, default 21474836480 (20 GB)
    • conservative
      • memory-limit-factor: When resource-limits for memory is reached, flush more often by downscaling total.maxmemorygain and component.maxmemorygain, default 0.5
      • disk-limit-factor: When resource-limits for disk is reached, flush more often by downscaling transactionlog.maxsize, default 0.5
<flushstrategy>
    <native>
        <total>
            <maxmemorygain>4294967296</maxmemorygain>
            <diskbloatfactor>0.2</diskbloatfactor>
        </total>
        <component>
            <maxmemorygain>1073741824</maxmemorygain>
            <diskbloatfactor>0.2</diskbloatfactor>
            <maxage>86400</maxage>
        </component>
        <transactionlog>
            <maxsize>21474836480</maxsize>
        </transactionlog>
        <conservative>
            <memory-limit-factor>0.5</memory-limit-factor>
            <disk-limit-factor>0.5</disk-limit-factor>
        </conservative>
    </native>
</flushstrategy>

resizing

Contained in searchnode, optional. Tune settings for data structure resizing to handle more or less documents. Optional sub-elements:

  • initialdocumentcount: The data structures used by the search node will be initialized to this number of documents before resizing - default 1024. Setting this value can help speed up the initial feed of documents. As attribute resize keep both current and new version in memory at the same time, peak memory usage more than doubles when growing the attribute. Setting initialdocumentcount higher than expected maximum number of documents per node will prevent a resize, which is useful if memory is the limiting sizing factor.
  • amortize-count: When growing the number of documents on a node the expansion of attributes will be spread out over N documents - default 1024. This is to spread the memory spikes out in time. You will not have to tune this. The default is good for all usecases. Documented for reference.
<resizing>
    <initialdocumentcount>1024</initialdocumentcount>
    <amortize-count>1024</amortize-count>
</resizing>

initialize

Contained in searchnode, optional. Tune settings related to how the search node (proton) is initialized. Optional sub-elements:

  • threads: The number of initializer threads used for loading structures from disk at proton startup. The threads are shared between document databases when the value is larger than 0. Default value is the number of document databases + 1.
    • When set to larger than 1, document databases are initialized in parallel
    • When set to 1, document databases are initialized in sequence
    • When set to 0, 1 separate thread is used per document database and they are initialized in parallel.
<initialize>
   <threads>2</threads>
</initialize>

feeding

Contained in searchnode, optional. Tune proton settings for feed operations. Optional sub-elements:

  • concurrency: A number between 0.0 and 1.0 that specifies the concurrency when handling feed operations, default 0.35. When set to 1.0, all cores on the cpu can be used for feeding. See feeding.concurrency for details on how this setting affects the thread pools used for feed operations.
<feeding>
    <concurrency>0.8</concurrency>
</feeding>

index

Contained in searchnode, optional. Tune various aspect with the handling of disk and memory indexes. Optional sub-elements:

  • io
    • write: Controls io write options used during index dump and fusion, values={normal,directio}, default directio
    • read: Controls io read options used during index dump and fusion, values={normal,directio}, default directio
  • warmup
    • time: Specifies in seconds how long the index shall be warmed up before being switch in for serving. During warmup it will receive queries and posting lists will be iterated, but results ignored as they are duplicates of the live index. This will pull in the most important ones in the cache. However as warming up an index will occupy more memory do not turn it on unless you suspect you need it. And always benchmark to see if it is worth it.
    • unpack: Controls whether all posting features are pulled in to the cache, or only the most important. values={true, false}, default false.
<index>
    <io>
        <write>directio</write>
        <read>directio</read>
        <search>mmap</search>
    </io>
    <warmup>
        <time>60</time>
        <unpack>true</unpack>
    </warmup>
</index>

attribute

Contained in searchnode, optional. Tune various aspect with the handling of attribute vectors. Optional sub-elements:

  • io
    • write: Controls io write options used during flushing of attribute vectors, values={normal,directio}, default directio
<attribute>
    <io>
        <write>directio</write>
    </io>
</attribute>

removed-db

Contained in searchnode, optional. Tune various aspect of the db of removed documents. Optional sub-elements:

  • prune
    • age: Specifes how long (in seconds) we must remember removed documents before we can prune them away. Default is 2 weeks. This sets the upper limit on long a node can be down and still be accepted back in the system, without having the index wiped. There is no point in having this any higher than the age of the documents. If corpus is re-fed every day, there is no point in having this longer than 24 hours.
    • interval: Specifies how often (in seconds) to prune old documents. Default is 600 which is 10 minutes. No need to change default. Exposed here for reference and for testing.
<removed-db>
    <prune>
        <age>86400</age>
    </prune>
</removed-db>

summary

Contained in searchnode, optional. Tune various aspect with the handling of document summary. Refer to proton.def to look for parameter values and defaults. Optional sub-elements:

  • io
    • write: Controls io write options used during flushing of stored documents. See summary.write.io
    • read: Controls io read options used during reading of stored documents. Values are directio mmap populate. Default is mmappopulate will do an eager mmap and touch all pages.
  • store
    • cache: Used to tune the cache used by the document store. Enabled by default, using up to 5% of available memory.
      • maxsize: The maximum size of the cache in bytes. If given it takes presedence over maxsize-percent. See summary.cache.maxbytes
      • maxsize-percent: The maximum size of the cache in percent of available memory. Default is 5%.
      • compression
        • type: The compression type of the documents while in the cache. See summary.cache.compression.type
        • level: The compression level of the documents while in cache. See summary.cache.compression.level
    • logstore: Used to tune the actual document store implementation (log-based).
      • maxfilesize: The maximum size (in bytes) per summary file on disk. See summary.log.maxfilesize and document-store-compaction
      • chunk
        • maxsize: Maximum size (in bytes) of a chunk. See summary.log.chunk.maxbytes
        • maxentries: DEPRECATED (use maxsize instead): Maximum number of documents in a chunk. See summary.log.chunk.maxentries
        • compression
          • type: Compression type of the documents. See summary.log.chunk.compression.type
          • level: Compression level of the documents. See summary.log.chunk.compression.level
<summary>
    <io>
        <write>directio</write>
        <read>mmap</read>
    </io>
    <store>
        <cache>
            <maxsize>0</maxsize>
            <compression>
                <type>none</type>
            </compression>
        </cache>
        <logstore>
            <maxfilesize>1000000000</maxfilesize>
            <chunk>
                <maxsize>65536</maxsize>
                <compression>
                    <type>zstd</type>
                    <level>9</level>
                </compression>
            </chunk>
        </logstore>
    </store>
</summary>

flush-on-shutdown

Contained in proton. Default value is true. If set to true, search nodes will flush a set of components (e.g. memory index, attributes) to disk before shutting down such that the time it takes to flush these components plus the time it takes to replay the transaction log after restart is as low as possible. The time it takes to replay the transaction log depends on the amount of data to replay, so by flushing, some components before restart the transaction log will be pruned and we reduce the replay time significantly. Refer to Proton maintenance jobs.

resource-limits

Contained in proton. Specifies resource limits used by proton to reject write operations when a limit is reached. Use this to implement a feed block to avoid saturating content nodes. Elements:

NameRequiredValueDefaultDescription
disk optional float
[0, 1]
writefilter.disklimit Fraction of total space on the disk partition used before put and update operations are rejected
memory optional float
[0, 1]
writefilter.memorylimit Fraction of physical memory that can be resident memory in anonymous mapping by proton before put and update operations are rejected
Example:
<proton>
  <resource-limits>
    <disk>0.90</disk>
    <memory>0.95</memory>

Contained in content, optional. Declares search configuration for this content cluster. Optional sub-elements are query-timeout, visibility-delay and coverage.

query-timeout

Contained in search. Specifies the query timeout in seconds for queries against the search interface on the content nodes. The default is 0.5 (500ms), the max is 600.0. For query timeout also see the request parameter timeout.

Note: You will not be able to override the configured value using the request parameter timeout.

visibility-delay

Contained in search. Default 0, max 1 seconds. This setting is normally not used anymore, in Vespa versions pre 7.300 (approx) it also controlled batch writes when feeding

This setting controls the TTL caching for parent-child imported fields as well as a query cache.

coverage

Contained in search. Declares search coverage configuration for this content cluster. Optional sub-elements are minimum, min-wait-after-coverage-factorand max-wait-after-coverage-factor.

minimum

Contained in coverage. Declares the minimum search coverage required before returning the results of a query. This number is in the range [0, 1], with 0 being no coverage and 1 being full coverage.

The default is 1; unless configured otherwise a query will not return until all search nodes have responded within the specified timeout.

min-wait-after-coverage-factor

Contained in coverage. Declares the minimum time for a query to wait for full coverage once the declared minimum has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage.

The default is 0; unless configured otherwise a query will return as soon as the minimum coverage has been reached, and the remaining search nodes appear to be lagging.

max-wait-after-coverage-factor

Contained in coverage. Declares the maximum time for a query to wait for full coverage once the declared minimum has been reached. This number is a factor that is multiplied with the time remaining at the time of reaching minimum coverage.

The default is 1; unless configured otherwise a query is allowed to wait its full timeout for full coverage even after reaching the minimum.

dispatch DEPRECATED

Since Vespa-7.109.10, this element has no effect - details.

num-dispatch-groups (in dispatch) DEPRECATED

group (in dispatch) DEPRECATED

node (in dispatch) DEPRECATED

tuning

Contained in content, optional. Optional tuning parameters are: bucket-splitting, min-node-ratio-per-group, cluster-controller, dispatch, distribution, maintenance, merges, persistence-threads and visitors.

bucket-splitting

Contained in tuning. The bucket is the fundamental unit of distribution and management in a content cluster. Buckets are auto-split, no need to configure for most applications. Streaming search latency is linear with bucket size. Attributes:

NameRequiredValueDefaultDescription
max-documents optional integer 1024 Maximum number of documents per content bucket. Buckets are split in two if they have more documents than this. Keep this value below 16K.
max-size optional integer 32MiB Maximum size (in bytes) of a bucket. This is the sum of the serialized size of all documents kept in the bucket. Buckets are split in two if they have a larger size than this. Keep this value below 100MiB.
minimum-bits optional integer Override the ideal distribution bit count configured for this cluster. Prefer to use the distribution type setting instead if the default distribution bit count does not fit the cluster. This variable is intended for testing and to work around possible distribution bit issues. Most users should not need this option.

min-node-ratio-per-group

Contained in tuning. States a lower bound requirement on the ratio of nodes within individual groups that must be online and able to accept traffic before the entire group is automatically taken out of service. Groups are automatically brought back into service when the availability of its nodes has been restored to a level equal to or above this limit.

Elastic content clusters are often configured to use multiple groups for the sake of horizontal traffic scaling and/or data availability. The content distribution system will try to ensure a configured number of replicas is always present within a group in order to maintain data redundancy. If the number of available nodes in a group drops too far, it is possible for the remaining nodes in the group to not have sufficient capacity to take over storage and serving for the replicas they now must assume responsibility for. Such situations are likely to result in increased latencies and/or feed rejections caused by resource exhaustion. Setting this tuning parameter allows the system to instead automatically take down the remaining nodes in the group, allowing feed and query traffic to fail completely over to the remaining groups.

Valid parameter is a decimal value in the range [0, 1]. Default is 0, which means that the automatic group out-of-service functionality will not automatically take effect.

Example: assume a cluster has been configured with n groups of 4 nodes each and the following tuning config:

<tuning>
  <min-node-ratio-per-group>0.75</min-node-ratio-per-group>
</tuning>
This tuning allows for 1 node in a group to be down. If 2 or more nodes go down, all nodes in the group will be marked as down, letting the n-1 remaining groups handle all the traffic.

This configuration can be changed live as the system is running and altered limits will take effect immediately.

distribution (in tuning)

Contained in tuning. Lets you tune the distribution algorithm used in the cluster. Attributes:

  • type (optional): loose | strict | legacy. Defaults to loose.

    When the number of a nodes configured in a system changes over certain limits, the system will automatically trigger major redistributions of documents. This is to ensure that the number of buckets is appropriate for the number of nodes in the cluster. This enum value speficies how aggressive the system should be in triggering such distribution changes.

    The default of loose strikes a balance between rarely altering the distribution of the cluster and keeping the skew in document distribution low. It is recommended that you use the default mode unless you have empirically observed that it causes too much skew in load or document distribution.

    Note that specifying minimum-bits under bucket-splitting overrides this setting and effectively "locks" the distribution in place.

maintenance

Contained in tuning. Controls the running time of the bucket maintenance process. Bucket maintenance verifies bucket content for corruption. Most users should not need to tweak this. Attributes:

  • start (required): Time string in HH:MM form, e.g. 02:00 Start of daily maintenance window.
  • stop (required): Time string in HH:MM form, e.g. 05:00 End of daily maintenance window.
  • high (required): Week day name string, e.g. monday Day of week for starting full file verification cycle (more costly than partial file verification)

merges

Contained in tuning. Defines throttling parameters for bucket merge operations. Attributes:

  • max-per-node (optional): Maximum number of parallel active bucket merge operations.
  • max-queue-size (optional): Maximum size of the merge bucket queue, before reporting BUSY back to the distributors.

persistence-threads

Contained in tuning. Defines the number of persistence threads per partition on each content node. A content node executes bucket operations against the persistence engine synchronously in each of these threads. By default 8 threads are used. Override with the count attribute.

visitors

Contained in tuning. Tuning parameters for visitor operations. Might contain max-concurrent. Attributes:

  • thread-count (optional): The maximum number of threads in which to execute visitor operations. A higher number of threads may increase performance, but may use more memory.
  • max-queue-size (optional): Maximum size of the pending visitor queue, before reporting BUSY back to the distributors.

max-concurrent

Contained in visitors. Defines how many visitors can be active concurrently on each storage node. The number allowed depends on priority - lower priority visitors should not block higher priority visitors completely. To implement this, specify a fixed and a variable number. The maximum active is calculated by adjusting the variable component using the priority, and adding the fixed component. Attributes:

NameRequiredValueDefaultDescription
fixed optional number 16 The fixed component of the maximum active count
variable optional number 64 The variable component of the maximum active count

dispatch

Contained in tuning. Tune the query dispatch behavior - child elements:

NameRequiredValueDefaultDescription
max-hits-per-partition optional Integer No capping: Return all

Maximum number of hits to return from a content node. By default, a query returns the requested number of hits + offset from every content node to the container. The container orders the hits globally according to the query, then discards all hits beyond the number requested.

In a system with a large fan-out, this consumes network bandwidth and the container nodes easily network saturated. Containers will also sort and discard more hits than optimal.

When there are sufficiently many search nodes, assuming an even distribution of the hits, it suffices to only return a fraction of the request number of hits from each node. Note that changing this number will have global ordering impact. See top-k-probability below for improving performance with less hits.

top-k-probability optional Double 0.9999

Probability that the top K hits will be the globally best. Based on this probability, the dispatcher will fetch enough hits from each node to achieve this. The only way to guarantee a probability of 1.0 is to fetch K hits from each partition. However, by reducing the probability from 1.0 to 0.99999, one can significantly reduce number of hits fetched and hence save both bandwidth and latency. The number of hits to fetch from each partion is computed as:

$${q}={\frac{k}{n}}+{qT}({p},{30})×{\sqrt{ {k}×{\frac{1}{n}}×({1}-{\frac{1}{n}}) }}$$

where qT is a Student's t-distribution. With n=10 partitions, k=200 hits and p=0.99999, only 45 hits per partition is needed, as opposed to 200 when p=1.0.

Use this option to reduce network and container cpu/memory in clusters with many nodes per group - see Vespa Serving Scaling Guide.

dispatch-policy optional round-robin / adaptive adaptive Configure policy for choosing which group shall receive the next query request. However multi-phase requests that either requires or benefits from hitting the same group in all phases are always hashed. Relevant only for grouped distribution:
round-robin round-robins between the groups, putting uniform load on the groups.
adaptive measures latency, preferring lower latency groups, useful for heterogeneous groups and handles soft-failing/slow nodes in a group better than round-robin
min-group-coverage optional A float percentage 100 With grouped distribution: The percentage of nodes in a group which must be up for this group to be used in queries.
min-active-docs-coverage optional A float percentage 97 With grouped distribution: The percentage of active documents a needs to have compared to average of other groups in order to be active for serving queries. Because of measurement timing differences, it is not advisable to tune this above 99 percent.

cluster-controller

Contained in tuning. Tuning parameters for the cluster controller managing this cluster - child elements:

NameRequiredValueDefaultDescription
init-progress-time optional If the initialization progress count have not been altered for this amount of seconds, the node is assumed to have deadlocked and is set down. Note that initialization may actually be prioritized lower now, so setting a low value here might cause false positives. Though if it is set down for wrong reason, when it will finish initialization and then be set up again.
transition-time optional storage_transition_time distributor_transition_time The transition time states how long (in milliseconds) a node will be in maintenance mode during what looks like a controlled restart. Keeping a node in maintenance mode during a restart allows a restart without the cluster trying to create new copies of all the data immediately. If the node has not started initializing or got back up within the transition time, the node is set down, in which case, new full bucket copies will be created. Note separate defaults for distributor and storage (i.e. search) nodes.
max-premature-crashes optional max_premature_crashes The maximum number of crashes allowed before a content node is permanently set down by the cluster controller. If the node has a stable up or down state for more than the stable-state-period, the crash count is reset. However, resetting the count will not reenable the node again if it has been disabled - restart the cluster controller to reset.
stable-state-period optional stable_state_time_period If a content node's state doesn't change for this many seconds, it's state is considered stable, clearing the premature crash count.
min-distributor-up-ratio optional min_distributor_up_ratio The minimum ratio of distributors that are required to be up for the cluster state to be up.
min-storage-up-ratio optional min_storage_up_ratio The minimum ratio of content nodes that are required to be up for the cluster state to be up.