Attribute Memory Usage

Attributes are field-level, in-memory data structures that enable functionality like sorting, grouping, and ranking. As attributes are stored in memory, it is important to have enough memory to avoid swapping and general unresponsiveness. Attribute structures are regularly optimized and this causes temporary resource usage - read more in Proton maintenance jobs

Data types

The memory footprint of an attribute depends on a few factors, data type being the most important:

  • Numeric (int, long, byte, and double) and Boolean (bit) types - fixed length and fix cost per document
  • String type - the footprint depends on the length of the strings and how many unique strings that needs to be stored.

Collection types like array and weighted sets increases the memory usage some, but the main factor is the average number of values per document. String attributes are typically the largest attributes, and requires most memory during initialization - use boolean/numeric types where possible.

Example

search foo {
    document bar {
        field titles type array<string> {
            indexing: summary | attribute
        }
    }
}

Refer to formulas below. Assume average 5 values per document, and average string length 10. Then usage is 5*(10 + 32) bytes per document during initialization, with 10 million documents that becomes 2100000000 bytes, or roughly 2GB of attribute data. Increase the average number of values per document to 10 (double) will also double the memory footprint during initialization (4GB). The steady state attribute footprint will be lower, but when doing the capacity plan, keep in mind the maximum footprint, which occurs during initialization. For the steady state footprint, the number of unique values is very important for string attributes.

Check the Example attribute sizing spreadsheet, with various data types and collection types for a simple search application. It also contains estimates for how many documents a 16GB RAM node can hold.

Attributes can be configured with fast-search - this impacts memory footprint:

  • Setting fast-search is not recommended unless querying the attribute without any other more restrictive terms that are indexed
  • fast-search will increase steady state memory usage for all attribute types and also add initialization overhead for numeric types
search foo {
    document bar {
        field titles type array<string> {
            indexing:  summary | attribute
            attribute: fast-search
        }
    }
}

Sizing

Attribute sizing is not an exact science, rather an approximation. The reason is that they vary in size. Both number of documents, number of values and uniqueness of the values are varying. The components of the attributes that occupy memory are listed below - concepts:

Abbreviation Concept Comment
D Number of documents Number of documents on the node, or rather the maximum number of documentids allocated
V Average number of values per document Only applicable for arrays and weighted sets
U Number of unique values Only applies if fast-search is set
FW Fixed data width sizeof(T) for numerics, 1 byte for strings, 1 bit for boolean
WW Weight width Width of the weight in a weighted set, 4 bytes
EW Enum index width Width of the enum index, 4 bytes. Used by all strings and other attributes if fast-search is set
VW Variable data width strlen(s) for strings, 0 bytes for the rest
PW Posting width Width of a postinglist entry for attribute. fast-search -> 4. array/weighted set -> (4+4)
IW Index width Width of index - 4 bytes
ROF Resize overhead factor Default is 6/5. This is the average overhead in any dynamic vector due to resizing strategy. Resize strategy is 50% indicating that structure is 5/6 full on average.

Components

Component Formula Approx Factor Applies to
Document vector D * ((FW or EW) or IW) ROF FW for singlevalue numeric attributes and IW for multivalue attributes. EW for single value string or the attribute is single value fast-search
Multivalue mapping D * V * (FW or EW) ROF Applicable only for array or weighted sets. EW if string or fast-search
Enum store U * (FW + VW) + 4 ROF Applicable for strings or if fast-search is set
Posting list D * V * PW ROF Applicable for strings or if fast-search is set

Variants

Type Components Formula
Numeric singlevalue plain Document vector D * FW * ROF
Numeric multivalue value plain Document vector, Multivalue mapping D * IW * ROF + D * V * FW * ROF
Numeric singlevalue fast-search Document vector, Enum store, Posting List D * EW * ROF + D * PW * ROF + U * (FW+4) * ROF
Numeric multivalue value fast-search Document vector, Multivalue mapping, Enum store, Posting List D * IW * ROF + D * V * EW * ROF + U * (FW+4) * ROF + D * V * PW * ROF
Singlevalue string fast-search Document vector, Enum store, Posting List D * EW * ROF + U * (FW+VW+4) * ROF + D * PW * ROF
Singlevalue string plain Document vector, Enum store D * EW * ROF + U * (FW+VW+4) * ROF
Multivalue string plain Document vector, Multivalue mapping, Enum store D * IW * ROF + D * V * EW * ROF + U * (FW+VW+4) * ROF
Multivalue string fast-search Document vector, Multivalue mapping, Enum store, Posting list D * IW * ROF + D * V * EW * ROF + U * (FW+VW+4) * ROF + D * V * PW * ROF
Boolean singlevalue Document vector D * FW * ROF

Enum Store

The Attribute Enum Store has an upper limit, limiting the number of different values in an attribute.

resourcedefaultmetricdescription
attribute enum store writefilter.attribute.enumstorelimit content.proton.documentdb.attribute.resource_usage.enum_store For string attribute fields or attribute fields with fast-search, there is a 32GB max limit on the size of the unique values stored for that attribute. The component storing these values is called enum store
attribute multivalue writefilter.attribute.multivaluelimit content.proton.documentdb.attribute.resource_usage.multi_value For array or weighted set attribute fields, there is a max limit on the number of documents that can have the same number of values. The limit is 2 billion documents per node
An error is emitted when exceeding the limit - sample:
Detail resultType=FATAL_ERRORexception=
'ReturnCode(NO_SPACE, Put operation rejected for document 'id:mynamespace:mydoc::123456' of type 'mydoc':
'enumStoreLimitReached: {
  action: "add more content nodes",
  reason: "enum store address space used (0.92813) > limit (0.9)",
  enumStore: { used: 31890298144, dead: 0, limit: 34359738368},
  attributeName: "text", subdb: "ready"}')'
endpoint=vespa1:8080 ssl=false resultTimeLocally=1532685239428
A similar message is emitted for too many values for a multivalue attribute.

To fix a problem with too many values, add content nodes to distribute documents with attributes over more nodes.

Use metrics content.proton.documentdb.attribute.resource_usage.enum_store.average and content.proton.documentdb.attribute.resource_usage.multi_value.average to track usage.