Attribute Memory Usage

Attributes are field-level, in-memory data structures that enable functionality like sorting, grouping, and ranking. As attributes are stored in memory, it is important to have enough memory to avoid swapping and general unresponsiveness. Attribute structures are regularly optimized and this causes temporary resource usage - read more in Proton maintenance jobs

Data types

The memory footprint of an attribute depends on a few factors, data type being the most important:

  • Numeric (int, long, byte, and double) and Boolean (bit) types - fixed length and fix cost per document
  • String type - the footprint depends on the length of the strings and how many unique strings that needs to be stored.

Collection types like array and weighted sets increases the memory usage some, but the main factor is the average number of values per document. String attributes are typically the largest attributes, and requires most memory during initialization - use boolean/numeric types where possible.

Example

search foo {
    document bar {
        field titles type array<string> {
            indexing: summary | attribute
        }
    }
}

Refer to formulas below. Assume average 5 values per document, and average string length 10. Then usage is 5*(10 + 32) bytes per document during initialization, with 10 million documents that becomes 2100000000 bytes, or roughly 2GB of attribute data. Increase the average number of values per document to 10 (double) will also double the memory footprint during initialization (4GB). The steady state attribute footprint will be lower, but when doing the capacity plan, keep in mind the maximum footprint, which occurs during initialization. For the steady state footprint, the number of unique values is very important for string attributes.

Check the Example attribute sizing spreadsheet, with various data types and collection types for a simple search application. It also contains estimates for how many documents a 16GB RAM node can hold.

Attributes can be configured with fast-search - this impacts memory footprint:

  • Setting fast-search is not recommended unless querying the attribute without any other more restrictive terms that are indexed
  • fast-search will increase steady state memory usage for all attribute types and also add initialization overhead for numeric types
search foo {
    document bar {
        field titles type array<string> {
            indexing:  summary | attribute
            attribute: fast-search
        }
    }
}

Sizing

Attribute sizing is not an exact science, rather an approximation. The reason is that they vary in size. Both number of documents, number of values and uniqueness of the values are varying. The components of the attributes that occupy memory are listed below - concepts:

.
Abbreviation Concept Comment
D Number of documents Number of documents on the node, or rather the maximum number of local document ids allocated
V Average number of values per document Only applicable for arrays and weighted sets
U Number of unique values Only applies for strings or if fast-search is set
FW Fixed data width sizeof(T) for numerics, 1 byte for strings, 1 bit for boolean
WW Weight width Width of the weight in a weighted set, 4 bytes
EIW Enum index width Width of the index into the enum store, 4 bytes. Used by all strings and other attributes if fast-search is set
VW Variable data width strlen(s) for strings, 0 bytes for the rest
PW Posting entry width Width of a posting list entry, 4 bytes for singlevalue, 8 bytes for array and weighted sets. Only applies if fast-search is set
PIW Posting index width Width of the index into the store of posting lists; 4 bytes
MIW Multivalue index width Width of the index into the multivalue mapping; 4 bytes
ROF Resize overhead factor Default is 6/5. This is the average overhead in any dynamic vector due to resizing strategy. Resize strategy is 50% indicating that structure is 5/6 full on average.

Components

Component Formula Approx Factor Applies to
Document vector D * ((FW or EIW) or MIW) ROF FW for singlevalue numeric attributes and MIW for multivalue attributes. EIW for singlevalue string or if the attribute is singlevalue fast-search
Multivalue mapping D * V * (FW or EIW) ROF Applicable only for array or weighted sets. EIW if string or fast-search
Enum store U * ((FW + VW) + 4 + ((EIW + PIW) or EIW)) ROF Applicable for strings or if fast-search is set. (EIW + PIW) if fast-search is set, EIW otherwise.
Posting list D * V * PW ROF Applicable if fast-search is set

Variants

Type Components Formula
Numeric singlevalue plain Document vector D * FW * ROF
Numeric multivalue value plain Document vector, Multivalue mapping D * MIW * ROF + D * V * FW * ROF
Numeric singlevalue fast-search Document vector, Enum store, Posting List D * EIW * ROF + U * (FW+4+EIW+PIW) * ROF + D * PW * ROF
Numeric multivalue value fast-search Document vector, Multivalue mapping, Enum store, Posting List D * MIW * ROF + D * V * EIW * ROF + U * (FW+4+EIW+PIW) * ROF + D * V * PW * ROF
Singlevalue string plain Document vector, Enum store D * EIW * ROF + U * (FW+VW+4+EIW) * ROF
Singlevalue string fast-search Document vector, Enum store, Posting List D * EIW * ROF + U * (FW+VW+4+EIW+PIW) * ROF + D * PW * ROF
Multivalue string plain Document vector, Multivalue mapping, Enum store D * MIW * ROF + D * V * EIW * ROF + U * (FW+VW+4+EIW) * ROF
Multivalue string fast-search Document vector, Multivalue mapping, Enum store, Posting list D * MIW * ROF + D * V * EIW * ROF + U * (FW+VW+4+EIW+PIW) * ROF + D * V * PW * ROF
Boolean singlevalue Document vector D * FW * ROF

Multivalue Mapping and Enum Store

The attribute Multivalue Mapping and Enum Store has upper limits, limiting the number of values in an attribute.

resourcedefaultmetricdescription
attribute enum store writefilter.attribute.enumstorelimit content.proton.documentdb.attribute.resource_usage.enum_store For string attribute fields or attribute fields with fast-search, there is a max limit on the number of unique values that can be stored for that attribute. The limit is approx 2 billion. The component storing these values is called enum store
attribute multivalue writefilter.attribute.multivaluelimit content.proton.documentdb.attribute.resource_usage.multi_value For array or weighted set attribute fields, there is a max limit on the number of documents that can have the same number of values. The limit is 2 billion documents per node
An error is emitted when exceeding the limit - sample:
Detail resultType=FATAL_ERRORexception=
'ReturnCode(NO_SPACE, Put operation rejected for document 'id:mynamespace:mydoc::123456' of type 'mydoc':
'enumStoreLimitReached: {
  action: "add more content nodes",
  reason: "enum store address space used (0.92813) > limit (0.9)",
  enumStore: { used: 31890298144, dead: 0, limit: 34359738368},
  attributeName: "text", subdb: "ready"}')'
endpoint=vespa1:8080 ssl=false resultTimeLocally=1532685239428
A similar message is emitted for too many values for a multivalue attribute.

To fix a problem with too many values, add content nodes to distribute documents with attributes over more nodes.

Use metrics content.proton.documentdb.attribute.resource_usage.enum_store.average and content.proton.documentdb.attribute.resource_usage.multi_value.average to track usage.