Vespa documents are created according to the Document JSON Format or constructed programmatically. Options for writing Documents to Vespa:
The CRUD operations are the four basic functions of persistent storage:
|Put||Put is used to write a document to Vespa. A document is a set of name-value pairs referred to as fields. The fields available for a given document is given by the document type, provided by the application's search definition - see field types. A document is overwritten if a document with the same document ID exists and no test-and-set condition is given. By specifying a test and set condition, one can perform a conditional put that only executes if the condition matches the already existing document.|
|Remove||Remove removes a document from Vespa. Later requests to access the document will not find it - read more about remove-entries. If the document to be removed is not found, this is returned in the reply. This is not considered a failure. Like the put and update operations, a test and set condition can be specified for remove operations, only removing the document when the condition (document selection) matches the document.|
|Update||Update is also referred to as partial update as it updates parts of a document. If the document to update does not exist, the update returns a reply stating that no document was found. A test and set condition can be specified for updates. Example usage is updating only documents with given timestamps.|
|Get||Get returns the newest document instance. The get reply includes the last-updated timestamp of the document, stating when the document was last written.|
The Document API uses the document identifier to implement ordering. Documents with the same identifier will have the same serialize id, and a Document API client will ensure that only one operation with a given serialize id is pending at the same time. This ensures that if a client sends multiple operations for the same document, they will be processed in a defined order.
Note: If sending two put operations to the same document, and the first operation fails, the second operation that was enqueued is sent. If the client chooses to just resend the failed request, the order of operations has been switched.
If different clients have operations towards the same document pending, the order of operations is undefined.
Write operations like put, update and remove, will have a timestamp assigned to them going through the distributor. This timestamp is guaranteed to be unique within the bucket where it is stored. This timestamp is used by the content layer to decide which operation is newest. These timestamps may be used when visiting, to only process/retrieve documents within a given timeframe. To guarantee unique timestamps, they are given in microseconds, and the microsecond part may be generated or altered to avoid conflicts with other documents.
The internal timestamp is often referred to as the last modified time. This is the time of the last write operation going through the distributor. If documents are migrated from cluster to cluster, the target cluster will have new timestamps for their entries, and when reprocessing documents within a cluster, documents will have new timestamps even if not modified.
Feed operations fail when a cluster is at full capacity. The following limits will block feeding:
|disk||writefilter.disklimit||content.proton.resource_usage.disk||Configure disk limit|
|memory||writefilter.memorylimit||content.proton.resource_usage.memory||Configure memory limit|
|attribute enum store||writefilter.attribute.enumstorelimit||content.proton.documentdb.attribute.resource_usage.enum_store||For string attribute fields or attribute fields with fast-search, there is a max limit on the size of the unique values stored for that attribute. The component storing these values is called enum store. The limit is 32GB|
|attribute multi-value||writefilter.attribute.multivaluelimit||content.proton.documentdb.attribute.resource_usage.multi_value||For array or weighted set attribute fields, there is a max limit on the number of documents that can have the same number of values. The limit is 128M (2^27) documents. To remedy, either change the attribute field to use huge, or add/change nodes|