/document/v1 API reference

This is the /document/v1 API reference documentation. Use this API for synchronous Document operations to a Vespa endpoint - refer to reads and writes for other options.

The document/v1 API guide has examples and use cases.

Note: Mapping from document IDs to /document/v1/ URLs is found in document IDs - also see troubleshooting.

Some examples use number and group document id modifiers. These are special cases that only work as expected for document types with mode=streaming or mode=store-only. Do not use group or number modifiers with regular indexed mode document types.

Configuration

To enable the API, add document-api in the serving container cluster - services.xml:

<services>
    <container>
        <document-api/>

HTTP requests

HTTP request	document/v1 operation	Description
GET	Get a document by ID or Visit a set of documents by selection.
	Get	Get a document: /document/v1/<namespace>/<document-type>/docid/<document-id> /document/v1/<namespace>/<document-type>/number/<numeric-group-id>/<document-id> /document/v1/<namespace>/<document-type>/group/<text-group-id>/<document-id> Optional parameters: cluster fieldSet timeout tracelevel
	Visit	Iterate over and get all documents, or a selection of documents, in chunks, using continuation tokens to track progress. Visits are a linear scan over the documents in the cluster. /document/v1/ It is possible to specify namespace and document type with the visit path: /document/v1/<namespace>/<document-type>/docid Documents can be grouped to limit accesses to a subset. A group is defined by a numeric ID or string — see id scheme. /document/v1/<namespace>/<document-type>/group/<group> /document/v1/<namespace>/<document-type>/number/<number> Mandatory parameters: cluster - Visits can only retrieve data from one content cluster, so `cluster` must be specified for requests at the root `/document/v1/` level, or when there is ambiguity. This is required even if the application has only one content cluster. Optional parameters: bucketSpace - Parent documents are global and in the `global` bucket space. By default, visit will visit non-global documents in the `default` bucket space, unless document type is indicated, and is a global document type. concurrency - Use to configure backend parallelism for each visit HTTP request. continuation fieldSet selection sliceId slices - Split visiting of the document corpus across more than one HTTP request—thus allowing the concurrent use of more HTTP containers—use the `slices` and `sliceId` parameters. stream - It's recommended enabling streamed HTTP responses, with the stream parameter, as this reduces memory consumption and reduces HTTP overhead. timeout tracelevel wantedDocumentCount fromTimestamp toTimestamp includeRemoves
POST	Put a given document, by ID, or Copy a set of documents by selection from one content cluster to another.
	Put	Write the document contained in the request body in JSON format. /document/v1/<namespace>/<document-type>/docid/<document-id> /document/v1/<namespace>/<document-type>/group/<group> /document/v1/<namespace>/<document-type>/number/<number> Optional parameters: condition - Use for conditional writes. route timeout tracelevel
	Copy	Write documents visited in source cluster to the destinationCluster in the same application. A selection is mandatory — typically the document type. Supported paths (see visit above for semantics): /document/v1/ /document/v1/<namespace>/<document-type>/docid/ /document/v1/<namespace>/<document-type>/group/<group> /document/v1/<namespace>/<document-type>/number/<number> Mandatory parameters: cluster destinationCluster selection Optional parameters: bucketSpace continuation timeChunk timeout tracelevel
PUT	Update a document with the given partial update, by ID, or Update where the given selection is true.
	Update	Update a document with the partial update contained in the request body in the document update JSON format. /document/v1/<namespace>/<document-type>/docid/<document-id> Optional parameters: condition - use for conditional writes create - use to create empty documents when updating non-existent ones. route timeout tracelevel
	Update where	Update visited documents in cluster with the partial update contained in the request body in the document update JSON format. Supported paths (see visit above for semantics): /document/v1/<namespace>/<document-type>/docid/ /document/v1/<namespace>/<document-type>/group/<group> /document/v1/<namespace>/<document-type>/number/<number> Mandatory parameters: cluster selection Optional parameters: bucketSpace - See visit, `default` or `global` bucket space continuation stream timeChunk timeout tracelevel
DELETE	Remove a document, by ID, or Remove where the given selection is true.
	Remove	Remove a document. /document/v1/<namespace>/<document-type>/docid/<document-id> Optional parameters: condition route timeout tracelevel
	Delete where	Delete visited documents from cluster. Supported paths (see visit above for semantics): /document/v1/ /document/v1/<namespace>/<document-type>/docid/ /document/v1/<namespace>/<document-type>/group/<group> /document/v1/<namespace>/<document-type>/number/<number> Mandatory parameters: cluster selection Optional parameters: bucketSpace - See visit, `default` or `global` bucket space continuation stream timeChunk timeout tracelevel

Request parameters

Parameter Type Description

bucketSpace

String

Specify the bucket space to visit. Document types marked as global exist in a separate bucket space from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root /document/v1/ level this information is not available, and the non-global ("default") bucket space is visited by default. Specify global to visit global documents instead. Supported values: default (for non-global documents) and global.

cluster

String

Name of content cluster to GET from, or visit.

concurrency

Integer

Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Default is 1. When stream=true, concurrency limits the maximum concurrency, which is otherwise unbounded, but controlled by a dynamic throttle policy.

Important: Given a concurrency parameter of N, the worst case for memory used while processing the request grows linearly with N, unless stream mode is turned on. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent.

condition

String

For test-and-set. Run a document operation conditionally — if the condition fails, a 412 Precondition Failed is returned. See example.

continuation

String

When visiting, a continuation token is returned as the "continuation" field in the JSON response, as long as more documents remain. Use this token as the continuation parameter to visit the next chunk of documents. See example.

create

Boolean

If true, updates to non-existent documents will create an empty document to update. See create if nonexistent.

destinationCluster

String

Name of content cluster to copy to, during a copy visit.

dryRun

Boolean

Used by the vespa-feed-client using --speed-test for bandwidth testing, by setting to true.

fieldSet

String

A field set string with the set of document fields to fetch from the backend. Default is the special [document] fieldset, returning all document fields. To fetch specific fields, use the name of the document type, followed by a comma-separated list of fields (for example music:artist,song to fetch two fields declared in music.sd).

route

String

The route for single document operations, and for operations generated by copy, update or deletion visits. Default value is default. See routes.

selection

String

Select only a subset of documents when visiting — details in document selector language.

sliceId

Integer

The slice number of the visit represented by this HTTP request. This number must be non-negative and less than the number of slices specified for the visit - e.g., if the number of slices is 10, sliceId is in the range [0-9].

Note: If the number of distribution bits change during a sliced visit, the results are undefined. Thankfully, this is a very rare occurrence and is only triggered when adding content nodes.

slices

Integer

Split the document corpus into this number of independent slices. This lets multiple, concurrent series of HTTP requests advance the same logical visit independently, by specifying a different sliceId for each.

stream

Boolean

Whether to stream the HTTP response, allowing data to flow as soon as documents arrive from the backend. This obsoletes the wantedDocumentCount parameter. The HTTP status code will always be 200 if the visit is successfully initiated. Default value is false.

format.tensors

String

Controls how tensors are rendered in the result.

Value	Description
`short`	Default. Render the tensor value in an object having two keys, "type" containing the value, and "cells"/"blocks"/"values" (depending on the type) containing the tensor content. Render the tensor content in the type-appropriate short form.
`long`	Render the tensor value in an object having two keys, "type" containing the value, and "cells" containing the tensor content. Render the tensor content in the general verbose form.
`short-value`	Render the tensor content directly. Render the tensor content in the type-appropriate short form.
`long-value`	Render the tensor content directly. Render the tensor content in the general verbose form.

timeChunk

String

Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit. Default value is 60.

timeout

String

Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 180s.

tracelevel

Integer

Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See routes.

wantedDocumentCount

Integer

Best effort attempt to not respond to the client before wantedDocumentCount number of documents have been visited. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each visit request would otherwise process only a handful of documents.

The maximum value of wantedDocumentCount is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value.

fromTimestamp

Integer

Filters the returned document set to only include documents that were last modified at a time point equal to or higher to the specified value, in microseconds from UTC epoch. Default value is 0 (include all documents).

toTimestamp

Integer

Filters the returned document set to only include documents that were last modified at a time point lower than the specified value, in microseconds from UTC epoch. Default value is 0 (sentinel value; include all documents). If non-zero, must be greater than, or equal to, fromTimestamp.

includeRemoves

Boolean

Include recently removed document IDs, along with the set of returned documents. By default, only documents currently present in the corpus are returned in the "documents" array of the response; when this parameter is set to "true", documents that were recently removed, and whose tombstones still exist, are also included in that array, as entries on the form { "remove": "id:ns:type::foobar" }. See here for specifics on tombstones, including their lifetime.

Request body

POST and PUT requests must include a body for single document operations; PUT must also include a body for update where visits. A field has a value for a POST and an update operation object for PUT. Documents and operations use the document JSON format. The document fields must match the schema:

{
    "fields": {
        "<fieldname>": "<value>"
    }
}

{
    "fields": {
        "<fieldname>": {
            "<update-operation>" : "<value>"
        }
    }
}

The update-operation is most often assign - see update operations for the full list. Values for id / put / update in the request body are silently dropped. The ID is generated from the request path, regardless of request body data - example:

{
    "put"   : "id:mynamespace:music::123",
    "fields": {
        "title": "Best of"
    }
}

This makes it easier to generate a feed file that can be used for both the vespa-feed-client and this API.

HTTP status codes

Non-exhaustive list of status codes:

Code	Description
200	OK. Attempts to remove or update a non-existent document also yield this status code (see 412 below).
400	Bad request. Returned for undefined document types + other request errors. See 13465 for defined document types not assigned to a content cluster when using PUT. Inspect `message` for details.
404	Not found; the document was not found. This is only used when getting documents.
412	condition is not met. Inspect `message` for details. This is also the result when a condition if specified, but the document does not exist.
429	Too many requests; the document API has too many inflight feed operations, retry later.
500	Server error; an unspecified error occurred when processing the request/response.
503	Service unavailable; the document API was unable to produce a response at this time.
504	Gateway timeout; the document API failed to respond within the given (or default 180s) timeout.
507	Insufficient storage; the content cluster is out of memory or disk space.

HTTP response headers

Header	Values	Description
X-Vespa-Ignored-Fields	true	Will be present and set to 'true' only when a put or update contains one or more fields which were ignored since they are not present in the document type. Such operations will be applied exactly as if they did not contain the field operations referencing non-existing fields. References to non-existing fields in field paths are not detected.

Response format

Responses are in JSON format, with the following fields:

Field	Description
pathId	Request URL path — always included.
message	An error message — included for all failed requests.
id	Document ID — always included for single document operations, including Get.
fields	The requested document fields — included for successful Get operations.
documents[]	Array of documents in a visit result — each document has the id and fields.
documentCount	Number of visited and selected documents. If includeRemoves is `true`, this also includes the number of returned removes (tombstones).
continuation	Token to be used to get the next chunk of the corpus - see continuation.

GET can include a fields object if a document was found in a GET request

{
    "pathId": "<pathid>",
    "id":     "<id>",
    "fields": {
    }
}

A GET visit result can include an array of documents plus a continuation:

{
    "pathId":    "<pathid>",
    "documents": [
        {
            "id":     "<id>",
            "fields": {
            }
        }
    ],
    "continuation": "<continuation string>",
    "documentCount": 123
}

A continuation indicates the client should make further requests to get more data, while lack of a continuation indicates an error occurred, and that visiting should cease, or that there are no more documents.

A message can be returned for failed operations:

{
    "pathId":  "<pathid>",
    "message": "<message text>"
}