/document/v1 API reference

This is the /document/v1 API reference documentation. Use this API for synchronous Document operations to a Vespa endpoint - refer to reads and writes for other options.

The document/v1 API guide has examples and use cases.

Configuration

To enable the API, add document-api in the serving container cluster - services.xml:

<services>
  <container>
    <document-api/>

HTTP requests

GET

Get a document by ID or Visit a set of documents by selection.

Get Get a single document:
/document/v1/<namespace>/<document-type>/docid/<document-id>
Get a document, using grouped numeric ID (see below):
/document/v1/<namespace>/<document-type>/number/<numeric-group-id>/<document-id>
Get a document, using grouped string ID (see below):
/document/v1/<namespace>/<document-type>/group/<text-group-id>/<document-id>
Supported parameters: cluster, fieldSet, timeout, tracelevel
Visit

Iterate over and get all documents, or a selection of documents, in chunks, using continuation tokens to track progress.

Supported parameters: bucketSpace, cluster, concurrency, continuation, fieldSet, selection, timeout, tracelevel, wantedDocumentCount

Visits should only retrieve data from one content cluster, so cluster must be specified for requests at the root /document/v1/ level, or when there is ambiguity. This is required even if the application has only one content cluster.

  • Visit all documents across all document types and namespaces stored in the given content cluster:
    /document/v1/?cluster=<clustername>
    
  • Visit all documents with given namespace and document type:
    /document/v1/<namespace>/<document-type>/docid
    
  • Visit documents restricted by selection in namespace and document type:
    /document/v1/<namespace>/<document-type>/docid?selection=<selection>
    

Documents can be grouped to limit accesses to a subset — often used in streaming-search. Streaming-search itself does a visit to search documents. A group is defined by a numeric ID or string — see id scheme.

  • Visit all documents in a group:
    /document/v1/<namespace>/<document-type>/group/<numeric-group-id>
    
    /document/v1/<namespace>/<document-type>/number/<text-group-id>
    

Parent documents are global and in the global bucket space. By default, visit will visit non-global documents in the default bucket space, unless document type is indicated, and is a global document type.

  • Visit documents across all global document types and namespaces :
    /document/v1/?cluster=<clustername>&bucketSpace=global
    
POST

Put a given document, by ID, or Copy a set of documents by selection from one content cluster to another.

Put Write the document contained in the request body in JSON format. Use a condition for conditional writes.
/document/v1/<namespace>/<document-type>/docid/<document-id>
Supported parameters: condition, route, timeout, tracelevel
Copy

Write documents visited in source cluster to the destinationCluster in the same application. A selection is mandatory — typically the document type. Supported paths (see visit above for semantics):

/document/v1/
/document/v1/<namespace>/<document-type>/docid/
/document/v1/<namespace>/<document-type>/group/<group>
/document/v1/<namespace>/<document-type>/number/<number>
Supported parameters: bucketSpace, cluster, continuation, destinationCluster, selection, timeChunk, timeout, tracelevel

PUT

Update a document with the given partial update, by ID, or Update where the given selection is true.

Update Update a document with the partial update contained in the request body in the document JSON format. Use a condition for conditional writes, and create for creating empty documents when updating non-existent ones.
/document/v1/<namespace>/<document-type>/docid/<document-id>
Supported parameters: condition, create, route, timeout, tracelevel
Update where

Update visited documents in cluster with the partial update contained in the request body in the document JSON format. A selection is mandatory, and a document type must be specified. Supported paths (see visit above for semantics):

/document/v1/<namespace>/<document-type>/docid/
/document/v1/<namespace>/<document-type>/group/<group>
/document/v1/<namespace>/<document-type>/number/<number>
Supported parameters: bucketSpace, cluster, continuation, selection, timeChunk, timeout, tracelevel

DELETE

Remove a document, by ID, or Remove where the given selection is true.

Remove Remove a document.
/document/v1/<namespace>/<document-type>/docid/<document-id>
Supported parameters: condition, route, timeout, tracelevel
Delete where

Delete visited documents from cluster. A selection is mandatory. Supported paths (see visit above for semantics):

/document/v1/
/document/v1/<namespace>/<document-type>/docid/
/document/v1/<namespace>/<document-type>/group/<group>
/document/v1/<namespace>/<document-type>/number/<number>
Supported parameters: bucketSpace, cluster, continuation, selection, timeChunk, timeout, tracelevel

Request parameters

bucketSpace String Specify the bucket space to visit. Document types marked as global exist in a separate bucket space from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root /document/v1/ level this information is not available, and the non-global ("default") bucket space is visited by default. Specify global to visit global documents instead. Supported values: default (for non-global documents) and global.
cluster String Name of content cluster to GET from, or visit.
concurrency Integer Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Caution: given a concurrency parameter of N, the worst case for memory used while processing the request grows linearly with N. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent. Default is 1.
condition String Allows performing a document operation conditionally — when this fails, a 4xx is returned. See conditional updates for details.
continuation String When visiting, a continuation token is returned as the "continuation" field in the JSON response, as long as more documents remain. Use this token as the continuation parameter to visit the next chunk of documents.
create Boolean If true, updates to non-existent documents will create an empty document to update. See create if nonexistent.
destinationCluster String Name of content cluster to copy to, during a copy visit.
fieldSet String A field set string with the set of document fields to fetch from the backend. Default is <document type>:[document], returning all document fields, when document type is part of the path, or [all] when visiting at the root level.
route String The route for single document operations, and for operations generated by copy, update or deletion visits. Default value is default. See routes.
selection String Select only a subset of documents when visiting — details in document selector language.
timeChunk String Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit. This value is bounded by the timeout, minus 5s. Default value is 60.
timeout String Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 175s. Visitor timeouts will be set to 5s less than this for get visits.
tracelevel Integer Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See routes.
wantedDocumentCount Integer

Best effort attempt to not respond to the client before wantedDocumentCount number of documents have been visited. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each visit request would otherwise process only a handful of documents.

Note that the maximum value of wantedDocumentCount is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value.

Request body

POST and PUT requests must include a body for single document operations, and PUT for update visits. A field has a value for a POST and an update operation object for PUT. Documents and operations use the document JSON format. The document fields must match the schema:

{
    "fields": {
        "<fieldname>": "<value>",
        ...
    }
}
{
    "fields": {
        "<fieldname>": {
            "<update-operation>" : "<value>",
        }
        ...
    }
}

HTTP status codes

Non-exhaustive list of status codes:

200 OK. Note that updating a non-existing document is defined as success.
400 Bad request. Returned for undefined document types + other request errors. See 13465 for defined document types not assigned to a content cluster when using PUT. Inspect message for details.
404 Not found.
412 condition is not met. Inspect message for details.
504 Gateway timeout.
507 Insufficient storage.

Response format

Responses are in JSON format, with the following fields:

pathRequest URL path — always included.
messageAn error message — included for all failed requests.
idDocument ID — always included for single document operations, including Get.
fieldsThe requested document fields — included for successful Get operations.
documents[]Array of documents in a visit result — each document has the id and fields.
documentCountNumber of visited and selected documents.

GET can include a fields object if a document was found in a GET request

{
    "pathId": "<pathid>",
    "id":     "<id>",
    "fields": {
        ...
    }
}

A GET visit result can include an array of documents plus a continuation:

{
    "pathId":    "<pathid>",
    "documents": [
        {
            "id":     "<id>",
            "fields": {
                ...
            }
        },
        ...
    ],
    "continuation": "<continuation string>",
    "documentCount": 123
}

A message can be returned for failed operations:

{
    "pathId":  "<pathid>",
    "message": "<message text>"
}