/document/v1 API reference

This is the /document/v1 API reference documentation. Use this API for synchronous Document operations to a Vespa endpoint - refer to reads and writes for other options.

The document/v1 API guide has examples and use cases.

Configuration

To enable the API, add document-api in the serving container cluster - services.xml:

<services>
  <container>
    <document-api/>

HTTP-Requests

GET

Get a document by ID or Visit a set of documents by selection.

Get Get a document:
/document/v1/<namespace>/<document-type>/docid/<document-id>
Get a document, using grouped numeric ID (see below):
/document/v1/<namespace>/<document-type>/number/<numeric-group-id>/<document-id>
Get a document, using grouped string ID (see below):
/document/v1/<namespace>/<document-type>/group/<text-group-id>/<document-id>
Visit

Visit iterates over all document or a set of documents, in chunks, using continuation as a progress token.

A Document API request can only retrieve data from one content cluster, so cluster must be specified for requests at the root /document/v1/ level. This is required even if the application has only one content cluster.

  • Visit all documents across all document types and namespaces stored in content cluster:
    /document/v1/?cluster=<clustername>
    
  • Visit all documents in namespace and document type:
    /document/v1/<namespace>/<document-type>/docid
    
  • Visit documents restricted by selection in namespace and document type:
    /document/v1/<namespace>/<document-type>/docid?selection=<selection>
    

Documents can be grouped, in order to access a subset - often used in streaming-search. Streaming-search itself does a visit to search documents. A group is defined by a numeric ID or string - see id scheme.

  • Visit all documents in a group:
    /document/v1/<namespace>/<document-type>/group/<numeric-group-id>
    
    /document/v1/<namespace>/<document-type>/number/<text-group-id>
    

Parent documents are global and in the global bucket space. By default, visit will visit non-global documents in the default bucket space.

  • Visit documents across all global document types and namespaces :
    /document/v1/?cluster=<clustername>&bucketSpace=global
    
POST Create a document. Use a condition for conditional writes.
/document/v1/<namespace>/<document-type>/docid/<document-id>
PUT Update a document. Use a condition for conditional writes.
/document/v1/<namespace>/<document-type>/docid/<document-id>
DELETE Delete a document.
/document/v1/<namespace>/<document-type>/docid/<document-id>

Request parameters

bucketSpace String Specify the bucket space to visit. Document types marked as global exist in a separate bucket space from non-global document types. When visiting a particular document type, the bucket space is automatically deduced based on the provided type name. When visiting at a root /document/v1/ level this information is not available, hence only the non-global ("default") bucket space is visited by default. Specify global to visit global documents instead. Supported values: default (for non-global documents) and global.
cluster String Name of content cluster to GET from.
concurrency Integer Sends the given number of visitors in parallel to the backend, improving throughput at the cost of resource usage. Caution: given a concurrency parameter of N, the worst case for memory used while processing the request grows linearly with N. This is because the container currently buffers all response data in memory before sending them to the client, and all sent visitors must complete before the response can be sent. Default is 1.
condition String Requires that this condition is true, otherwise a 4xx is returned. See conditional updates for details.
continuation String When visiting, a continuation token is returned if the result set is chunked. Use the token in the continuation parameter to get the next chunk of documents.
create Boolean If set to true, updates will create new document if not existing - see create if nonexistent.
fieldSet String A field set string with the set of document fields to return. Default is <visited document type>:[document], returning all fields.
route String The route for document operations. Default value is default. See routes.
selection String Select only a subset of documents when visiting - details in document selector language.
timeout String Request timeout in seconds, or with optional ks, s, ms or ┬Ás unit. Default value is 175s. Visitor timeouts will be set to 5s less than this.
tracelevel Integer Number in the range [0,9], where higher gives more details. The trace dumps which nodes and chains the document operation has touched. See routes.
wantedDocumentCount Integer

Best effort attempt to not respond to the client before wantedDocumentCount number of documents can be returned. Response may still contain fewer documents if there are not enough matching documents left to visit in the cluster, or if the visiting times out. This parameter is intended for the case when you have relatively few documents in your cluster and where each GET operation otherwise would only return a handful of documents.

Note that the maximum value of wantedDocumentCount is bounded by an implementation-specific limit to prevent excessive resource usage. If the cluster has many documents (on the order of tens of millions), there is no need to set this value.

Request body

POST and PUT have a body. A field has a value for a POST and an update operation object for PUT. Documents and operations use the document format. The document fields must match the schema:

{
    "fields": {
        "<fieldname>": "<value>",
        ...
    }
}
{
    "fields": {
        "<fieldname>": {
            "<update-operation>" : "<value>",
        }
        ...
    }
}

HTTP status codes

Non-exhaustive list of status codes:

200 OK. Note that updating a non-existing document is defined as success.
400 Bad request. Returned for undefined document types + other request errors. See 13465 for defined document types not assigned to a content cluster when using PUT. Inspect message for details.
404 Not found.
412 condition is not met. Inspect message for details.
504 Gateway timeout.

Response format

Successful responses always include the path used in the request and one of:

idDocument ID
documents[]Array of documents in a visit result
GET can include a fields object if a Document was found in a Get request
{
    "pathId": "<pathid>",
    "id":     "<id>",
    "fields": {
        ...
    }
}

A GET visit result can include an array of documents plus a continuation:

{
    "pathId":    "<pathid>",
    "documents": [
        {
            "id":     "<id>",
            "fields": {
                ...
            }
        },
        ...
    ],
    "continuation": "<continuation string>"
}

A message can be returned for failed operations:

{
    "pathId":  "<pathid>",
    "message": "<message text>"
}