This is the /document/v1 API reference documentation.
Use this API for synchronous Document operations to a Vespa endpoint -
refer to reads and writes for other options.
Iterate over and get all documents, or a selection of documents,
in chunks, using continuation tokens to track progress.
Visits is implemented as a linear scan over the documents in the cluster.
/document/v1/<namespace>/<document-type>/docid
Documents can be grouped to limit accesses to a subset.
A group is defined by a numeric ID or string — see id scheme.
cluster -
Visits can only retrieve data from one content cluster,
so clustermust be specified
for requests at the root /document/v1/ level, or when there is ambiguity.
This is required even if the application has only one content cluster.
Optional parameters:
bucketSpace -
Parent documents are global
and in the globalbucket space.
By default, visit will visit non-global documents
in the default bucket space, unless document type is indicated,
and is a global document type.
concurrency -
Use to configure backend parallelism for each visit HTTP request.
slices -
Split visiting of the document corpus across more than one HTTP
request—thus allowing the concurrent use of more HTTP containers—use the
slices and sliceId parameters.
stream -
It's recommended enabling streamed HTTP responses,
with the stream parameter,
as this reduces memory consumption and reduces HTTP overhead.
Write documents visited in source cluster to the
destinationCluster in the same application.
A selection is mandatory — typically the document type.
Supported paths (see visit above for semantics):
Update visited documents in cluster with the partial update
contained in the request body in the
document JSON format.
Supported paths (see visit above for semantics):
Specify the bucket space to visit.
Document types marked as global exist in a separate bucket space from non-global document types.
When visiting a particular document type,
the bucket space is automatically deduced based on the provided type name.
When visiting at a root /document/v1/ level this information is not available,
and the non-global ("default") bucket space is visited by default.
Specify global to visit global documents instead.
Supported values: default (for non-global documents) and global.
Sends the given number of visitors in parallel to the backend,
improving throughput at the cost of resource usage.
Default is 1.
When stream=true, concurrency limits the maximum concurrency,
which is otherwise unbounded, but controlled by a dynamic throttle policy.
Important:
Given a concurrency parameter of N,
the worst case for memory used while processing the request grows linearly with N,
unless stream mode is turned on.
This is because the container currently buffers all response data in memory before sending them to the client,
and all sent visitors must complete before the response can be sent.
condition
String
For test-and-set.
Run a document operation conditionally — if the condition fails,
a 412 Precondition Failed is returned.
See example.
continuation
String
When visiting, a continuation token is returned as the "continuation" field
in the JSON response, as long as more documents remain.
Use this token as the continuation parameter to visit the next chunk of documents.
create
Boolean
If true, updates to non-existent documents will create an empty document to update.
See create if nonexistent.
A field set string
with the set of document fields to fetch from the backend.
Default is <document type>:[document], returning all document fields,
when document type is part of the path, or [all] when visiting at the root level.
route
String
The route for single document operations, and for operations generated
by copy, update or
deletion visits. Default value is default.
See routes.
The slice number of the visit represented by this HTTP request. This number must be non-negative
and less than the number of slices specified for the visit.
Note:
If the number of distribution bits change during a sliced visit,
the results are undefined.
Thankfully, this is a very rare occurrence, and is only triggered when adding content nodes.
slices
Integer
Split the document corpus into this number of independent slices. This lets multiple, concurrent series of HTTP
requests advance the same logical visit independently, by specifying a different sliceId for each.
stream
Boolean
Whether to stream the HTTP response, allowing data to flow as soon as documents arrive from the backend.
This obsoletes the wantedDocumentCount parameter.
The HTTP status code will always be 200 if the visit is successfully initiated. Default value is false.
timeChunk
String
Target time to spend on one chunk of a copy, update or remove visit; with optional ks, s, ms or µs unit.
Default value is 60.
timeout
String
Request timeout in seconds, or with optional ks, s, ms or µs unit. Default value is 180s.
tracelevel
Integer
Number in the range [0,9], where higher gives more details.
The trace dumps which nodes and chains the document operation has touched.
See routes.
wantedDocumentCount
Integer
Best effort attempt to not respond to the client before wantedDocumentCount
number of documents have been visited.
Response may still contain fewer documents if there are not enough matching documents left
to visit in the cluster, or if the visiting times out.
This parameter is intended for the case when you have relatively few documents in your cluster
and where each visit request would otherwise process only a handful of documents.
The maximum value of wantedDocumentCount is bounded
by an implementation-specific limit to prevent excessive resource usage.
If the cluster has many documents (on the order of tens of millions),
there is no need to set this value.
Request body
POST and PUT requests must include a body for single document operations, and PUT for update visits.
A field has a value for a POST and an update operation object for PUT.
Documents and operations use the document JSON format.
The document fields must match the schema:
Values for id / put / update in the request body are silently dropped.
The ID is generated from the request path, regardless of request body data - example:
This makes it easier to generate a feed file that can be used for both the
vespa-feed-client and this API.
HTTP status codes
Non-exhaustive list of status codes:
Code
Description
200
OK. Note that updating a non-existing document is defined as success.
400
Bad request. Returned for undefined document types + other request errors.
See 13465
for defined document types not assigned to a content cluster when using PUT.
Inspect message for details.
404
Not found; the document was not found.
412
condition is not met.
Inspect message for details. This is also the result when
a condition if specified, and the document is not found.
500
Server error; an unspecified error occurred when processing the request/response.
502
Bad gateway; the document API gave an error response.
504
Gateway timeout; the document API failed to respond within the given (or default 180s) timeout.
507
Insufficient storage; the content cluster is out of memory or disk space.
Response format
Responses are in JSON format, with the following fields:
Field
Description
path
Request URL path — always included.
message
An error message — included for all failed requests.
id
Document ID — always included for single document operations, including Get.
fields
The requested document fields — included for successful Get operations.
documents[]
Array of documents in a visit result —
each document has the id and fields.
documentCount
Number of visited and selected documents.
GET can include a fields object if a document was found in a GET request
{"pathId":"<pathid>","id":"<id>","fields":{}}
A GET visit result can include an array of documents
plus a continuation:
A continuation indicates the client should make further requests to get more data, while lack of a
continuation indicates an error occurred, and that visiting should cease, or that there are no more documents.