Use the /document/v1/ API to read, write, update and delete documents.
Refer to the document/v1 API reference for API details. Reads and writes has an overview of alternative tools and APIs as well as the flow through the Vespa components when accessing documents. See getting started for how to work with the /document/v1/ API.
Examples:
GET |
|
||||
---|---|---|---|---|---|
POST |
Post data in the document JSON format. $ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": "Coldplay", "album": "A Head Full of Dreams", "year": 2015 } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams |
||||
PUT |
$ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Warmplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams |
||||
DELETE |
Delete a document by ID: $ curl -X DELETE http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreamsDelete all documents in the music schema:
$ curl -X DELETE \ "http://localhost:8080/document/v1/mynamespace/music/docid?selection=true&cluster=my_cluster" |
A test-and-set condition can be added to Put, Remove and Update operations. Example:
$ curl -X PUT -H "Content-Type:application/json" --data ' { "condition": "music.artist==\"Warmplay\"", "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams
If the condition is not met, a 412 Precondition Failed is returned:
Also see the condition reference.
Updates to nonexistent documents are supported using create. This is often called an upsert — insert a document if it does not already exist, or update it if it exists. An empty document is created on the content nodes, before the update is applied. This simplifies client code in the case of multiple writers. Example:
$ curl -X PUT -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true
Conditional updates and puts can be combined with create. This has the following semantics:
Support for conditional puts with create was added in Vespa 8.178.
$ curl -X POST -H "Content-Type:application/json" --data ' { "fields": { "artist": { "assign": "Coldplay" } } }' \ http://localhost:8080/document/v1/mynamespace/music/docid/a-head-full-of-thoughts?create=true&condition=music.title%3D%3D%27best+of%27
"create": true
is executed, a new document will always be created.
This happens even if a condition has been given.
If the existing replicas become available later,
their version of the document will be overwritten by the newest update since it has a higher timestamp.
To iterate over documents, use visiting — sample output:
Note the continuation token — use this in the next request for more data. Below is a sample script dumping all data using jq for JSON parsing. It splits the corpus in 8 slices by default; using a number of slices at least four times the number of container nodes is recommended for high throughput. Timeout can be set lower for benchmarking. (Each request has a maximum timeout of 60s to ensure progress is saved at regular intervals)
Note that visit with selection is a linear scan over all the music documents in the request examples at the start of this guide. Each complete visit thus requires the selection expression to be evaluated for all documents. Running concurrent visits with selections that match disjoint subsets of the document corpus is therefore a poor way of increasing throughput, as work is duplicated across each such visit. Fortunately, the API offers other options for increasing throughput:
Pro tip: It is easy to generate a /document/v1
request by using the Vespa CLI,
with the -v
option to output a generated /document/v1
request - example:
$ vespa document -v ext/A-Head-Full-of-Dreams.json curl -X POST -H 'Content-Type: application/json' --data-binary @ext/A-Head-Full-of-Dreams.json http://127.0.0.1:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams Success: put id:mynamespace:music::a-head-full-of-dreams
See the document JSON format for creating JSON payloads.
This is a quick guide into dumping random documents from a cluster to get started:
<content id="music" version="1.0">
.
jq
for full json):
$ curl -s 'http://localhost:8080/document/v1/?cluster=music&wantedDocumentCount=10&timeout=60s' | \
jq -r .documents[].id
id:mynamespace:music::love-is-here-to-stay id:mynamespace:music::a-head-full-of-dreams id:mynamespace:music::hardwired-to-self-destruct
wantedDocumentCount
is useful to let the operation run longer to find documents,
to avoid an empty result.
This operation is a scan through the corpus,
and it is normal to get empty result and the continuation token.
id:mynamespace:music::love-is-here-to-stay
:
$ curl -s 'http://localhost:8080/document/v1/mynamespace/music/docid/love-is-here-to-stay' | jq .
When troubleshooting documents not found using the query API,
use vespa visit to export the documents.
Then compare the id
field with other user-defined id
fields in the query.
$ vespa visit
Find more details on the components of the document ID.
Document not found responses look like:
$ curl http://127.0.0.1:8080/document/v1/mynamespace/music/docid/non-existing-doc
This might look like an empty document, use -v
for more output:
$ curl -v http://127.0.0.1:8080/document/v1/mynamespace/music/docid/non-existing-doc > GET /document/v1/mynamespace/music/docid/non-existing-doc HTTP/1.1 > Host: 127.0.0.1:8080 > User-Agent: curl/7.88.1 > Accept: */* > < HTTP/1.1 404 Not Found < Date: Fri, 26 May 2023 08:53:20 GMT < Content-Type: application/json;charset=utf-8 < Content-Length: 108
Observe the 404 Not Found.
Using the Vespa CLI is great for troubleshooting - use
-v
for verbose output, this prints an equivalent curl
command:
$ vespa document get -v id:mynamespace:music::non-existing-doc curl -X GET http://127.0.0.1:8080/document/v1/mynamespace/music/docid/non-existing-doc Error: Invalid document operation: 404 Not Found
Query results can have results like:
Query result IDs are not the same as Document IDs. Use a separate field for the document ID, if needed.
Delete all documents in music schema, with security credentials:
$ curl -X DELETE \ --cert data-plane-public-cert.pem --key data-plane-private-key.pem \ "http://localhost:8080/document/v1/mynamespace/music/docid?selection=true&cluster=my_cluster"
Do not use group or number modifiers with regular indexed mode document types. These are special cases that only work as expected for document types with mode=streaming or mode=store-only. Examples:
Get |
Get a document in a group:
$ curl http://localhost:8080/document/v1/mynamespace/music/number/23/some_key $ curl http://localhost:8080/document/v1/mynamespace/music/group/mygroupname/some_key |
---|---|
Visit |
Visit all documents for a group:
$ curl http://localhost:8080/document/v1/namespace/music/number/23/ $ curl http://localhost:8080/document/v1/namespace/music/group/mygroupname/ |