wantedDocumentCount is useful to let the operation run longer to find documents,
to avoid an empty result.
This operation is a scan through the corpus,
and it is normal to get empty result and the continuation token.
Look up the document with id id:mynamespace:music::love-id-here-to-stay:
$ curl -X POST -H "Content-Type:application/json" --data '
"album": "A Head Full of Dreams",
Updates to nonexistent documents are supported using
An empty document is created on the content nodes, before the update is applied.
This simplifies client code in the case of multiple writers. Example:
create can be used in combination with a condition.
If the document does not exist, the condition will be ignored
and a new document with the update applied is automatically created.
Otherwise, the condition must match for the update to take place.
If all existing replicas of a document are missing
when an update with "create": true is executed, a new document will always be created.
This happens even if a condition has been given.
If the existing replicas become available later,
their version of the document will be overwritten by the newest update since it has a higher timestamp.
See document expiry
for auto-created documents - it is possible to create documents that does not match the selection criterion.
Note that visit with selection is a linear scan over all the music documents
in the request examples in the table above.
Each complete visit thus requires the selection expression to be evaluated for all documents.
Running concurrent visits with selections that match disjoint subsets of the document corpus
is therefore a poor way of increasing throughput,
as work is duplicated across each such visit.
Fortunately, the API offers other options for increasing throughput:
Split the corpus into any number of smaller slices,
each to be visited by a separate, independent series of HTTP requests.
This is by far the most effective setting to change,
as it allows visiting through all HTTP containers simultaneously,
and from any number of clients—either of which is
typically the bottleneck for visits through /document/v1.
A good value for this setting is at least a handful per container.
Increase backend concurrency
so each visit HTTP response is promptly filled with documents.
When using this together with slicing (above),
take care to also stream the HTTP responses (below),
to avoid buffering too much data in the container layer.
When a high number of slices is specified, this setting may have no effect.
Stream the HTTP responses.
This lets you receive data earlier, and more of it per request, reducing HTTP overhead.
It also minimizes memory usage due to buffering in the container,
allowing higher concurrency per container.
It is recommended to always use this, but the default is not to, due to backwards compatibility.
To iterate over documents, use visiting — sample output:
Note the continuation token — use this in the next request for more data.
Below is a sample script dumping all data using jq for JSON parsing.
It splits the corpus in 8 slices by default;
using a number of slices at least four times the number of container nodes is recommended for high throughput.
Timeout can be set lower for benchmarking.
(Each request has a maximum timeout of 60s to ensure progress is saved at regular intervals)
Query results can have results like:
Query result IDs are not the same as Document IDs.
Use a separate field for the document ID, if needed.
Delete all documents in music schema, with security credentials: