/document/v1 API guide

This is the /document/v1 API guide. Refer to the document/v1 API reference.

Request examples

GET
Get
$ curl http://hostname:8080/document/v1/my_namespace/my_document-type/docid/1
Get a document in a group:
$ curl http://hostname:8080/document/v1/namespace/music/number/23/some_key
$ curl http://hostname:8080/document/v1/namespace/music/group/groupname/some_key
Visit visit all documents:
$ curl http://hostname:8080/document/v1/namespace/music/docid
Visit all documents using continuation:
$ curl http://hostname:8080/document/v1/namespace/music/docid?continuation=AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA
Visit using a selection:
$ curl http://hostname:8080/document/v1/namespace/music/docid?selection=music.genre=='blues'
Visit all documents for a group:
$ curl http://hostname:8080/document/v1/namespace/music/number/23/
$ curl http://hostname:8080/document/v1/namespace/music/group/groupname/
Visit documents across all non-global document types and namespaces stored in content cluster mycluster:
$ curl http://hostname:8080/document/v1/?cluster=mycluster
Visit documents across all global document types and namespaces stored in content cluster mycluster:
$ curl http://hostname:8080/document/v1/?cluster=mycluster&bucketSpace=global
POST Post data in the document JSON format.
$ curl -X POST -H "Content-Type:application/json" --data-binary @document-1.json http://hostname:8080/document/v1/namespace/music/docid/1
{
    "fields": {
        "songs": "Knockin on Heaven's Door; Mr. Tambourine Man",
        "title": "Best of Bob Dylan",
        "url": "http://music.yahoo.com/bobdylan/BestOf"
    }
}
PUT
$ curl -X PUT -H "Content-Type:application/json" --data-binary @update.json http://hostname:8080/document/v1/namespace/music/docid/1
{
    "fields": {
        "title": {
            "assign": "New title"
        }
    }
}
DELETE
$ curl -X DELETE http://hostname:8080/document/v1/namespace/music/docid/1

ID examples

  • Uniform distribution: id:mynamespace:music::mydocid-123
  • Data access is grouped, e.g. personal data (each user has a numeric user id): id:mynamespace:music:n=12345:mydocid-123
  • Using a string identifier to group data: id:mynamespace:music:g=mymusicsite.com:mydocid-123

Data dump

To iterate over documents, use visiting — sample output:

{
    "pathId": "/document/v1/namespace/doc/docid",
    "documents": [
        {
            "id": "id:namespace:doc::id-1",
            "fields": {
                "title": "Document title 1",
                ...
            }
        },
        ...
    ],
    "continuation": "AAAAEAAAAAAAAAM3AAAAAAAAAzYAAAAAAAEAAAAAAAFAAAAAAABswAAAAAAAAAAA"
}
Note the continuation token — use this in the next request for more data. Sample script dumping all data using jq for JSON parsing:

#!/bin/bash

set -x

ENDPOINT="https://endpoint.vespa.oath.cloud"
NAMESPACE=open
DOCTYPE=doc
CLUSTER=documentation
CERT=data-plane-public-cert.pem
KEY=data-plane-private-key.pem

continuation=""
idx=0

while
  ((idx+=1))
  echo "$continuation"
  printf -v out "%05g" $idx
  filename=${NAMESPACE}-${DOCTYPE}-${out}.data.gz
  echo "Fetching data..."
  token=$(curl -s --cert ${CERT} --key ${KEY} \
          "${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?wantedDocumentCount=20&concurrency=4&cluster=${CLUSTER}&${continuation}" \
          | tee >(gzip > ${filename}) | jq -re .continuation)
do
  continuation="continuation=${token}"
done

Using fieldsets

When visiting across all document types, some internal document fields (e.g. Geo fields) set by Vespa may be returned as part of the results. To avoid this, limit visiting to just one document type using selection and explicitly filter these internal fields away using fieldSet:

curl http://hostname:8080/document/v1/?cluster=mycluster&selection=mydoctype&fieldSet=mydoctype:%5Bdocument%5D