Find documents using a query, delete, repeat. Pseudocode:
while True; do
query and read document ids, if empty exit
delete document ids using /document/v1
wait a sec # optional, add wait to reduce load while deleting
Like 1, but use the Vespa feed client.
Instead of deleting one-by-one, stream remove operations to the API (write a Java program for this),
or append to a file and use the binary:
Use a document selection to expire documents.
This deletes all documents not matching the expression.
It is possible to use parent documents and imported fields for expiry of a document set.
The content node will iterate over the corpus and delete documents (that are later compacted out):
It is possible to drop a schema, with all its content, by removing the mapping to the content cluster.
To understand what is happening, here is the status before the procedure:
# ls $VESPA_HOME/var/db/vespa/search/cluster.music/n0/documents
drwxr-xr-x 6 vespa vespa 4096 Oct 25 16:59 books
drwxr-xr-x 6 vespa vespa 4096 Oct 25 12:47 music
The procedure, deploying with and without the schema, is an efficient way to drop all documents.
After the procedure, it is good practice to remove validation-overrides.xml
or the schema-removal element inside, to avoid accidental data loss later.
The directory listing above is just for illustration.