Find documents using a query, delete, repeat. Pseudocode:
while True; do
query and read document ids, if empty exit
delete document ids using /document/v1
wait a sec # optional, add wait to reduce load while deleting
Use a document selection to expire documents.
This deletes all documents not matching the expression.
It is possible to use parent documents and imported fields for expiry of a document set.
The content node will iterate over the corpus and delete documents (that are later compacted out):
It is possible to drop a schema, with all its content, by removing the mapping to the content cluster.
To understand what is happening, here is the status before the procedure:
# ls $VESPA_HOME/var/db/vespa/search/cluster.music/n0/documents
drwxr-xr-x 6 vespa vespa 4096 Oct 25 16:59 books
drwxr-xr-x 6 vespa vespa 4096 Oct 25 12:47 music
The procedure, deploying with and without the schema, is an efficient way to drop all documents.
After the procedure, it is good practice to remove validation-overrides.xml
or the schema-removal element inside, to avoid accidental data loss later.
The directory listing above is just for illustration.
Example
This is an end-to-end example on how to track number of documents, and delete a subset using a
selection string.
Feed sample documents
Feed a batch of documents, e.g. using the vector-search
sample application: