Refer to Vespa Support for more support options.
Ranking is maybe the primary Vespa feature -
we like to think of it as scalable, online computation.
A rank profile is where the application's logic is implemented,
supporting simple types like double
and complex types like tensor
.
Supply ranking data in queries in query features (e.g. different weights per customer),
or look up in a Searcher.
Typically, a document (e.g. product) "feature vector"/"weights" will be compared to a user-specific vector (tensor).
Vespa doesn't have specific support for storing customer data as such. You can store this data as a separate document type in Vespa and look it up before passing the query, or store this customer meta-data as part of the other meta-data for the customer (i.e. login information) and pass it along the query when you send it to the backend. Find an example on how to look up data in album-recommendation-docproc.
Create a tensor in the ranking function from arrays or weighted sets using tensorFrom...
functions -
see document features.
Pass a ranking feature like query(threshold)
and use an if
statement in the ranking expression -
see retrieval and ranking. Example:
rank-profile drop-low-score { function my_score() { expression: ..... #custom first phase score } rank-score-drop-limit:0.0 first-phase { if(my_score() < query(threshold), -1, my_score()) } }
Rank expressions are not evaluated lazily. No, this would require lambda arguments. Only doubles and tensors are passed between functions. Example:
function inline foo(tensor, defaultVal) { expression: if (count(tensor) == 0, defaultValue, sum(tensor)) } function bar() { expression: foo(tensor, sum(tensor1 * tensor2)) }
Yes, this can be accomplished by configuring match-phase in the rank profile, or by adding a range query item using hitLimit to the query tree,
see capped numeric range search.
Both methods require an attribute field with fast-search. The capped range query is faster, but beware that if there are other restrictive filters in the query, one might end up with 0 hits.
The additional filters are applied as a post filtering
step over the hits from the capped range query. match-phase on the other hand, is safe to use with filters or other query terms,
and also supports diversification which the capped range query term does not support.
The returned relevance for a hit can become "-Infinity" instead of a double. This can happen in two cases:
NaN
(Not a Number). For example, log(0)
would produce
-Infinity. One can use isNan to guard against this.each(output(summary()))
that are outside of what Vespa computed and caches on a heap. This is controlled by the keep-rank-count.To hard-code documents to positions in the result set, see the pin results example.
There is no hard limit, see field size.
No enforced limit, except resource usage (memory). See field size.
E.g. a product is offered in a list of stores with a quantity per store. Use multivalue fields (array of struct) or parent child. Which one to chose depends on use case, see discussion in the latter link.
E.g. price and quantity available per store may often change vs the actual product attributes. Vespa supports partial updates of documents. Also, the parent/child feature is implemented to support use-cases where child elements are updated frequently, while a more limited set of parent elements are updated less frequently.
See the Vespa Consistency Model. Vespa is not transactional in the traditional sense, it doesn't have strict ACID guarantees. Vespa is designed for high performance use-cases with eventual consistency as an acceptable (and to some extent configurable) trade-off.
Wildcard fields are not supported in vespa.
Workaround would be to use maps to store the wildcard fields.
Map needs to be defined with indexing: attribute
and hence will be stored in memory.
Refer to map.
Implement a document processor for this.
Set a selection criterion on the document
element in services.xml
.
The criterion selects documents to keep.
I.e. to purge documents "older than two weeks", the expression should be "newer than two weeks".
Read more about document expiry.
Changing redundancy is a live and safe change
(assuming there is headroom on disk / memory - e.g. from 2 to 3 is 50% more).
The time to migrate will be quite similar to what it took to feed initially -
a bit hard to say generally, and depends on IO and index settings, like if building an HNSW index.
To monitor progress, take a look at the
multinode
sample application for the clustercontroller status page - this shows buckets pending, live.
Finally, use the .idealstate.merge_bucket.pending
metric to track progress -
when 0, there are no more data syncing operations - see
monitor distance to ideal state.
Nodes will work as normal during data sync, and query coverage will be the same.
It does not, namespace is a mechanism to split the document space into parts that can be used for document selection - see documentation. The namespace is not indexed and cannot be searched using the query api, but can be used by visiting.
There are multiple things that can cause this, see visiting troubleshooting.
Run a query like vespa query "select * from sources * where true"
and see the totalCount
field.
Alternatively, use metrics or vespa visit
- see examples.
Facets is called grouping in Vespa. Groups can be multi-level.
Add filters to the query using YQL using boolean, numeric and text matching. Query terms can be annotated as filters, which means that they are not highlighted when bolding results.
One way is to describe items using tensors and query for the nearest neighbor - using full precision or approximate (ANN) - the latter is used when the set is too large for an exact calculation. Apply filters to the query to limit the neighbor candidate set. Using dot products or weak and are alternatives.
Vespa does not have a stop-word concept inherently. See the sample app for how to use filter terms.
Trying to request more than 400 hits in a query, getting this error:
{'code': 3, 'summary': 'Illegal query', 'message': '401 hits requested, configured limit: 400.'}
.
maxHits
in a query profile,
e.g. <field name="maxHits">500</field>
in search/query-profiles/default.xml
(create as needed).
The query timeout can be increased,
but it will still be costly and likely impact other queries -
large limit more so than a large offset.
It can be made cheaper by using a smaller document summary,
and avoiding fields on disk if possible.See the UserProfileSearcher
for how to create a new query to fetch data -
this creates a new Query, sets a new root and parameters - then fill
s the Hits.
See the sub-query question above, in addition add something like:
Yes, using the in query operator. Example:
select * from data where user_id in (10, 20, 30)
The best article on the subject is multi-lookup set filtering. Refer to the in operator example on how to use it programmatically in a Java Searcher.
Use the in query operator. Example:
select * from data where category in ('cat1', 'cat2', 'cat3')
See multi-lookup set filtering above for more details.
Count all documents using a query like select * from doc where true -
this counts all documents from the "doc" source.
Using select * from doc where true limit 0
will return the count and no hits,
alternatively add hits=0.
Pass ranking.profile=unranked
to make the query less expensive to run.
If an estimate is good enough, use hitcountestimate=true.
Yes - a deployment warning with This may lead to recall and ranking issues is emitted when fields with conflicting tokenization are put in the same fieldset. This is because a given query item searching one fieldset is tokenized just once, so there's no right choice of tokenization in this case. If you have user input that you want to apply to multiple fields with different tokenization, include the userInput multiple times in the query:
select * from sources * where ({defaultIndex: 'fieldsetOrField1'}userInput(@query)) or ({defaultIndex: 'fieldsetOrField2'}userInput(@query))
More details on stack overflow.
Find query timeout details in the Query API Guide and the Query API Reference.
Backslash is used to escape special characters in YQL. For example, to query with a literal backslash, which is useful in regexpes, you need to escape it with another backslash: \. Unescaped backslashes in YQL will lead to "token recognition error at: ''".
In addition, Vespa CLI unescapes double backslashes to single (while single backslashes are left alone), so if you query with Vespa CLI you need to escape with another backslash: \\. The same applies to strings in Java.
Also note that both log messages and JSON results escape backslashes, so any \ becomes \.
E.g. two select queries with slightly different filtering condition and have a limit operator for each of the subquery. This makes it impossible to do via OR conditions to select both collection of documents - something equivalent to:
SELECT 1 AS x
UNION ALL
SELECT 2 AS y;
This isn’t possible, need to run 2 queries. Alternatively, split a single incoming query into two running in parallel in a Searcher - example:
FutureResult futureResult = new AsyncExecution(settings).search(query);
FutureResult otherFutureResult = new AsyncExecution(settings).search(otherQuery);
No, there is no index or attribute data structure that allows efficient searching for documents where an array field has a certain number of elements or items.
The visiting API using document selections supports it, with a linear scan over all documents. If the field is an attribute one can query using grouping to identify Nan Values, see count and list fields with NaN.
See the random.match rank feature - example:
rank-profile random {
first-phase {
expression: random.match
}
}
Run queries, seeding the random generator:
$ vespa query 'select * from music where true' \
ranking=random \
rankproperty.random.match.seed=2
See result diversity for strategies on how to create result sets from different sources.
If you want to search for the most dissimilar items,
you can with angular distance multiply your clip_query_embedding
by the scalar -1.
Then you are searching for the points that are closest to the point
which is the farthest away from your clip_query_embedding
.
Also see a pyvespa example.
The best option is to use --verbose
option, like vespa feed --verbose myfile.jsonl
-
see documentation.
A common problem is a mismatch in schema names and document IDs - a schema like:
schema article {
document article {
...
}
}
will have a document feed like:
{"put": "id:mynamespace:article::1234", "fields": { ... }}
Note that the namespace is not mentioned in the schema, and the schema name is the same as the document name.
This configuration is a combination of content and container cluster configuration, see indexing and feed troubleshooting.
This is often a problem if using document expiry, as documents already expired will not be persisted, they are silently dropped and ignored. Feeding stale test data with old timestamps in combination with document-expiry can cause this behavior.
Using too many HTTP clients can generate a 429 response code. The Vespa sample apps use vespa feed which uses HTTP/2 for high throughput - it is better to stream the feed files through this client.
Vespa does not have a Kafka connector. Refer to third-party connectors like kafka-connect-vespa.
E.g. integrating NER, word sense disambiguation, specific intent detection. Vespa supports these things well:
E.g. instead of using terms or n-grams as the unit, we might use terms with specific word senses - e.g. bark (dog bark) vs. bark (tree bark), or BCG (company) vs. BCG (vaccine name). Creating a new index format means changing the core. However, for the examples above, one just need control over the tokens which are indexed (and queried). That is easily done in some Java code. The simplest way to do this is to plug in a custom tokenizer. That gets called from the query parser and bundled linguistics processing Searchers as well as the Document Processor creating the annotations that are consumed by the indexing operation. Since all that is Searchers and Docprocs which you can replace and/or add custom components before and after, you can also take full control over these things without modifying the platform itself.
It provides the building blocks but not an out-of-the-box solution. We can write a Searcher to detect query-side entities and rewrite the query, and a DocProc if we want to handle them in some special way on the indexing side.
You can write a document processor for text extraction, Vespa does not provide it out of the box.
Imported fields from parent documents are defined as attributes,
and have limited text match modes (i.e. indexing: index
cannot be used).
Details.
If you have added vectors to your documents and queries, and see that the rank feature closeness(field, yourEmbeddingField) produces 1.0 for all documents, you are likely using distance-metric: innerproduct/prenormalized-angular, but your vectors are not normalized, and the solution is normally to switch to distance-metric: angular or use distance-metric: dotproduct (available from Vespa 8.170.18 ).
With non-normalized vectors, you often get negative distances, and those are capped to 0, leading to closeness 1.0. Some embedding models, such as models from sbert.net, claim to output normalized vectors but might not.
Plugins have to run in the JVM - jython might be an alternative, however Vespa Team has no experience with it. Vespa does not have a language like painless - it is more flexible to write application logic in a JVM-supported language, using Searchers and Document Processors.
A Searcher intercepts a query and/or result. To get a number of documents by id in a Searcher or other component like a Document processor, you can have an instance of com.yahoo.documentapi.DocumentAccess injected and use that to get documents by id instead of the HTTP API.
Vespa uses Java 17 - it will support 20 some time in the future.
Use System.out.println
to write text to the vespa.log.
Vespa has a near real-time indexing core with typically sub-second latencies from document ingestion to being indexed. This depends on the use-case, available resources and how the system is tuned. Some more examples and thoughts can be found in the scaling guide.
Vespa does not have a concept of "batch ingestion" as it contradicts many of the core features that are the strengths of Vespa, including serving elasticity and sub-second indexing latency. That said, we have numerous use-cases in production that do high throughput updates to large parts of the (sometimes entire) document set. In cases where feed throughput is more important than indexing latency, you can tune this to meet your requirements. Some of this is detailed in the feed sizing guide.
Yes. The content node is implemented in C++ and not memory constrained other than what the operating system does.
If the replicas are in sync the request is only sent to the primary content node. Otherwise, it's sent to several nodes, depending on replica metadata. Example: if a bucket has 3 replicas A, B, C and A & B both have metadata state X and C has metadata state Y, a request will be sent to A and C (but not B since it has the same state as A and would therefore not return a potentially different document).
Attribute (with or without fast-search
) is always in memory,
but does not support tokenized matching.
It is for structured data.
Index (where there’s no such thing as fast-search since it is always fast)
is in memory to the extent there is available memory and supports tokenized matching.
It is for unstructured text.
It is possible to guarantee that fields that are defined with index
have both the dictionary and the postings in memory by changing from mmap
to populate
,
see index > io > search.
Make sure that the content nodes run on nodes with plenty of memory available,
during index switch the memory footprint will 2x.
Familiarity with Linux tools like pmap
can help diagnose what is mapped and if it’s resident or not.
Fields that are defined with attribute
are in-memory,
fields that have both index
and attribute
have separate data structures,
queries will use the default mapped on disk data structures that supports text
matching,
while grouping, summary and ranking can access the field from the attribute
store.
A Vespa query is executed in two phases as described in sizing search,
and summary requests can touch disk (and also uses mmap
by default).
Due to their potential size there is no populate option here,
but one can define dedicated document summary
containing only fields that are defined with attribute
.
The practical performance guide
can be a good starting point as well to understand Vespa query execution,
difference between index
and attribute
and summary fetching performance.
Deleting documents, by using the document API or garbage collection will increase the capacity on the content nodes. However, this is not necessarily observable in system metrics - this depends on many factors, like what kind of memory that is released, when flush jobs are run and document schema.
In short, Vespa is not designed to release memory once used. It is designed for sustained high throughput, low latency, keeping maximum memory used under control using features like feed block.
When deleting documents, one can observe a slight increase in memory. A deleted document is represented using a tombstone, that will later be removed, see removed-db-prune-age. When running garbage collection, the summary store is scanned using mmap and both VIRT and page cache memory usage increases.
Read up on attributes to understand more of how such fields are stored and managed. Paged attributes trades off memory usage vs. query latency for a lower max memory usage.
Yes, deployment is using this web service API, which allows you to create an edit session from the currently deployed package, make modifications, and deploy (prepare+activate) it: deploy-rest-api-v2.html. However, this is only useful in cases where you want to avoid transferring data to the config server unnecessarily. When you resend everything, the config server will notice that you did not actually change e.g. the node configs and avoid unnecessary noop changes.
Elasticity is a core Vespa strength - easily add and remove nodes with minimal (if any) serving impact. The exact time needed depends on how much data will need to be migrated in the background for the system to converge to ideal data distribution.
You will need to load balance incoming requests between the nodes running the stateless Java container cluster(s). This can typically be done using a simple network load balancer available in most cloud services. This is included when using Vespa Cloud, with an HTTPS endpoint that is already load balanced - both locally within the region and globally across regions.
Search sizing is the intro to this. Topology matters, and this is much used in the high-volume Vespa applications to optimise latency vs. cost.
With Vespa Cloud, we do automated background upgrades daily without noticeable serving impact. If you host Vespa yourself, you can do this, but need to implement the orchestration logic necessary to handle this. The high level procedure is found in live-upgrade.
Vespa Cloud has integrated support - query a global endpoint. Writes will have to go to each zone. There is no auto-sync between zones.
Building indexes offline requires the partition layout to be known in the offline system, which is in conflict with elasticity and auto-recovery (where nodes can come and go without service impact). It is also at odds with realtime writes. For these reasons, it is not recommended, and not supported.
Use visiting to dump all or a subset of the documents. See data-management-and-backup for more information.
Failure response will be given in case the document is not written on some replica nodes.
Yes, it will be available, eventually. Also try Multinode testing and observability.
Yes just add a deleted
attribute, add fast-search on it
and create a searcher which adds an andnot deleted
item to queries.
You can set a transition-time in services.xml to configure the cluster controller how long a node is to be kept in maintenance mode before being automatically marked down.
Grouped distribution is used to reduce search latency. Content is distributed to a configured set of groups, such that the entire document collection is contained in each group. Setting the redundancy and searchable-copies equal to the number of groups ensures that data can be queried from all groups.
Refer to #17898 for a discussion of options.
Use /state/v1/version to find Vespa version.
See rollback for options.
If deployment fails with error message "Deployment failed, code: 413 ("Payload Too Large.")" you might need to increase the config server's JVM heap size. The config server has a default JVM heap size of 2 Gb. When deploying an app with e.g. large models this might not be enough, try increasing the heap to e.g. 4 Gb when executing 'docker run …' by adding an environment variable to the command line:
docker run --env VESPA_CONFIGSERVER_JVMARGS=-Xmx4g <other options> <image>
When deploying an application package, with some kind of error, the endpoints might fail, like:
$ vespa deploy --wait 300
Uploading application package ... done
Success: Deployed target/application.zip
Waiting up to 5m0s for query service to become available ...
Error: service 'query' is unavailable: services have not converged
Another example:
[INFO] [03:33:48] Failed to get 100 consecutive OKs from endpoint ...
There are many ways this can fail, the first step is to check the Vespa Container:
$ docker exec vespa vespa-logfmt -l error
[2022-10-21 10:55:09.744] ERROR container
Container.com.yahoo.container.jdisc.ConfiguredApplication
Reconfiguration failed, your application package must be fixed, unless this is a JNI reload issue:
Could not create a component with id 'ai.vespa.example.album.MetalSearcher'.
Tried to load class directly, since no bundle was found for spec: album-recommendation-java.
If a bundle with the same name is installed,
there is a either a version mismatch or the installed bundle's version contains a qualifier string.
...
Bundle plugin troubleshooting is a good resource to analyze Vespa container startup / bundle load problems.
Using an M1 MacBook Pro / AArch64 makes the Docker run fail:
WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)
and no specific platform was requested
Make sure you are running a recent version of the Docker image, do docker pull vespaengine/vespa
.
Make sure all Config servers are started, and are able to establish ZooKeeper quorum (if more than one) - see the multinode sample application. Validate that the container has enough memory.
The Config Server cluster with 3 nodes fails to start. The ZooKeeper cluster the Config Servers use waits for hosts on the network, the hosts wait for ZooKeeper in a catch 22 - see sampleapp troubleshooting.
Use vespa-logfmt to dump logs.
If Vespa is running in a local container (named "vespa"), run docker exec vespa vespa-logfmt
.
See encoding troubleshooting for how to handle and remove control characters from the document feed.