The text-image use case is an example of a text-to-image search application. Taking a textual query, such as "two people bicycling", it will return images containing two people on bikes. This application is built using CLIP (Contrastive Language-Image Pre-Training) which enables "zero-shot prediction". This means that the system can return sensible results for images it hasn't seen during training, allowing it to process and index any image. In this use case, we use the Flickr8k dataset, which was not used during training of the CLIP model.
To start the application, please follow the instructions in the README.
This sample application can be used in two different ways. The first is by using a Python-based search app, which is suitable for exploration and analysis. The other is a stand-alone Vespa application, which is more suitable for production.
After deploying the application, you can ask questions like this:
http://localhost:8080/search/?input=two+people+bicylcing
PyVespa is the official Python API for Vespa. This can be used to easily create, modify, deploy and interact with Vespa instances. The main goal of the library is to allow for faster prototyping and to facilitate Machine Learning experiments for Vespa applications.
Approximate nearest neighbors using an HNSW index
Vespa supports approximate nearest neighbors (ANN) by using Hierarchical Navigable Small World (HNSW) indexes. This allows for efficient similarity search in large collections. Vespa implements a modified HNSW index that allows for index building during feeding, so one does not have to build the index offline. It also supports additional query filters directly, thus avoiding the sub-optimal filtering after the ANN search.
The Vespa application uses a Transformer model to create an embedding representation of the input. This is done in a custom searcher to transform the text to the representation before sending it to the backend for the ANN search.
In Vespa, you can set up custom document or search processors to perform extra processing during document feeding or a query. This application uses this feature to create embedding representations by first tokenizing the input string using a Byte-Pair Encoding (BPE) tokenizer.
When creating custom components in Vespa, for instance, document processors,
searchers, or handlers, one can use custom configuration to inject config
parameters into the components. This involves defining a config definition
(a .def
file), which creates a config class. You can instantiate this
class with data in services.xml
, and the resulting object is dependency
injected to the component during construction. This application uses custom
config to set up the token vocabulary used in tokenization.