• [+] expand all

Embedding

A common technique in modern big data serving applications is to map the subject data - say, text or images - to points in an abstract vector space and then do computation in that vector space. For example, retrieve similar data by finding nearby points in the vector space, or using the vectors as input to a neural net. This mapping is usually referred to as embedding.

Vespa provides built-in support for embedding, which is documented here.

Embedders

Vespa provides a Java interface for defining components which can provide embeddings of text: com.yahoo.language.process.Embedder.

To define a custom embedder in an application and make it usable by Vespa (see below), implement this interface and add it as a component to services.xml:

<container version="1.0">
    <component id="myEmbedder"
               class="com.example.MyEmbedder"
               bundle="the name in <artifactId> in pom.xml"/>
</container>

Provided embedders

Vespa provides some embedders as part of the platform.

BertBase embedder

An embedder using WordPiece to produce tokens which is then input to a supplied ONNX model on the form expected by a BERT base model. See export_model_from_hf.py, for how to export a sentence-transformer model to ONNX format. See also troubleshooting model signature.

This provides embeddings directly suitable for retrieval and ranking in Vespa, and makes it possible to implement semantic search with no need for custom components or client-side embedding when used with the syntax for invoking the embedder in queries and during indexing described below.

To set up the BertBase embedder, add it to services.xml:

<component id="myBert"
           class="ai.vespa.embedding.BertBaseEmbedder"
           bundle="model-integration">
    <config name="embedding.bert-base-embedder">
        <transformerModel path="models/myBertModel.onnx"/>
        <tokenizerVocab path="models/myTokenizerVocabulary.txt"/>
    </config>
</component>

See the options available for configuring the BertBase embedder in the full configuration definition. Notice that BertBase embedder uses mean pooling strategy by default.

The model files used must be supplied by the application (or specified by id when on Vespa Cloud), for example from HuggingFace. Refer to adding files to the configuration for the full syntax for specifying model files by url, path or model-id.

SentencePiece embedder

A native Java implementation of SentencePiece. SentencePiece breaks text into chunks independent of spaces, which is robust to misspellings and works with CJK languages. It is also very fast.

This is suitable to use in conjunction with custom components which processes the resulting encoding further to produce semantically meaningful vectors.

To use the SentencePiece embedder, add it to services.xml:

<component id="mySentencePiece"
           class="com.yahoo.language.sentencepiece.SentencePieceEmbedder"
           bundle="linguistics-components">
    <config name="language.sentencepiece.sentence-piece">
        <model>
            <item>
                <language>unknown</language>
                <path>model/en.wiki.bpe.vs10000.model</path>
            </item>
        </model>
    </config>
</component>

See the options available for configuring SentencePiece in the full configuration definition.

WordPiece embedder

A native Java implementation of WordPiece, which is commonly used with BERT models.

This is suitable to use in conjunction with custom components which processes the resulting encoding further to produce semantically meaningful vectors.

To use the WordPiece embedder, add it to services.xml:

<component id="myWordPiece"
           class="com.yahoo.language.wordpiece.WordPieceEmbedder"
           bundle="linguistics-components">
    <config name="language.wordpiece.word-piece">
        <model>
            <item>
                <language>unknown</language>
                <path>models/bert-base-uncased-vocab.txt</path>
            </item>
        </model>
    </config>
</component>

See the options available for configuring WordPiece in the full configuration definition.

WordPiece is suitable to use in conjunction with custom components which processes the resulting encoding further to produce semantically meaningful vectors.

Embedding a query text

Where you would otherwise supply a tensor representing the vector point in a query, you can with an embedder configured instead supply any text enclosed in embed(), e.g:

ranking.features.query(myEmbedding)=embed(myEmbedderId, "Hello%20world")

If you have only configured a single embedder, you can skip the embedder id argument and optionally also the quotes. Both single and double quotes are permitted.

Embedding a document field

Use the indexing language to convert a text field into an embedding by using the embed function, for example:

schema doc {

    document doc {

        field myText type string {
            indexing: index | summary
        }

    }

    field embeddingOfMyText type tensor(x[5]) {
        indexing: input myText | embed myEmbedderId | attribute | index | summary
        index: hnsw
    }

}

If you only have configured a single embedder you can skip the embedder id argument.

Using an embedder from Java

When writing custom Java components (such as Searchers or Document processors), use embedders you have configured by having them injected in the constructor, just as any other component:

class MyComponent(ComponentRegistry<Embedder> embedders) {
    // embedders contains all the embedders configured in your services.xml
}

Examples

Try the simple-semantic-search sample application. A complete example application using multiple embedders can be found in in this system test.

Troubleshooting model signature

When loading models for the embedder, the model must have correct inputs and output signatures. Here, minilm-l6-v2.onnx is in current working directory:

$ docker run -v `pwd`:/w \
  --entrypoint /opt/vespa/bin/vespa-analyze-onnx-model \
  vespaengine/vespa \
  /w/minilm-l6-v2.onnx

...
model meta-data:
  input[0]: 'input_ids' long[batch][sequence]
  input[1]: 'attention_mask' long[batch][sequence]
  input[2]: 'token_type_ids' long[batch][sequence]
  output[0]: 'output_0' float[batch][sequence][384]
  output[1]: 'output_1' float[batch][384]
...
test setup:
  input[0]: tensor(d0[1],d1[1]) -> long[1][1]
  input[1]: tensor(d0[1],d1[1]) -> long[1][1]
  input[2]: tensor(d0[1],d1[1]) -> long[1][1]
  output[0]: float[1][1][384] -> tensor<float>(d0[1],d1[1],d2[384])
  output[1]: float[1][384] -> tensor<float>(d0[1],d1[384])

If loading models with other signatures, the Vespa Container node will not start (check vespa.log in the container running Vespa):

[2022-10-18 18:18:31.761] WARNING container        Container.com.yahoo.container.di.Container
  Failed to set up first component graph due to error when constructing one of the components
  exception=com.yahoo.container.di.componentgraph.core.ComponentNode$ComponentConstructorException:
  Error constructing 'bert' of type 'ai.vespa.embedding.BertBaseEmbedder': null
  Caused by: java.lang.IllegalArgumentException: Model does not contain required input: 'input_ids'. Model contains: input
  at ai.vespa.embedding.BertBaseEmbedder.validateName(BertBaseEmbedder.java:79)
  at ai.vespa.embedding.BertBaseEmbedder.validateModel(BertBaseEmbedder.java:68)

When this happens, a deploy looks like:

$ vespa deploy --wait 300
Uploading application package ... done

Success: Deployed .

Waiting up to 5m0s for query service to become available ...
Error: service 'query' is unavailable: services have not converged

Use vespa-analyze-onnx-model like in the example above to analyze the signature.