Vespa Cloud This content is applicable to Vespa Cloud deployments.

Using machine-learned models from Vespa Cloud

Vespa Cloud provides a set of machine-learned models that you can use in your applications. These models will always be available on Vespa Cloud and are frozen models. You can also bring your own embedding model, by deploying it in the Vespa application package.

You specify to use a model provided by Vespa Cloud by setting the model-id attribute where you specify a model config. For example, when configuring the Huggingface embedder provided by Vespa, you can write:

<container id="default" version="1.0">
    <component id="e5" type="hugging-face-embedder">
        <transformer-model model-id="e5-small-v2"/>
    </component>
    ...
</container>

With this, your application will have support for text embedding inference for both queries and documents. Nodes that have been provisioned with GPU acceleration, will automatically use GPU for embedding inference.

Vespa Cloud Embedding Models

Models on Vespa model hub are selected open-source embedding models with great performance. See the Vespa blog on embedding tradeoffs for details on performance and quality. These embedding models are useful for retrieval (semantic search), re-ranking, clustering, classification, and more.

Huggingface Embedder

These models are available for the Huggingface Embedder type="hugging-face-embedder". All these models support mapping from string or array<string> to tensor representations.

The output tensor cell-precision can be <float>, <bfloat16>, or <int8>.

Most models also support binarization, which requires using distance-metric hamming instead of angular. The E5 and multilingual-e5 models do not support binarization. See the nanobeir hybrid evaluation leaderboard for details on quality impact.

alibaba-gte-modernbert

GTE (General Text Embedding) model trained from ModernBERT-base.

Model idalibaba-gte-modernbert
Tensor definitiontensor<float>(x[768])
Matryoshka dimensionsx[768], x[256]
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/Alibaba-NLP/gte-modernbert-base @ 3ab3f8c
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="alibaba-gte-modernbert"/>
        <max-tokens>8192</max-tokens>
        <pooling-strategy>cls</pooling-strategy>
    </component>

alibaba-gte-modernbert-int8

INT8 quantized variant of alibaba-gte-modernbert. Offers faster inference with minimal accuracy loss.

Model idalibaba-gte-modernbert-int8
Tensor definitiontensor<float>(x[768])
Matryoshka dimensionsx[768], x[256]
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/Alibaba-NLP/gte-modernbert-base @ e7f32e3
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="alibaba-gte-modernbert-int8"/>
        <max-tokens>8192</max-tokens>
        <pooling-strategy>cls</pooling-strategy>
    </component>
    

e5-base-v2

The base model of the E5 family.

Model-ide5-base-v2
Tensor definitiontensor<float>(x[768]) or tensor<float>(p{},x[768])
distance-metricangular
LicenseMIT
Sourcehttps://huggingface.co/intfloat/e5-base-v2
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="e5-base-v2"/>
        <max-tokens>512</max-tokens>
        <prepend>
            <query>query: </query>
            <document>passage: </document>
        </prepend>
    </component>

e5-large-v2

The largest model of the E5 family, at time of writing, this is the best performing embedding model on the MTEB benchmark.

Model-ide5-large-v2
Tensor definitiontensor<float>(x[1024]) or tensor<float>(p{},x[1024])
distance-metricangular
LicenseMIT
Sourcehttps://huggingface.co/intfloat/e5-large-v2
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="e5-large-v2"/>
        <max-tokens>512</max-tokens>
        <prepend>
            <query>query: </query>
            <document>passage: </document>
        </prepend>
    </component>

e5-small-v2

The smallest and most cost-efficient model from the E5 family.

Model-ide5-small-v2
Tensor definitiontensor<float>(x[384]) or tensor<float>(p{},x[384])
distance-metricangular
LicenseMIT
Sourcehttps://huggingface.co/intfloat/e5-small-v2
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="e5-small-v2"/>
        <max-tokens>512</max-tokens>
        <prepend>
            <query>query: </query>
            <document>passage: </document>
        </prepend>
    </component>

lightonai-modernbert-large

Trained from ModernBERT-large on the Nomic Embed datasets, bringing the new advances of ModernBERT to embeddings.

Model idlightonai-modernbert-large
Tensor definitiontensor<float>(x[1024])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/lightonai/modernbert-embed-large @ b3a781f
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="lightonai-modernbert-large"/>
        <max-tokens>8192</max-tokens>
        <prepend>
            <query>search_query: </query>
            <document>search_document: </document>
        </prepend>
    </component>

lightonai-modernbert-large-int8

INT8 quantized variant of lightonai-modernbert-large. Offers faster inference with minimal accuracy loss.

Model idlightonai-modernbert-large-int8
Tensor definitiontensor<float>(x[1024])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/lightonai/modernbert-embed-large @ 95a19bf
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="lightonai-modernbert-large-int8"/>
        <max-tokens>8192</max-tokens>
        <prepend>
            <query>search_query: </query>
            <document>search_document: </document>
        </prepend>
    </component>

multilingual-e5-base

The multilingual model of the E5 family. Use this model for multilingual queries and documents.

Model-idmultilingual-e5-base
Tensor definitiontensor<float>(x[768]) or tensor<float>(p{},x[768])
distance-metricangular
LicenseMIT
Sourcehttps://huggingface.co/intfloat/multilingual-e5-base
LanguageMultilingual
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="multilingual-e5-base"/>
        <max-tokens>512</max-tokens>
        <prepend>
            <query>query: </query>
            <document>passage: </document>
        </prepend>
    </component>

nomic-ai-modernbert

Trained from ModernBERT-base on the Nomic Embed datasets, bringing the new advances of ModernBERT to embeddings.

Model idnomic-ai-modernbert
Tensor definitiontensor<float>(x[768])
Matryoshka dimensionsx[768], x[256]
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/nomic-ai/modernbert-embed-base @ 92168cb
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="nomic-ai-modernbert"/>
        <transformer-output>token_embeddings</transformer-output>
        <max-tokens>8192</max-tokens>
        <prepend>
            <query>search_query: </query>
            <document>search_document: </document>
        </prepend>
    </component>

nomic-ai-modernbert-int8

INT8 quantized variant of nomic-ai-modernbert. Offers faster inference with minimal accuracy loss.

Model idnomic-ai-modernbert-int8
Tensor definitiontensor<float>(x[768])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/nomic-ai/modernbert-embed-base @ d556a88
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="nomic-ai-modernbert-int8"/>
        <max-tokens>8192</max-tokens>
        <transformer-output>token_embeddings</transformer-output>
        <prepend>
            <query>search_query: </query>
            <document>search_document: </document>
        </prepend>
    </component>

snowflake-arctic-embed-m-v2.0

Embedding model based on snowflake-arctic-embed-m-v2.0.

Model idsnowflake-arctic-embed-m-v2.0
Tensor definitiontensor<float>(x[768])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0 @ 95c2741
LanguageMultilingual
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="snowflake-arctic-embed-m-v2.0"/>
        <max-tokens>8192</max-tokens>
        <transformer-output>token_embeddings</transformer-output>
        <pooling-strategy>cls</pooling-strategy>
        <normalize>true</normalize>
        <prepend>
            <query>query: </query>
        </prepend>
    </component>

snowflake-arctic-embed-m-v2.0-int8

INT8 quantized variant of snowflake-arctic-embed-m-v2.0. Offers faster inference with minimal accuracy loss.

Model idsnowflake-arctic-embed-m-v2.0-int8
Tensor definitiontensor<float>(x[768])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0 @ 95c2741
LanguageMultilingual
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="snowflake-arctic-embed-m-v2.0-int8"/>
        <max-tokens>8192</max-tokens>
        <transformer-output>token_embeddings</transformer-output>
        <pooling-strategy>cls</pooling-strategy>
        <normalize>true</normalize>
        <prepend>
            <query>query: </query>
        </prepend>
    </component>

voyage-4-nano

Embedding model based on voyage-4-nano-ONNX.

Model idvoyage-4-nano
Tensor definitiontensor<float>(x[2048])
Matryoshka dimensionsx[2048], x[1024], x[512], x[256]
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/thomasht86/voyage-4-nano-ONNX @ 736c0d0
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="voyage-4-nano"/>
        <max-tokens>32768</max-tokens>
        <pooling-strategy>mean</pooling-strategy>
        <normalize>true</normalize>
        <prepend>
            <query>Represent the query for retrieving supporting documents: </query>
        </prepend>
    </component>

voyage-4-nano-int8

INT8 quantized variant of voyage-4-nano. Offers faster inference with minimal accuracy loss.

Model idvoyage-4-nano-int8
Tensor definitiontensor<float>(x[2048])
Matryoshka dimensionsx[2048], x[1024], x[512], x[256]
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/thomasht86/voyage-4-nano-ONNX @ 736c0d0
LanguageEnglish
Component declaration
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="voyage-4-nano-int8"/>
        <max-tokens>32768</max-tokens>
        <pooling-strategy>mean</pooling-strategy>
        <normalize>true</normalize>
        <prepend>
            <query>Represent the query for retrieving supporting documents: </query>
        </prepend>
    </component>

Bert Embedder

These models are available for the Bert Embedder type="bert-embedder":

<container id="default" version="1.0">
    <component id="mini" type="bert-embedder">
        <transformer-model model-id="minilm-l6-v2"/>
        <tokenizer-vocab model-id="bert-base-uncased"/>
    </component>
    ...
</container>

Note bert-embedder requires both transformer-model and tokenizer-vocab.

minilm-l6-v2

A small, fast sentence-transformer model.

Model-idminilm-l6-v2
Tensor definitiontensor<float>(x[384]) or tensor<float>(p{},x[384])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
LanguageEnglish

mpnet-base-v2

A larger, but better than minilm-l6-v2 sentence-transformer model.

Model-idmpnet-base-v2
Tensor definitiontensor<float>(x[768]) or tensor<float>(p{},x[768])
distance-metricangular
Licenseapache-2.0
Sourcehttps://huggingface.co/sentence-transformers/all-mpnet-base-v2
LanguageEnglish

Tokenization Embedders

These are embedder implementations that tokenize text and embed string to the vocabulary identifiers. These are most useful for creating the tensor inputs to re-ranking models that take both the query and document token identifiers as input. Find examples in the sample applications.

bert-base-uncased

A vocabulary text (vocab.txt) file on the format expected by WordPiece: A text token per line.
Model-idbert-base-uncased
Licenseapache-2.0
Sourcehttps://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

e5-base-v2-vocab

A tokenizer.json configuration file on the format expected by HF tokenizer. This tokenizer configuration can be used with e5-base-v2, e5-small-v2 and e5-large-v2.
Model-ide5-base-v2-vocab
LicenseMIT
Sourcehttps://huggingface.co/intfloat/e5-base-v2
LanguageEnglish

multilingual-e5-base-vocab

A tokenizer.json configuration file on the format expected by HF tokenizer. This tokenizer configuration can be used with multilingual-e5-base-vocab.
Model-idmultilingual-e5-base-vocab
LicenseMIT
Sourcehttps://huggingface.co/intfloat/multilingual-e5-base
LanguageMultilingual

Significance models

These are global significance models that can be added to significance element in services.xml.

significance-en-wikipedia-v1

This significance model was generated from English Wikipedia dump data from 2024-08-01. Available in Vespa as of version 8.426.8.
Model-idsignificance-en-wikipedia-v1
LicenseCreative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) License.
Sourcehttps://data.vespa-cloud.com/significance_models/significance-en-wikipedia-v1.json.zst
LanguageEnglish

Creating applications working both self-hosted and on Vespa Cloud

You can also specify both a model-id, which will be used on Vespa Cloud, and a url/path, which will be used on self-hosted deployments:

<transformer-model model-id="minilm-l6-v2" path="myAppPackageModels/myModel.onnx"/>

This can be useful for example to create an application package which uses models from Vespa Cloud for production and a scaled-down or dummy model for self-hosted development.

Using Vespa Cloud models with any config

Specifying a model-id can be done for any config field of type model, whether the config is from Vespa or defined by you.