- [+] expand all

- Getting started
- Vespa Overview
- Features
- Getting Started
- Vespa CLI
- Getting Started with Ranking
- Tutorials
- Vespa API and Interfaces
- Frequently Asked Questions - FAQ
- Glossary

- Schemas and documents
- Documents
- Schemas
- Parent/Child
- Annotations API
- Concrete document types

- Reading and writing
- Reads and Writes
- /document/v1
- Visiting
- Vespa Feed Client
- Indexing
- Document API
- Partial Updates

- Querying
- Query API
- Vespa Query Language
- Grouping Information in Results
- Federation
- Query Profiles
- Nearest Neighbor Search
- Approximate Nearest Neighbor Search
- Nearest Neighbor Search Guide
- Text matching
- Geo Search
- Predicate Fields
- Streaming Search
- Document Summaries
- Result Renderers
- Page Templates

- Ranking and ML models
- Ranking Introduction
- Rank Features and Expressions
- Embedding
- Multivalue Query Operators
- Tensor User Guide
**Tensor Examples**- Phased Ranking
- Searcher Re-Ranking
- Cross-Encoder Transformer Ranking
- Ranking With TensorFlow Models
- Ranking With ONNX Models
- Ranking With XGBoost Models
- Ranking With LightGBM Models
- Stateless model evaluation
- Ranking With BM25
- Ranking With nativeRank
- Accelerated OR search using the WAND algorithm

- Linguistics and text processing
- Linguistics in Vespa
- Query Rewriting
- Embedding text
- Troubleshooting character encoding
- Lucene Linguistics

- Tutorials and quick starts
- News 1: Getting Started
- News 2: Application Packages, Feeding, Query
- News 3: Sorting, Grouping and Ranking
- News 4: Embeddings
- News 5: Partial Updates, ANNs, Filtering
- News 6: Custom Searchers, Document Processors
- News 7: Parent-Child, Tensor Ranking
- Models Hot Swap
- Text Search
- Text Search ML
- Quick Start
- Quick Start Java

- Applications and components
- Developer Guide
- Application Packages
- Unit Testing
- Testing
- Testing Reference
- Testing Reference Java
- Java Serving Container
- Container Components
- Request-Response Processing
- Searcher Development
- Document Processor Development
- Developing Web Service Applications
- Component Injection
- Chained Components
- Configuring Java components
- Bundles
- Using ZooKeeper
- Developing request handlers
- Building an HTTP API using request handlers and processors
- Configuring Http Servers and Filters
- Using Libraries for Pluggable Frameworks
- Developing server providers
- Server Tutorial
- LLMs in Vespa
- RAG in Vespa

- Content clusters
- Elasticity
- Proton
- Content Nodes and States
- Consistency Model
- Distribution Algorithm
- Buckets

- Performance and tuning
- Performance intro
- Practical performance guide
- Serving Sizing Guide
- Feed Sizing Guide
- Sizing Examples
- Document Attributes
- Benchmarking
- Profiling
- Container Tuning
- Rate-Limiting Search Requests
- HTTP/2
- Graceful Query Coverage Degradation
- Caches
- HTTP Performance Testing
- Feature Tuning
- Valgrind

- Operations
- Metrics
- Logs
- Access Logging
- Batch delete
- Feed block
- Reindexing
- Tools

- Operations - selfhosted
- Multinode Systems
- Administrative Procedures
- Files, Processes, Ports, Environment
- Node Setup
- Content node recovery
- Using Kubernetes with Vespa
- Securing a Vespa installation
- mTLS
- Configuration Servers
- Live Vespa upgrade procedure
- Config Sentinel
- Config Proxy
- Docker Containers
- Vespa Command-line Tools
- Docker Containers GPU setup
- Service Location Broker
- Change from attribute to index procedure
- Container
- Monitoring
- Routing

- Configuration reference
- Application Package Reference
- Schema Reference
- services.xml
- services.xml - admin
- services.xml - container
- services.xml - content
- services.xml - docproc
- services.xml - http
- services.xml - processing
- services.xml - search
- hosts.xml
- validation-overrides.xml
- Indexing Language Reference
- Custom Configuration File Reference
- mTLS Reference
- Internal Configuration File Reference
- Healthchecks Reference
- /state/v1 API Reference
- Deploy API
- HTTP Config API
- /application/v2/tenant API Reference
- /cluster/v2 API Reference
- /metrics/v1 API Reference
- /metrics/v2 API Reference
- /prometheus/v1 API Reference

- Ranking and ML models reference
- Ranking Expressions
- Tensor Evaluation Reference
- nativeRank Reference
- Rank Feature Reference
- String Segment Match
- Rank Feature Configuration
- Rank Types
- Stateless Model Reference
- Embedding Model Reference

- Queries and results reference
- Query API Reference
- Query Language Reference
- Simple Query Language Reference
- Select Reference
- Grouping Reference
- Sorting Reference
- Query Profile Reference
- Semantic Rule Language Reference
- Default JSON Result Format
- Page Templates Syntax
- Page Result Format
- Inspecting Structured Data in a Searcher
- Low-level request handler APIs

- Document API reference
- /document/v1 API reference
- Document JSON Format
- Document Field Path Syntax
- Document Selector Language

- Component reference
- Component Reference

- Metrics reference
- Vespa Metric Set Reference
- Default Metric Set Reference
- Container Metrics Reference
- Distributor Metrics Reference
- Searchnode Metrics Reference
- Storage Metrics Reference
- Configserver Metrics Reference
- Logd Metrics Reference
- Node Admin Metrics Reference
- Slobrok Metrics Reference
- Clustercontroller Metrics Reference
- Sentinel Metrics Reference
- Metric Units Reference

- Utilities and libraries
- Predicate Search Java Library
- pyvespa: Getting Started

Tensors can be used to express machine-learned models such as neural nets, but they can be used for much more than that. The tensor model in Vespa is powerful, since it supports sparse dimensions, dimension names and lambda computations. Whatever you want to compute, it is probably possible to express it succinctly as a tensor expression - the problem is learning how. This page collects some real-world examples of tensor usage to provide some inspiration.

The tensor playground is a tool to get familiar with and explore tensor algebra. It can be found at docs.vespa.ai/playground. Below are some examples of common tensor compute operations using tensor functions. Feel free to play around with them to explore further:

- Dense tensor dot product
- Sparse tensor dot product
- Vector-matrix product
- Matrix multiplication
- Tensor generation, dimension renaming and concatenation
- Jaccard similarity between mapped (sparse) tensors
- Neural network

In an ecommerce application you may have promotions that sets a different product price in given time intervals. Since the price is used for ranking, the correct price must be computed in ranking. Can tensors be used to specify prices in arbitrary time intervals in documents and pick the right price during ranking?

To do this, add three tensors to the document type as follows:

field startTime type tensor(id{}} { indexing: attribute } field endTime type tensor(id{}} { indexing: attribute } field price type tensor(id{}} { indexing: attribute }

Here the id is an arbitrary label for the promotion which must be unique within the document, and startTime and endTime are epoch timestamps.

Now documents can include promotions as follows (document JSON syntax):

"startTime": { "cells": { "promo1": 40, "promo2": 60, "promo3": 80 } "endTime": { "cells": { "promo1": 50, "promo2": 70, "promo3": 90 } "price": { "cells": { "promo1": 16, "promo2": 18, "promo3": 10 }

And we can retrieve the currently valid price by the expression

reduce((attribute(startTime) < now) * (attribute(endTime) > now) * attribute(price), max)

This will return 0 if there is no matching interval, so a full expression will probably wrap this in a function and check if it returns 0 (using an if expression) and return the default price of that product otherwise.

To see why this retrieves the right price, notice that `(attribute(startTime) < now)`

is a shorthand for

join(attribute(startTime), now, f(x,y)(x < y))

That is joining all the cells of the `startTime`

tensor by the zero-dimensional `now`

tensor (i.e a number), and setting the cell value in the joined tensor to 1 if now is larger than the cell
timestamp and 0 otherwise. When this tensor is joined by multiplication with one that has 1's only where now
is smaller, the result is a tensor with 1's for promotion id's whose interval is currently valid and 0 otherwise.
Then we can just join by multiplication with the price tensor to get the final tensor (on which we just pick the
max value to retrieve the non-zero value.

Play around with this example in the playground

A common situation is that you have dense embedding vectors to which you want to add some scalar attributes (or function return values) as input to a machine-learned model. This can be done by the following expression (assuming the dense vector dimension is named "x":

concat(concat(query(embedding),attribute(embedding),x), tensor(x[2]):[bm25(title),attribute(popularity)], x)

This creates a tensor from a set of scalar expressions, and concatenates it to the query and document embedding vectors.

Play around with this example in the playground

Assume we have a set of documents where each document contains a vector of size 4. We want to calculate the dot product between the document vectors and a vector passed down with the query and rank the results according to the dot product score.

The following schema file defines an attribute tensor field
with a tensor type that has one indexed dimension `x`

of size 4.
In addition, we define a rank profile with the input and the dot product calculation:

schema example { document example { field document_vector type tensor<float>(x[4]) { indexing: attribute | summary } } rank-profile dot_product { inputs { query(query_vector) tensor<float>(x[4]) } first-phase { expression: sum(query(query_vector)*attribute(document_vector)) } } }

Example JSON document with the vector [1.0, 2.0, 3.0, 5.0], using indexed tensors short form:

[ { "put": "id:example:example::0", "fields": { "document_vector" : [1.0, 2.0, 3.0, 5.0] } } ]

Example query set in a searcher with the vector [1.0, 2.0, 3.0, 5.0]:

public Result search(Query query, Execution execution) { query.getRanking().getFeatures().put("query(query_vector)", Tensor.Builder.of(TensorType.fromSpec("tensor<float>(x[4])")). cell().label("x", 0).value(1.0). cell().label("x", 1).value(2.0). cell().label("x", 2).value(3.0). cell().label("x", 3).value(5.0).build()); return execution.search(query); }

Play around with this example in the playground

Note that this example calculates the dot product for every document retrieved by the query. Consider
using approximate nearest neighbor search with
`distance-metric`

dotproduct.

One simple way to use machine-learning is to generate cross features from a set of base features and then do a logistic regression on these. How can this be expressed as Vespa tensors?

Assume we have three base features:

query(interests): tensor(interest{}) - A sparse, weighted set of the interests of a user. query(location): tensor(location{}) - A sparse set of the location(s) of the user. attribute(topics): tensor(topic{}) - A sparse, weighted set of the topics of a given document.

From these we have generated all 3d combinations of these features and trained a logistic regression model, leading to a weight for each possible combination:

tensor(interest{}, location{}, topic{})

This weight tensor can be added as a
constant tensor
to the application package, say `constant(model)`

. With that we can compute the model
in a rank profile by the expression

sum(query(interests) * query(location) * attribute(topics) * constant(model))

Where the first three factors generates the 3d cross feature tensor and the last combines them with the learned weights.

Play around with this example in the playground

Assume we have a 3x2 matrix represented in an attribute tensor field `document_matrix`

with a tensor type `tensor<float>(x[3],y[2])`

with content:

{ {x:0,y:0}:1.0, {x:1,y:0}:3.0, {x:2,y:0}:5.0, {x:0,y:1}:7.0, {x:1,y:1}:11.0, {x:2,y:1}:13.0 }

Also assume we have 1x3 vector passed down with the query as a tensor
with type `tensor<float>(x[3])`

with content:

{ {x:0}:1.0, {x:1}:3.0, {x:2}:5.0 }

that is set as `query(query_vector)`

in a searcher
as specified in query feature.

To calculate the matrix product between the 1x3 vector and 3x2 matrix (to get a 1x2 vector) use the following ranking expression:

sum(query(query_vector) * attribute(document_matrix),x)

This is a sparse tensor product over the shared dimension `x`

,
followed by a sum over the same dimension.

Play around with this example in the playground

Tensors with mapped dimensions look similar to maps, but are more general. What if all needed is a simple map lookup? See tensor performance for more details.

Assume a tensor attribute `my_map`

and this is the value for a specific document:

tensor<float>(x{},y[3]):{a:[1,2,3],b:[4,5,6],c:[7,8,9]}

To create a query to select which of the 3 named vectors (a,b,c) to use for some other calculation,
wrap the wanted label to look up inside a tensor.
Assume a query tensor `my_key`

with type/value:

tensor<float>(x{}):{b:1.0}

Do the lookup, returning a tensor of type `tensor<float>(y[3])`

:

sum(query(my_key)*attribute(my_map),x)

If the key does not match anything, the result will be empty: `tensor<float>(y[3]):[0,0,0]`

.
For something else, add a check up-front to check if the lookup will be successful
and run a fallback expression if it is not, like:

if(reduce(query(my_key)*attribute(my_map),count) == 3, reduce(query(my_key)*attribute(my_map),sum,x), tensor<float>(y[3]):[0.5,0.5,0.5])

`(y*x){x:b}`

.
The above syntax allows an optimized execution, find an example in the
Tensor Playground.
A common use case is to use a tensor lambda function to slice out the first `k`

dimensions of a vector representation of `m`

dimensions where `m`

is larger than `k`

.
Slicing with lambda functions is great for representing vectors from Matryoshka Representation Learning.

Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks.The following slices the first 256 dimensions of a tensor

`t`

:
tensor<float>(x[256])(t{x:(x)})Importantly, this does only reference into the original tensor, avoiding copying the tensory to a smaller tensor. The following is a complete example where we have stored an original vector representation with 3072 dimensions, And we slice the first 256 dimensions of the original representation to perform a dot product in the first-phase expression, followed by a full computation over all dimensions in the second-phase expression. See phased ranking for context on using Vespa phased computations and customizing reusable frozen embeddings with Vespa.

schema example { document example { field document_vector type tensor<float>(x[3072]) { indexing: attribute | summary } } rank-profile small-256-first-phase { inputs { query(query_vector) tensor<float>(x[3072]) } function slice_first_dims(t) { expression: l2_normalize(tensor<float>(x[256])(t{x:(x)}), x) } first-phase { expression: sum( slice_first_dims(query(query_vector)) * slice_first_dims(attribute(document_vector)) ) } second-phase { expression: sum( query(query_vector) * attribute(document_vector) ) } } }See also a runnable example in this tensor playground example.