Tensor Guide

Tensors allow Vespa to support advanced ranking models such as large logistic regression models and neural networks. In this guide, tensors are introduced with some examples of use. For details, refer to the tensor reference. Also try the blog recommendation tutorial. This guide goes through:

  • Setting up tensor fields in schemas
  • Feeding tensors to Vespa
  • Querying Vespa with tensors
  • Ranking with tensors
  • Constant tensors
  • Tensor Java API
  • Common use cases
  • Tensor concepts
For a quick introduction to tensors, refer to tensor concepts and the tensor reference guide. See also using TensorFlow models or using ONNX models which explains how to use TensorFlow or ONNX models directly in Vespa.

Tensor document fields

In typical use, a document contains one or more tensor fields to be used for ranking - this example sets up a tensor field called tensor_attribute:

field tensor_attribute type tensor<float>(x[4]) {
    indexing: attribute | summary
A tensor requires a type - x[4] means indexed dimension of size 4, where x{} means mapped dimension - see tensor field in schemas. For details on tensor types, refer to the tensor type reference.

Feeding tensors

There are two options when feeding tensors.

Tensors can be updated - one can add, remove and modify tensor cells, or assign a completely new tensor value.

Querying with tensors

Tensors can only be used for ranking, not searching. The tensor can either be supplied in the query request, or constructed from some other data or data source. In the latter case, refer to the Tensor Java API for details on how to construct tensors programmatically. Query/Context tensors must be defined in the application package (name, type and dimension), see defining query feature types.

Ranking with tensors

Tensors are used during ranking to modify a document's rank score given the query. Typical operations are dot products between tensors of order 1 (vectors), or matrix products between tensors of order 2 (matrices). Tensors are used in rank expressions as rank features. Two rank features are defined above:

  • attribute(tensor_attribute): the tensor associated with the document
  • query(tensor): the tensor sent with the query request
These can be used in rank expressions. Note that the final rank score of a document must be a single double value - example:
rank-profile dot_product {
    first-phase {
        expression: sum(query(tensor)*attribute(tensor_attribute))
This takes the product of the query tensor and the document tensor, and sums all fields thus resolving into a single value which is used as the rank score. In the case above, the value is 39.0 (1*1 + 2*2 + 3*3 + 5*5).

There are some ranking functions that are specific for tensors:

map(tensor, f(x)(...)) Returns a new tensor with the lambda function defined in f(x)(...) applied to each cell.
reduce(tensor, aggregator, dim1, dim2, ...) Returns a new tensor with the aggregator applied across dimensions dim1, dim2, etc. If no dimensions are specified, reduce over all dimensions.
join(tensor1, tensor2, f(x,y)(...)) Returns a new tensor constructed from the natural join between tensor1 and tensor2, with the resulting cells having the value as calculated from f(x,y)(...), where x is the cell value from tensor1 and y from tensor2.
These primitives allow for a great deal of flexibility when combined. The above rank expression is equivalently:
rank-profile dot_product {
    first-phase {
        expression {
                    f(x,y)(x * y)
...and represents the general dot product for tensors of any order. Details about tensor ranking functions including lambda expression and available aggregators can be found in the tensor reference documentation.

Create a tensor on the fly from another field type:

  • From a single-valued source field:
  • Create a tensor in the ranking function from arrays or weighted sets using tensorFrom... functions - see document features.

Performance considerations

Tensor expressions are fairly concise, and since the expressions themselves are independent of the data size, the actual workload during ranking can be significant for large tensors.

When using tensors in ranking it is important to have an understanding of the potential computational cost for each query. As an example, assume the dot product of two tensors with 1000 values each, e.g. tensor<double>(x[1000]). Assuming one query tensor and one document tensor, the operation is:

sum(query(tensor1) * attribute(tensor2))
If 8 bytes is used to store each value (e.g. using a double), each tensor is approximately 8KB. With for instance a Haswell architecture the theoretical upper memory bandwidth is 68GB/s, which is around 9 million document ranking evaluations per second. With 1 million documents, this means the maximum throughput, with regards to pure memory bandwidth, is 9 queries per second (per node).

Even though you would typically not do the above without reducing the search space first (using matching and first phase), it is important to consider the memory bandwidth and other hardware limitations when developing ranking expressions with tensors.

Using a smaller value type increases performance, trading off precision: tensor<float>(x[1000]) uses 4 bytes per cell value.

Constant tensors

In addition to document tensors and query tensors, constant tensors can be put in the application package. This is useful when constant tensors are used in ranking expressions, for instance machine learned models. Example:

constant tensor_constant {
    file: constants/constant_tensor_file.json
    type: tensor<float>(x[4])
This defines a new tensor rank feature with the type as defined and the contents distributed with the application package in the file constants/constant_tensor_file.json. The format of this file is the tensor JSON format, it can be compressed, see the reference for examples.

To use this tensor in a rank expression, encapsulate the constant name with constant(...):

rank-profile use_constant_tensor {
    first-phase {
        expression: sum(query(tensor) * attribute(tensor_attribute) * constant(tensor_constant))
The above expression combines three tensors: the query tensor, the document tensor and a constant tensor.

Use cases

In the following section, find common use cases that can be solved using tensor operations.

Dot Product between query and document vectors

Assume we have a set of documents where each document contains a vector of size 4. We want to calculate the dot product between the document vectors and a vector passed down with the query and rank the results according to the dot product score.

The following sd-file defines an attribute tensor field with a tensor type that has one indexed dimension x of size 4. In addition we define a rank profile that calculates the dot product.

schema example {
  document example {
    field document_vector type tensor<float>(x[4]) {
      indexing: attribute | summary
  rank-profile dot_product {
    first-phase {
      expression: sum(query(query_vector)*attribute(document_vector))
The tensor to pass down with query is defined in a query profile type with the same tensor type as the field in the document:
<query-profile-type id="myProfileType">
  <field name="ranking.features.query(query_vector)" type="tensor&lt;float&gt;(x[4])" />
Example document with the vector [1.0, 2.0, 3.0, 5.0]:
  { "put": "id:example:example::0", "fields": {
      "document_vector" : {
        "cells": [
          { "address" : { "x" : "0" }, "value": 1.0 },
          { "address" : { "x" : "1" }, "value": 2.0 },
          { "address" : { "x" : "2" }, "value": 3.0 },
          { "address" : { "x" : "3" }, "value": 5.0 }
Example query set in a searcher with the vector [1.0, 2.0, 3.0, 5.0]:
public Result search(Query query, Execution execution) {
        cell().label("x", 0).value(1.0).
        cell().label("x", 1).value(2.0).
        cell().label("x", 2).value(3.0).
        cell().label("x", 3).value(5.0).build());
    return execution.search(query);

Matrix Product between 1d vector and 2d matrix

Assume we have a 3x2 matrix represented in an attribute tensor field document_matrix with a tensor type tensor<float>(x[3],y[2]) with content:

{ {x:0,y:0}:1.0, {x:1,y:0}:3.0, {x:2,y:0}:5.0, {x:0,y:1}:7.0, {x:1,y:1}:11.0, {x:2,y:1}:13.0 }
Also assume we have 1x3 vector passed down with the query as a tensor with type tensor<float>(x[3]) with content:
{ {x:0}:1.0, {x:1}:3.0, {x:2}:5.0 }
that is set as query(query_vector) in a searcher as specified in query feature.

To calculate the matrix product between the 1x3 vector and 3x2 matrix (to get a 1x2 vector) use the following ranking expression:

sum(query(query_vector) * attribute(document_matrix),x)
This is a sparse tensor product over the shared dimension x, followed by a sum over the same dimension.

Tensor concepts

In Vespa, a tensor is a data structure which is a generalization of scalars, vectors and matrices. Tensors can have any order:

  • A scalar is a tensor of order 0
  • A vector is a tensor of order 1
  • A matrix is a tensor of order 2
Tensors consist of a set of double valued cells, with each cell having a unique address. A cell's address is specified by its index or label along all dimensions. The number of dimensions in a tensor is the rank of the tensor. Each dimension can be either mapped or indexed. Mapped dimensions are sparse and allow any label (string identifier) designating their address, while indexed dimensions use dense numberic indices starting at 0.

Example: Using literal form, the tensor:

    {x:2, y:1}:1.0,
    {x:0, y:2}:1.0
has two dimensions named x and y, and has two cells with defined values:

Tensor graphical representation

A type declaration is needed for tensors. This defines a 2-dimensional mapped tensor (matrix) of float:

This is a 2-dimensional indexed tensor (a 2x3 matrix) of double:
A combination of mapped and indexed dimensions is a mixed tensor:
Vespa uses the type information to optimize execution plans at configuration time. For dense data the best performance is achieved with indexed dimensions.

Tensor examples

The following examples uses the tensor playground to visualize tensor operations. Follow the readme and use http://localhost:8080/playground/index.html. By clicking on the links below, a setup string will be copied to the clipboard - paste the string into the setup input box in the playground and press enter.

The neural network example is quite a bit more involved. Here, the network has 3 input neurons, 5 hidden neurons and a single output neuron. An example of neural networks in action can be found in the blog recommendation tutorial part 3.