• [+] expand all

Tensor Guide

Vespa provides a tensor data model and computation engine to support advanced computations over data, such as neural nets. This guide explains the tensor support in Vespa. See also the tensor reference, and our published paper (pdf).


See also:

Tensor concepts

A tensor in Vespa is a data structure which generalizes scalars, vectors and matrices to any number of dimensions:

  • A scalar is a tensor of rank 0
  • A vector is a tensor of rank 1
  • A matrix is a tensor of rank 2
  • ...

Tensors consist of a set of scalar valued cells, with each cell having a unique address. A cell's address is specified by its index or label in all the dimensions of that tensor. The number of dimensions in a tensor is the rank of the tensor. Each dimension can be either mapped or indexed. Mapped dimensions are sparse and allow any label (string identifier) designating their address, while indexed dimensions use dense numeric indices starting at 0.

Example: Using literal form, the tensor:

    {x:2, y:1}:1.0,
    {x:0, y:2}:1.0

has two dimensions named x and y, and has two cells with defined values:

Tensor graphical representation

A tensor has a type, which consists of a set of dimension names and types, and a value type. The dimension name can be anything. This defines a 2-dimensional mapped tensor (matrix) of floats:

This is a 2-dimensional indexed tensor (a 2x3 matrix) of double (double is the default value type):
A combination of mapped and indexed dimensions is a mixed tensor:

Vespa uses the type information to optimize execution plans at configuration time. For dense data the best performance is achieved with indexed dimensions.

Tensor document fields

Document fields in schemas can be of any tensor type:

field tensor_attribute type tensor<float>(x[4]) {
    indexing: attribute | summary

Feeding tensors

Tensor field values can be added to documents in JSON format. You can add, remove and modify tensor cells, or assign a completely new tensor value.

From inside containers (e.g in a Document Process), you can also create Tensor values using the tensor Java API to set tensor values in documents.

Querying with tensors

Tensors with one dense dimension can be indexed (HNSW) and used for searching (ANN), see nearest neighbor search.

Ranking with tensors

Tensors can be used in making inferences and ranking over documents for a query, that is in ranking expressions in rank profiles.

Accessing a document tensor in a ranking expressions is just like accessing any other attribute: attribute(tensor_attribute).

To pass tensors in queries, you need to define their type in the rank profile. Then you can either

  • add it to the query in a Searcher using the Tensor class and setting it by Query.getRanking().getFeatures.put("query(myTensor)", myTensorInstance), or
  • pass it in the request, using a parameter like input.query(myTensor) and passing a tensor value on the tensor literal form.

You can then access the query tensor in ranking expressions by query(myTensor).

With these tensors, you can write functions in rank profiles which computes over them. For example:

rank-profile dot-product {

    first-phase {
        expression: dotProduct(query(tensor), attribute(tensor_attribute))

    function dotProduct(a, b) {
        expression: sum(a*b)


The tensor expression above is a short form for reduce(join(a, b, f(x,y)(x * y) ), sum). The full list of tensor functions are listed in the ranking expression reference.

Creating tensors from document fields

If you need to make tensor comutations using single-valued attributes, arrays or weighted sets, you can convert them in a ranking expression:

  • Creating an indexed tensor where the values are lifted from single-value attributes can be done using the tensor concat function:
    concat(attribute(foo), attribute(bar))
    This can also be used to concat single-value attributes to a tensor.
  • Creating a mapped tensor where the values are lifted from single-value attributes can be done using the tensor generate function:
    tensor(x{}):{x1:attribute(foo), x2:attribute(bar)}
  • Creating a mapped tensor where the label(s) are lifted from a string array or single-value attribute can be done with the document feature. tensorFromLabels.
  • Creating a mapped tensor where the labels and values are lifted from a string array can be done with the document feature. tensorFromWeightedSet.

Constant tensors

In addition to document tensors and query tensors, constant tensors can be put in the application package. This is useful when constant tensors are used in ranking expressions, for instance machine learned models. Example:

constants  {
    my_tensor_constant tensor<float>(x[10000]) file: constants/constant_tensor_file.json

This defines a new tensor rank feature with the type as defined and the contents distributed with the application package in the file constants/constant_tensor_file.json. The format of this file is the tensor JSON format, it can be compressed, see the reference for examples.

To use this tensor in a ranking expression, encapsulate the constant name with constant(...):

rank-profile use_constant_tensor {
    first-phase {
        expression: sum(query(tensor) * attribute(tensor_attribute) * constant(tensor_constant))

Tensors with strings

Tensors in Vespa cannot have strings as values, since the mathematical tensor functions would be undefined for such "tensors". However, you can still represent sets of strings in tensors by using the strings as keys in a mapped tensor dimensions, using e.g 1.0 as values. This allows you to perform set operations on strings and similar without making those tensors incompatible with other tensors and with normal tensor operations.

Performance considerations

Tensor expressions are fairly concise, and since the expressions themselves are independent of the data size, the actual workload during ranking can be significant for large tensors.

When using tensors in ranking it is important to have an understanding of the potential computational cost for each query. As an example, assume the dot product of two tensors with 1000 values each, e.g. tensor<double>(x[1000]). Assuming one query tensor and one document tensor, the operation is:

sum(query(tensor1) * attribute(tensor2))

If 8 bytes is used to store each value (e.g. using a double), each tensor is approximately 8 KB. With for instance a Haswell architecture the theoretical upper memory bandwidth is 68 GB/s, which is around 9 million document ranking evaluations per second. With 1 million documents, this means the maximum throughput, in regard to pure memory bandwidth, is 9 queries per second (per node).

Even though you would typically not do the above without reducing the search space first (using matching and first phase), it is important to consider the memory bandwidth and other hardware limitations when developing ranking expressions with tensors.

When using tensor types with at least one mapped dimension (sparse or mixed tensor), attribute: fast-rank can be used to optimize the tensor attribute for ranking expression evaluation at the cost of using more memory. This is a trade-off that can be worth taking if benchmarking indicates significant latency improvements with fast-rank.

Performance considerations for cell value types


The 64-bit floating-point "double" format is the default cell type. It gives best precision at the cost of high memory usage and somewhat slower calculations. Using a smaller value type increases performance, trading off precision, so consider changing to one of the cell types below before scaling your application.


The usual 32-bit floating-point format "float" should usually be used for all tensors when scaling for production. (Note that other frameworks, like tensorflow, will also prefer 32-bit floats.) A vector with 1000 dimensions, tensor<float>(x[1000]) would then use approx 4K memory per tensor value.


If memory (or memory bandwidth) is still a concern, it's possible to change the most space-consuming tensors to use the bfloat16 cell type. This type has the range as a normal 32-bit float but only 8 bits of precision, and can be thought of as "float with lossy compression". See also bfloat16 floating-point format on Wikipedia. Some careful analysis of your data is required before using this type.

Note that when doing calculations bfloat16 will act as if it was a 32-bit float, but the smaller size comes with a potential computation overhead. In most cases, the bfloat16 needs to be converted to a 32-bit float before the actual calculation can take place; adding an extra conversion step.

In some cases, having tensors with bfloat16 cells might bypass some build-in optimizations in the back-end (like matrix multiplication) that will be hardware accelerated only if the cells are the same type. To avoid this last case, you can use the cell_cast tensor operation to make sure the cells are of the appropriate type before doing the more expensive operations.


If one uses machine-learning to generate a model with data quantization you can target the int8 cell value type, which is a signed integer with range from -128 to +127 only. This is also treated like a "float with limited range and lossy compression" by the Vespa tensor framework, and gives results as if it was a 32-bit float when any calculation is done. This type is also suitable when representing boolean values (0 or 1). Note that if the input for an int8 cell is not directly representable, the resulting cell value is undefined, so you should take care to only input numbers in the [-128,127] range. It's also possible to use int8 representing binary data for hamming distance Nearest-Neighbor search.