Vespa provides a tensor data model and computation engine to support advanced computations over data, such as neural nets. This guide explains the tensor support in Vespa. See also the tensor reference, and our published paper (pdf).
Content:
See also:
A tensor in Vespa is a data structure which generalizes scalars, vectors and matrices to any number of dimensions:
Tensors consist of a set of scalar valued cells, with each cell having a unique address. A cell's address is specified by its index or label in all the dimensions of that tensor. The number of dimensions in a tensor is the rank of the tensor. Each dimension can be either mapped or indexed. Mapped dimensions are sparse and allow any label (string identifier) designating their address, while indexed dimensions use dense numeric indices starting at 0.
Example: Using literal form, the tensor:
{ {x:2, y:1}:1.0, {x:0, y:2}:1.0 }
has two dimensions named x
and y
, and has two cells with defined values:
A tensor has a type, which consists of a set of dimension names and types, and a value type. The dimension name can be anything. This defines a 2-dimensional mapped tensor (matrix) of floats:
tensor<float>(topic{},segment{})This is a 2-dimensional indexed tensor (a 2x3 matrix) of double (double is the default value type):
tensor(x[2],y[3])A combination of mapped and indexed dimensions is a mixed tensor:
tensor<float>(key{},x[1000])
Vespa uses the type information to optimize execution plans at configuration time. For dense data the best performance is achieved with indexed dimensions.
Document fields in schemas can be of any tensor type:
field tensor_attribute type tensor<float>(x[4]) { indexing: attribute | summary }
Tensor field values can be added to documents in JSON format. You can add, remove and modify tensor cells, or assign a completely new tensor value.
From inside containers (e.g in a Document Process), you can also create Tensor values using the tensor Java API to set tensor values in documents.
Tensors with one dense dimension can be indexed (HNSW) and used for searching (ANN), see nearest neighbor search.
Tensors can be used in making inferences and ranking over documents for a query, that is in ranking expressions in rank profiles.
Accessing a document tensor in a ranking expressions is just like accessing any other attribute:
attribute(tensor_attribute)
.
To pass tensors in queries, you need to define their type in the rank profile. Then you can either
Query.getRanking().getFeatures.put("query(myTensor)", myTensorInstance)
, or
input.query(myTensor)
and
passing a tensor value on the tensor literal form.
You can then access the query tensor in ranking expressions by
query(myTensor)
.
With these tensors, you can write functions in rank profiles which computes over them. For example:
rank-profile dot-product { first-phase { expression: dotProduct(query(tensor), attribute(tensor_attribute)) } function dotProduct(a, b) { expression: sum(a*b) } }
The tensor expression above is a short form for reduce(join(a, b, f(x,y)(x * y) ), sum)
.
The full list of tensor functions are listed in the
ranking expression reference.
If you need to make tensor comutations using single-valued attributes, arrays or weighted sets, you can convert them in a ranking expression:
concat
function:
concat(attribute(foo), attribute(bar))This can also be used to concat single-value attributes to a tensor.
tensor(x{}):{x1:attribute(foo), x2:attribute(bar)}
tensorFromLabels
.
tensorFromWeightedSet
.
In addition to document tensors and query tensors, constant tensors can be put in the application package. This is useful when constant tensors are used in ranking expressions, for instance machine learned models. Example:
constants { my_tensor_constant tensor<float>(x[10000]) file: constants/constant_tensor_file.json }
This defines a new tensor rank feature with the type as defined and the contents distributed with the application package in the file constants/constant_tensor_file.json. The format of this file is the tensor JSON format, it can be compressed, see the reference for examples.
To use this tensor in a ranking expression, encapsulate the constant name with constant(...)
:
rank-profile use_constant_tensor { first-phase { expression: sum(query(tensor) * attribute(tensor_attribute) * constant(tensor_constant)) } }
Tensors in Vespa cannot have strings as values, since the mathematical tensor functions would be undefined for such "tensors". However, you can still represent sets of strings in tensors by using the strings as keys in a mapped tensor dimensions, using e.g 1.0 as values. This allows you to perform set operations on strings and similar without making those tensors incompatible with other tensors and with normal tensor operations.
Tensor expressions are fairly concise, and since the expressions themselves are independent of the data size, the actual workload during ranking can be significant for large tensors.
When using tensors in ranking it is important to have an understanding of the
potential computational cost for each query. As an example, assume
the dot product of two tensors with 1000 values each, e.g. tensor<double>(x[1000])
.
Assuming one query tensor and one document tensor, the operation is:
sum(query(tensor1) * attribute(tensor2))
If 8 bytes is used to store each value (e.g. using a double), each tensor is approximately 8 KB. With for instance a Haswell architecture the theoretical upper memory bandwidth is 68 GB/s, which is around 9 million document ranking evaluations per second. With 1 million documents, this means the maximum throughput, in regard to pure memory bandwidth, is 9 queries per second (per node).
Even though you would typically not do the above without reducing the search space first (using matching and first phase), it is important to consider the memory bandwidth and other hardware limitations when developing ranking expressions with tensors.
When using tensor types with at least one mapped dimension (sparse or mixed tensor),
attribute: fast-rank
can be used to optimize the tensor attribute for ranking expression evaluation at the cost of using more memory.
This is a trade-off that can be worth taking if benchmarking indicates significant latency improvements with fast-rank
.
double |
The 64-bit floating-point "double" format is the default cell type. It gives best precision at the cost of high memory usage and somewhat slower calculations. Using a smaller value type increases performance, trading off precision, so consider changing to one of the cell types below before scaling your application. |
---|---|
float |
The usual 32-bit floating-point format "float" should usually
be used for all tensors when scaling for production.
(Note that other frameworks, like tensorflow, will also prefer 32-bit floats.)
A vector with 1000 dimensions, |
bfloat16 |
If memory (or memory bandwidth) is still a concern, it's
possible to change the most space-consuming tensors to use
the
Note that when doing calculations
In some cases, having tensors with |
int8 |
If one uses machine-learning to generate a model with data
quantization you can target the |