Vespa's speciality is evaluating machine-learned models quickly over large numbers of data points.
However, it can also be used to evaluate models once on request in stateless containers.
By enabling a feature in services.xml,
all machine-learned models -
TensorFlow,
Onnx,
XGBoost,
LightGBM and
Vespa stateless models -
added to the models/ directory of the
application package,
are made available through both a REST API and a Java API
where you can compute inferences from your own code.
The model-evaluation section can optionally contain inference session options for
ONNX models. See ONNX inference options.
Model inference using the REST API
The simplest way to evaluate the model is to use the REST API.
After enabling it as above, a new API path is made available:
/model-evaluation/v1/.
To discover and find information about the models
(including expected input parameters to the model) in your application package,
simply follow the links from this root.
To evaluate a model add /eval to the query path:
Here <model-name> signifies which model to evaluate as you can deploy
multiple models in your application package.
The <function> specifies
which signature and output to evaluate as a model might have multiple
signatures and outputs you can evaluate.
If a model only has one function, this can be omitted.
Inputs to the model are specified as query parameters for GET requests,
and they can also be in the body part of the request for POST requests.
The expected format for input parameters are tensors as specified
with the literal form.
Model evaluation requests accepts these request parameters:
Parameter
Type
Description
format.tensors
String
Controls how tensors are rendered in the result.
Value
Description
short
Default. Render the tensor value in a JSON object having two keys, "type" containing the value,
and "cells"/"blocks"/"values" (depending on the type) containing the tensor content.
Render the tensor content in the type-appropriate short form.
long
Render the tensor value in a JSON object having two keys, "type" containing the value,
and "cells" containing the tensor content.
Render the tensor content in the general verbose form.
short-value
Render the tensor content directly as a JSON value.
Render the tensor content in the type-appropriate short form.
long-value
Render the tensor content directly as a JSON value.
Render the tensor content in the general verbose form.
While the REST API gives a basic interface to run model inference,
the Java interface offers far more control
allowing you to for instance implement custom input and output formats.
(Or, if you want the minimal dependency,
depend on model-evaluation instead of container.)
With the above dependency and the model-evaluation element
added to services.xml,
you can now have your Java component that should evaluate models
take a ai.vespa.models.evaluation.ModelsEvaluator
(see
ModelsEvaluator.java) instance as a constructor argument
(Vespa will automatically inject it).
Use the ModelsEvaluator API (from any thread) to make inferences. Sample code:
importai.vespa.models.evaluation.ModelsEvaluator;importai.vespa.models.evaluation.FunctionEvaluator;importcom.yahoo.tensor.Tensor;// ...// Create evaluatorFunctionEvaluatorevaluator=modelsEvaluator.evaluatorOf("myModel","mySignature","myOutput");// Unambiguous args may be skipped// Get model inputs for instance from query (here we just construct a sample tensor)Tensor.Builderb=Tensor.Builder.of(newTensorType.Builder().indexed("d0",3));b.cell(0.1,0);b.cell(0.2,0);b.cell(0.3,0);Tensorinput=b.build();// Bind inputs to the evaluatorevaluator.bind("myInput",input);// Evaluate model. Note: Evaluator must be discarded after a single useTensorresult=evaluator.evaluate());// Do something with the result
When developing your application it can be helpful to unit test your
models and/or your searchers and document processors during development. Vespa
provides a ModelsEvaluatorTester which can be constructed from the
contents of your "models" directory. This allows for testing that the model
works as expected in context of Vespa, and that your searcher or document
processor gets the correct results from your models.
With this you can construct a testable ModelsEvaluator:
importcom.yahoo.vespa.model.container.ml.ModelsEvaluatorTester;publicclassModelsTest{@TestpublicvoidtestModels(){ModelsEvaluatormodelsEvaluator=ModelsEvaluatorTester.create("src/main/application/models");// Test the modelsEvaluator directly or construct a searcher and pass it in}}
The ModelsEvaluator object that is returned contains all models
found under the directory pass in. Note that this should only be used in unit
testing.
The number of threads available for running operations with
multithreaded implementations.
interop-threads
optional
number
max(1, CPU count / 4) if execution mode parallel
The number of threads available for running multiple operations in
parallel. This is only applicable for parallel execution mode.
execution-mode
optional
string
sequential
Controls how the operators of a graph are executed,
either sequential or parallel.
gpu-device
optional
number
Set the GPU device number to use for computation, starting at 0, i.e.
if your GPU is /dev/nvidia0 set this to 0. This must be an
Nvidia CUDA-enabled GPU.
Since stateless model evaluation is based on auto-discovery of models under the
models directory in the application package, the above would only
be needed for models that should not use the default settings, or should run on
a GPU.