Stateless Model Evaluation
Vespas speciality is evaluating machine-learned models quickly over large
numbers of data points. However, it can also be used to evaluate models once
on request in stateless containers. By enabling a feature in services.xml,
all machine-learned models -
TensorFlow, Onnx, XGBoost and Vespa stateless models -
added to the
models/ directory of the application package,
are made available through both a REST API and a Java API where you can compute inferences from your own code.
An example application package can be found at in the model-evaluation system test.
The model evaluation tag
<container> ... <model-evaluation/> ... </container>
Model inference using the REST API
The simplest way to evaluate the model is to use the REST API. After
enabling it as above, a new API path is made available:
/model-evaluation/v1/. To discover and find information about the
models (including expected input parameters to the model) in your application
package simply follow the links from this root. To evaluate a model add
to the query path:
Here <model-name> signifies which model to evaluate as you can deploy multiple models in your application package. The <function> specifies which signature and output to evaluate as a model might have multiple signatures and outputs you can evaluate. If a model only has one function, this can be omitted. Inputs to the model are specified as query parameters for GET requests and they can also be in the body part of the request for POST requests. The expected format for input parameters are tensors as specified with the literal form.
See the model-evaluation sample app for an example of this.
Model inference using Java
While the REST API gives a basic interface to run model inference, the Java interface offers far more control allowing you to for instance implement custom input and output formats.
First, add the following dependency in your
<dependency> <groupId>com.yahoo.vespa</groupId> <artifactId>container</artifactId> <scope>provided</scope> </dependency>
(Or, if you want the minimal dependency, depend on
model-evaluation instead of
With the above dependency and the
model-evaluation tag added to
services.xml, you can
now have your Java component that should evaluate models take a
instance as a constructor argument (Vespa will automatically inject it).
ModelsEvaluator API (from any thread) to make inferences. Sample code:
import ai.vespa.models.evaluation.ModelsEvaluator; import ai.vespa.models.evaluation.FunctionEvaluator; import com.yahoo.tensor.Tensor; ... // Create evaluator FunctionEvaluator evaluator = modelsEvaluator.evaluatorOf("myModel", "mySignature", "myOutput"); // Unambiguous args may be skipped // Get model inputs for instance from query (here we just construct a sample tensor) Tensor.Builder b = Tensor.Builder.of(new TensorType.Builder().indexed("d0", 3)); b.cell(0.1, 0); b.cell(0.2, 0); b.cell(0.3, 0); Tensor input = b.build(); // Bind inputs to the evaluator evaluator.bind("myInput", input); // Evaluate model. Note: Evaluator must be discarded after a single use Tensor result = evaluator.evaluate()); // Do something with the result
The model-evaluation sample app also has an example of this.