expand all
collapse all

Tutorials and quick starts

Operations - Vespa Cloud

Component reference

Getting Started

Welcome to Vespa, the open big data serving engine! Here you'll find resources for getting started.

Quick Start	Quick start: Create and run a minimal Vespa application Other ways to get started: Quick start, application with Java components Quick start, using the Pyvespa Python API Docker Desktop: Install and run Vespa locally Docker Desktop: Install and run Vespa locally, with Java components The developer guide is an intro to developing, testing, and deploying applications. Until you add multiple nodes an application can be deployed both on cloud and locally with no modifications.
Tutorials and Use Cases	Moving from the minimal quick start to more advanced use cases Search Tutorial: Text Search. A text search tutorial and introduction to text ranking with Vespa using traditional information retrieval techniques like BM25. Tutorial: Hybrid Text Search. A search tutorial and introduction to hybrid text ranking with Vespa, combining BM25 with text embedding models. Tutorial: Improving Text Search with Machine Learning. This tutorial builds on the text search tutorial but introduces Learning to Rank to improve relevance. Vector Search Learn how to use Vespa Vector Search in the practical nearest neighbor search guide. It uses Vespa's support for nearest neighbor search, there is also support for fast approximate nearest neighbor search in Vespa. The guide covers combining vector search with filters and how to perform hybrid search, combining retrieval over inverted index structures with vector search. RAG (Retrieval-Augmented Generation) Tutorial: RAG Blueprint. A tutorial that provides a blueprint for building high-quality RAG applications with Vespa. Includes evaluation and learning-to-rank (LTR). Retrieval-augmented generation (RAG) in Vespa. Recommendation Learn how to use Vespa for content recommendation/personalization in the News Search and Recommendation tutorial set. ML Model Serving Learn how to use Vespa for ML model serving in Stateless Model Evaluation. Vespa supports running inference with models from many popular ML frameworks, which can be used for ranking, query classification, question answering, multi-modal retrieval, and more. Ranking with ONNX models. Export models from popular deep learning frameworks such as PyTorch to ONNX format for serving in Vespa. Vespa integrates with ONNX-Runtime for accelerated inference. Many ML frameworks support exporting models to ONNX, including sklearn. Ranking with LightGBM models Ranking with XGBoost models Ranking with TensorFlow models Embedding Model Inference Vespa supports integrating embedding models, which avoids transferring large amounts of embedding vector data over the network and allows for efficient serving of embedding models. Huggingface Embedder Use single-vector embedding models from Hugging face ColBERT Embedder Use multi-vector embedding models Splade Embedder Use sparse learned single vector embedding models ML Model Lifecycle The Models hot swap tutorial shows a solution for changing the vector embedding model atomically while serving. It also extends the application to support multiple recommendation models while minimizing data duplication. Lastly, it demonstrates how to efficiently garbage collect obsolete content in an application. E-Commerce Search The e-commerce shopping sample application demonstrates Vespa grouping, true in-place partial updates, custom ranking, and more. Examples and starting sample applications There are many examples and starting applications on GitHub and Pyvespa examples.
Production deployment environments	Vespa can be deployed in multiple ways. These guides show how to deploy multi-node applications in various environments. Production deployments on Vespa Cloud Vespa high-availability multi-node template Vespa multinode testing and observability Using Kubernetes with Vespa AWS EC2 multinode AWS ECS multinode See also monitoring Vespa.
Custom component development	Vespa applications can contain custom components that are run by Vespa, for example, when receiving queries or documents. The applications must be able to run on a JVM. While all the built-in behavior of Vespa can be invoked by a YQL query, advanced applications often choose to use plugin components to build queries from frontend requests as doing this closer to the data is faster and simpler. See the quick starts with Java above to get started. The Developer Guide has more details.

Next Steps

Performance and scaling on Vespa.
Vespa query performance - practical guide.
Overview of Vespa APIs.
Frequently asked questions.
Sample applications GitHub repo.
Securing a Vespa installation.
Follow the Vespa Blog for product updates and use cases.