Vespa basics
Learn more
Applications and components
Schemas and documents
Reading and writing
Querying
Ranking and inference
RAG and embedding
Linguistics and text processing
Content and elasticity
Performance
Operations
- Quota
- Environments
- Zones
- Availability Zones
- Production deployment
- Deployment variants
- Automated deployments
- Autoscaling
- Enclave: Bring your own cloud
- Reindexing
- Reindexing on Vespa Cloud
- Data management and backup
- Cloning applications and data
- Monitoring
- Metrics
- Telemetry export
- Notifications
- Support
- Login Help
- Single Sign-On (SSO) Setup
- Deployment patterns
- Private endpoints
- Endpoint routing
- Access logging
- Artifact archive
- Deleting applications
- Self-managed
- Kubernetes
  - Introduction
  - Architecture
  - Deployment
    
    Installation
    
    Minikube Setup
    
    Setup ECR Pull-through Cache
    
    Setup Dev Environment
    
    Permissions
  - Operations
    
    Operations
    
    Upgrade Vespa on Kubernetes
    
    Delete a VespaSet
    
    Monitor a Vespa on Kubernetes Deployment
    
    Resource Scaling
  - Configuration
    
    Configure Local Storage Type
    
    Configure Log Collections
    
    Configure External Access Layer
    
    Provide Custom Overrides
    
    Enable TLS Encryption
Security
Clients
Modules
- E-commerce
Reference
- APIs
- Applications and components
- Schemas and documents
- Reading and writing
  - Indexing language
  - Document selector language
- Querying
- Ranking and inference
- RAG and embedding
  - Chunking
  - Embedding
- Operations
  - Health checks
  - Log files
  - Tools
  - Metrics
    
    Metrics
    
    Default metric set
    
    Vespa metric set
    
    Metric units
    
    Container metrics
    
    Distributor metrics
    
    Search node metrics
    
    Storage metrics
    
    Configserver metrics
    
    Logd metrics
    
    Node Admin metrics
    
    Slobrok metrics
    
    Cluster controller metrics
    
    Sentinel metrics
  - Self-managed
    
    Tools
- Security
  - Mtls
- Clients
  - Vespa CLI
    
    vespa
    
    vespa activate
    
    vespa auth
    
    vespa clone
    
    vespa config
    
    vespa curl
    
    vespa deploy
    
    vespa destroy
    
    vespa document
    
    vespa feed
    
    vespa fetch
    
    vespa inspect
    
    vespa log
    
    vespa prepare
    
    vespa prod
    
    vespa query
    
    vespa status
    
    vespa test
    
    vespa version
    
    vespa visit
- Release notes

The RAG Blueprint

Vespa is the platform of choice for large scale RAG applications like Perplexity. It gives you all the features you need but putting them all together can be a challenge.

This open source sample applications contains all the elements you need to create a RAG application that

delivers state-of-the-art quality, and
scales to any amount of data, query load, and complexity.

This README provides the steps to create and run your own application based on the blueprint. Refer to the RAG Blueprint tutorial for more in-depth explanations, or try out the Python notebook.

Setup:

Create a tenant on Vespa Cloud:

Go to console.vespa-cloud.com and create your tenant (unless you already have one).
Install the Vespa CLI using Homebrew:
```
$ brew install vespa-cli
```
Windows/No Homebrew? See the Vespa CLI page to download directly.
Configure the Vespa client:
```
$ export VESPA_CLI_HOME=$PWD/.vespa
```
```
$ vespa config set target cloud
$ vespa config set application vespa-team.autotest
```
Use the tenant name from step 1 instead of "vespa-team", and replace in other steps in this example guide, too.
Get Vespa Cloud control plane access:
```
$ vespa auth login
```
Follow the instructions from the command to authenticate.
Clone a sample application:
```
$ vespa clone rag-blueprint myapp && cd myapp
```
See sample-apps for other sample apps you can clone.
Add a certificate for data plane access to the application:
```
$ vespa auth cert app
```
It is a good idea to take note of the path to the .pem files written here.

Test the application

$ vespa deploy --wait 900 ./app

Feed some documents, this will also chunk and embed so it takes about 3 minutes:

$ vespa feed dataset/docs.jsonl

Now you can issue queries:

$ vespa query 'query=yc b2b sales'

$ vespa destroy --force

[!TIP] Add "-v" to see the HTTP request this becomes.

Congratulations! You have now created a RAG application that can scale to billions of documents and thousands of queries per second, while delivering state-of-the-art quality.

Explore more

What do you want to do next?

To learn what this application can do, look at the files in your app/ dir.
Run your application locally using Docker
Using query profiles to define behavior for different use cases
Evaluate and improve relevance of the data returned
Do LLM generation inside the application