Vespa is the platform of choice for large scale RAG applications like Perplexity. It gives you all the features you need but putting them all together can be a challenge.
This open source sample applications contains all the elements you need to create a RAG application that
This README provides the steps to create and run your own application based on the blueprint. Refer to the RAG Blueprint tutorial for more in-depth explanations, or try out the Python notebook.
Setup:
Create a tenant on Vespa Cloud:
Go to console.vespa-cloud.com and create your tenant (unless you already have one).
Install the Vespa CLI using Homebrew:
$ brew install vespa-cli
Windows/No Homebrew? See the Vespa CLI page to download directly.
Configure the Vespa client:
$ export VESPA_CLI_HOME=$PWD/.vespa
$ vespa config set target cloud $ vespa config set application vespa-team.autotest
Use the tenant name from step 1 instead of "vespa-team", and replace in other steps in this example guide, too.
Get Vespa Cloud control plane access:
$ vespa auth login
Follow the instructions from the command to authenticate.
Clone a sample application:
$ vespa clone rag-blueprint myapp && cd myapp
See sample-apps for other sample apps you can clone.
Add a certificate for data plane access to the application:
$ vespa auth cert app
It is a good idea to take note of the path to the .pem files written here.
$ vespa deploy --wait 900 ./app
Feed some documents, this will also chunk and embed so it takes about 3 minutes:
$ vespa feed dataset/docs.jsonl
Now you can issue queries:
$ vespa query 'query=yc b2b sales'
$ vespa destroy --force
[!TIP] Add "-v" to see the HTTP request this becomes.
Congratulations! You have now created a RAG application that can scale to billions of documents and thousands of queries per second, while delivering state-of-the-art quality.
What do you want to do next?