Use Case - shopping

The e-commerce, or shopping, use case is an example of an e-commerce site complete with sample data and a web front end to browse product data and reviews. To quick start the application, follow the instructions in the README in the sample app.

Shopping sample app screenshot

To browse the application, navigate to localhost:8080/site. This site is implemented through a custom request handler and is meant to be a simple example of creating a front end / middleware that sits in front of the Vespa back end. As such it is fairly independent of Vespa features, and the code is designed to be fairly easy to follow and as non-magical as possible. All the queries against Vespa are sent as HTTP requests, and the JSON results from Vespa are parsed and rendered.

This sample application is built around the Amazon product data set found at https://cseweb.ucsd.edu/~jmcauley/datasets.html. A small sample of this data is included in the sample application, and full data sets are available from the above site. This sample application contains scripts to convert from the data set format to Vespa format: convert_meta.py and convert_reviews.py. See README for example use.

When feeding reviews, there is a custom document processor that intercepts document writes and updates the parent item with the review rating, so the aggregated review rating is kept stored with the item - see ReviewProcessor. This is more an example of a custom document processor than a recommended way to do this, as feeding the reviews more than once will result in inflated values. To do this correctly, one should probably calculate this offline so a re-feed does not cause unexpected results.

Highlighted features

Multiple document types

Vespa models data as documents, which are configured in schemas that defines how documents should be stored, indexed, ranked, and searched. In Vespa, you can have multiple documents types, which can be defined in services.xml how these should be distributed around the content clusters. This application uses three document types that are stored in the same content cluster: item, review and query. Search is done on items, but reviews refer to a single parent item and are rendered on the item page. The query document type is used to power auto-suggest functionality.
Custom document processor

In Vespa, you can set up custom document processors to perform any type of extra processing during document feeding. One example is to enrich the document with extra information, and another is to precalculate values of fields to avoid unnecessary computation during ranking. This application uses a document processor to intercept reviews and update the parent item's review rating.
Custom searcher processor

In Vespa, you can set up custom searchers to perform any type of extra processing during querying. In the sample app there is a single custom searcher which builds the query for auto-suggestions, using a combination of fuzzy matching and prefix search.
Custom handlers

With Vespa, you can set up general request handlers to handle any type of request. This example site is implemented with a single such request handler, SiteHandler which is set up in services.xml to be bound to /site. Note that this handler is for example purposes and is designed to be independent of Vespa. Most applications would serve this through a dedicated setup.
Custom configuration

When creating custom components in Vespa, for instance document processors, searchers or handlers, one can use custom configuration to inject config parameters into the components. This involves defining a config definition (a .def file), which creates a config class. You can instantiate this class with data in services.xml and the resulting object is dependency injected to the component during construction. This application uses custom config to set up the Vespa host details for the handler.
Partial update

With Vespa, you can make changes to an existing document without submitting the full document. Examples are setting the value of a single field, adding elements to an array, or incrementing the value of a field without knowing the field value beforehand. This application contains an example of a partial update, in the voting of whether a review is helpful or not. The SiteHandler receives the request and the ReviewVote class sends a partial update to increment the up- or downvotes field.
Search using YQL

In Vespa, you search for documents using YQL. In this application, the classes responsible for retrieving data from Vespa (in the data package beneath the SiteHandler) set up the YQL queries which are used to query Vespa over HTTP.
Grouping

Grouping is used to group various fields of query results together. For this application, many of the queries to Vespa include grouping requests. The home page uses grouping to dynamically extract the first 3 levels of categories from the stored items. The search page groups results matching the query into categories, brands, item rating and price ranges. The order which the groups are rendered are determined by both counting and the relevance of the hits. This enables query-contextualized navigation.
Rank profiles

Rank profiles are profiles containing instructions on how to score documents for a given query. The most important part of rank profiles are the ranking expressions. The schemas for the item and review document types contain different rank profiles to sort or score the data. The item ranking is using a hybrid combination of keyword and vector matching.
Native embedders

Native embedders are used to map the textual query and document representations into dense high dimensional vectors which are used for semantic search. The application uses an open-source embedding model and inference is performed using stateless model evaluation, both during document and query processing.
Vector search

The default retrieval uses approximate nearest neighbor search in combination with traditional lexical matching. Both the keyword and vector matching is constrained by the filters such as brand, price or category.
Ranking functions

Ranking functions are contained in rank profiles and can be referenced as part of any ranking expression from either first-phase, second-phase, global-phase or other functions.