News search and recommendation tutorial - recommendations

This is the fifth part of the tutorial series for setting up a Vespa application for personalized news recommendations. The parts are:

Getting started
A basic news search application - application packages, feeding, query
News search - sorting, grouping, and ranking
Generating embeddings for users and news articles
News recommendation - partial updates (news embeddings), ANNs, filtering
News recommendation with searchers - custom searchers, doc processors
News recommendation with parent-child - parent-child, tensor ranking
Advanced news recommendation - intermission - training a ranking model
Advanced news recommendation - ML models

In this part, we'll start transforming our application from news search to recommendation using the embeddings we created in the previous part. So, we'll start by modifying our application, so we can feed the embeddings and start using them for searching.

For reference, the final state of this tutorial can be found in the app-5-recommendation sub-directory of the news sample application.

Indexing embeddings

First, we need to modify the news.sd search definition to include a field to hold the embedding and a recommendation rank profile:

schema news {
    document news {
        field news_id type string {
            indexing: summary | attribute
            attribute: fast-search
        }
        field category type string {
            indexing: summary | attribute
        }
        field subcategory type string {
            indexing: summary | attribute
        }
        field title type string {
            indexing: index | summary
            index: enable-bm25
        }
        field abstract type string {
            indexing: index | summary
            index: enable-bm25
        }
        field body type string {
            indexing: index | summary
            index: enable-bm25
        }
        field url type string {
            indexing: index | summary
        }
        field date type int {
            indexing: summary | attribute
        }
        field clicks type int {
            indexing: summary | attribute
        }
        field impressions type int {
            indexing: summary | attribute
        }
        field embedding type tensor<float>(d0[50]) {
            indexing: attribute 
            attribute {
                distance-metric: dotproduct
            }
        }
    }

    fieldset default {
        fields: title, abstract, body
    }

    rank-profile popularity inherits default {
        function popularity() {
            expression: if (attribute(impressions) > 0, attribute(clicks) / attribute(impressions), 0)
        }
        first-phase {
            expression: nativeRank(title, abstract) + 10 * popularity
        }
    }

    rank-profile recommendation inherits default {
        first-phase {
          expression: closeness(field, embedding)
        }
    }
}

The embedding field is a tensor field. Tensors in Vespa are flexible multi-dimensional data structures, and, as first-class citizens, can be used in queries, document fields, and constants in ranking. Tensors can be either dense or sparse or both, and can contain any number of dimensions. See the tensor user guide for more information.

Here we have defined a dense tensor with a single dimension (d0 - dimension 0), which represents a vector. The distance metric is "dotproduct" as we would like to use this field for nearest-neighbor search where we search for the maximal dotproduct.

This is seen in the recommendation rank profile. Here, we've added a ranking expression using the closeness ranking feature, which calculates the dot product and uses that to rank the news articles. This depends on using the nearestNeighbor search operator, which we'll get back to below when searching. But for now, this expects a tensor in the query to be used as the initial search point.

If you take a look at the file generated for the news embeddings, mind/vespa_news_embeddings.json, you'll see several lines with something like this:

{
    "update": "id:news:news::N13390", 
    "fields": {
        "embedding": {
            "assign": { 
                "values": [9.871717,-0.403103,...]
            } 
        }
    }
}

This is a partial update. So, assuming you already have a system up and running from the previous search tutorial, you don't need to feed the entire corpus. With a partial update, you only need to update the necessary fields. So, after training another set of embeddings you can partially feed them again. Please refer to Vespa reads and writes for more information on feeding formats.

We need to add another document type to represent a user. Add this schema in schemas/user.sd:

schema user {
    document user {
        field user_id type string {
            indexing: summary | attribute
            attribute: fast-search
        }
        field embedding type tensor<float>(d0[50]) {
            indexing: summary | attribute
        }
    }
}

This schema is set up so that we can search for a user_id and retrieve the user's embedding vector.

We also need to let Vespa know we want to use this document type, so we modify services.xml and add it under documents in the content section:

<?xml version="1.0" encoding="UTF-8"?>
<services version="1.0">

    <container id="default" version="1.0">
        <search />
        <document-api />
        <nodes>
            <node hostalias="node1" />
        </nodes>
    </container>

    <content id="mind" version="1.0">
        <redundancy>1</redundancy>
        <documents>
            <document type="news" mode="index" />
            <document type="user" mode="index" />
        </documents>
        <nodes>
            <node hostalias="node1" distribution-key="0" />
        </nodes>
    </content>

</services>

$ vespa deploy --wait 300 my-app

$ sleep 20

After redeploying with the updated schemas and services.xml, feed mind/vespa_user_embeddings.json and mind/vespa_news_embeddings.json:

$ vespa feed mind/vespa_user_embeddings.json --target http://localhost:8080
$ vespa feed mind/vespa_news_embeddings.json --target http://localhost:8080

Once the feeding jobs finishes, the index is ready to be used, we can verify that we have 28,603 news documents and 5000 user documents:

$ sleep 20

$ vespa query -v \
  'yql=select * from news where true' \
  'hits=0'

$ vespa query -v \
  'yql=select * from user where true' \
  'hits=0'

Query profiles and query profile types

Before we can test the application, we need to add a query profile type. The recommendation rank profile above requires a tensor to be sent along with the query. For Vespa to bind the correct types, it needs to know the expected type of this query parameter. That is called a query profile type.

Query profiles are named sets of search request parameters that can be set as default, so you don't have to pass them along with the query. We don't use this in this sample application. Still we need to set up a default query profile to set up the types of query parameters we expect to pass.

So, write the following to news/my-app/search/query-profiles/default.xml:

<query-profile id="default" type="root" />

To set up the query profile types, write them to the file search/query-profiles/types/root.xml:

<query-profile-type id="root" inherits="native">
    <field name="ranking.features.query(user_embedding)" type="tensor&lt;float&gt;(d0[50])" />
</query-profile-type>

This configures Vespa to expect a float tensor with dimension d0[50] when the query parameter ranking.features.query(user_embedding) is passed. We'll see how this works together with the nearestNeighbor search operator below.

Important: Setting up this query profile type is required when sending a tensor as a query parameter. A common pitfall is to forget the default query profile, but that is required to successfully set up the query profile type.

Deploy the updates to query profiles:

$ vespa deploy --wait 300 my-app

Testing the application

We can now query Vespa using embeddings. First, let's find the user U33527:

$ vespa query -v \
  'yql=select user_id, embedding from user where user_id contains "U33527"' \
  'hits=1'

This returns the document containing the user's embedding:

{
    "root": {
        ...
        "children": [
            {
                "id": "index:mind/0/ce7cc40b398f32626fcff97a",
                "relevance": 0.0017429193899782135,
                "source": "mind",
                "fields": {
                    "user_id": "U33527",
                    "embedding": {
                        "type": "tensor<float>(d0[50])",
                        "values": [
                            0.0,
                            0.06090399995446205,
                            0.15839800238609314,
                            ...
                        ]
                    }
                }
            }
        ]
    }
}

Now we can use this vector to query the news articles. You can either write this query by hand, but we've added a convenience script user_search.py which queries Vespa:

$ ./src/python/user_search.py U33527 10

This script first retrieves the user embedding using an HTTP GET query to Vespa. It then parses the tensor containing the embedding vector. Finally, it issues a nearestNeighbor search using a POST (however a GET would work just as well). Please see the nearest-neighbor operator for more on the syntax for nearest-neighbor searches. The nearestNeighbor search looks like:

{
    "hits": 10,
    "yql": "select * from sources news where (nearestNeighbor(embedding, user_embedding))",
    "ranking.features.query(user_embedding)": "{ ... }" ,
    "ranking.profile": "recommendation"
}

Here, you can see the nearestNeighbor search operator being set up so that the query parameter user_embedding will be searched against the embedding document field. The tensor for the user_embedding is in the ranking.features.query(user_embedding) parameter. Recall from above that we set a query profile type for this exact query parameter, so Vespa knows what to expect here.

When Vespa receives this query, it scans linearly through all documents (28603 if you are using the MIND DEMO dataset), and scores them using the recommendation rank profile we set up above. Recall that we ask Vespa to convert the problem from maximum inner product to a nearest distance problem by using the dotproduct distance metric; in this case distance ranking feature just outputs the negative dotproduct.

With a distance search, we want to find the smallest distances. However, Vespa sorts the final results by decreasing rank score. To get the expected rank order, Vespa provides the closeness feature which in this case is just the dotproduct directly.

Let's test that this works as intended, using evaluate.py:

$ ./src/python/evaluate.py mind 1000

This reads both the training and validation set impressions, queries Vespa for 1000 randomly drawn impressions, and calculates the same metrics we saw during training. The result is something like:

Train: {'auc': 0.8774, 'mrr': 0.5115, 'ndcg@5': 0.5842, 'ndcg@10': 0.6345}
Valid: {'auc': 0.6308, 'mrr': 0.2935, 'ndcg@5': 0.3203, 'ndcg@10': 0.3789}

This is in line with the results from the training. So, the conversion from inner product space to euclidean space works as intended. The resulting rank scores are different, but the transformation evidently retains the same ordering.

Approximate Nearest Neighbor Search

So far, we've been using exact nearest-neighbor search. This is a linear scan through all documents. For the MIND demo dataset we've been using, this isn't a problem as it only contains roughly 28000 documents, and Vespa only uses a few milliseconds to scan through these. However, as the index grows, the time (and computational cost) becomes significant.

There are no exact methods for finding the nearest-neighbors efficiently. So we trade accuracy for efficiency in what is called approximate nearest-neighbors (ANN). Vespa provides a unique implementation of ANNs that uses the HNSW (hierarchical navigable small world) algorithm, while still being compatible with other facets of the search such as filtering. We'll get back to this in the next section.

If you recall, Vespa returned something like the following when searching for single users above (with targetHits equals to 10):

  "root": {
    "id": "toplevel",
    "relevance": 1.0,
    "fields": {
      "totalCount": 95
    },
    "coverage": {
      "coverage": 100,
      "documents": 28603,
      "full": true,
      "nodes": 1,
      "results": 1,
      "resultsFull": 1
    }

Here, coverage shows that Vespa did scan through all 28603 documents. The interesting piece here is the totalCount. This number is the number of times a document has been put in the top 10 results during this linear scan.

Let's switch to using approximate nearest-neighbors by adding index to the embedding field in news.sd:

schema news {
    document news {
        field news_id type string {
            indexing: summary | attribute
            attribute: fast-search
        }
        field category type string {
            indexing: summary | attribute
        }
        field subcategory type string {
            indexing: summary | attribute
        }
        field title type string {
            indexing: index | summary
            index: enable-bm25
        }
        field abstract type string {
            indexing: index | summary
            index: enable-bm25
        }
        field body type string {
            indexing: index | summary
            index: enable-bm25
        }
        field url type string {
            indexing: index | summary
        }
        field date type int {
            indexing: summary | attribute
            attribute: fast-search
        }
        field clicks type int {
            indexing: summary | attribute
        }
        field impressions type int {
            indexing: summary | attribute
        }
        field embedding type tensor<float>(d0[50]) {
            indexing: attribute | index
            attribute {
                distance-metric: euclidean
            }
        }
    }

    fieldset default {
        fields: title, abstract, body
    }

    rank-profile popularity inherits default {
        function popularity() {
            expression: if (attribute(impressions) > 0, attribute(clicks) / attribute(impressions), 0)
        }
        first-phase {
            expression: nativeRank(title, abstract) + 10 * popularity
        }
    }

    rank-profile recommendation inherits default {
        first-phase {
          expression: closeness(field, embedding)
        }
    }
}

If you make this change and deploy it, you will get prompted by Vespa that a restart is required so that the index can be built:

$ vespa deploy --wait 300 my-app

Introducing the HNSW index requires a content node restart, in this case we restart all services:

$ docker exec vespa /usr/bin/sh -c \
  '/opt/vespa/bin/vespa-stop-services && /opt/vespa/bin/vespa-start-services'

$ vespa status --wait 300

After doing this and waiting a bit for Vespa to start, we can query Vespa again:

$ ./src/python/user_search.py U33527 10

{
  "root": {
    "id": "toplevel",
    "relevance": 1.0,
    "fields": {
      "totalCount": 10
    },
    "coverage": {
      "coverage": 100,
      "documents": 28603,
      "full": true,
      "nodes": 1,
      "results": 1,
      "resultsFull": 1
    }

Here, coverage is still 100%, but the totalCount has been reduced to 10 - the same number of hits we requested. By adding the index to this field, Vespa built a HNSW graph structure for the values in this field. When used in an approximate nearest-neighbor search, this graph is queried and only the closest points as determined by this graph is added to the list.

The particularly observant might have noticed that the result set has changed. Indeed, the third result when using exact nearest neighbor search was news article N438. This was omitted from the approximate search. As mentioned, we trade accuracy for efficiency when using approximate nearest-neighbor search.

It should also be mentioned that searching through this graph comes with a cost. In our case, since we only have a relatively small amount of documents, there isn't that much gain in efficiency. However, as the number of documents grows, this starts to pay off. See Approximate nearest neighbor search in Vespa for more of a discussion around this. See also Billion-scale vector search with Vespa - part one and Billion-scale vector search with Vespa - part two which cover the many trade-offs related to approximate nearest neighbor search.

The implementation of ANN using HNSW in Vespa has some nice features. Notice that we did not have to re-feed the corpus to enable ANN. Many other approaches for ANNs require building an index offline in a batch job. HNSW allows for incrementally building this index, which is fully exploited in Vespa.

A unique feature of Vespa is that the implementation allows for filtering during graph traversal, which we'll look at next.

Filtering

A common case when using approximate nearest-neighbors is to combine with some additional query filters. For instance, for retail search, one can imagine finding relevant products for a user. In this case, we should not recommend products that are out of stock. So an additional query filter would be to ensure that in_stock is true.

Now, most implementations of ANNs come in the form of a library, so they are not integrated with the search at large. The natural approach is to first perform the ANN, the post-filter the results. Unfortunately, this often leads to sub-optimal results as relevant documents might not have been recalled. See Using approximate nearest-neighbor search in real world applications for more of a discussion around this, and Query Time Constrained Approximate Nearest Neighbor Search for a better understanding of pre- and post-filtering tradeoffs.

In our case, let's assume we want to retrieve 10 sports articles for a user. It turns out we need to retrieve at least 278 news articles from the search to get to 10 sports articles for this user:

$ ./src/python/user_search.py U63195  10 | grep "category\": \"sports\"" | wc -l
$ ./src/python/user_search.py U63195 278 | grep "category\": \"sports\"" | wc -l

On the other hand, if we add a filter specifically:

$ ./src/python/user_search.py U63195 10 "AND category contains 'sports'" | \
  grep "category\": \"sports" | wc -l

Here, we only specify 10 hits and exactly 10 hits of sports category are returned. Vespa still searches through the graph starting from the query point, however the search does not stop when we have 10 hits. In effect, the graph search widens until 10 results fulfilling the filters are found.

As a note, strict filters that filter away a large part of the corpus would entail that many candidates in the graph are skipped while searching for the results that fulfill the filters. This can take an exponential amount of time. For this case, Vespa falls back to a linear, brute-force search over the few documents which matches the filter for efficiency.

Conclusion

We now have a basic recommendation system up and running. We can query for a user, retrieve the embedding vector and use that for querying the news articles. Right now, this means two calls to Vespa. In the next part of the tutorial, we will introduce searchers, which allows for custom logic during query processing inside the Vespa cluster, requiring only one pass from the client to Vespa.