Search Coverage Degradation - Graceful degradation in Vespa

Ideally one want to search all data indexed in a Vespa cluster within the specified timeout, but that might not be possible due to several different reasons:

  • The system might be overloaded due to capacity constraints, and queries does does not complete within the timeout, as they are sitting in the queue (Little's Law)
  • A complex query might take longer time to execute than the specified timeout, or the timeout is too low given the complexity of the query and available capacity
This document describes how Vespa could gracefully degrade the result set if the query cannot be completed within the timeout specified. Definitions:
  • Coverage: The percentage of documents indexed which were searched by the query. Ideally coverage is 100%
  • Timeout: The total time a query is allowed to run for. Vespa is a distributed system where multiple components are involved in the query execution
  • Soft Timeout: Soft timeout is a Vespa configuration switch which allows coverage to be less than 100%, but bigger than 0% if the query is going to time out. E.g return current hit set when time budget is used

Timeouts

Vespa's default query timeout is 500ms in Vespa 7 (5000ms in Vespa 6), and soft timeout is enabled per default in Vespa 7.

Search API reference documentation:

The timeout specifies the overall timeout of the query execution and can be configured and also overridden by the timeout run time request parameter, or defined in a query profile so that different classes of queries can have a different latency budget/timeout.

The softtimeout setting controls what the content nodes should do in the case where the latency budget has almost been used (timeout times a factor). Return the documents recalled and ranked with the first phase function within the time used, or simply don't produce a result. Without softtimeout enabled the Vespa container will return a 504 timeout without any results, while with enabled it will return the documents matched and ranked up until the timeout was reached with a 200 OK response along with the reason the result set was degraded. The container might respond with a timeout error with http response code 504, even with softtimeout enabled if the timeout is set so low that that the query does not make it to the content nodes, or the container does not have any time left after input and query processing to dispatch the query to the content clusters.

Detection

The default JSON renderer template will always render a coverage element below the root element, which has a degraded element if the query execution was degraded in some way and the coverage field will be less than 100. Example request with a query timeout of 200 ms and ranking.softtimeout.enable=true:

/search/?searchChain=vespa&yql=select * from sources * where foo contains bar;&presentation.format=json&timeout=200ms&ranking.softtimeout.enable=true

{
    "root": {
        "coverage": {
            "coverage": 99,
            "degraded": {
                "adaptive-timeout": false,
                "match-phase": false,
                "non-ideal-state": false,
                "timeout": true
            },
            "documents": 167006201,
            "full": false,
            "nodes": 11,
            "results": 1,
            "resultsFull": 0
        },
        "fields": {
            "totalCount": 16469732
        }
    }
}
The result was delivered in 200 ms but the query was degraded as coverage is less than 100 (not all available documents searched and ranked within the specified timeout of 200ms). In this case 167006201 out of x documents where searched, and 16469732 documents where matched and ranked, using the first-phase ranking expression in the default ranking profile.

The degraded field contains the following fields which explains why the result had coverage less than 100:

  • adaptive-timeout is true if Adaptive coverage has been enabled and one or more nodes fail to produce a result at all within the timeout. This could be caused by nodes with degraded hardware making them slower than others in the cluster
  • match-phase is true if the ranking profile has defined match phase ranking degradation. Match-phase can be used to control which documents gets ranked within the timeout
  • non-ideal-state is true in cases where the system is not in ideal state
  • timeout is true if softtimeout was enabled and not all documents could be matched and& ranked with the first phase ranking function within the timeout
Note that the degraded reasons are not mutually exclusive. In the example, the softtimeout was triggered and only 99% of of the documents where searched before the time budget was used. One could imagine scenarios where 10 out of 11 nodes involved in the query execution were healthy and triggered soft timeout and delivered a result, while the last node was in a bad state (e.g. hw issues) and could not produce a result at all, and that would cause both timeout and adaptive-timeout to be true.

When working on Results in a Searcher, get the coverage information programmatically:

    @Override
    public Result search(Query query, Execution execution) {
        Result result = execution.search(query);
        Coverage  coverage = result.getCoverage(false);
        if (coverage != null && coverage.isDegraded()) {
            logger.warning("Got a degraded result for query " + query + " : " + coverage.getResultPercentage() + "% was searched");
        }
        return result;
    }

Match phase degradation

Refer to the match-phase reference. Match-phase works by specifying an attribute that measures document quality in some way (popularity, click-through rate, pagerank, bid value, price, text quality). In addition a max-hits value is specified that specifies how many hits are "more than enough" for your application. Then an estimate is made after collecting a reasonable amount of hits for the query, and if the estimate is higher than the configured max-hits value, an extra limitation is added to the query, ensuring that only the highest quality documents can become hits.

In effect this limits the documents actually searched to the highest quality documents, a subset of the full corpus, where the size of subset is calculated in such a way that the query is estimated to give max-hits hits. Since some (low-quality) hits will already have been collected to do the estimation, the actual number of hits returned will usually be higher than max-hits. But since the distribution of documents isn't perfectly smooth, you risk sometimes getting less than the configured max-hits hits back.

Note that limiting hits in the match-phase also affects aggregation, grouping, and total-hit-count since it actually limits so the query gets less hits. Also note that it doesn't really make sense to use this feature together with a WAND operator that also limit your hits, since they both operate in the same manner and you would get interference between them that could cause very unpredictable results. The graph shows possible hits versus actual hits in a corpus with 100 000 documents, where max-hits is configured to 10 000. The corpus is a synthetic (slightly randomized) data set, in practice the graph will be less smooth:

There is a searchnode metric per rank-profile named content.proton.documentdb.matching.rank_profile.limited_queries which can be used to see how many of your queries are actually affected by these settings; compare with the corresponding content.proton.documentdb.matching.rank_profile.queries metric to measure the percentage.

Tradeoffs

There are some important things to consider before using match-phase. In a normal search scenario latency is directly proportional to the number of hits the query matches: a query that matches few documents will have low latency and a query that matches many documents will have high latency. Match-phase has the opposite effect. This means that if you have queries that matches few documents, match-phase might make these queries significantly slower, hence it might actually be faster to run the query without the filter.

Example: Lets say you have a corpus with a document attribute named created_time. For all queries you want the newest content surfaced so you enable match-phase on created_time. So far, so good - you get a great latency and always get your top-k hits. The problem might come if you introduce a filter. If you have a filter saying you only want documents from the last day, then match-phase can become sub-optimal and in some cases much worse than not running with match-phase. By design Vespa will evaluate potential matches for a query by the order of their internal documentid. This means it will start evaluating documents in the order they were indexed on the node, and for most use-cases that means the oldest documents first. Without a filter, every document is a potential match, and match-phase will quickly figure out how it can optimize. With the filter, on the other hand, the algorithm need to evaluate almost all the corpus before it reaches potential matches (1 day old corpus), and because of the way the algorithm is implemented, end up with doing a lot of unnecessary work and can have orders of magnitude higher latencies than running the query without the filter.

Another important thing to mention is that the reported total-hits will be different when doing queries with match-phase enabled. This is because match-phase works on an estimated "virtual" corpus, which might have much fewer hits that is actually in the full corpus.

If used correctly match-phase can be a life saver, but as you understand, it is not a straight forward fix-it-all silver bullet. Please test and measure your use of match-phase, and contact the Vespa team if your results are not what you expect.