• [+] expand all

Ranking With nativeRank

The nativeRank text match score is a reasonably good text rank score which is computed at an acceptable performance by Vespa. It computes a normalized rank score which tries to capture how well query terms matched the set of searched index fields.

The nativeRank feature is computed as a linear combination of three other matching features: nativeFieldMatch, nativeProximity and nativeAttributeMatch, see the nativeRank reference for details. Ranking signals that might be useful, like freshness (the age of the document compared to the time of the query) or any other document or query features, are not a part of the nativeRank calculation. These need to be added to the final ranking function depending on application specifics.

The nativeRank is a pure text match ranking function and should be used in combination with application specific features to produce the final rank score.

Ranking expressions and nativeRank

Vespa allows the final rank score number to be calculated by a configured mathematical expression called ranking expression. Ranking expressions looks like straightforward mathematical expressions. They support the usual mathematical operators and functions, allowing the user to configure any rank function, including usage of the nativeRank text matching feature.

The primitive values which are combined by ranking expressions are called rank features. The rank features are numbers which say something about the query, the document or how well the query matched this particular document. The query features can be sent as query parameters or set in searchers. The document features are the attributes (indexing:attribute) specified in the schema, and the match features are calculated by Vespa from the index during matching. The nativeRank feature is one example of a built-in match feature, which only concerns the query/document matching.

The final rank score should also utilize other signals that are application specific, for instance document age (freshness), quality of the document (number of in-links in a web context, maybe ratings and reviews in social applications) and user personalization (user age, gender etc.). See the rank feature list for details about built-in rank-features in Vespa.

Using nativeRank

In this section we describe a blog search application that uses nativeRank as the core text matching rank feature, in combination with other signals that could be important for a blog search type of application:

schema blog {
  document blog {
    field title type string {
      indexing: summary | index
    }
    field body type string {
      indexing: summary | index
    }
    #The quality of the source in the range 0 - 1.0
    field sourcequality type float {
      indexing: summary | attribute
    }
    #seconds since epoch
    field timestamp type long {
      indexing: summary | attribute
    }
    field url type uri {
      indexing: summary
    }
  }
  fieldset default {
    fields: title, body
  }
}

In addition to the core text match feature (nativeRank), we have a pre-calculated document feature which indicates the quality of the document represented by the field sourcequality of type float. The sourcequality field has the attribute property which is required to refer that field in a ranking expression: attribute(name). The sourcequality score could be calculated from a web map, or any other source and is outside the scope of this document.

We also know when the documented was published (timestamp) and this document attribute can be used to calculate the age of the document. To summarize, we have three main rank signals that we would like our blog ranking function to consist of:

  • How well the query match the document text, where we use the nativeRank feature score.
  • How fresh the document is, where we use the built-in age(name) feature to built our own feature score.
  • The quality of the document, calculated outside of Vespa and referenced in a ranking expression by attribute(name).

Tuning nativeRank

We tune the index field weight in the rank-profile, this is done by the field weight configuration parameter. We claim that a hit in the title is more relevant than a hit in the body, so we have configured a weight of 200 for the title, and 100 (default) for the body. There are several other tuning parameters of the nativeRank feature, like:

  • The weight of the 3 nativeRank components that nativeRank consist of: nativeFieldMatch, nativeProximity and nativeAttributeMatch.
  • The shape of the boosting tables for core statistics like term frequency and proximity between terms in the field.
  • Per field configuration supported, can have different term frequency boosting (or any other of the core statistics) for text with different characteristics, e.g. title compared to body.
  • Per field rank-types like identity and about which are pre-configured boosting tables for different type of text fields.

See the comprehensive list of all the configuration properties of nativeRank.

Designing our own blog freshness ranking function

Vespa has several built in rank-features that we can use directly, or we can design our own as well if the built-in features doesn't meet our requirements. The built in freshness(name) rank-feature is linearly decreasing from 0 age (now) to the configured max age. Ideally we would like to have a different shape for our blog application, we define the following feature which has the characteristic we want:

function freshness() {
    expression: exp(-1 * age(timestamp)/(3600*12))
}

Timestamp resolution is seconds, so we divide by 3600 to go to an hour resolution, and further we divide with 12 to control the slope of the freshness function. Below is a plot of two freshness functions with different slope numbers for comparison:

Blog freshness ranking plot: freshness score

The beauty is that we can control and experiment with the freshness rank score given the document age. We can define any shape over any resolution that we think will fit the exact application requirements. In our case we would like to have a non-linear relationship between the age of the document and the freshness score. We achieve this with an exponential decreasing function (exp(-x)), where the sensitivity of x is higher when the document is really fresh compared to an old blog post (24 hours).

Putting our features together into a ranking expression

We now need to put our three main ranking signals together into one ranking expression. We would like to control the weight of each component at query time, so we can at query time do analysis to figure out if a certain signal should be weighted more than others. We chose to combine our three signals into a normalized weighted sum of the three signals. The shape of each of the three signals might be tuned individually as we have seen with design of our own freshness feature and nativeRank tuning. Below is the final blog rank-profile with all relevant settings (properties) and ranking expressions:

rank-profile blog inherits default {
    weight title: 200
    weight body: 100
    rank-type body: about
    rank-properties {
      nativeFieldMatch.occurrenceCountTable.title: "linear(0,8000)" 
    }

    # our freshness rank feature
    function freshness() {
      expression: exp(-1 * age(timestamp)/(3600*12))
    }

    # our quality rank feature
    function quality() {
      expression: attribute(sourcequality)
    }

    # normalization factor for the weighted sum
    function normalization() {
      expression: query(textMatchWeight) + query(qualityWeight) + query(deservesFreshness)
    }

    # ranking function that runs over all matched documents, determined by the boolean query logic
    first-phase {
      expression: (query(textMatchWeight) * (nativeRank(title,body) + query(qualityWeight) * quality +  query(deservesFreshness) * freshness))/normalization
    }

    summary-features: nativeRank(title,body) age(timestamp) freshness quality
  }
}

We can override the weight of each signal at query time with the query api, passing down the weights:

/search/?query=vespa+ranking&datetime=now&ranking.profile=blog&input.query(textMatchWeight)=0.1&input.query(deservesFreshness)=0.85

It is also possible to override the user-defined rank-features in a custom searcher plugin, note that we also use the datetime parameter to be able to calculate the age of the document.

The summary-features allows us to have access to the individual ranking signals along with the hit's summary fields.