Ranking with nativeRank

The nativeRank text match score is a reasonably good text rank score which is computed at an acceptable performance by Vespa. It computes a normalized rank score which tries to capture how well query terms matched the set of searched index fields.

The nativeRank feature is computed as a linear combination of three other matching features: nativeFieldMatch, nativeProximity and nativeAttributeMatch, see the nativeRank reference for details. Ranking signals that might be useful like freshness (the age of the document compared to the time of the query) or any other document or query features is NOT a part of the nativeRank calculation and needs to be added to the final ranking function depending on the vertical specifics. The nativeRank is a pure text match ranking function and should be used in combination with other vertical specific features to produce the final optimal rank score.

Ranking expressions and nativeRank

Vespa allows the final rank score number to be calculated by a configured mathematical expression called ranking expression. Ranking expressions looks like straightforward mathematical expressions and supports the usual mathematical operators and functions allowing the user to configure any rank function, including usage of the nativeRank text matching feature.

The primitive values which are combined by ranking expressions are called rank features. The rank features are numbers which say something about the query, the document or how well the query matched this particular document. The query features can be sent as http query parameters or set in searchers, the document features are the attributes (indexing:attribute) specified in the search definition, and the match features are calculated by Vespa from the index during matching. The nativeRank feature is one example of a built in match feature, which only concerns the query/document matching. The final rank score should also utilize other signals that are vertical specific, for instance document age (freshness), quality of the document (number of in-links in a web context, maybe ratings and reviews in social applications) and user personalization (user age, gender etc). See the rank feature list for details about built-in rank-features in Vespa.

Using nativeRank

In this section we describe a blog search application that uses nativeRank as the core text matching rank feature, in combination with other signals that could be important for a blog search type of application:

search blog {
  document blog {
    field title type string {
      indexing: summary | index
    }
    field body type string {
      indexing: summary | index
    }
    #The quality of the source in the range 0 - 1.0
    field sourcequality type float {
      indexing: summary | attribute
    }
    #seconds since epoch
    field timestamp type long {
      indexing: summary | attribute
    }
    field url type uri {
      indexing: summary
    }
  }
  fieldset default {
    fields: title, body
  }
}
In addition to the core text match feature (nativeRank) we have a pre-calculated document feature which indicates the quality of the document represented by the field sourcequality of type float. The sourcequality field has the attribute property which is required to refer that field in a ranking expression: attribute(name). The sourcequality score could be calculated from webmap, or any other source and is outside the scope of this document. We also know when the documented was published (timestamp) and this document attribute can be used to calculate the age of the document. To summarize we have three main rank signals that we would like our blog ranking function to consist of:
  • How well the query match the document text, where we use the nativeRank feature score
  • How fresh the document is, where we use the built-in age(name) feature to built our own feature score
  • The quality of the document, calculated outside of Vespa and referenced in a ranking expression by attribute(name)

Tuning nativeRank

We tune the index field weight in the rank-profile, this is done by the field weight configuration parameter. We claim that a hit in the title is more relevant then a hit in the body so we have configured a weight of 200 for the title, and 100 (default) for the body. There are several other tuning parameters of the nativeRank feature, like:

  • The weight of the 3 nativeRank components that nativeRank consist of: nativeFieldMatch, nativeProximity and nativeAttributeMatch
  • The shape the boosting tables for core statistics like term frequency and proximity between terms in the field.
  • Per field configuration supported, can have different term frequency boosting (or any other of the core statistics) for text with different characteristics, e.g title compared to body
See the comprehensive list of all the configuration properties of nativeRank.

Designing our own blog freshness ranking function

Vespa has several built in rank-features that we can use directly or we can design our own as well if the built in features doesn't meet our requirements. The built in freshness(name) rank-feature is linearly decreasing from 0 age (now) to the configured max age. Ideally we would like to have a different shape for our blog application, we define the following feature which has the characteristic we want:

macro freshness() {
    expression: exp(-1 * age(timestamp)/(3600*12))
}
Timestamp resolution is seconds, so we divide by 3600 to go to an hour resolution and further we divide with 12 to control the slope of the freshness function. Below is a plot of two freshness functions with different slope numbers for comparison:

The beauty is that we can control and experiment with the freshness rank score given the document age. We can define any shape over any resolution that we think will fit the exact vertical requirements. In our case we would like to have a none-linear relationship between the age of the document and the freshness score. We achieve this with a exponential decreasing function (exp(-x)), where the sensitivity of x is higher when the document is really fresh compared to a old blog post (24 hours).

Putting our features together into a ranking expression

We now need to put our three main ranking signals together into one ranking expression. We would like to control the weight of each component at query time so we can at query time do analysis to figure out if a certain signal should be weighted more then others. We chose to combine our three signals into a normalized weighted sum of the three signals. The shape of each of the three signals might be tuned individually as we have seen with design of our own freshness feature and nativeRank tuning. Below is the final blog rank-profile with all relevant settings (properties) and ranking expressions.

rank-profile blog inherits default {
    weight title: 200
    weight body: 100
    rank-properties {
      $textMatchWeight: 0.4 #pre-configured weights, can be overridden at query time
      $qualityWeight: 0.3
      $deservesFreshness: 0.3
      nativeFieldMatch.occurrenceCountTable.title: "linear(0,1)" #Example of nativeRank tuning, override the occurrence boost shape to be flat
    }

    #our freshness rank feature
    macro freshness() {
      expression: exp(-1 * age(timestamp)/(3600*12))
    }

    #our quality rank feature
    macro quality() {
      expression: attribute(sourcequality)

    #normalization factor for the weighted sum
    macro normalization() {
      expression: $textMatchWeight + $qualityWeight + $deservesFreshness
    }

    #ranking function that runs over all matched documents, determined by the boolean query logic
    first-phase {
      expression: (query(textMatchWeight) * (nativeRank(title,body) + query(qualityWeight) * quality +  query(deservesFreshness) * freshness))/normalization
    }

    summary-features: nativeRank(title,body) age(date) rankingExpression(freshness@) rankingExpression(quality@)
  }
}
We can override the weight of each signal at query time with the search api, passing down the weights to the search core by:
/search/?query=vespa+ranking&datetime=now&ranking.profile=blog&ranking.features.query(textMatchWeight)=0.1&ranking.features.query(deservesFreshness)=0.85
It is also possible to override these user defined rank-features in a custom searcher plugin, note that we also use the datetime parameter to be able to calculate the age of the document.

The summary-features allows us to have access to the individual ranking signals along with the hit's summary fields.

An alternative approach with tiering

After some research we figured that we would like to have a tiered ranking function, where the freshness signal is only applied to documents that are from trustworthy blog sites. A suggested approach is given below:

rank-profile blog inherits default {
    weight title: 200
    weight body: 100
    rank-properties {
      $textMatchWeight: 0.4 #pre-configured weights, can be overridden at query time
      $qualityWeight: 0.3
      $deservesFreshness: 0.3
      $qualityLimit: 0.4
      nativeFieldMatch.occurrenceCountTable.title: "linear(0,1)" #Example of nativeRank tuning, override the occurrence boost shape to be flat
    }

    #our freshness rank feature
    macro freshness() {
      expression: exp(-1 * age(timestamp)/(3600*12))
    }

    #our quality rank feature
    macro quality() {
      expression: attribute(sourcequality)

    #normalization factor for the weighted sum
    macro normalization() {
      expression: $textMatchWeight + $qualityWeight + $deservesFreshness
    }

    macro normalrank() {
      expression: (nativeRank(title,body) + query(qualityWeight) * quality
    }

    #ranking function that runs over all matched documents, determined by the boolean query logic
    first-phase {
      expression: if(quality < query(qualityLimit), normalrank/normalization, (normalrank + query(deservesFreshness) * freshness))/normalization)
    }

    summary-features: nativeRank(title,body) age(date) rankingExpression(freshness@) rankingExpression(quality@)
  }
}
Here we use the conditional if operator to implement a tiered ranking function, where only documents from quality sources are given freshness boost. The quality limit is passed down with the user query, so it can be subject to tuning and changes without re-indexing or re-configuration.