• [+] expand all

Rank Feature Reference

This is the list of the rank features in Vespa. These features are available during document ranking for combination into a complete rank score by a ranking expression. The features are a combination of coarse grained features suitable for handwritten expressions, and finer grained features suitable for machine learning.

See also the overview of the ranking framework, and rank feature configuration parameters. Notes:

  • Types: All rank feature values are floats. Ints are converted to exact whole value floats. String values are converted to exact whole value floats using a hash function. String literals in ranking expressions are converted using the same hash function, to enable equality tests on string values.
  • Features which are normalized are between 0 and 1, where 0 is always the minimum and 1 the maximum. Normalized features should normally be preferred because they are more easily combined by ranking expressions into a complete normalized score.
  • A query may override any rank feature value by submitting that value as a feature with the query.
  • Some features have parameters. It is always allowed to quote parameters with ". Nested quotes are not allowed and must be escaped using \. Parameters that can be parsed as feature names may be left unquoted. Examples: foo(bar(baz(5.5))), foo("bar(\"baz(\\\"5.5\\\")\")"), foo("need quote")

Feature list

Query features

Feature nameDefaultDescription
query(value) 0

An application specific feature submitted with the query, see using the query feature.

term(n).significance 0

A normalized number (between 0.0 and 1.0) describing the significance of the term; used as a multiplier or weighting factor by many other text matching rank features.

This should ideally be set by a searcher in the container for global correctness as each node will estimate the significance values from the local corpus. Use the Java API for significance or YQL annotation for significance.

As a fallback, a significance based on Robertson-Sparck-Jones term weighting is used; it is logarithmic from 1.0 for rare terms down to 0.5 for common terms (those occurring in every document seen).

Note that "rare" is defined as a frequency of 0.000001 or less. This is the term document frequency (how many documents contain the term out of all documents that can be observed), so you cannot get 1.0 as the fallback until you actually have a large number of documents (minimum 1 million) in the same search process.

See numTerms config.

term(n).weight 100

The importance of matching this query term given in the query

term(n).connectedness 0.1

The normalized strength with which this term is connected to the previous term in the query. Must be assigned to query terms in a searcher using the Java API for connectivity or YQL annotation for connectivity.

queryTermCount 0

The total number of terms in this query, including both user and synthetic terms in all fields.

Document features

Feature nameDefaultDescription
fieldLength(name) 1000000

The number of terms in this field if one or more query term matched the field, 1000000 if no query term matched the field.

attribute(name) null

The value of a tensor or single value numeric attribute or null/NaN if not set. Use isNan() to check if value is not defined. Using undefined values in ranking expressions leads to undefined behavior.

attribute(name,n) 0

The value at index n (base 0) of a numeric array attribute with the given name. Note that the index number must be explicit, it cannot be the output of an expression function. The order of the items in an array attribute is the same as the order they have in the input feed. If items are added using partial updates they are added to the end of the existing items list.

attribute(name,key).weight 0

The weight found at a given key in a weighted set attribute

attribute(name,key).contains 0

1 if the given key is present in a weighted set attribute, 0 otherwise

attribute(name).count 0

The number of elements in the attribute with the given name.

tensorFromWeightedSet(source,dimension) n/a

Creates a tensor<double> with one mapped dimension from the given integer or string weighted set attribute. The attribute is specified as the full feature name, attribute(name). The dimension parameter is optional. If omitted the dimension name will be the attribute name.

Example: Given the weighted set:

{key1:0, key2:1, key3:2.5}

tensorFromWeightedSet(attribute(myField), dim) produces:

tensor<double>(dim{}):{ {dim:key1}:0.0, {dim:key2}:1.0, {dim:key3}:2.5} }
tensorFromLabels(attribute,dimension) n/a

Creates a tensor<double> with one mapped dimension from the given single value or array attribute. The value(s) must be integers or strings. The attribute is specified as the full feature name, attribute(name). The dimension parameter is optional. If omitted the dimension name will be the attribute name.

Example: Given an attribute field myField containing the array value:

[v1, v2, v3]

tensorFromLabels(attribute(myField), dim) produces:

tensor<double>(dim{}):{ {dim:v1}:1.0, {dim:v2}:1.0, {dim:v3}:1.0} }

See tensorFromWeightedSet for performance notes.

Field match features - normalized

fieldMatch features provide a good measure of the degree to which a query matches the text of a field, but are expensive to calculate and therefore often only suitable for second-phase ranking expressions. See the string segment match document for details on the algorithm computing this rank-feature set. Note that even using a fine-grained sub features like fieldMatch(name).absoluteOccurrence will have the same complexity and cost as using the general top level fieldMatch(name) feature.
Feature nameDefaultDescription
fieldMatch(name) 0

A normalized measure of the degree to which this query and field matched (default, the long name of this is match). Use this if you do not want to create your own combination function of more fine-grained fieldmatch features.

fieldMatch(name).proximity 0

Normalized proximity - a value which is close to 1 when matched terms are close inside each segment, and close to zero when they are far apart inside segments. Relatively more connected terms influence this value more. This is absoluteProximity/average connectedness for the query terms for this field.

Note that if all the terms are far apart, the proximity will be 1, but the number of segments will be high. Proximity is only concerned with closeness within segments, a total score must also take the number of segments into account.

fieldMatch(name).completeness 0

The normalized total completeness, where field completeness is more important:

queryCompleteness * ( 1 - fieldCompletenessImportance ) + fieldCompletenessImportance * fieldCompleteness

fieldMatch(name).queryCompleteness 0

The normalized ratio of query tokens matched in the field:

matches/query terms searching this field
fieldMatch(name).fieldCompleteness 0

The normalized ratio of query tokens which was matched in the field:

matches/fieldLength
fieldMatch(name).orderness 0

A normalized metric of how well the order of the terms agrees in the chosen segments:

1-outOfOrder/pairs

fieldMatch(name).relatedness 0

A normalized measure of the degree to which different terms are related (occurring in the same segment):

1-(segments-1)/(matches-1)

fieldMatch(name).earliness 0

A normalized measure of how early the first segment occurs in this field.

fieldMatch(name).longestSequenceRatio 0

A normalized metric of the relative size of the longest sequence:

longestSequence/matches

fieldMatch(name).segmentProximity 0

A normalized metric of the closeness (inverse of spread) of segments in the field:

1-segmentDistance/fieldLength

fieldMatch(name).unweightedProximity 0

The normalized proximity of the matched terms, not taking term connectedness into account. This number is close to 1 if all the matched terms are following each other in sequence, and close to 0 if they are far from each other or out of order.

fieldMatch(name).absoluteProximity 0

Returns the normalized proximity of the matched terms, weighted by the connectedness of the query terms. This number is 0.1 if all the matched terms are and have default or lower connectedness, close to 1 if they are following in sequence and have a high connectedness, and close to 0 if they are far from each other in the segments or out of order.

fieldMatch(name).occurrence 0

Returns a normalized measure of the number of occurrences of the terms of the query. This is 1 if there are many occurrences of the query terms in absolute terms, or relative to the total content of the field, and 0 if there are none.

This is suitable for occurrence in fields containing regular text.

fieldMatch(name).absoluteOccurrence 0

Returns a normalized measure of the number of occurrence of the terms of the query:

$$\frac{\sum_{\text{all query terms}}(min(\text{number of occurrences of the term},maxOccurrences))}{(\text{query term count} × 100)}$$

This is 1 if there are many occurrences of the query terms, and 0 if there are none.

This number is not relative to the field length, so it is suitable for uses of occurrence to denote relative importance between matched terms (i.e. fields containing keywords, not normal text).

fieldMatch(name).weightedOccurrence 0

Returns a normalized measure of the number of occurrence of the terms of the query, weighted by term weight. This number is close to 1 if there are many occurrences of highly weighted query terms, in absolute terms, or relative to the total content of the field, and 0 if there are none.

fieldMatch(name).weightedAbsoluteOccurrence 0

Returns a normalized measure of the number of occurrence of the terms of the query, taking weights into account so that occurrences of higher weighted query terms has more impact than lower weighted terms.

This is 1 if there are many occurrences of the highly weighted terms, and 0 if there are none.

This number is not relative to the field length, so it is suitable for uses of occurrence to denote relative importance between matched terms (i.e. fields containing keywords, not normal text).

fieldMatch(name).significantOccurrence 0

Returns a normalized measure of the number of occurrence of the terms of the query in absolute terms, or relative to the total content of the field, weighted by term significance.

This number is 1 if there are many occurrences of the highly significant terms, and 0 if there are none.

Field match features - normalized and relative to the whole query

Feature nameDefaultDescription
fieldMatch(name).weight 0

The normalized weight of this match relative to the whole query: The sum of the weights of all matched terms/the sum of the weights of all query terms. If all the query terms were matched, this is 1. If no terms were matched, or these matches has weight zero this is 0.

As the sum of this number over all the terms of the query is always 1, sums over all fields of normalized rank features for each field multiplied by this number for the same field will produce a normalized number.

Note that this scales with the number of matched query terms in the field. If you want a component which does not, divide by matches.

fieldMatch(name).significance 0

Returns the normalized term significance of the terms of this match relative to the whole query: The sum of the significance of all matched terms/the sum of the significance of all query terms. If all the query terms were matched, this is 1. If no terms were matched, or if the significance of all the matched terms is zero, this number is zero.

This metric has the same properties as weight.

See the term(n).significance feature for how the significance for a single term is calculated.

fieldMatch(name).importance 0

Returns the average of significance and weight. This has the same properties as those metrics.

Field match features - not normalized

Feature nameDefaultDescription
fieldMatch(name).segments 0

The number of field text segments which are needed to match the query as completely as possible

fieldMatch(name).matches 0

The total number of query terms which was matched in this field

fieldMatch(name).degradedMatches 0

The number of degraded query terms which was matched in this field. A degraded term is a term where no occurrence information is available during calculation. The number of degraded matches is less than or equal to the total number of matches.

fieldMatch(name).outOfOrder 0

The total number of out of order token sequences within matched field segments

fieldMatch(name).gaps 0

The total number of position jumps (backward or forward) within field segments

fieldMatch(name).gapLength 0

The summed length of all gaps within segments

fieldMatch(name).longestSequence 0

The size of the longest matched continuous, in-order sequence in the field

fieldMatch(name).head 0

The number of tokens in the field preceding the start of the first matched segment

fieldMatch(name).tail 0

The number of tokens in the field following the end of the last matched segment

fieldMatch(name).segmentDistance 0

The sum of the distance between all segments making up a match to the query, measured as the sum of the number of token positions separating the start of each field adjacent segment.

Query and field similarity

Normalized feature set measuring the approximate similarity between a field and the query. These features are suitable in cases where the query is as large as the field (i.e. is a document) such that we are interested in the similarity between the query and the entire field. They are cheap to compute even if the query is large.
Feature nameDefaultDescription
textSimilarity(name) 0

A weighted sum of the individual similarity measures.

textSimilarity(name).proximity 0

A measure of how close together the query terms appear in the field.

textSimilarity(name).order 0

A measure of the order in which the query terms appear in the field compared to the query.

textSimilarity(name).queryCoverage 0

A measure of how much of the query the field covers when a single term from the field can only cover a single term in the query. Query term weights are used during normalization.

textSimilarity(name).fieldCoverage 0

A measure of how much of the field the query covers when a single term from the query can only cover a single term in the field.

Query term and field match features

Feature nameDefaultDescription
fieldTermMatch(name,n).firstPosition 1000000

The position of the first occurrence of this query term in this index field. numTerms configuration

fieldTermMatch(name,n).occurrences 0

The number of occurrences of this query term in this index field

matchCount(name) 0

Returns number of times any term in the query matches this index/attribute field.

matches(name) 0

Returns 1 if the index/attribute field with the given name is matched by the query.

matches(name,n) 0

Returns 1 if the index/attribute field with the given name is matched by the query term with position n.

termDistance(name,x,y).forward 1000000

The minimum distance between the occurrences of term x and term y in this index field. Term x occurs before term y.

termDistance(name,x,y).forwardTermPosition 1000000

The position of the occurrence of term x in this index field used for the forward distance.

termDistance(name,x,y).reverse 1000000

The minimum distance between the occurrences of term y and term x in this index field. Term y occurs before term x.

termDistance(name,x,y).reverseTermPosition 1000000

The position of the occurrence of term y in this index field used for the reverse distance.

Features for indexed multivalue string fields

Feature nameDefaultDescription
elementCompleteness(name).completeness 0

A weighted combination of fieldCompleteness and queryCompleteness for the element in the field that produces the highest value for this output after the elements weight is factored in. The weighting can be adjusted using elementCompleteness(name).fieldCompletenessImportance.

elementCompleteness(name).fieldCompleteness 0

The field completeness of the best matching element. This is calculated as:

max( (number of query terms matched in the element) / (element size), 1.0).

elementCompleteness(name).queryCompleteness 0

The query completeness of the best matching element. This is calculated as:

(sum of weight for query terms matched in the element) / (sum of weight for query terms searching the field).

elementCompleteness(name).elementWeight 0

The weight of the best matching element, starting from the default - i.e., negative weights will return 0.

elementSimilarity(name) 0

Aggregated similarity between the query and individual field elements. The same sub-scores used by the textSimilarity feature are calculated for each individual element in the field. The final output is calculated as the maximum of the combined element similarity measures (similarity measures are combined the same way as the default output of the textSimilarity feature) multiplied with the element weight which is 1 for arrays, and the supplied weights for indexed weighted sets.

This is a flexible feature; how sub-scores are combined for each element and how element scores are aggregated may be configured. You may also add additional outputs if you want to capture multiple signals from a single field. Use elementSimilarity to customize this feature.

Attribute match features - normalized

Feature nameDefaultDescription
attributeMatch(name) 0

A normalized measure of the degree to which this query and field matched. This is currently the same as completeness. Note that depending on what the attribute is used for, this may or may not be a suitable metric. If the attribute is a weighted set representing counts of items (like tags), normalizedWeight is probably a better metric.

attributeMatch(name).completeness 0

The normalized total completeness, where field completeness is more important:

queryCompleteness * ( 1 - fieldCompletenessImportance + fieldCompletenessImportance * fieldCompleteness )

attributeMatch(name).queryCompleteness 0

The query completeness for this attribute:

matches/the number of query terms searching this attribute

attributeMatch(name).fieldCompleteness 0

The normalized ratio of query tokens which was matched in the field. For arrays: matches/array length For weighted sets: sum of weight of matched terms/sum of weights of entire set. This is relatively expensive to calculate for large weighted sets.

attributeMatch(name).normalizedWeight 0

A number which is close to 1 if the attribute weights of most matches in a weighted set are high (relative to maxWeight), 0 otherwise

attributeMatch(name).normalizedWeightedWeight 0

A number which is close to 1 if the attribute weights of most matches in a weighted set are high (relative to maxWeight), and where highly weighted query terms has more impact, 0 otherwise

closeness(dimension,name) 0

Used with the nearestNeighbor query operator. A number which is close to 1 when a point vector in the document is close to a matching point vector in the query. The document vectors and the query vector must be the same tensor type, with one indexed dimension of size N, representing a point in an N-dimensional space.

  • dimension: Specifies the dimension of name. This must be either the string field or the string label.

    When using field, the name given must be a field with a tensor attribute of appropriate type. Often used when the document type has only one vector field, see example.

    When using label, queries are assumed to contain a nearestNeighbor query item with a label that matches the given name. This is useful when having multiple vector fields, where closeness() then maps to the nearestNeighbor operator with the field configured. Example.

  • name: The value of the field name or label.

The output value is $$ closeness(dimension,name) = \frac{1.0}{1.0 + distance(dimension,name)}$$

When the tensor field stores multiple vectors per document, the minimum distance between the vectors of a document and the query vector is used in the calculation above.

freshness(name) 0

A number which is close to 1 if the timestamp in attribute name is recent compared to the current time compared to maxAge:

max( 1-age(name)/maxAge , 0 )

Scales linearly with age, see freshness plot.

freshness(name).logscale 0

A logarithmic-shaped freshness; also goes from 1 to 0, but looks like freshness plot. The function is based on -log(age(name) + scale) and is calculated as:

$$\frac{log(maxAge + scale) - log(age(name) + scale)}{log(maxAge + scale) - log(scale)}$$

where scale is defined using halfResponse and maxAge:

$$\frac{-halfResponse^2}{2 × halfResponse - maxAge}$$

When age(name) == halfResponse the function output is 0.5.

Attribute match features - normalized and relative to the whole query

Feature nameDefaultDescription
attributeMatch(name).weight 0

This has the same semantics as fieldMatch(name).weight.

attributeMatch(name).significance 0

This has the same semantics as fieldMatch(name).significance.

attributeMatch(name).importance 0

Returns the average of significance and weight. This has the same properties as those metrics.

Attribute match features - not normalized

Feature nameDefaultDescription
attributeMatch(name).matches 0

The number of query terms which was matched in this attribute

attributeMatch(name).totalWeight 0

The sum of the weights of the attribute keys matched in a weighted set attribute

attributeMatch(name).averageWeight 0

totalWeight/matches

attributeMatch(name).maxWeight 0

The maximum weight of the attribute keys matched in a weighted set attribute

closest(name) {}

Used with the nearestNeighbor query operator and a tensor field attribute name storing multiple vectors per document. This feature returns a tensor with one or more mapped dimensions and one point with a value of 1.0, where the label of that point indicates which document vector was closest to the query vector in the nearest neighbor search.

Given a tensor field with type tensor<float>(m{},x[3]) used with the nearestNeighbor operator, an example output of this feature is:

    tensor<float>(m{}):{ 3: 1.0 }

In this example, the document vector with label 3 in the mapped m dimension was closest to the query vector.

closest(name,label) {}

Used with the nearestNeighbor query operator tagged with a label label and a tensor field attribute name storing multiple vectors per document.

See closest(name) for details.

distance(dimension,name) max double value

Used with the nearestNeighbor query operator. A number which is close to 0 when a point vector in the document is close to a matching point vector in the query. The document vectors and the query vector must be the same tensor type, with one indexed dimension of size N, representing a point in an N-dimensional space.

  • dimension: Specifies the dimension of name. This must be either the string field or the string label.

    When using field, the name given must be a field with a tensor attribute of appropriate type. Often used when the document type has only one vector field, see example.

    When using label, queries are assumed to contain a nearestNeighbor query item with a label that matches the given name. This is useful when having multiple vector fields, where distance() then maps to the nearestNeighbor operator with the field configured. Example.

  • name: The value of the field name or label.

The output value depends on the distance metric used. The default is the Euclidean distance between the "n"-dimensional query point "d" and the point "d" in the document tensor field: $$ distance = \sqrt{\sum_{i=1}^n (q_i - d_i)^2} $$

When the tensor field stores multiple vectors per document, the minimum distance between the vectors of a document and the query vector is used in the calculation above.

age(name) 10B

The document age in seconds relative to the unit time value stored in the attribute having this name

Features combining multiple fields and attributes

Feature nameDefaultDescription
match 0

A normalized average of the fieldMatch and attributeMatch scores of all the searched fields and attributes, where the contribution of each field and attribute is weighted by its weight setting.

match.totalWeight 0

The sum of the weight settings of all the field and attributes searched by the query

match.weight.name 100

The (schema) weight setting of a field or attribute

Rank scores

Feature nameDefaultDescription
bm25(field) 0

Calculates the Okapi BM25 ranking function over the given indexed string field. This feature is cheap to compute, about 3-4 times faster than nativeRank, while still providing a good rank score quality wise. This feature is a good candidate for usage in a first phase ranking function when ranking text documents. Note that the field must be enabled to be used with the bm25 feature; set the enable-bm25 flag in the index section of the field definition. See the BM25 Reference for more detailed information.

nativeRank 0

A reasonably good rank score which is computed cheaply by Vespa. This value only is a good candidate first phase ranking function, and is the default used in the default rank profile. The value computed by this function may change between Vespa versions. See the native rank reference for more information.

nativeRank(field,...) 0

Same as nativeRank, but only the given set of fields are used in the calculation.

nativeFieldMatch 0

Captures how well query terms match in index fields. Used by nativeRank. See the native rank reference for more information.

nativeFieldMatch(field,...) 0

Same as nativeFieldMatch, but only the given set of index fields are used in the calculation.

nativeProximity 0

Captures how near matched query terms occur in index fields. Used by nativeRank. See the native rank reference for more information.

nativeProximity(field,...) 0

Same as nativeProximity, but only the given set of index fields are used in the calculation.

nativeAttributeMatch 0

Captures how well query terms match in attribute fields. Used by nativeRank. See the native rank reference for more information.

nativeAttributeMatch(field,...) 0

Same as nativeAttributeMatch, but only the given set of attribute fields are used in the calculation.

nativeDotProduct(field) 0

Calculates the sparse dot product between query term weights and match weights for the given field. Example: A weighted set string field X:

"X": {
    "x": 10,
    "y": 20,
    "z": 30
}

For the query (x!2 OR y!4), the nativeDotProduct(X) feature will have the value 100 (10*2+20*4) for that document.

nativeDotProduct 0

Calculates the sparse dot product between query term weights and match weights as above, but for all term/field combinations.

firstPhase 0

The value of the rank score calculated in the first phase (unavailable in first phase ranking expressions)

secondPhase 0

The value of the rank score calculated in the second phase (unavailable in first phase and second phase ranking expressions)

firstPhaseRank max double value

The rank of the document after first phase within the content node when selecting which documents to rerank in second phase. The best document after first phase has rank 1, the second best 2, etc. The feature returns the default value for documents not selected for second phase ranking and for unsupported cases (streaming search, summary features, first phase expressions). Multiple documents can have the same firstPhaseRank value in multi-node configurations.

Global features

Feature nameDefaultDescription
globalSequence n/a

A global sequence number computed as (1 << 48) - (LocalDocumentId << 16 || distribution-key). This will give a global sequence to documents. This is a cheap way of having stable ordering of documents. Note the large range of this value. Also note that if the system is not stable, e.g. if documents move around due to new nodes coming in, or nodes being removed, it will no longer be stable as documents might be found in a different replica. If you need true global ordering we suggest assigning a unique numeric id to your documents as an attribute field and use the attribute(name) feature.

now n/a

Time at which the query is executed in unix-time (seconds since epoch)

random n/a

A pseudorandom number in the range [0,1> which is drawn once per document during rank evaluation. By default, the current time in microseconds is used as a seed value. Users can specify a seed value by setting random.seed in the rank profile. If you need several independent random numbers the feature can be named like this: random(foo), random(bar).

random.match n/a

A pseudorandom number in the range [0,1> that is stable for a given hit. This means that a hit will always receive the same random score (on a single node). If it is required that the scores be different between different queries, specify a seed value dependent upon the query. By default, the seed value is 1024. Users can specify a seed value by adding the query parameter rankproperty.random.match.seed=<value>. If you need several independent random numbers the feature can be named like this: random(foo).match, random(bar).match.

randomNormal(mean,stddev) 0.0,1.0

Same as random, except the random number is drawn from the Gaussian distribution using the supplied mean and stddev parameters. Can be called without parameters; default values are assumed. Seed is set similarly as random. If you need several independent random numbers with the same parameters, the feature can be named like this: randomNormal(0.0,1.0,foo), randomNormal(0.0,1.0,bar). If the parameters to randomNormal are not the same, you do not need to supply an additional name, e.g. randomNormal(0.0, 0.1) and randomNormal(0.0, 0.5) results in two independent values.

randomNormalStable(mean,stddev) 0.0,1.0

Same as randomNormal, except that the generated number is stable for a given hit, similar to random.match.

constant(name) n/a

Returns the constant tensor value.

Match operator scores

See Raw scores and query item labeling
Feature nameDefaultDescription
rawScore(field) 0

The sum of all raw scores produced by match operators for this field.

itemRawScore(label) 0

The raw score produced by the query item with the given label.

These features are for ranking on the distances between geographical coordinates, i.e. points on the surface of the earth defined by latitude/longitude pairs. See the main documentation for Geo Search.
Feature nameDefaultDescription
closeness(name) 0

A number which is close to 1 if the position in attribute name is close to the query position compared to maxDistance:

max(1-distance(name)/maxDistance , 0)

Scales linearly with distance, see closeness plot.

closeness(name).logscale 0

A logarithmic-shaped closeness; like normal closeness it goes from 1 to 0, but looks like closeness plot. The function is a logarithmic fall-off based on log(distance + scale) and is calculated as:

$$closeness(name).logscale = \frac{log(maxDistance + scale) - log(distance(name) + scale))}{(log(maxDistance + scale) - log(scale))}$$

where scale is defined using halfResponse and maxDistance:

$$scale = \frac{halfResponse^2}{(maxDistance - 2 × halfResponse)}$$

When distance(name) == halfResponse the function output is 0.5; halfResponse should be less than maxDistance/2 since that means adding a certain distance when you are close matters more than adding the same distance when you're already far away.

distance(name) 6400M

The euclidian distance from the query position to the given position attribute in millionths of degrees (about 10 cm). If there are multiple positions in the query, items that actually search in name is preferred. Also: if multiple query items search in name, or name is an array of positions, or both, the closest distance found is returned.

distance(name).km 711648.5

As above, but scaled, so it uses the kilometer as unit of distance, instead of "micro-degrees".

distance(name).index -1

The array index of the closest position found. Useful when name is of array<position> type.

distance(name).latitude 90

The latitude (geographical north-south coordinate) of the closest position found. In range from -90.0 (South Pole) to +90.0 (North Pole). Useful when name is of array<position> type.

distance(name).longitude -180

The latitude (geographical east-west coordinate) of the closest position found. In range from -180.0 (extreme west) to +180.0 (extreme east). Useful when name is of array<position> type.

distanceToPath(name).distance 6400M

The euclidian distance from a path through 2d space given in the query to the given position attribute in millionths of degrees. This is useful e.g. for finding the closest locations to a given road. The query path is set in the rankproperty.distanceToPath(name).path query parameter, using syntax "(x1,y1,x2,y2,..)" also in millionth of degrees, see the distance to path example. The closest point along the path is referred to as the intersection.

distanceToPath(name).traveled 1

The normalized distance along the query path traveled before intersection (0.0 indicates start of path, 0.5 is middle, and 1.0 is end of path).

distanceToPath(name).product 0

The cross-product of the intersected path segment and the intersection-to-document vector. Given that the document was found to lie closest to the path element A->B, the intersected path segment vector is [ B.x - A.x, B.y - A.y ]. Furthermore, given that the intersection of that path element occurred at point I for document location D, the intersection-to-document vector is [ I.x - D.x, I.y - D.y]. This is useful e.g. for finding what side of a path a document exists by looking at the sign of this value.

Utility features

Feature nameDefaultDescription
foreach(dimension, variable, feature, condition, operation) n/a

foreach iterates over a set of feature output values and performs an operation on them. Only the values where the condition evaluates to true are considered for the operation. The result of this operation is returned.

  • dimension: Specifies what to iterate over. This can be:
    • terms: All query term indices, from 0 and up to maxTerms.
    • fields: All index field names.
    • attributes: All attribute field names.
  • variable: The name of the variable 'storing' each of the items you are iterating over.
  • feature: The name of the feature you want to use the output value from. Use the variable as part of the feature name, and for each item you iterate over this variable is replaced with the actual item. Note that the variable replacement is a simple string replace, so you should use a variable name that is not in conflict with the feature name.
  • condition: The condition used on each feature output value to find out if the value should be considered by the operation. The condition can be:
    • >a: Use feature output if greater than number a.
    • <a: Use feature output if less than number a.
    • true: Use all feature output values.
  • operation: The operation you want to perform on the feature output values. This can be:
    • sum: Calculate the sum of the values.
    • product: Calculate the product of the values.
    • average: Calculate the average of the values.
    • max: Find the max of the values.
    • min: Find the min of the values.
    • count: Count the number of values.

Lets say you want to calculate the average score of the fieldMatch feature for all index fields, but only consider the scores larger than 0. Then you can use the following setup of the foreach feature:

foreach(fields,N,fieldMatch(N), ">0", average).

Note that when using the conditions >a and <a the arguments must be quoted.

You can also specify a ranking expression in the foreach feature by using the rankingExpression feature. The rankingExpression feature takes the expression as the first and only parameter and outputs the result of evaluating this expression. Let's say you want to calculate the average score of the squared fieldMatch feature score for all index fields. Then you can use the following setup of the foreach feature:

foreach(fields, N, rankingExpression("fieldMatch(N)*fieldMatch(N)"), true, average)

Note that you must quote the expression passed in to the rankingExpression feature.

dotProduct(name,vector) 0

The sparse dot product of the vector represented by the given weighted set attribute and the vector sent down with the query.

You can also do an ordinary full dotproduct by using arrays instead of weighted sets. This will be a lot faster when you have full vectors in the document with more than 5-10% non-zero values. You are also then not limited to integer weights. All the numeric datatypes can be used with arrays, so you have full floating point support. The 32 bit floating point type yields the fastest execution.

  • name: The name of the weighted set string/integer or array of numeric attribute.
  • vector: The name of the vector sent down with the query.

Each unique string/integer in the weighted set corresponds to a dimension and the belonging weight is the vector component for that dimension. The query vector is set in the rankproperty.dotProduct.vector query parameter, using syntax {d1:c1,d2:c2,…} where d1 and d2 are dimensions matching the strings/integers in the weighted set and c1 and c2 are the vector components (floating point numbers). The number of dimensions in the weighted set and the query vector do not need to be the same. When calculating the dot product we only use the dimensions present in both the weighted set and the query vector.

When using an array the dimensions is a positive integer starting at 0. If the query is sparse all non given dimensions are zero. That also goes for dimensions that outside of the array size in each document.

Assume a weighted set string attribute X with:

"X": {
    "x": 10,
    "y": 20
}

for a particular document. The result of using the feature dotProduct(X,Y) with the query vector rankproperty.dotProduct.Y={x:2,y:4} will then be 100 (10*2+20*4) for this document.

Arrays can be passed down as [w1 w2 w3 …] or on sparse form {d1:c1,d2:c2,…} as is already supported for weighted sets.

tokenInputIds(length, input_1, input_2, ...) n/a

Convenience function for generating token sequence input to Transformer models. Creates a tensor with dimensions d0[1], d1[length], where d0 is the batch dimension and d1 is the maximum length of the token sequence. Assumes the inputs are zero-padded tensors representing token sequences. The result is the token sequence:

CLS + input_1 + SEP + input_2 + SEP + ... + 0's

  • length: The maximum length of the token sequence
  • input_N: Where to retrieve input from. At least one is required.

The inputs are typically retrieved from the query, document attributes or constants. For instance, tokenInputIds(128, query(my_input), attribute(my_field)) where input types are:

  • query(my_input): tensor(d0[32])
  • attribute(my_field): tensor(d0[128])

will create a tensor of type d0[1],d1[128] consisting of the CLS token 101, the tokens from the query, the SEP token 102, the tokens from the document field, the SEP token 102, and 0's for the rest of the tensor.

customTokenInputIds(start_sequence_id, sep_sequence_idlength, input_1, input_2, ...) n/a

Convenience function for generating token sequence input to Transformer models. Creates a tensor with dimensions d0[1], d1[length], where d0 is the batch dimension and d1 is the maximum length of the token sequence. Assumes the inputs are zero-padded tensors representing token sequences. The result is the token sequence:

start_sequence_id + input_1 + sep_sequence_id + input_2 + sep_sequence_id + ... + 0's

  • start_sequence_idThe start sequence id, typically 1
  • sep_sequence_idThe separator sequence id, typically 2
  • length: The maximum length of the token sequence
  • input_N: Where to retrieve input from. At least one is required.

The inputs are typically retrieved from the query, document attributes or constants. For instance, customTokenInputIds(1,2,128, query(my_input), attribute(my_field)) where input types are:

  • query(my_input): tensor(d0[32])
  • attribute(my_field): tensor(d0[128])
tokenTypeIds(length, input_1, input_2, ...) n/a

Convenience function for generating token sequence input to Transformer models. Similar to the tokenInputIds, creates a tensor of type d0[1],d1[length] which represents a mask with zeros for the first input including CLS and SEP token, ones for the rest of the inputs (up to and including the final SEP token), and 0's for the rest of the tensor.

tokenAttentionMask(length, input_1, input_2, ...) n/a

Convenience function for generating token sequence input to Transformer models. Similar to the tokenInputIds, creates a tensor of type d0[1],d1[length] which represents a mask with ones for all tokens that are set (CLS and SEP and all inputs), and zeros for the rest.

Graphs for selected ranking functions

closeness

Closeness logscale plot

The plot above shows the possible outputs from the closeness distance rank feature using the default maxDistance of 1000 km. The linear(x) graph shows the default closeness output while the other graphs are logscale output for various values of the scaleDistance parameter: 9013.305 (1 km), 45066.525 (5 km - the default value), and 901330.5 (100 km). These values correspond to the following values of the halfResponse parameter: 276154.903 (30.64 km), 593861.739 (65.89 km), and 2088044.581 (231.66 km).

freshness

Freshness logscale plot

The plot above shows the possible outputs from the freshness rank feature using the default maxAge of 7776000s (90 days). The linear(x) graph shows the default freshness output while the other graphs are logscale output for various values of the halfResponse parameter: 172800s (2 days), 604800s (7 days - the default value), 1209600s (14 days).