• [+] expand all

Geo Search

To model position(s) in documents, use field(s) of type position. Use this to limit hits to within an area, or use the distance from a position in rank functions. Sample schema:

schema local {
    document local {
        field title type string {
            indexing: index
        field location type position {
            indexing: attribute
    fieldset default {
        fields: title
        "put": "id:localnamespace:local::business-1",
        "fields": {
            "title" :    "Random pizza place",
            "location" : "N37.401;W121.996"


The preferred API for adding a geographical restriction is to use a geoLocation clause in the YQL statement, example:

$ curl -H "Content-Type: application/json" \
  --data '{"yql" : "select * from sources * where title contains \"pizza\" and geoLocation(location, 37.416383, -122.024683, \"20 miles\")"}' \

You can also build or modify the query programmatically by adding a GeoLocationItem anywhere in the query tree.

To allow the query location to match any position, specify a negative radius (e.g. "-1 m"). But to use a position for ranking only (without any requirement for a matching position) you should also specify it as a ranking-only term. Use the rank() operation in YQL, or a RankItem when building the query programmatically.

There is also a set of legacy parameters in the query API that may be used to restrict query results, defining the allowed area using a position + radius or a bounding box.

  • Restrict search results using pos.ll and default radius (50km):
    $ curl -H "Content-Type: application/json" \
      --data '{"yql" : "select * from sources * where title contains \"pizza\""}' \
  • Set a different pos.radius:
    $ curl -H "Content-Type: application/json" \
      --data '{"yql" : "select * from sources * where title contains \"pizza\""}' \
  • Restrict search results using pos.bb bounding box is also currently possible, but this feature is scheduled for removal in Vespa 8. Using a bounding box gives no meaningful data for distance ranking or summary-features, trying to compute a distance or closeness to a bounding box gives undefined results.
    $ curl -H "Content-Type: application/json" \
      --data '{"yql" : "select * from sources * where title contains \"pizza\""}' \


The corresponding rank features for this example are distance(latlong) or closeness(latlong) or closeness(latlong).logscale - the last is probably the most useful when combining distance ranking with textual relevance ranking.

Summary fields

See the reference for rendering options.

If the request specifies a position, the distance to this position is calculated and rendered in fieldname.distance. For documents with multiple positions in the attribute, the distance to the nearest position is returned.


"x" and "y" are integers - these are in millionths of a degree - about 10 cm. Also note which is which of "x" and "y":

world map

It's just putting a normal coordinate system on top of the world map, so "x" is the longitude (east-west) and "y" the latitude (north-south).

The third summary field is the distance, also as an integer, and also in millionths of degrees. When converting to internal units (millionths of degrees), the Earth polar radius is used, so degrees = 180.0 * meters / (Math.PI * 6356752.0); is the basic conversion formula.


Instead of sending the position and the computed distance in a summary field, it may be more practical to add some of the associated rank features as summary-features. In particular, distance(fieldname).latitude and distance(fieldname).longitude gives the geographical coordinates directly, in degrees. This is especially useful to use these programmatically from a searcher, accessing feature values in results for further processing.

Using multiple position fields

For some applications, it can be useful to have several position attributes that may be searched. For example an address book application could use positions for home address and work address. This is possible to declare without any special considerations in the schema file, but needs some extra handling on the query side. A single query item can only search in one of the position attributes, and with the legacy API must specify which attribute with a pos.attribute query parameter. If you want to have a search that spans several fields, use YQL to combine several geoLocation items inside an or clause, or combine several fields into a combined array field:

schema address {
    document address {
        field homeaddress type string {
            indexing: summary | index
        field homelatlong type position {
            indexing: attribute
        field workaddress type string {
            indexing: summary | index
        field worklatlong type position {
            indexing: attribute
        field bothlatlong type array<position> {
            indexing: attribute
    field bothaddress type string {
        indexing: input homeaddress . " " . input workaddress | index
Here we assume that the home fields will contain the address and position of your house, the work fields the address and position of your workplace, while the bothlatlong field is assumed to be filled with the positions of both house and workplace (before feeding data into Vespa, or in a document processor during feeding). In a query it's then possible to say
which is unlikely to give many hits, since it's mostly a business district around Yahoo! headquarters, while

would show lots of people working in Sunnyvale; use pos.attribute=bothlatlong for cases where it's uncertain if home address or work address position was wanted.

Distance to path

This example provides an overview of the DistanceToPath rank feature. This feature matches document locations to a path given in the query. Not only does this feature return the closest distance for each document to the path, it also includes the length traveled along the path before reaching the closest point, or intersection. This feature has been nick named the gas feature because of its obvious use case of finding gas stations along a planned trip.

In this example we have been traveling from the US to Bangalore, and we are now planning our trip back. We have decided to rent a car in Bangalore that we are to return upon arrival at the airport in Chennai. We are already quite hungry and wish to stop for a meal once we are outside of town. To avoid having to pay an additional fueling premium, we also wish to refuel just before reaching the airport. We need to figure out what roads to take, what restaurants are available outside of Bangalore, and what fuel stations are available once we get close to Chennai. In figure 1 we have plotted our trip from Bangalore to the airport:

Trip from Bangalore to the airport

If we search for restaurants along the path, we only see a small subset of all restaurants present in the window of our quite large map. In figure 2 you see how the most relevant results are actually all in Bangalore or Chennai:

Most relevant results

To find the best results, move the map window to just about where we expect to be eating, and redo the search:

redo search with adjusted map

This has to be done similarly for finding a gas station near the airport. This illustrates searching for restaurants in a smaller window along the planned trip without DistanceToPath. Next, we outline how DistanceToPath can be used to quickly and easily improve this type of planning to be more convenient for the user.

The nature of this feature requires that the search corpus contains documents with position data. A searcher component needs to be written that is able to pass paths with the queries that lie in the same coordinate space as the searchable documents. Finally, a rank-profile needs to defined that scores documents according to how they match some target distance traveled and at the same time lies close "enough" to the path.

Query Syntax

This document does not describe how to write a searcher plugin for the JDisc Container, refer to the container documentation. However, let us review the syntax expected by DistanceToPath. As noted in the the rank features reference, the path is supplied as a query parameter by name of the feature and the path keyword:


Here name has to match the name of the position attribute that holds the positions data.

The path itself is parsed as a list of N coordinate pairs that together form N-1 line segments:

$$(x_1,y_1) \rightarrow (x_2,y_2), (x_2,y_2) \rightarrow (x_3,y_3), (…), (x_{N-1},y_{N-1}) \rightarrow (x_N,y_N)$$

Rank profile

If we were to disregard our scenario for a few moments, we could suggest the following rank profile:

rank-profile default {
    first-phase {
        expression: nativeRank
    second-phase {
        expression: firstPhase * if (distanceToPath(ll).distance < 10000, 1, 0)

This profile will first rank all documents according to Vespa's nativeRank feature, and then do a second pass over the top 100 results and order these based on their distance to our path. If a document lies within 100 metres of our path it retains its relevancy, otherwise its relevancy is set to 0. Such a rank profile would indeed solve the current problem, but Vespa's ranking model allows for us to take this a lot further.

The following is a rank profile that ranks documents according to a query-specified target distance to path and distance traveled:

rank-profile default {
    first-phase {
        expression {
            max(0,    query(distance) - distanceToPath(ll).distance) *
            (1 - fabs(query(traveled) - distanceToPath(ll).traveled))

The expression is two-fold; a first component determines a rank based on the document's distance to the given path as compared to the query parameter distance. If the allowed distance is exceeded, this component's contribution is 0. The distance contribution is then multiplied by the difference of the actual distance traveled as compared to the query parameter traveled. In short, this profile will include all documents that lie close enough to the path, ranked according to their actual distance and traveled measure.


For the sake of this example, assume that we have implemented a custom path searcher that is able to pass the path found by the user's initial directions query to Vespa's query syntax. There are then two more parameters that must be supplied by the user; distance and traveled. Vespa expects these parameters to be supplied in a scale compatible with the feature's output, and should probably also be mapped by the container plugin. The feature's distance output is given in Vespa's internal resolution, which is approximately 10 units per meter. The traveled output is a normalized number between 0 and 1, where 0 represents the beginning of the path, and 1 is the end of the path.

This illustrates how these parameters can be used to return the most appropriate hits for our scenario. Note that the figures only show the top hit for each query:

Top tip 1 Top tip 2
  1. Searching for restaurants with the DistanceToPath feature. distance = 1000, traveled = 0.1
  2. Searching for gas stations with the DistanceToPath feature. distance = 1000, traveled = 0.9