# Geo Search

To model position(s) in documents, use field(s) of type position. Use this to limit hits to within an area, or use the distance from a position in rank functions. Sample schema:

schema local {
document local {
field title type string {
indexing: index
}
field location type position {
indexing: attribute
}
}
fieldset default {
fields: title
}
}

[
{
"fields": {
"title" :    "Random pizza place",
"location" : "N37.401;W121.996"
}
}
]


## Restrict

The preferred API for adding a geographical restriction is to use a geoLocation clause in the YQL statement, example:

$curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where title contains \"pizza\" and geoLocation(location, 37.416383, -122.024683, \"20 miles\");"}' \ http://localhost:8080/search/  You can also build or modify the query programatically by adding a GeoLocationItem anywhere in the query tree. To allow the query location to match any position, specify a negative radius (e.g. "-1 m"). But to use a position for ranking only (without any requirement for a matching position) you should also specify it as a ranking-only term. Use the rank() operation in YQL, or a RankItem when building the query programatically. There is also a set of legacy parameters in the query API that may be used to restrict query results, defining the allowed area using a position + radius or a bounding box. • Restrict search results using pos.ll and default radius (50km): $ curl -H "Content-Type: application/json" \
--data '{"yql" : "select * from sources * where title contains \"pizza\";"}' \
http://localhost:8080/search/?pos.ll=N37.416383%3BW122.024683

$curl -H "Content-Type: application/json" \ --data '{"yql" : "select * from sources * where title contains \"pizza\";"}' \ http://localhost:8080/search/?pos.ll=N37.416383%3BW122.024683&pos.radius=5km  • Restrict search results using pos.bb bounding box is also currently possible, but this feature is scheduled for removal in Vespa 8. Using a bounding box gives no meaningful data for distance ranking or summary-features, trying to compute a distance or closeness to a bounding box gives undefined results. $ curl -H "Content-Type: application/json" \
--data '{"yql" : "select * from sources * where title contains \"pizza\";"}' \
http://localhost:8080/search/?pos.bb=n=37.44899,s=37.3323,e=-121.98241,w=-122.06566


## Rank

The corresponding rank features for this example are distance(latlong) or closeness(latlong) or closeness(latlong).logscale - the last is probably the most useful when combining distance ranking with textual relevance ranking.

## Summary fields

See the reference for rendering options.

If the request specifies a position, the distance to this position is calculated and rendered in fieldname.distance. For documents with multiple positions in the attribute, the distance to the nearest position is returned.

### X/Y

"x" and "y" are integers - these are in millionths of a degree - about 10cm. Also note which is which of "x" and "y":

It's just putting a normal coordinate system on top of the world map, so "x" is the longitude (east-west) and "y" the latitude (north-south).

The third summary field is the distance, also as an integer, and also in millionths of degrees. When converting to internal units (millionths of degrees), the Earth polar radius is used, so degrees = 180.0 * meters / (Math.PI * 6356752.0); is the basic conversion formula.

### summary-features

Instead of sending the position and the computed distance in a summary field, it may be more practical to add some of the associated rank features as summary-features. In particular, distance(fieldname).latitude and distance(fieldname).longitude gives the geographical coordinates directly, in degrees. This is especially useful to use these programatically from a searcher, accessing feature values in results for further processing.

## Using multiple position fields

For some applications, it can be useful to have several position attributes that may be searched. For example an address book application could use positions for home address and work address. This is possible to declare without any special considerations in the schema file, but needs some extra handling on the query side. A single query item can only search in one of the position attributes, and with the legacy API must specify which attribute with a pos.attribute query parameter. If you want to have a search that spans several fields, use YQL to combine several geoLocation items inside an or clause, or combine several fields into a combined array field:

schema address {
indexing: summary | index
}
field homelatlong type position {
indexing: attribute
}
indexing: summary | index
}
field worklatlong type position {
indexing: attribute
}
field bothlatlong type array<position> {
indexing: attribute
}
}
}
}

Here we assume that the home fields will contain the address and position of your house, the work fields the address and position of your workplace, while the bothlatlong field is assumed to be filled with the positions of both house and workplace (before feeding data into Vespa, or in a document processor during feeding). In a query it's then possible to say
search/?query=homeaddress:sunnyvale&pos.attribute=homelatlong&pos.ll=N37.416383%3BW122.024683&pos.radius=5km

which is unlikely to give very many hits, since it's mostly a business district around Yahoo! headquarters, while
search/?query=workaddress:sunnyvale&pos.attribute=worklatlong&pos.ll=N37.416383%3BW122.024683&pos.radius=5km

would show lots of people working in Sunnyvale; use pos.attribute=bothlatlong for cases where it's uncertain if home address or work address position was wanted.

## Distance to path

This example provides an overview of the DistanceToPath rank feature. This feature matches document locations to a path given in the query. Not only does this feature return the closest distance for each document to the path, it also includes the length traveled along the path before reaching the closest point, or intersection. This feature has been nick named the gas feature because of its obvious use case of finding gas stations along a planned trip.

In this example we have been traveling from the US to Bangalore, and we are now planning our trip back. We have decided to rent a car in Bangalore that we are to return upon arrival at the airport in Chennai. We are already quite hungry and wish to stop for a meal once we are outside of town. To avoid having to pay an additional fueling premium, we also wish to refuel just before reaching the airport. We need to figure out what roads to take, what restaurants are available outside of Bangalore, and what fuel stations are available once we get close to Chennai. In figure 1 we have plotted our trip from Bangalore to the airport:

If we search for restaurants along the path, we only see a small subset of all restaurants present in the window of our quite large map. In figure 2 you see how the most relevant results are actually all in Bangalore or Chennai:

To find the best results, move the map window to just about where we expect to be eating, and redo the search:

This has to be done similarly for finding a gas station near the airport. This illustrates searching for restaurants in a smaller window along the planned trip without DistanceToPath. Next, we outline how DistanceToPath can be used to quickly and easily improve this type of planning to be more convenient for the user.

The nature of this feature requires that the search corpus contains documents with position data. A searcher component needs to be written that is able to pass paths with the queries that lie in the same coordinate space as the searchable documents. Finally, a rank-profile needs to defined that scores documents according to how they match some target distance traveled and at the same time lies close "enough" to the path.

### Query Syntax

This document does not describe how to write a searcher plugin for the JDisc Container, refer to the container documentation. However, let us review the syntax expected by DistanceToPath. As noted in the the rank features reference, the path is supplied as a query parameter by name of the feature and the path keyword:

yql=(…)&rankproperty.distanceToPath(name).path=(x1,y1,x2,y2,…,xN,yN)

Here name has to match the name of the position attribute that holds the positions data.

The path itself is parsed as a list of N coordinate pairs that together form N-1 line segments:

$$(x_1,y_1) \rightarrow (x_2,y_2), (x_2,y_2) \rightarrow (x_3,y_3), (…), (x_{N-1},y_{N-1}) \rightarrow (x_N,y_N)$$

The path is not in a readable (longitude, latitude) format, but is a pair of integers in the internal format (degrees multiplied by 1 million). If a transform is required from geographic coordinates to this, the search plugin must do it; note that the first number in each pair (the "x") is longitude (degrees East or West) while the second ("y") is latitude (degrees North or South), corresponding to the usual orientation for maps - opposite to the usual order of latitude/longitude.

### Rank profile

If we were to disregard our scenario for a few moments, we could suggest the following rank profile:

rank-profile default {
first-phase {
expression: nativeRank
}
second-phase {
expression: firstPhase * if (distanceToPath(ll).distance < 10000, 1, 0)
}
}

This profile will first rank all documents according to Vespa's nativeRank feature, and then do a second pass over the top 100 results and order these based on their distance to our path. It is very simple; if a document lies within 100 metres of our path it retains its relevancy, otherwise its relevancy is set to 0. Such a rank profile would indeed solve the current problem, but Vespa's ranking model allows for us to take this a lot further.

The following is a very simple rank profile that ranks documents according to a query-specified target distance to path and distance traveled:

rank-profile default {
first-phase {
expression {
max(0,    $distance - distanceToPath(ll).distance) * (1 - fabs($traveled - distanceToPath(ll).traveled))
}
}
}

The expression is two-fold; a first component determines a rank based on the document's distance to the given path as compared to the query parameter $distance. If the allowed distance is exceeded, this component's contribution is 0. The distance contribution is then multiplied by the difference of the actual distance traveled as compared to the query parameter $traveled. In short, this profile will include all documents that lie close enough to the path, ranked according to their actual distance and traveled measure.

DistanceToPath is only compatible with 2D coordinates because pathing in 1 dimension makes no sense.

### Results

For the sake of this example, assume that we have implemented a custom path searcher that is able to pass the path found by the user's initial directions query to Vespa's query syntax. There are then two more parameters that must be supplied by the user; $distance and $traveled. Vespa expects these parameters to be supplied in a scale compatible with the feature's output, and should probably also be mapped by the container plugin. The feature's distance output is given in Vespa's internal resolution, which is approximately 10 units per meter. The traveled output is a normalized number between 0 and 1, where 0 represents the beginning of the path, and 1 is the end of the path.

This illustrates how these parameters can be used to return the most appropriate hits for our scenario. Note that the figures only show the top hit for each query:

1. Searching for restaurants with the DistanceToPath feature. $distance = 1000,$traveled = 0.1
2. Searching for gas stations with the DistanceToPath feature. $distance = 1000,$traveled = 0.9