Access Logging in JSON Format

In some applications, it may be desirable to generate jdisc access logs - which logs a line per request - in a format that is easier to parse and also supports logging custom key/value pairs from plug-in code. The Vespa JSON access log format allows the logs to be processed by a number of available tools handling JSON based (log) files. With the ability to add custom key/value pairs to the log from any Searcher, you can easily track the decisions done by searchers for given queries.

Vespa JSON Access Log Format

The Vespa JSON access log format is a format where each log event is logged as a single JSON object compressed on a single line. This makes it possible to use basic line-based tools such as grep to process the log file, but a JSON aware tool such as jq is obviously preferred.

Access log fields

The log format defines a list of fields that can be logged with every request. In addition to these fields, custom key/value pairs can be logged via Searcher code. The following fields are pre-defined by Vespa:

NameTypeDescriptionAlways present
ipstringThe IP address request came fromyes
timenumberUNIX timestamp with millisecond decimal precision (e.g. 1477828938.123)yes
durationnumberThe duration of the request in seconds with millisecond decimal precision (e.g. 0.123)yes
responsesizenumberThe size of the response in bytesyes
codenumberThe HTTP status code returnedyes
methodstringThe HTTP method used (e.g. 'GET')yes
uristringThe request URI from path and beyond (e.g. '/search?query=test')yes
versionstringThe HTTP version (e.g. 'HTTP/1.1')yes
agentstringThe user agent specified in the requestyes
hoststringThe host header provided in the requestyes
schemestringThe scheme of the requestyes
portnumberThe IP port number of the interface on which the request was receivedyes
remoteaddrstringThe IP address of the remote client if specified in HTTP headerno
remoteportstringThe port used from the remote client if specified in HTTP header no
peeraddrstringAddress of immediate client making request if different than remoteaddrno
peerportstringPort used by immediate client making request if different than remoteportno
user-principalstringThe name of the authenticated user (java.security.Principal.getName()) if principal is setno
ssl-principalstringThe name of the x500 principal if client is authenticated through SSL/TLSno
searchobjectObject holding search specific fieldsno
search.totalhitsnumberThe total number of hits for the queryno
search.hitsnumberThe hits returned in this specific responseno
attributesobjectObject holding custom key/value pairs logged in searcher.no

NOTE:IP addresses can be both IPv4 addresses in standard dotted format (e.g. 127.0.0.1) or IPv6 addresses in standard form with leading zeros omitted (e.g. 2222:1111:123:1234:0:0:0:4321).

An example log line will look like this:

{"ip":"152.200.54.243","time":920880005.023,"duration":0.122,"responsesize":9875,"code":200,"method":"GET","uri":"/search?query=test&param=value","version":"HTTP/1.1","agent":"Mozilla/4.05 [en] (Win95; I)","host":"localhost","search":{"totalhits":1234,"hits":0}}
A pretty print version of the same example:
{
  "ip": "152.200.54.243",
  "time": 920880005.023,
  "duration": 0.122,
  "responsesize": 9875,
  "code": 200,
  "method": "GET",
  "uri": "/search?query=test&param=value",
  "version": "HTTP/1.1",
  "agent": "Mozilla/4.05 [en] (Win95; I)",
  "host": "localhost",
  "search": {
    "totalhits": 1234,
    "hits": 0
  }
}
NOTE: The log format is extendable by design such that the order of the fields can be changed and new fields can be added between minor versions. Make sure any programmatic log handling is using a proper JSON processor.

Logging Remote Address/Port

In some cases when a request passes through an intermediate service, this service may add HTTP headers indicating the IP address and port of the real origin client. These values are logged as remoteaddr and remoteport respectively. Vespa will log the contents in any of the following HTTP request headers as remoteaddr: Y-RA, YahooRemoteIP, X-Forwarded-For or Client-IP. The contents of the Y-RP HTTP request header will be logged as remoteport.

If the remote address or -port differs from those initiating the HTTP request, the address and port for the immediate client making the request are logged as peeraddress and peerport respectively.

Configuring JSON Logging

In order for Vespa to use JSON access logging, add the following under the container element in services.xml:

<accesslog type='json'
           fileNamePattern='logs/vespa/qrs/access.%Y%m%d%H%M%S' />
and redeploy the application package.

File name pattern

The file name pattern is expanded using the time when the file is created. The following parts in the file name are expanded:

FieldFormatMeaningExample
%YYYYYYear2003
%mMMMonth, numeric08
%xMMMMonth, textualAug
%dddDate25
%HHHHour14
%MmmMinute30
%SssSeconds35
%sSSSMilliseconds123
%ZZTime zone-0400
%TLongSystem.currentTimeMillis1349333576093
%%%Escape percentage%

Log rotation

Apache httpd style log rotation can be configured by setting the rotationScheme. There's two alternatives for the rotationScheme, sequence and date. The rotation time is controlled by setting rotationInterval.

Sequence rotation scheme

The fileNamePattern is used for the active log file name (which in this case will often be a constant string). At rotation, this file is given the name fileNamePattern.N where N is 1 + the largest integer found by extracting the integers from all files ending by .<Integer> in the same directory

<accesslog type='json'
           fileNamePattern='logs/vespa/qrs/access.%Y%m%d%H%M%'
           rotationScheme='sequence' />

Date rotation scheme

The fileNamePattern is used for the active log file name here too, but the log files are not renamed at rotation. Instead, you must specify a time-dependent fileNamePattern so that each time a new log file is created, the name is unique. In addition, a symlink is created pointing to the active log file. The name of the symlink is specified using symlinkName.

<accesslog type='json'
           fileNamePattern='logs/vespa/qrs/access.%Y%m%d%H%M%'
           rotationScheme='date'
           symlinkName='json_access' />

Rotation interval

The time of rotation is controlled by setting rotationInterval:

<accesslog type='json'
           fileNamePattern='logs/vespa/qrs/access.%Y%m%d%H%M%'
           rotationInterval='0 60 ...'
           rotationScheme='date'
           symlinkName='json_access' />
The rotationInterval is a list of numbers specifying when to do rotation. Each element represents the number of minutes since midnight. Ending the list with '...' means continuing the arithmetic progression defined by the two last numbers for the rest of the day. E.g. "0 100 240 480 ..." is expanded to "0 100 240 480 720 960 1200"

Logging Key/Value pairs to the JSON Access Log from Searchers

To add a key/value pair to the access log from a searcher, use

query/result.getContext(true).logValue(key,value)
Such key/value pairs may be added from any thread participating in handling the query without incurring synchronization overhead.

If the same key is logged multiple times, the values written will be included in the log as an array of strings rather than a single string value.

The key/value pairs are added to the attributes object in the log.

An example log line will then look something like this:

{"ip":"152.200.54.243","time":920880005.023,"duration":0.122,"responsesize":9875,"code":200,"method":"GET","uri":"/search?query=test&param=value","version":"HTTP/1.1","agent":"Mozilla/4.05 [en] (Win95; I)","host":"localhost","search":{"totalhits":1234,"hits":0},"attributes":{"singlevalue":"value1","multivalue":["value2","value3"]}}

A pretty print version of the same example:

{
  "ip": "152.200.54.243",
  "time": 920880005.023,
  "duration": 0.122,
  "responsesize": 9875,
  "code": 200,
  "method": "GET",
  "uri": "/search?query=test&param=value",
  "version": "HTTP/1.1",
  "agent": "Mozilla/4.05 [en] (Win95; I)",
  "host": "localhost",
  "search": {
    "totalhits": 1234,
    "hits": 0
  },
  "attributes": {
    "singlevalue": "value1",
    "multivalue": [
      "value2",
      "value3"
    ]
  }
}