/cluster/v2 API reference

The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. To find the URL to access this API, identify the cluster controller services in the application. Only the master cluster controller will be able to respond. The master cluster controller is the cluster controller alive that has the lowest index. Thus, one will typically use cluster controller 0, but if contacting it fails, try number 1 and so on. Using vespa-model-inspect:

$ vespa-model-inspect service -u container-clustercontroller

container-clustercontroller @ hostname.domain.com : admin
admin/cluster-controllers/0
    http://hostname.domain.com:19050/ (STATE EXTERNAL QUERY HTTP)
    http://hostname.domain.com:19117/ (EXTERNAL HTTP)
    tcp/hostname.domain.com:19118 (MESSAGING RPC)
    tcp/hostname.domain.com:19119 (ADMIN RPC)

In this example, there is only one clustercontroller, and the State Rest API is available on the port marked STATE and HTTP, 19050 in this example. This information can also be retrieved through the model config in the config server.

Find examples of API usage in content nodes.

HTTP requests

HTTP request	cluster/v2 operation	Description
GET	List cluster and nodes. Get cluster, node or disk states.
	List content clusters	/cluster/v2/
	Get cluster state and list service types within cluster	/cluster/v2/<cluster>
	List nodes per service type for cluster	/cluster/v2/<cluster>/<service-type>
	Get node state	/cluster/v2/<cluster>/<service-type>/<node>
PUT	Set node state
	Set node user state	/cluster/v2/<cluster>/<service-type>/<node>

Node state

Content and distributor nodes have state:

State	Description
`Up`	The node is up and available to keep buckets and serve requests.
`Down`	The node is not available, and can not be used.
`Stopping`	This node is stopping and is expected to be down soon. This state is typically only exposed to the cluster controller to tell why the node stopped. The cluster controller will expose the node as down or in maintenance mode for the rest of the cluster. This state is thus not seen by the distribution algorithm.
`Maintenance`	This node is temporarily unavailable. The node is available for bucket placement, so redundancy is lower. Using this mode, new replicas of the documents stored on this node will not be created, allowing the node to be down with less of a performance impact on the rest of the cluster. This mode is typically used to mask a down state during controlled node restarts, or by an administrator that need to do some short maintenance work, like upgrading software or restart the node.
`Retired`	A retired node is available and serves requests. This state is used to remove nodes while keeping redundancy. Buckets are moved to other nodes (with low priority), until empty. Special considerations apply when using grouped distribution as buckets are not necessarily removed.

Distributor nodes start / transfer buckets quickly and are hence not in maintenance or retired.

Refer to examples of manipulating states.

Types

Type	Spec	Description
cluster	<identifier>	The name given to a content cluster in a Vespa application.
description	.*	Description can contain anything that is valid JSON. However, as the information is presented in various interfaces, some which may present reasons for all the states in a cluster or similar, keeping it short and to the point makes it easier to fit the information neatly into a table and get a better cluster overview.
group-spec	<identifier>(\.<identifier>)*	The hierarchical group assignment of a given content node. This is a dot separated list of identifiers given in the application services.xml configuration.
node	[0-9]+	The index or distribution key identifying a given node within the context of a content cluster and a service type.
service-type	(distributor\|storage)	The type of the service to look at state for, within the context of a given content cluster.
state-disk	(up\|down)	One of the valid disk states.
state-unit	up \| stopping \| down	The cluster controller fetches states from all nodes, called unit states. States reported from the nodes are either `up` or `stopping`. If the node can not be reached, a `down` state is assumed. This means, the cluster controller detects failed nodes. The subsequent generated states will have nodes in `down`, and the ideal state algorithm will redistribute buckets of documents.
state-user	up \| down \| maintenance \| retired	Use tools for user state management. Retire a node from a cluster - use `retired` to move buckets to other nodes Short-lived maintenance work - use `maintenance` to avoid merging buckets to other nodes Fail a bad node. The cluster controller or an operator can set a node `down`
state-generated	up \| down \| maintenance \| retired	The cluster controller generates the cluster state from the `unit` and `user` states, over time. The generated state is called the cluster state.

Request parameters

Parameter Type Description

recursive

number

Parameter	Type	Description
recursive	number	Number of levels, or `true` for all levels. Examples: Use `recursive=1` for a node request to also see all data use `recursive=2` to see all the node data within each service type In recursive mode, you will see the same output as found in the spec below. However, where there is a `{ "link" : "<url-path>" }` element, this element will be replaced by the content of that request, given a recursive value of one less than the request above.

Number of levels, or true for all levels. Examples:

Use recursive=1 for a node request to also see all data
use recursive=2 to see all the node data within each service type

In recursive mode, you will see the same output as found in the spec below. However, where there is a { "link" : "<url-path>" } element, this element will be replaced by the content of that request, given a recursive value of one less than the request above.

HTTP status codes

Non-exhaustive list of status codes:

Code	Description
200	OK.
303	Cluster controller not master - master known. This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller. As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master. HTTP/1.1 303 See Other Location: http://<master>/<current-request> Content-Type: application/json { "message" : "Cluster controller index not master. Use master at index index. }
503	Cluster controller not master - unknown or no master. This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master. HTTP/1.1 503 Service Unavailable Content-Type: application/json { "message" : "No known master cluster controller currently exist." }

Code

Description

200

OK.

303

Cluster controller not master - master known.

This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller.

As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master.

HTTP/1.1 303 See Other
Location: http://<master>/<current-request>
Content-Type: application/json

{
    "message" : "Cluster controller index not master. Use master at index index.
}

503

Cluster controller not master - unknown or no master.

This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master.

HTTP/1.1 503 Service Unavailable
Content-Type: application/json

{
    "message" : "No known master cluster controller currently exist."
}

Response format

Responses are in JSON format, with the following fields:

Field	Description
message	An error message — included for failed requests.
ToDo	Add more fields here.