The cluster controller has a /cluster/v2 API for viewing and modifying a content cluster state. To find the URL to access this API, identify the cluster controller services in the application. Only the master cluster controller will be able to respond. The master cluster controller is the cluster controller alive that has the lowest index. Thus, one will typically use cluster controller 0, but if contacting it fails, try number 1 and so on. Using vespa-model-inspect:
$ vespa-model-inspect service -u container-clustercontroller container-clustercontroller @ hostname.domain.com : admin admin/cluster-controllers/0 http://hostname.domain.com:19050/ (STATE EXTERNAL QUERY HTTP) http://hostname.domain.com:19117/ (EXTERNAL HTTP) tcp/hostname.domain.com:19118 (MESSAGING RPC) tcp/hostname.domain.com:19119 (ADMIN RPC)
In this example, there is only one clustercontroller, and the State Rest API is available on the port marked STATE and HTTP, 19050 in this example. This information can also be retrieved through the model config in the config server.
Find examples of API usage in content nodes.
HTTP request | cluster/v2 operation | Description |
---|---|---|
GET |
List cluster and nodes. Get cluster, node or disk states. |
|
List content clusters |
/cluster/v2/ |
|
Get cluster state and list service types within cluster |
/cluster/v2/<cluster> |
|
List nodes per service type for cluster |
/cluster/v2/<cluster>/<service-type> |
|
Get node state |
/cluster/v2/<cluster>/<service-type>/<node> |
|
PUT |
Set node state |
|
Set node user state |
/cluster/v2/<cluster>/<service-type>/<node> |
Content and distributor nodes have state:
State | Description |
---|---|
|
The node is up and available to keep buckets and serve requests. |
|
The node is not available, and can not be used. |
|
This node is stopping and is expected to be down soon. This state is typically only exposed to the cluster controller to tell why the node stopped. The cluster controller will expose the node as down or in maintenance mode for the rest of the cluster. This state is thus not seen by the distribution algorithm. |
|
This node is temporarily unavailable. The node is available for bucket placement, so redundancy is lower. Using this mode, new replicas of the documents stored on this node will not be created, allowing the node to be down with less of a performance impact on the rest of the cluster. This mode is typically used to mask a down state during controlled node restarts, or by an administrator that need to do some short maintenance work, like upgrading software or restart the node. |
|
A retired node is available and serves requests. This state is used to remove nodes while keeping redundancy. Buckets are moved to other nodes (with low priority), until empty. Special considerations apply when using grouped distribution as buckets are not necessarily removed. |
Distributor nodes start / transfer buckets quickly
and are hence not in maintenance
or retired
.
Refer to examples of manipulating states.
Type | Spec | Description |
---|---|---|
cluster |
<identifier> | The name given to a content cluster in a Vespa application. |
description |
.* | Description can contain anything that is valid JSON. However, as the information is presented in various interfaces, some which may present reasons for all the states in a cluster or similar, keeping it short and to the point makes it easier to fit the information neatly into a table and get a better cluster overview. |
group-spec |
<identifier>(\.<identifier>)* | The hierarchical group assignment of a given content node. This is a dot separated list of identifiers given in the application services.xml configuration. |
node |
[0-9]+ | The index or distribution key identifying a given node within the context of a content cluster and a service type. |
service-type |
(distributor|storage) | The type of the service to look at state for, within the context of a given content cluster. |
state-disk |
(up|down) | One of the valid disk states. |
state-unit |
up | stopping | down |
The cluster controller fetches states from all nodes, called unit states.
States reported from the nodes are either
This means, the cluster controller detects failed nodes.
The subsequent generated states will have nodes in |
state-user |
up | down | maintenance | retired |
Use tools for user state management.
|
state-generated |
up | down | maintenance | retired |
The cluster controller generates the cluster state
from the |
Parameter | Type | Description |
---|---|---|
recursive | number |
Number of levels, or
In recursive mode, you will see the same output as found in the spec below.
However, where there is a |
Non-exhaustive list of status codes:
Code | Description |
---|---|
200 | OK. |
303 |
Cluster controller not master - master known. This error means communicating with the wrong cluster controller. This returns a standard HTTP redirect, so the HTTP client can automatically redo the request on the correct cluster controller. As the cluster controller available with the lowest index will be the master, the cluster controllers are normally queried in index order. Hence, it is unlikely to ever get this error, but rather fail to connect to the cluster controller if it is not the current master. HTTP/1.1 303 See Other Location: http://<master>/<current-request> Content-Type: application/json { "message" : "Cluster controller index not master. Use master at index index. } |
503 |
Cluster controller not master - unknown or no master. This error is used if the cluster controller asked is not master, and it doesn't know who the master is. This can happen, e.g. in a network split, where cluster controller 0 no longer can reach cluster controller 1 and 2, in which case cluster controller 0 knows it is not master, as it can't see the majority, and cluster controller 1 and 2 will vote 1 to master. HTTP/1.1 503 Service Unavailable Content-Type: application/json { "message" : "No known master cluster controller currently exist." } |
Responses are in JSON format, with the following fields:
Field | Description |
---|---|
message | An error message — included for failed requests. |
ToDo | Add more fields here. |