The Cloud Config System can be set up with one or more configuration servers (config servers). A config server uses ZooKeeper as a distributed data storage for the configuration system. In addition, each node runs a config proxy to cache configuration data - find an overview at services start.
Tools to access config:
To access config from a node not running the config system (e.g. doing feeding via the Document API), use the environment variable VESPA_CONFIG_SOURCES:
$ export VESPA_CONFIG_SOURCES="myadmin0.mydomain.com:12345,myadmin1.mydomain.com:12345"Alternatively, for Java programs, use the system property configsources and set it programmatically or on the command line with the -D option to Java. The syntax for the value is the same as for VESPA_CONFIG_SOURCES.
The default heap size for the JVM it runs under is 1.5 Gb (which can be changed with a setting). It writes a transaction log that is regularly purged of old items, so little disk space is required. Note that running on a server that has a lot of disk I/O will adversely effect performance and is not recommended.
The config server RPC port can be changed by setting VESPA_CONFIGSERVER_RPC_PORT on all nodes in the system. Changing HTTP port requires changing the port in $VESPA_HOME/conf/configserver-app/services.xml:
<http> <server port="12345" id="configserver" /> </http>When deploying, use the -p option, if port is changed from the default.
# services.xml <admin version="2.0"> <configservers> <configserver hostalias="admin0" /> <configserver hostalias="admin1" /> <configserver hostalias="admin2" /> </configservers> </admin> # hosts.xml <hosts> <host name="myserver0.mydomain.com"> <alias>admin0</alias> </host> <host name="myserver1.mydomain.com"> <alias>admin1</alias> </host> <host name="myserver2.mydomain.com"> <alias>admin2</alias> </host> </hosts> # VESPA_CONFIGSERVERS - set on all nodes in the application VESPA_CONFIGSERVERS=myserver0.mydomain.com,myserver1.mydomain.com,myserver2.mydomain.com
Refer to the admin model reference. In addition, VESPA_CONFIGSERVERS must be set on all nodes. This is a comma or whitespace-separated list with the hostname of all configservers, like myhost1.mydomain.com,myhost2.mydomain.com,myhost3.mydomain.com.
When there are multiple config servers, the config clients (usually the config proxy, if you have not configured the client to use another config source) will pick a config server based on a hash of their hostname and how many config servers there are. This uniformly loads the config servers. The config clients are fault-tolerant and will switch to another config server if it is unavailable or there is an error in the configuration it receives. With only one config server configured, it will continue using that in case of errors.
For the system to tolerate n failures, ZooKeeper by design requires using (2*n)+1 nodes. Consequently, only odd numbers of nodes work, meaning minimum 3 nodes to have a fault tolerant config system.
It is important to remember that even when using one config server, the config system will not fail at once if that server fails. This is because all nodes runs a config proxy that caches every known config, and serves it to the components on that node. However, restarting a node when half or more config servers are unavailable will lead to a failure of that node, since restarting a node means restarting the config proxy.
Add config server nodes for increased fault tolerance. There is no need to restart Vespa on other nodes during the procedure - this ensures uninterrupted functionality of the application. Procedure:
- Install vespa on the new config server nodes
- Add config server hosts to VESPA_CONFIGSERVERS on all the nodes
- Restart the config server on the old config server hosts and start it on the new ones
- Update services.xml and hosts.xml with the new set of config servers, then vespa-deploy prepare and vespa-deploy activate
Scaling up by Majority
When increasing from 1 to 3 nodes or 3 to 7, the blank nodes constitutes a majority in the cluster. After restarting the config servers, they will not always contain old application data, because the blank nodes might win the election - depending on restart timing. Get a new correct data set when repating vespa-deploy prepare and vespa-deploy activate. If you do not wish to scratch your old application data like this, for instance to keep the history, the solution is to scale up by minor sets of the nodes - example:
- Scale from 1 to 2
- Scale from 2 to 3
Remove config servers from a cluster:
- remove config server hosts from VESPA_CONFIGSERVERS on all vespa nodes
- Restart config servers on the new subset
- Verify that these nodes have data, by using vespa-get-config or zkCli.sh ls (see below). If they are blank, redo vespa-deploy prepare and vespa-deploy activate. Also see health checks.
- Pull removed hosts from production
ZooKeeper handles data consistency across multiple config servers. The config server Java application runs a ZooKeeper server, embedded with an RPC frontend that the other nodes use. ZooKeeper stores data internally in nodes that can have sub-nodes, similar to a file system.
When start/restarting the config server, the configuration file for ZooKeeper, $VESPA_HOME/conf/zookeeper/zookeeper.cfg, is generated based on the contents of VESPA_CONFIGSERVERS. Hence, config server(s) must all be restarted if that changes on a config server node.
At vespa-deploy prepare, the application's files, along with global configurations, are stored in ZooKeeper. The application data is stored under /config/v2/tenants/default/sessions/[sessionid]/userapp. At vespa-deploy activate, the newest application is set live by updating the pointer in /config/v2/tenants/default/FIXME to refer to the active app's timestamp. It is at that point the other nodes get configured.
Use zkCli.sh to inspect state, replace with actual session id:
$ ./zkCli.sh ls /config/v2/tenants/default/sessions/[sessionid]/userapp $ ./zkCli.sh get /config/v2/tenants/default/sessions/[sessionid]/userapp/services.xml
The ZooKeeper server logs to $VESPA_HOME/logs/vespa/zookeeper.configserver.0.log (files are rotated with sequence number)
If the config server(s) should experience data corruption, for instance a hardware failure, use the following recovery procedure. One example of such a scenario is if $VESPA_HOME/logs/vespa/zookeeper.configserver.0.log says java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method), which indicates ZooKeeper has not been able to recover after a full disk. There is no need to restart Vespa on other nodes during the procedure. Refer to tools for details:
- stop cloudconfig_server
- start cloudconfig_server
- vespa-deploy prepare <application path>
- vespa-deploy activate
Note that by default the cluster controller that maintains the state of the content cluster will use the shared same ZooKeeper instance, so the content cluster state is also reset when removing state. Manually set state will be lost (e.g. a node with user state down). It is possible to run cluster-controllers in standalone zookeeper mode - see standalone-zookeeper.
ZooKeeper barrier timeout
If the config servers are heavily loaded, or the applications being deployed are big, the internals of the server may time out when synchronizing with the other servers during deploy. To work around, increase the timeout by setting: VESPA_CONFIGSERVER_ZOOKEEPER_BARRIER_TIMEOUT to 600 (seconds) or higher, and restart the config servers.
Set the ZooKeeper ports, prior to starting the config server. This is useful if running multiple instances on the same host:
|Health checks||Verify that a config server is up and running using the Health and Metric APIs, like http://myserver0.mydomain.com:12345/state/v1/. Metrics are found at http://myserver0.mydomain.com:12345/state/v1/metrics. Use vespa-model-inspect to find host and port number|
|Bad Node||If running with more than one config server and one of these goes down or has hardware failure, the cluster will still work and serve config as usual (clients will switch to use one of the good nodes). It is not necessary to remove a bad node from the configuration. Deploying applications will use a long time, since vespa-deploy will not be able to complete a deployment on all servers when one of them is down. If this is troublesome, lower the barrier timeout - (default value is 120 seconds). Note also that if you have not configured cluster controllers explicitly, these will run on the config server nodes and the operation of these might be affected. This is another reason for not trying to manually remove a bad node from the config server setup|
|Stuck filedistribution||The config system distributes binary files (such as jar bundle files) using file-distribution - use vespa-status-filedistribution if it gets stuck|