Config sentinel

The config sentinel starts and stops services - and restart failed services unless are manually stopped

Services start

All nodes in a Vespa system has at least these running services:

config-proxy proxies config requests between Vespa applications and the configserver node. All configuration is cached locally so that this node can maintain its current configuration, even if the configserver shuts down
config-sentinel registers itself with the config-proxy and subscribes to and enforces node configuration, meaning the configuration of what services should be run locally, and with what parameters
vespa-logd monitors $VESPA_HOME/logs/vespa/vespa.log, which is used by all other services, and relays everything to the log-server

Sequence:

  1. config-proxy is started. The environment variables VESPA_CONFIGSERVERS and VESPA_CONFIGSERVER_RPC_PORT are used to connect to the config-server(s). It will retry all config servers in case some are down.
  2. config-sentinel is started, and subscribes to node configuration (i.e. a service list) from config-proxy using its hostname as the config id. The config for the config-sentinel (the service list) lists the processes to be started, along with the config id to assign to each, typically the logical name of that service instance.
  3. config-proxy subscribes to node configuration from config-server, caches it, and returns the result to config-sentinel
  4. config-sentinel starts the services given in the node configuration, with the config id as argument - see example output below, like id="search/qrservers/qrserver.0". Each service:
    1. Subscribes to configuration from config-proxy
    2. config-proxy subscribes to configuration from config-server, caches it and returns result to the service
    3. The service runs according to its configuration, logging to $VESPA_HOME/logs/vespa/vespa.log. The processes instantiate internal components, each assigned the same or another config id, and instantiating further components.
When the config-server receives updated configuration for a running system, it propagates the changed configuration to nodes subscribing to it. In turn, these nodes reconfigure themselves accordingly.

User interface

The config sentinel runs a telnet service which can be used to list, start and stop the services supposed to run on that node. The port number is 19098 by default, use vespa-model-inspect to verify - the service name is config-sentinel, and the port is tagged telnet interactive. Open the interface:

$ telnet localhost 19098
Commands:
ls Lists the services running on this node and their status.
restart [name] Restarts the service with the given name. The name is the first string in the service list given by ls.
manual [name] Switch the service with the given name into manual mode, using stop and start for manual management, and the service will not be automatically restarted if it should stop.
auto [name] Switch the service with the given name back to automatic mode. Also starts the service if necessary.
stop [name] Stops the service with the given name. Only possible when the service will not auto-restart, do manual first.
start [name] Starts the service with the given name. Note that the service will remain in manual mode and will not be restarted if it crashes, so remember to use auto to get back to normal operations.
quit Close the connection.
To temporarily stop a service, use manual and then stop; later switch it back to normal operation using auto. Pro tip: To restart a process from the command line, do:
$ echo -e "restart distributor\nquit" | nc localhost 19098
To learn more about the processes and services, see files and processes.

Example output from telnet localhost 19098 followed by a ls:

vespa-logd state=RUNNING mode=AUTO pid=24512 exitstatus=0 autostart=TRUE
    autorestart=TRUE id="hosts/myhost.mydomain.com/logd"
topleveldispatch state=RUNNING mode=AUTO pid=24518 exitstatus=0 autostart=TRUE
    autorestart=TRUE id="search/cluster.search/tld.0"
clustercontroller state=RUNNING mode=AUTO pid=24517 exitstatus=0 autostart=TRUE
    autorestart=TRUE id="search/cluster.search/rtx/0"
qrserver state=RUNNING mode=AUTO pid=24516 exitstatus=0 autostart=TRUE
    autorestart=TRUE id="search/qrservers/qrserver.0"
vespa-slobrok state=RUNNING mode=AUTO pid=24514 exitstatus=0 autostart=TRUE
    autorestart=TRUE id="admin/slobrok.0"
Each line has the following fields:
service name
state
  • RUNNING: Service is running
  • FINISHED: Service has been stopped
  • FAILED: Service has crashed and failed to restart
  • TERMINATING: Service is stopping
mode
  • MANUAL: Service has to be started and stopped manually
  • AUTO: Service will restart automatically according to autorestart
pid Pid of the process (main thread)
exitstatus Exit code last time service stopped and did not exit normally (e.g. using stop command)
autostart Indicates if service will be started automatically when Vespa is started
autorestart Indicates if service should be restarted automatically if it crashes. Has no effect if mode=MANUAL. Configuration setting
id Config ID of the service