The config sentinel starts and stops services - and restart failed services unless they are manually stopped. All nodes in a Vespa system have at least these running processes:
Process | Description |
---|---|
config-proxy | Proxies config requests between Vespa applications and the configserver node. All configuration is cached locally so that this node can maintain its current configuration, even if the configserver shuts down. |
config-sentinel | Registers itself with the config-proxy and subscribes to and enforces node configuration, meaning the configuration of what services should be run locally, and with what parameters. |
vespa-logd | Monitors $VESPA_HOME/logs/vespa/vespa.log, which is used by all other services, and relays everything to the log-server. |
metrics-proxy | Provides APIs for metrics access to all nodes and services. |
Start sequence:
When new config is deployed to config-servers they propagate the changed configuration to nodes subscribing to it. In turn, these nodes reconfigure themselves accordingly.
The config sentinel runs an RPC service which can be used to list,
start and stop the services supposed to run on that node.
This can be useful for testing and debugging.
Use vespa-sentinel-cmd
to trigger these actions.
Example output from vespa-sentinel-cmd list
:
vespa-sentinel-cmd 'sentinel.ls' OK. container state=RUNNING mode=AUTO pid=27993 exitstatus=0 id="default/container.0" container-clustercontroller state=RUNNING mode=AUTO pid=27997 exitstatus=0 id="admin/cluster-controllers/0" distributor state=RUNNING mode=AUTO pid=27996 exitstatus=0 id="search/distributor/0" logd state=RUNNING mode=AUTO pid=5751 exitstatus=0 id="hosts/r6-3/logd" logserver state=RUNNING mode=AUTO pid=27994 exitstatus=0 id="admin/logserver" searchnode state=RUNNING mode=AUTO pid=27995 exitstatus=0 id="search/search/cluster.search/0" slobrok state=RUNNING mode=AUTO pid=28000 exitstatus=0 id="admin/slobrok.0"
To learn more about the processes and services, see files and processes. Use vespa-model-inspect host hostname to list services running on a node.
The config sentinel will not start services on a node unless it has connectivity to a minimum of other nodes, default 50%. Find an example of this feature in the multinode-HA example application. Example configuration:
Example: minOkPercent 10
means that services will be started only if
more than or equal to 10% of nodes are up.
If there are 11 nodes in the application,
the first node started will not start its services -
when the second node is started, services will be started on both.
maxBadCount
is for connectivity checks where the other node is up,
but we still do not have proper two-way connectivity.
Normally, one-way connectivity means network configuration is broken and needs looking into,
so this may be set low (1 or even 0 are the recommended values). If there are some
temporary problems (in the example below non-responding DNS which leads to various
issues at startup) the config sentinel will loop and retry, so the service startup will just
be slightly delayed.
Example log:
[2021-06-15 14:33:25] EVENT : starting/1 name="sbin/vespa-config-sentinel -c hosts/le40808.ostk (pid 867)" [2021-06-15 14:33:25] EVENT : started/1 name="config-sentinel" [2021-06-15 14:33:25] CONFIG : Sentinel got 4 service elements [tenant(footest), application(bartest), instance(default)] for config generation 1001 [2021-06-15 14:33:25] CONFIG : Booting sentinel 'hosts/le40808.ostk' with [stateserver port 19098] and [rpc port 19097] [2021-06-15 14:33:25] CONFIG : listening on port 19097 [2021-06-15 14:33:25] CONFIG : Sentinel got model info [version 7.420.21] for 35 hosts [config generation 1001] [2021-06-15 14:33:25] CONFIG : connectivity.maxBadCount = 3 [2021-06-15 14:33:25] CONFIG : connectivity.minOkPercent = 40 [2021-06-15 14:33:28] INFO : Connectivity check details: 2086533.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le01287.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23256.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23267.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23297.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23312.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le23317.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le23319.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30550.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30553.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le30556.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le30560.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le30567.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40387.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40389.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40808.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40817.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40833.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40834.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40841.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40858.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40860.ostk -> unreachable from me, but up [2021-06-15 14:33:28] INFO : Connectivity check details: le40863.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:28] INFO : Connectivity check details: le40873.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40892.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40900.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40905.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: le40914.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02318.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02324.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: sm02340.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40672.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40712.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt40728.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] INFO : Connectivity check details: zt41329.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:28] WARNING : 8 of 35 nodes up but with network connectivity problems (max is 3) [2021-06-15 14:33:28] WARNING : Bad network connectivity (try 1) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le30556.ostk' -> '1234:5678:90:123::abcd' (5.00528 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40834.ostk' -> '1234:5678:90:456::efab' (5.00527 s) [2021-06-15 14:33:30] WARNING : slow resolve time: 'le40860.ostk' -> '1234:5678:90:789::cdef' (5.00459 s) [2021-06-15 14:33:31] INFO : Connectivity check details: le23312.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le23319.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30553.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le30556.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40834.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40841.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Connectivity check details: le40860.ostk -> connect OK, but reverse check FAILED [2021-06-15 14:33:31] INFO : Connectivity check details: le40863.ostk -> OK: both ways connectivity verified [2021-06-15 14:33:31] INFO : Enough connectivity checks OK, proceeding with service startup [2021-06-15 14:33:31] EVENT : starting/1 name="searchnode" ...