A Vespa system consists of one or more stateless and stateful clusters configured by an application package. A Vespa system is configured and managed through an admin cluster as shown below.
All nodes of a Vespa system have the same software installed. Which processes are started on each node and how they are configured is determined by the admin cluster from the specification given in services.xml in the application package.
To create a fully functional production ready multinode system from a single-node sample application, follow these steps (also see next steps):
$ echo "override VESPA_CONFIGSERVERS [configserver-hostnames]" >> $VESPA_HOME/conf/vespa/default-env.txtwhere
[configserver-hostnames]
is replaced by the full hostname of the config server
(or a comma-separated list if multiple).
node
tags in services.xml.
See below for AWS examples. Refer to configuration server operations for troubleshooting.
The following is a procedure to set up a multinode application on AWS EC2 instances. Please run the procedure in multinode-HA first, to get familiar with the different Vespa concepts before running the AWS procedure below. This procedure will use the name number of hosts, 10, and set up the same application.
sudo
.
The Vespa start scripts will modify the environment (directories, system limits), requiring root access -
refer to vespa-start-configserver
and vespa-start-services.
After the environment setup, Vespa is run as the vespa
user.
Can AWS Auto Scaling be used? Read the autoscaling Q/A.
type | Private IP DNS name (IPv4 only) | Public IPv4 DNS |
---|---|---|
configserver | ip-10-0-1-234.ec2.internal | ec2-3-231-33-190.compute-1.amazonaws.com |
configserver | ip-10-0-1-154.ec2.internal | ec2-3-216-28-201.compute-1.amazonaws.com |
configserver | ip-10-0-0-88.ec2.internal | ec2-34-230-33-42.compute-1.amazonaws.com |
services | ip-10-0-1-95.ec2.internal | ec2-44-192-98-165.compute-1.amazonaws.com |
services | ip-10-0-0-219.ec2.internal | ec2-3-88-143-47.compute-1.amazonaws.com |
services | ip-10-0-0-28.ec2.internal | ec2-107-23-52-245.compute-1.amazonaws.com |
services | ip-10-0-0-67.ec2.internal | ec2-54-198-251-100.compute-1.amazonaws.com |
services | ip-10-0-1-84.ec2.internal | ec2-44-193-84-85.compute-1.amazonaws.com |
services | ip-10-0-0-167.ec2.internal | ec2-54-224-15-163.compute-1.amazonaws.com |
services | ip-10-0-1-41.ec2.internal | ec2-44-200-227-127.compute-1.amazonaws.com |
$ SSH_AUTH_SOCK=/dev/null ssh -i mykeypair.pem centos@ec2-3-231-33-190.compute-1.amazonaws.com
$ sudo dnf config-manager \ --add-repo https://raw.githubusercontent.com/vespa-engine/vespa/master/dist/vespa-engine.repo $ sudo dnf config-manager --enable powertools $ sudo dnf install -y epel-release $ sudo dnf install -y vespa $ export VESPA_HOME=/opt/vespa
$ echo "override VESPA_CONFIGSERVERS" \ "ip-10-0-1-234.ec2.internal,ip-10-0-1-154.ec2.internal,ip-10-0-0-88.ec2.internal" \ | sudo tee -a $VESPA_HOME/conf/vespa/default-env.txtIt is required that all nodes, both config server and Vespa nodes, have the same setting for
VESPA_CONFIGSERVERS
.
$ sudo systemctl start vespa-configserver
$ for configserver in \ ip-10-0-1-234.ec2.internal \ ip-10-0-1-154.ec2.internal \ ip-10-0-0-88.ec2.internal; \ do curl -s http://$configserver:19071/state/v1/health | head -5; done { "time" : 1660034756595, "status" : { "code" : "up" }, { "time" : 1660034756607, "status" : { "code" : "up" }, { "time" : 1660034756786, "status" : { "code" : "up" },A successful config server start will log an entry like:
$ $VESPA_HOME/bin/vespa-logfmt | grep "Application config generation" [2022-08-09 08:29:38.684] INFO : configserver Container.com.yahoo.container.jdisc.ConfiguredApplication Switching to the latest deployed set of configurations and components. Application config generation: 0Do not continue setup before the config server cluster is successfully started. See the video: Troubleshooting startup - multinode and read config server start sequence.
$ sudo systemctl start vespa$VESPA_HOME/logs/vespa/vespa.log will now contain messages for
APPLICATION_NOT_LOADED
,
this is normal until an application is deployed (next section).
$ sudo dnf install -y git zip $ git clone https://github.com/vespa-engine/sample-apps.git && \ cd sample-apps/examples/operations/multinode-HA
$ zip -r - . -x "img/*" "scripts/*" "pki/*" "tls/*" README.md .gitignore | \ curl --header Content-Type:application/zip --data-binary @- \ http://localhost:19071/application/v2/tenant/default/prepareandactivateExpected output:
$ sudo systemctl start vespa
$ curl http://localhost:8080/state/v1/health { "time" : 1660038306465, "status" : { "code" : "up" },Refer to the sample application ports:
Remember to terminate the instances in the AWS console after use.
This is a variant of the multinode install, using only one host, running both a config server and the other Vespa services on the same node.
VESPA_CONFIGSERVERS
:
$ hostname
$ sudo dnf config-manager \ --add-repo https://raw.githubusercontent.com/vespa-engine/vespa/master/dist/vespa-engine.repo $ sudo dnf config-manager --enable powertools $ sudo dnf install -y epel-release $ sudo dnf install -y vespa $ export VESPA_HOME=/opt/vespa $ echo "override VESPA_CONFIGSERVERS ip-172-31-95-248.ec2.internal" | \ sudo tee -a $VESPA_HOME/conf/vespa/default-env.txt
$ sudo dnf install -y git zip $ git clone https://github.com/vespa-engine/sample-apps.git && cd sample-apps/album-recommendation
$ sudo systemctl start vespa-configserver $ curl http://localhost:19071/state/v1/health | head -5
$ zip -r - . -x "img/*" "scripts/*" "pki/*" "tls/*" README.md .gitignore | \ curl --header Content-Type:application/zip --data-binary @- \ http://localhost:19071/application/v2/tenant/default/prepareandactivate
$ sudo systemctl start vespa $ curl http://localhost:8080/state/v1/health | head -5
The following is a procedure to set up a multinode application on AWS ECS instances. Please run the procedure in multinode-HA first, to get familiar with the different Vespa concepts before running the AWS procedure below. This procedure will use the name number of host, 10, and set up the same application. Running the EC2 procedure above can also be helpful, this procedure has a similar structure.
Cluster name | vespa |
---|---|
EC2 instance type | t2.medium |
Number of instances | 10 |
Key pair | Select or create your keypair |
Security group inbound rules - port range | 0 - 65535 |
Name=type
and Value=configserver
,
click the green checkbox on the right, then Close.Name=type
and Value=services
,
click the green checkbox on the right, then Close.
{
"networkMode": "host",
"containerDefinitions": [
{
"name": "configserver",
"environment": [
{
"name": "VESPA_CONFIGSERVERS",
"value": "ip-10-0-1-234.ec2.internal,ip-10-0-1-154.ec2.internal,ip-10-0-0-88.ec2.internal"
}
],
"image": "vespaengine/vespa",
"privileged": true,
"memoryReservation": 1024
}
],
"placementConstraints": [
{
"expression": "attribute:type == configserver",
"type": "memberOf"
}
],
"family": "configserver"
}
Launch type | EC2 |
---|---|
Cluster | vespa |
Number of tasks | 3 |
Placement templates | One Task Per Host |
$ ssh -i mykeypair.pem ec2-user@ec2-3-231-33-190.compute-1.amazonaws.com \ curl -s http://localhost:19071/state/v1/health | head -5 { "time" : 1660635645783, "status" : { "code" : "up" },
$ ssh -i mykeypair.pem ec2-user@ec2-3-231-33-190.compute-1.amazonaws.com
$ sudo yum -y install git zip $ git clone https://github.com/vespa-engine/sample-apps.git && \ cd sample-apps/examples/operations/multinode-HA
$ zip -r - . -x "img/*" "scripts/*" "pki/*" "tls/*" README.md .gitignore | \ curl --header Content-Type:application/zip --data-binary @- \ http://localhost:19071/application/v2/tenant/default/prepareandactivate
{
"networkMode": "host",
"containerDefinitions": [
{
"name": "services",
"environment": [
{
"name": "VESPA_CONFIGSERVERS",
"value": "ip-10-0-1-234.ec2.internal,ip-10-0-1-154.ec2.internal,ip-10-0-0-88.ec2.internal"
}
],
"image": "vespaengine/vespa",
"command": [
"services"
],
"privileged": true,
"memoryReservation": 1024
}
],
"placementConstraints": [
{
"expression": "attribute:type == services",
"type": "memberOf"
}
],
"family": "services"
}
"command": [ "services" ]
.
See controlling which services to start
for details, this starts services only -
the start script starts both the configserver and services if given no arguments -
this is used for the config server above.
For these 7 nodes, services
is given as an argument to the start script to only start Vespa services.
Launch type | EC2 |
---|---|
Cluster | vespa |
Number of tasks | 7 |
Placement templates | One Task Per Host |
$ ssh -i mykeypair.pem ec2-user@ec2-3-88-143-47.compute-1.amazonaws.com \ curl -s http://localhost:8080/state/v1/health | head -5 { "time" : 1660652648442, "status" : { "code" : "up" },
Logs are automatically collected from all nodes in real time to the admin node listed as adminserver
.
To view log messages from the system,
run vespa-logfmt on this node.
To change the system, deploy the changed application to the admin cluster. The admin cluster will automatically change the participating nodes as necessary. It is safe to do this while serving live query and write traffic. In some cases the admin cluster will report that some processes must be restarted to make the change effective. To avoid query or write traffic disruption, such restarts must be done on one node at the time, waiting until the node is fully up before restarting the next one.
See multiple schemas for an overview of how to map schemas to content clusters. There is another way to distribute load over hosts, by mapping multiple content clusters to the same hosts:
Observe that both clusters use node1
.
This is a non-recommended configuration, as it runs multiple proton processes per node.
To reduce interference between the processes in this case, virtualize the host into more nodes.
One can use containers or VMs to do this:
A common question is, "Can AWS Auto Scaling be used?" That is a difficult question to answer, here is a transcript from the Vespa Slack:
I have a question about deployment. I set up cluster on two AWS auto-scaling groups (config & services) based on multinode-systems.html#aws-ec2. But if one of instances was replaced by auto-scaling group, I need manually update hosts.xml file, zip it and deploy new version of the app. I'm thinking about automation of this process by Cloudwatch & Lambda... I wonder if there is some node-discovery mechanism which can e.g. check instances tags and update hosts config based on it?
First, you see in aws-ec2 that there are two types of hosts,
configserver
and services
.
configserver setup / operations is documented at
configuration server operations.
This must be set up first.
This is backed by an Apache ZooKeeper cluster,
so should be 1 or 3 nodes large.
In our own clusters in Yahoo, we do not autoscale configserver clusters, there is no need - we use 3.
If that is too many, use 1. So this question is easy - do not autoscale configservers.
For the services nodes, observe that there are two kinds of nodes - stateless containers and stateful content nodes - see the overview. In any way, you will want to manage these differently - the stateless nodes are more easily replaced / increased / shrunk, by changing services.xml and hosts.xml. It is doable to build an autoscaling service for the stateless nodes, but you need to make sure to use the right metrics for your autoscaling code, and integrate the deploy-automation with the other deployments (say schema modifications).
A much harder problem is autoscaling the stateful nodes - these are the nodes with the indexes and data. See elasticity - adding a node + data redistribution can take hours, and the node's load will increase during redistribution. Building autoscaling here is very difficult to do safely and efficient.
Nothing of this is impossible, and it is actually implemented at cloud.vespa.ai/autoscaling - but it is a difficult feature to get right.
So, my recommendation is starting with a static set of hosts, like in multinode-HA - and in parallel try out cloud.vespa.ai/en/free-trial with autoscaling experiments using your data and use cases.
Autoscaling can save money, but before going there, it is wise to read docs.vespa.ai/en/performance/ and optimize resources using a static node set (or use the sizing suggestions from the Vespa Cloud Console). I.e., get the node resources right first, then consider if autoscaling node count for your load patterns makes sense.