Multi-Node Quick Start: Install and run Vespa on AWS EC2
- An AWS account
- Familiarity with setting up a single-node system on CentOS
Navigate to AWS EC2
Navigate to the AWS EC2 console and to the region of your choice.
Configure an AWS security group
- Click Create Security Group under the Security Groups section. Give your security group a descriptive name; you'll need to refer back to it soon.
- Under Inbound, Add rule: Custom TCP for port range 8080 (feed & search port) from your source IP range. For testing, you can choose "My IP" from the Source drop-down menu.
- Under Inbound, Add rule: SSH for your source IP range. You need SSH access on all instances to install and configure Vespa.
- Click Create.
- We need to modify the rules so that instances that are part of the group can talk to all other instances that are part of the same group. Select the newly created group in the Security Group overview table. Under the Inbound tab, click Edit.
- Add rule: Custom TCP for port range 0-65535, specifying the name of the current Security Group as the Source. Being able to reference the security group from itself is why we had to create an incomplete group earlier.
- Click Save
Launch 5 instances
Under Instances, click the Launch instance button and follow the wizard steps:
Choose Linux/Unix, CentOS 7 | 64-bit Amazon Machine Image (AMI) as Amazon Machine Image (AMI) - there are both commercial AMIs and Community based AMIs available.
The official CentOS 7 AMI on the AWS marketplace has been verified to work.
Choose Instance Type:
t2.medium is suffucient for this guide - for production setups see the sizing guide.
Enter 5 in the Number of instances field.
The default of 8 GiB SSD is sufficient for running this quick-start. Optionally select Delete on termination if you know you won't need the data stored in Vespa after you terminate the instances.
You do not have to add any tags for this quick-start
Configure Security Group:
Choose Select an existing security group and select the security group you created earlier. You should see the expected inbound rules towards the bottom of the page.
- Choose AMI:
Install and start
This example uses ip-172-1-1-1.us-east-2.compute.internal as the administration node and all commands below are run from that host. Locate the Public DNS record for that instance in EC2 Dashboard -> Instances, and log on to that host (e.g ec2-18-1-1-1.us-east-2.compute.amazonaws.com). Using ssh agent forwarding simplifies, given your aws identify key file my-aws-key.pem and assuming the AMI uses centos as login user:
$ ssh-add my-aws-key.pem $ ssh -A centos@<administration-node-public-dns>Inside the AWS EC2 cloud, boostrap the system:
Create hosts.txt with the Private DNS records for the EC2 instances allocated above.
These are located in EC2 Dashboard -> Instances.
Let the administation node be the first entry. Example:
$ cat hosts.txt ip-172-1-1-1.us-east-2.compute.internal ip-172-1-1-2.us-east-2.compute.internal ip-172-1-1-3.us-east-2.compute.internal ip-172-1-1-4.us-east-2.compute.internal ip-172-1-1-5.us-east-2.compute.internal
Download and distribute the aws bootstrap script to all instances:
$ curl -s https://raw.githubusercontent.com/vespa-engine/sample-apps/master/aws_bootstrap.sh \ > aws_bootstrap.sh $ for host in $(cat hosts.txt); do \ scp aws_bootstrap.sh $host:.; done
The bootstrap script installs vespa and git, does basic system configuration
and starts vespa and the configuration server on the administration node.
The script expects a single hostname as argument
which should be the Private DNS record of the administration node (first entry in hosts.txt).
The last step might take some time to complete:
$ for host in $(cat hosts.txt); do \ (ssh $host "sudo bash aws_bootstrap.sh $(head -1 hosts.txt)" 2>&1 | tee /tmp/aws_bootstrap_$host.log) & done; \ wait; \ echo "Bootstrap done"
Configure and deploy
The instances are now up and running. The next step is to configure Vespa using an example application package - on the administration node, do:
Configure the multi-node application:
An example application is found at github.com/vespa-engine/sample-apps/album-recommendation-selfhosted.
$ export VESPA_HOME=/opt/vespa; export PATH=$PATH:$VESPA_HOME/bin $ git clone https://github.com/vespa-engine/sample-apps.git $ cd sample-apps/album-recommendation-selfhostedAdd host aliases in src/main/application/hosts.xml (replace hostnames with Private DNS records):
<?xml version="1.0" encoding="utf-8" ?> <!-- Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. --> <hosts> <host name="ip-172-1-1-1.us-east-2.compute.internal"> <alias>admin0</alias> </host> <host name="ip-172-1-1-2.us-east-2.compute.internal"> <alias>stateless0</alias> </host> <host name="ip-172-1-1-3.us-east-2.compute.internal"> <alias>stateless1</alias> </host> <host name="ip-172-1-1-4.us-east-2.compute.internal"> <alias>content0</alias> </host> <host name="ip-172-1-1-5.us-east-2.compute.internal"> <alias>content1</alias> </host> </hosts>Configure service-to-host mappings in src/main/application/services.xml - in this example one configserver, two stateless containers for search and feed processing and two content nodes storing data with a redundancy of 2:
<?xml version="1.0" encoding="utf-8" ?> <!-- Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. --> <services version="1.0"> <admin version="2.0"> <adminserver hostalias="admin0"/> <configservers> <configserver hostalias="admin0"/> </configservers> </admin> <container id="container" version="1.0"> <document-api/> <search/> <nodes> <node hostalias="stateless0"/> <node hostalias="stateless1"/> </nodes> </container> <content id="music" version="1.0"> <redundancy>2</redundancy> <documents> <document type="music" mode="index"/> </documents> <nodes> <node hostalias="content0" distribution-key="0"/> <node hostalias="content1" distribution-key="1"/> </nodes> </content> </services>
Deploy the application:
$ vespa-deploy prepare src/main/application && vespa-deploy activateThe Vespa instances subscribes to configuration and once the application is deployed the set of services configured will start - no need to re-start Vespa or the configserver. When changing the application, just deploy again and the system will make the change.
Ensure the application is active - wait for a 200 OK response:
$ curl -s --head http://localhost:8080/ApplicationStatus
Inspect & Verify:
Inspect the system state using vespa-get-cluster-state, making sure services are 'up':
$ vespa-get-cluster-state Cluster music: music/distributor/0: up music/distributor/1: up music/storage/0: up music/storage/1: up
Feed documents and search
Feed sample documents:
$ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/A-Head-Full-of-Dreams.json \ http://localhost:8080/document/v1/mynamespace/music/docid/1 $ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/Love-Is-Here-To-Stay.json \ http://localhost:8080/document/v1/mynamespace/music/docid/2 $ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/Hardwired...To-Self-Destruct.json \ http://localhost:8080/document/v1/mynamespace/music/docid/3
Use the Public DNS records for the stateless containers and run queries from outside the EC2 instances:
$ curl -s "http://<stateless0-public-dns>:8080/search/?ranking=rank_albums&yql=select%20%2A%20from%20sources%20%2A%20where%20sddocname%20contains%20%22music%22%3B&ranking.features.query(user_profile)=%7B%7Bcat%3Apop%7D%3A0.8%2C%7Bcat%3Arock%7D%3A0.2%2C%7Bcat%3Ajazz%7D%3A0.1%7D" $ curl -s "http://<stateless1-public-dns>:8080/search/?ranking=rank_albums&yql=select%20%2A%20from%20sources%20%2A%20where%20sddocname%20contains%20%22music%22%3B&ranking.features.query(user_profile)=%7B%7Bcat%3Apop%7D%3A0.8%2C%7Bcat%3Arock%7D%3A0.2%2C%7Bcat%3Ajazz%7D%3A0.1%7D"Read more in the Search API