Prerequisites:
This guide explains how to set up a multinode Vespa system on AWS EC2 instances. The tasks involved are similar to setting up a single-node system on CentOS. See Getting Started for troubleshooting, next steps and other guides. Also read using ECS to set up a multinode docker based system.
Navigate to the AWS EC2 console and correct region.
Configure an AWS security group which will limit access to the installation and allow the launched EC2 instances to talk to each other without restrictions. Refer to securing a Vespa installation.
Custom TCP
for port range 8080
(feed & search port) from your source IP range.
For testing, you can choose "My IP" from the Source drop-down menu.
SSH
for your source IP range.
You need SSH access on all instances to install and configure Vespa.
Custom TCP
for port range 0-65535
, specifying
the name of the current Security Group as the Source
. Being able to reference the security group
from itself is why we had to create an incomplete group earlier.
Under Instances, click the Launch instance button and follow the wizard steps:
Choose Linux/Unix, CentOS 7 | 64-bit Amazon Machine Image (AMI) as Amazon Machine Image (AMI) - there are both commercial AMIs and Community based AMIs available.
The official CentOS 7 AMI on the AWS marketplace has been verified to work.
t2.medium
is sufficient for this guide -
for production setups see the sizing guide.
Enter 5 in the Number of instances field.
The default of 8 GiB SSD is sufficient for running this quick-start. Optionally select Delete on termination if you know you won't need the data stored in Vespa after you terminate the instances.
You do not have to add any tags for this quick-start
Choose Select an existing security group and select the security group you created earlier. You should see the expected inbound rules towards the bottom of the page.
Click Launch.
This example uses ip-172-1-1-1.us-east-2.compute.internal
as the administration node and all commands below are run from that host.
Locate the Public DNS record for that instance in
EC2 Dashboard -> Instances,
and log on to that host (e.g ec2-18-1-1-1.us-east-2.compute.amazonaws.com
).
Using ssh agent forwarding simplifies, given your aws identify key file my-aws-key.pem
and assuming the AMI uses centos
as login user:
$ ssh-add my-aws-key.pem $ ssh -A centos@<administration-node-public-dns>Inside the AWS EC2 cloud, boostrap the system:
hosts.txt
with the Private DNS records for the EC2 instances allocated above.
These are located in EC2 Dashboard -> Instances.
Let the administration node be the first entry. Example:
$ cat hosts.txt ip-172-1-1-1.us-east-2.compute.internal ip-172-1-1-2.us-east-2.compute.internal ip-172-1-1-3.us-east-2.compute.internal ip-172-1-1-4.us-east-2.compute.internal ip-172-1-1-5.us-east-2.compute.internal
$ curl -s https://raw.githubusercontent.com/vespa-engine/sample-apps/master/aws_bootstrap.sh \ > aws_bootstrap.sh $ for host in $(cat hosts.txt); do \ scp aws_bootstrap.sh $host:.; done
$ for host in $(cat hosts.txt); do \ (ssh $host "sudo bash aws_bootstrap.sh $(head -1 hosts.txt)" 2>&1 | tee /tmp/aws_bootstrap_$host.log) & done; \ wait; \ echo "Bootstrap done"
The instances are now up and running. The next step is to configure Vespa using an example application package - on the administration node, do:
Configure the multi-node application:
An example application is found at github.com/vespa-engine/sample-apps/album-recommendation.
$ export VESPA_HOME=/opt/vespa; export PATH=$PATH:$VESPA_HOME/bin $ git clone https://github.com/vespa-engine/sample-apps.git $ cd sample-apps/album-recommendation
Add host aliases in
src/main/application/hosts.xml
(replace hostnames with Private DNS records):
<?xml version="1.0" encoding="utf-8" ?> <!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. --> <hosts> <host name="ip-172-1-1-1.us-east-2.compute.internal"> <alias>admin0</alias> </host> <host name="ip-172-1-1-2.us-east-2.compute.internal"> <alias>stateless0</alias> </host> <host name="ip-172-1-1-3.us-east-2.compute.internal"> <alias>stateless1</alias> </host> <host name="ip-172-1-1-4.us-east-2.compute.internal"> <alias>content0</alias> </host> <host name="ip-172-1-1-5.us-east-2.compute.internal"> <alias>content1</alias> </host> </hosts>
Configure service-to-host mappings in
src/main/application/services.xml
-
in this example one configserver,
two stateless containers for search and feed processing
and two content nodes storing data with a redundancy of 2:
<?xml version="1.0" encoding="utf-8" ?> <!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. --> <services version="1.0"> <admin version="2.0"> <adminserver hostalias="admin0"/> <configservers> <configserver hostalias="admin0"/> </configservers> <cluster-controllers standalone-zookeeper="true"> <cluster-controller hostalias="stateless0" /> </cluster-controllers> <slobroks> <slobrok hostalias="admin0" /> <slobrok hostalias="stateless0" /> <slobrok hostalias="stateless1" /> </slobroks> </admin> <container id="container" version="1.0"> <document-api/> <search/> <nodes> <node hostalias="stateless0"/> <node hostalias="stateless1"/> </nodes> </container> <content id="music" version="1.0"> <redundancy>2</redundancy> <documents> <document type="music" mode="index"/> </documents> <nodes> <node hostalias="content0" distribution-key="0"/> <node hostalias="content1" distribution-key="1"/> </nodes> </content> </services>
Deploy the application:
$ vespa-deploy prepare src/main/application && vespa-deploy activate
The Vespa instances subscribes to configuration and once the application is deployed the set of services configured will start - no need to re-start Vespa or the configserver. When changing the application, just deploy again and the system will make the change.
$ curl -s --head http://ip-172-1-1-3.us-east-2.compute.internal:8080/ApplicationStatus
Inspect & Verify:
Inspect the state using
vespa-get-cluster-state
,
making sure services are 'up':
$ vespa-get-cluster-state Cluster music: music/distributor/0: up music/distributor/1: up music/storage/0: up music/storage/1: up
Feed documents:
Feed sample documents:
$ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/A-Head-Full-of-Dreams.json \ http://ip-172-1-1-3.us-east-2.compute.internal:8080/document/v1/mynamespace/music/docid/a-head-full-of-dreams $ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/Love-Is-Here-To-Stay.json \ http://ip-172-1-1-3.us-east-2.compute.internal/document/v1/mynamespace/music/docid/love-is-here-to-stay $ curl -s -H "Content-Type:application/json" --data-binary \ @src/test/resources/Hardwired...To-Self-Destruct.json \ http://ip-172-1-1-3.us-east-2.compute.internal:8080/document/v1/mynamespace/music/docid/hardwired-to-self-desctruct
Run queries:
Use the Public DNS records for the stateless containers and run queries from outside the EC2 instances:
$ curl -s "http://<stateless0-public-dns>:8080/search/?ranking=rank_albums&yql=select%20%2A%20from%20sources%20%2A%20where%20sddocname%20contains%20%22music%22&input.query(user_profile)=%7B%7Bcat%3Apop%7D%3A0.8%2C%7Bcat%3Arock%7D%3A0.2%2C%7Bcat%3Ajazz%7D%3A0.1%7D" $ curl -s "http://<stateless1-public-dns>:8080/search/?ranking=rank_albums&yql=select%20%2A%20from%20sources%20%2A%20where%20sddocname%20contains%20%22music%22&input.query(user_profile)=%7B%7Bcat%3Apop%7D%3A0.8%2C%7Bcat%3Arock%7D%3A0.2%2C%7Bcat%3Ajazz%7D%3A0.1%7D"
Read more in the Query API