Setting up a personal VPN to access AWS instances

The goal of this post is to lay the groundwork for running a Spark driver against a Spark cluster in a AWS VPC. Specifically by setting up a VPN to access VPC instances.

There's a few different ways of connecting to instances running within a VPC; however, most only allow connections to be initiated from one side. These include public IPs and ssh tunneling. Some like public IPs can also carry security risks if you're not careful with things like security group restrictions. The Spark driver however requires a two way initiation of connections with the master connecting back to the driver's host.

This post will cover:

  • Setting up a VPC
  • Setting up a DNS forwarder
  • Setting up a VPN
  • Configuring the VPN and VPC to work correctly

Setting up a VPC

In your AWS account go to VPC and then Start VPC Wizard. In the first Step simply select Create.

In the second step, leave everything default unless your local network uses the 10.0.0.0/16 ip range in which case you need to change that (to for example 10.1.0.0).

image1.png

This will create a VPC that allocates public hostnames and IPs to it's instances. This option allows for faster S3 access and removes the need for a NAT instance for internet access. At the same time you need to be careful to ensure that the security group restrictions are well formulated and do not allow arbitrary access to your instances.

Then select Create VPC

Modify the Subnet (VPC->Subnets) to enable auto-assigned public IPs

Setting up a DNS forwarder

The AWS DNS server does two things that external DNS servers do not: it resolves the private ip hostnames (ie: ip-172-30-2-16.ec2.internal) and it resolves the public hostnames to their local ips. These are both vital for being able to connect to a Spark cluster over a VPN connection (the former if the cluster is using private hostnames and the latter if it's using public hostnames). More specifically, Spark requires the hostname you use to connect to the master to be the same as what the master was configured with; as a result you cannot simply use the master's private ip to connect to it.

So, when you try to connect to “Spark://ec2-52-5-111-98.compute-1.amazonaws.com :7077” the DNS will resolve ec2-52-5-111-98.compute-1.amazonaws.com to 10.0.0.40. The VPN will then pass the connection through since it's in the ip range allocated to the VPC.

The issue is that the internal AWS DNS server will not accept connections from outside EC2 even if they come over a VPN. The best way to get around this is to setup a small DNS forwarding server which forwards requests to the AWS DNS server. If you want to know more about setting up DNS forwarding and these options then there is a great guide here. As a note, despite being somewhat of a hack a forwarding DNS was the officially recommended solution when we spoke to an amazon representative.

Create a new Ubuntu micro instance in aws. 

Update packages and install bind:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install bind9 bind9utils bind9-doc

Modify the bind configuration:

sudo nano /etc/bind/named.conf.options

This should be the configuration:

acl goodclients {
0.0.0.0/0;
localhost;
localnets;
};

options {
directory "/var/cache/bind";

recursion yes;
allow-query { goodclients; };

forwarders {
10.0.0.2;
};
forward only;

dnssec-enable yes;
dnssec-validation yes;

auth-nxdomain no;# conform to RFC1035
listen-on-v6 { any; };
};

Check that the configuration has no errors:

sudo named-checkconf

Restart bind:

sudo service bind9 restart

Setting up a VPN

We'll be using OpenVPN using the AWS Marketplace instances:
https://aws.amazon.com/marketplace/pp/B00MI40CAE

The full guide to setting it up is here:
https://docs.openvpn.net/how-to-tutorialsguides/virtual-platforms/amazon-ec2-appliance-ami-quick-start-guide/

Select the VPC created previously, the instance size you want (we're going with a t2.micro instance), and the default security group settings.

Select “Launch with 1 Click”

Configuring the VPN

ssh into the instances with the username “openvpnas” and the selected ssh keys once it's created. Answer the questions that come up:

Please enter 'yes' to indicate your agreement [no]: yes

Will this be the primary Access Server node?
(enter 'no' to configure as a backup or standby node)
> Press ENTER for default [yes]: yes

Please specify the network interface and IP address to be
used by the Admin Web UI:
(1) all interfaces: 0.0.0.0
(2) eth0: 10.0.0.250
Please enter the option number from the list above (1-2).
> Press Enter for default [2]: 1

Please specify the port number for the Admin Web UI.
> Press ENTER for default [943]: 943

Please specify the TCP port number for the OpenVPN Daemon
> Press ENTER for default [443]: 443

Should client traffic be routed by default through the VPN?
> Press ENTER for default [no]: no

Should client DNS traffic be routed by default through the VPN?
> Press ENTER for default [no]: yes

Use local authentication via internal DB?
> Press ENTER for default [yes]: yes

Private subnets detected: ['10.0.0.0/16']

Should private subnets be accessible to clients by default?
> Press ENTER for EC2 default [yes]: yes

Do you wish to login to the Admin UI as "openvpn"?
> Press ENTER for default [yes]: no

Create a username that you want to use for administration of the vpn server and a password for that user. 

Go to the admin webpage and login using the username/password you just created to continue the configuration:
https://<ip>:943/admin

Configure the VPN server to use routing instead of NAT:

In addition the DNS needs to be updated to point to the forwarding DNS server that was previously created:

Configuring the VPC for the VPN

Select the VPN server instance and remove the Source/Dest Check. This allows for traffic to be routed to the instance for IPs that aren't assigned to it, specifically the IPs for the VPN clients that are connected to it:

Create a new route for the subnet and VPC we're using to sent traffic for the VPN's client IP range to the VPN instance:

Add a rule to the VPN server security group to allow all local traffic of the VPC to connect to it.

Create a new security group that will allow VPN access and will be added to any instances that we want VPN access for (such as Spark clusters using additional security groups setting).

Connect to the VPN

Go to https://<ip>/ and login with the administrator account/password then follow the instructors to install the client VPN software.