Machines should work: Using Ansible for IT VM provisioning

What: System provisioning
Why: Faster development startup
How: Using Ansible modules for provisioning via ssh

During the what the data hackathon, a friend of mine and I used Vagrant for setting up the same system for both of us. It was a perfect starting point. Nevertheless, during hacking we installed this package, tried out that package, .. After a short while, the two systems were not equal anymore. This is a first approach to use Ansible to keep two machines in sync during development.

Ansible works by using a central machine (control machine) to provision other machines (remote machines) via ssh. In this tutorial the control machine is assummed to be a linux machine with Python < 3 installed and the remote machine is a Vagrant VM running debian (see here for the Vagrantfile).

Additional resources

See also: this blog

Ansible Installation

In principal, it should be possible to install Ansible from source (github). Nevertheless, I already had Anaconda with Python 3 installed on the controll machine, which complicated things a little bit because Ansible works only wih Python < 3. After some trial and error, I installed Ansible using pip. You have to install the dependencies before:

If you get an error message about missing libffi, install python-cffi before.

Install Ansible via pip:

Setup Ansible

You have to setup Ansible by providing a so called Inventory file to specify, which machines should be provisioned.
Create a file called hosts and add a name in brackets as well as ip, port, … for all the machines you want to provision. If you use the Vagrantfile from here, the ip should be 10.0.2.2 (default gateway in Vagrant for public networks). In other cases or if no connection is established change the ip accordingly. You can find the ip by running the following command on the remote machine. Look for the address which is listed as default gateway ip.

With the correct ip, create the inventory file:

Ping

Ansible uses key based authentification by default. I switched it off in my use case for simplicity and use password based authentication which is enabled by –ask-pass. Otherwise, the keys have to be set up. For password based authentication: You have to install sshpass.

Ansible comes with a predefined module for checkin availability of remote machines: ping. Use it with the -m option:

You will be asked for the password on the remote machine (it is vagrant) and after that the output should be equivalent to:

Installing Python3, pip and Pandas on the remote machine

Ansible works by executing so called playbooks for different scenarios.

Lets create a playbook, which uses apt to install Python 3 and pip on the remote machine and afterwards using pip to install python packages (Pandas in this case). Create a file called python-datascience.yml with the following content (or download it here):

Run this playbook in a way similar to the ping above:

The output should be:

Login to the remote machine to check the result. You should have python3 and pandas now installed.