Machines should work: Using Ansible for IT VM provisioning

What: System provisioning
Why: Faster development startup
How: Using Ansible modules for provisioning via ssh

During the what the data hackathon, a friend of mine and I used Vagrant for setting up the same system for both of us. It was a perfect starting point. Nevertheless, during hacking we installed this package, tried out that package, .. After a short while, the two systems were not equal anymore. This is a first approach to use Ansible to keep two machines in sync during development.

Ansible works by using a central machine (control machine) to provision other machines (remote machines) via ssh. In this tutorial the control machine is assummed to be a linux machine with Python < 3 installed and the remote machine is a Vagrant VM running debian (see here for the Vagrantfile).

Additional resources

See also: this blog

Ansible Installation

In principal, it should be possible to install Ansible from source (github). Nevertheless, I already had Anaconda with Python 3 installed on the controll machine, which complicated things a little bit because Ansible works only wih Python < 3. After some trial and error, I installed Ansible using pip. You have to install the dependencies before:

1
2
sudo easy_install pip
sudo pip install paramiko PyYAML Jinja2 httplib2 six

If you get an error message about missing libffi, install python-cffi before.

Install Ansible via pip:

1
sudo pip install ansible

Setup Ansible

You have to setup Ansible by providing a so called Inventory file to specify, which machines should be provisioned.
Create a file called hosts and add a name in brackets as well as ip, port, … for all the machines you want to provision. If you use the Vagrantfile from here, the ip should be 10.0.2.2 (default gateway in Vagrant for public networks). In other cases or if no connection is established change the ip accordingly. You can find the ip by running the following command on the remote machine. Look for the address which is listed as default gateway ip.

1
/sbin/ifconfig

With the correct ip, create the inventory file:

1
2
3
4
cat &lt;&gt; /home/vagrant/ansible/hosts
[ansibletest]
10.0.2.2 ansible_port=2200 ansible_user=vagrant
EOF

Ping

Ansible uses key based authentification by default. I switched it off in my use case for simplicity and use password based authentication which is enabled by –ask-pass. Otherwise, the keys have to be set up. For password based authentication: You have to install sshpass.

Ansible comes with a predefined module for checkin availability of remote machines: ping. Use it with the -m option:

1
ansible ansibletest -i  -m ping --ask-pass

You will be asked for the password on the remote machine (it is vagrant) and after that the output should be equivalent to:

1
2
3
4
10.0.2.2 | SUCCESS =&gt; {
    "changed": false,
    "ping": "pong"
}

Installing Python3, pip and Pandas on the remote machine

Ansible works by executing so called playbooks for different scenarios.

Lets create a playbook, which uses apt to install Python 3 and pip on the remote machine and afterwards using pip to install python packages (Pandas in this case). Create a file called python-datascience.yml with the following content (or download it here):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
- name: install python for data science
  hosts: ansibletest
  become: true

  tasks:
    - name: Get aptitude for upgrade
      apt: pkg=aptitude state=present

    - name: install base packages
      apt: pkg={{item}} state=present
      with_items:
        - python3
        - python3-pip

    - name: Install global python requirements
      pip: name={{item}} state=present executable=pip3
      with_items:
        - pandas

Run this playbook in a way similar to the ping above:

1
ansible-playbook python-datascience.yml --ask-pass

The output should be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PLAY [install python for data science] *****************************************
 
TASK [setup] *******************************************************************
ok: [10.0.2.2]
 
TASK [Get aptitude for upgrade] ************************************************
ok: [10.0.2.2]
 
TASK [install base packages] ***************************************************
ok: [10.0.2.2] =&gt; (item=[u'python3', u'python3-pip'])
 
TASK [Install global python requirements] **************************************
ok: [10.0.2.2] =&gt; (item=pandas)
 
PLAY RECAP *********************************************************************
10.0.2.2                   : ok=4    changed=0    unreachable=0    failed=0

Login to the remote machine to check the result. You should have python3 and pandas now installed.