Archiv der Kategorie: General programming

Machines should work: Using Ansible for IT VM provisioning

What: System provisioning
Why: Faster development startup
How: Using Ansible modules for provisioning via ssh

During the what the data hackathon, a friend of mine and I used Vagrant for setting up the same system for both of us. It was a perfect starting point. Nevertheless, during hacking we installed this package, tried out that package, .. After a short while, the two systems were not equal anymore. This is a first approach to use Ansible to keep two machines in sync during development.

Ansible works by using a central machine (control machine) to provision other machines (remote machines) via ssh. In this tutorial the control machine is assummed to be a linux machine with Python < 3 installed and the remote machine is a Vagrant VM running debian (see here for the Vagrantfile).

Additional resources

See also: this blog

Ansible Installation

In principal, it should be possible to install Ansible from source (github). Nevertheless, I already had Anaconda with Python 3 installed on the controll machine, which complicated things a little bit because Ansible works only wih Python < 3. After some trial and error, I installed Ansible using pip. You have to install the dependencies before:

1
2
sudo easy_install pip
sudo pip install paramiko PyYAML Jinja2 httplib2 six

If you get an error message about missing libffi, install python-cffi before.

Install Ansible via pip:

1
sudo pip install ansible

Setup Ansible

You have to setup Ansible by providing a so called Inventory file to specify, which machines should be provisioned.
Create a file called hosts and add a name in brackets as well as ip, port, … for all the machines you want to provision. If you use the Vagrantfile from here, the ip should be 10.0.2.2 (default gateway in Vagrant for public networks). In other cases or if no connection is established change the ip accordingly. You can find the ip by running the following command on the remote machine. Look for the address which is listed as default gateway ip.

1
/sbin/ifconfig

With the correct ip, create the inventory file:

1
2
3
4
cat &lt;&gt; /home/vagrant/ansible/hosts
[ansibletest]
10.0.2.2 ansible_port=2200 ansible_user=vagrant
EOF

Ping

Ansible uses key based authentification by default. I switched it off in my use case for simplicity and use password based authentication which is enabled by –ask-pass. Otherwise, the keys have to be set up. For password based authentication: You have to install sshpass.

Ansible comes with a predefined module for checkin availability of remote machines: ping. Use it with the -m option:

1
ansible ansibletest -i  -m ping --ask-pass

You will be asked for the password on the remote machine (it is vagrant) and after that the output should be equivalent to:

1
2
3
4
10.0.2.2 | SUCCESS =&gt; {
    "changed": false,
    "ping": "pong"
}

Installing Python3, pip and Pandas on the remote machine

Ansible works by executing so called playbooks for different scenarios.

Lets create a playbook, which uses apt to install Python 3 and pip on the remote machine and afterwards using pip to install python packages (Pandas in this case). Create a file called python-datascience.yml with the following content (or download it here):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
- name: install python for data science
  hosts: ansibletest
  become: true

  tasks:
    - name: Get aptitude for upgrade
      apt: pkg=aptitude state=present

    - name: install base packages
      apt: pkg={{item}} state=present
      with_items:
        - python3
        - python3-pip

    - name: Install global python requirements
      pip: name={{item}} state=present executable=pip3
      with_items:
        - pandas

Run this playbook in a way similar to the ping above:

1
ansible-playbook python-datascience.yml --ask-pass

The output should be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PLAY [install python for data science] *****************************************
 
TASK [setup] *******************************************************************
ok: [10.0.2.2]
 
TASK [Get aptitude for upgrade] ************************************************
ok: [10.0.2.2]
 
TASK [install base packages] ***************************************************
ok: [10.0.2.2] =&gt; (item=[u'python3', u'python3-pip'])
 
TASK [Install global python requirements] **************************************
ok: [10.0.2.2] =&gt; (item=pandas)
 
PLAY RECAP *********************************************************************
10.0.2.2                   : ok=4    changed=0    unreachable=0    failed=0

Login to the remote machine to check the result. You should have python3 and pandas now installed.

Let the sun shine

What: Getting power production data in a real life photovoltaic system
Why: Use real data in modelling; Use mqtt to get sensor data
How: Use mqtt, csvkit and R to get and clean the data, use command line pipeline for data processing

This is the second part of my what-the-data hackathon series. Please visit also part one.

During the hackathon, the data were provided in json format with a timestamp. We collected data for nearly two days without serious interruption. My team was looking at the power production of the photovoltaic system. We decided to gather all the data as-is and write it to disk (we were expecting <100Mb of data). We used a small python script to get the data via mqtt. After the two days we collected 2.7Mb of data for the power production (and a lot more for other sensor data), which is approx. 58.000 entries like:

b'{"value":-3724.951,"timestamp":1474813391427}'
b'{"value":-3746.143,"timestamp":1474813392417}'

You can download the file here.

We needed the data for simulation, where we simulated energy transport in the system for each second. Most of the data were already timed at seconds, but some data points were missing due to sensor failures. Nevertheless, for our simulation we needed also values for these timepoints. Luckily, the sun shine doesn’t change too fast and a simple smoothing was enough to interpolate missing values.

We wanted to process the data with R and maybe other tools. We decided to convert the data to csv and process this further. Main points of the decision were:

  • Ability to put csv easily under version control (text file)
  • Use the csv file in a ton of different tools

We used the following bash script to strip of characters and convert the single entries to a proper json array (assume, the raw data is in the file pv_wirk.raw and has the format described above):

1
2
3
sed "s/^b//g" pv_wirk.raw | sed "s/'//g" | \
sed "s/$/,/g" | sed '$s/},/}/g' | \
sed 's/value":-/value":/g' | sed "1 i \[" | sed -e "\$a]"

It does the following:

  1. Strips of the first character (b)
  2. Strips of the dash characters
  3. Adds a comma at the end of the line
  4. Removes the comma from the last line (althouh it is valid json, csvkit is a little bit picky about this)
  5. Inverts the data. This was needed because the produced power was send with a neative sign.
  6. Add a square bracket at the first line to begin the json array
  7. Add a square bracket at the last line to end the json array

The output of the above command is piped into csvkit and converted from json to csv:

1
... | csvkit -f json

The data is now in the form:

value,timestamp
4379.872,1474707052767
4388.521,1474707053773
4393.466,1474707054770

With the data in a nice csv format, we used R to plot the data and interpolate it simply by using loess smoothing. Where data were missing, we imputed it with predicted values from the loess-fit. You can find the R script here.

The resulting image is shown below. The power still shows the natural fluctuations but also contains data for each second.
analysed

What the data hackathon 2016: Agents on the run

The What the data hackathon is over! It was an incredible weekend in Hamburg with a lot of fun, nerds and crazy ideas (and some barbeque as well). A friend of mine and I were participating as team coding agents and we were one of the prize winning teams!

logo

The hackathon was held in a smart house of the Energie-Campus, Hamburg full of sensors. The sensors included heating, photovoltaic, hydrolysis and methane production for power2gas conversions, a lot of heat sensors, power consumption/production sensors on many systems, RFID tags and readers, cameras, lights, … All the systems were nicely available through MQTT protocoll and although the WiFi was off for some time and not all privileges to read/write data were granted from the beginning on, the organizers did a really great job! Thanks so much! During the hackathon we learned a lot about MQTT, renewable energy, smart houses, IoT, sensors and man-machine interactions. It is definitely worth to visit the hackathon next year again. And to show some numbers: According to the organizers 40 million data sets were transfered during the weekend between the hackathon participants and the sensor system.

This is the first part of articles about the hackathon. For further reading please see part two.

Following are some of my personal highlights of the event:

  • A team using Alexa, Mobile or a Smartwatch as input device for querying machine state. If a RFID tagged user was close to a machine Alexa was answering a simple command with a detailed explanation of the machines state like: CO2, O2 or Methane concentration, temperature, …

    alexa

  • A team using a laser scanner on the fridge to trigger a camera, which made a photo of the fridge opening persons T-shirt. The image was analysed with respect of T-shirt color and bound to the persons rfid-id. Next time this person was close to any of the RFID readers its color changed according to the T-shirt color. All the data were visualized with a small app. A nice and simple way to play Capture the flag.

    capture-the-flag

  • A team proposing blockchain technology to organize energy distribution based on the demand of an individual agent. Maybe, things like that can be done with smart contracts? The different parts in the system are independent agents, the hydrolysis machine was used as an example.

    blockchain

  • A team using a smartwatch as input device to control the movement of a roboter arm. The movement included up/down, left/right and stick/release items.

    robot

It was very interesting to see location based activation of services in different applications (capture the flag and alexa). For sure, this is a quite powerful concept also for serious applications. Also, the use of Raspberries in a lot of projects was very interesting. As an additional plus: Discussions with a lot of people about sensors, chips and how to use it for hobby projects.

Our own project was about simulating the energy distribution from the photovoltaic on the roof down to the different stages of hydrolysis, methan production to charging electrical bicycles and electrical cars. There was some time left at the end. We did a good investment by using this time to tought the RFID sensors to blink in Morse code during our final presentation.

morse

Some parts of our project will be highlighted here in future articles. Stay tuned!

Pythagorean triples: Do it right

What: Minimal lines of code for calculating the length of integer-sided right triangles with a side length below a given threshold
Why: Functional programming paradigm and vector handling in different languages
How: Write minimal examples for: Frege, Java, SQL, R, Python, Javascript. Please contribute!

Last week I went to a talk, where Frege was introduced. Frege is a purely functinal language based on Haskel. I once looked at Haskell and the introductory example was the famous pythogorean triples. Thats also mentioned on the Frege page. I was asking myself: How can this be done in Java, or R or SQL?

Here is my list of implementations. Please contribute if you know more or have a better (shorter) version. Line breaks are inserted for better layout. All implementations return something like:

(3, 4, 5)
(4, 3, 5)
(6, 8, 10)
(8, 6, 10)

Frege

This is not tested. I am not sure, what Frege says about the inserted line breaks.

1
2
3
4
5
[ (a,b,c) |
  a <- [1..10],
  b <- [x..10],
  c <- [x..10],
a*a + b*b == c*c ]

Java

Tested.

For the Java version: Lets create a model class first. This makes the stream handling more easy. It is just some sugar.

1
2
3
4
5
6
7
8
9
10
11
12
static class Triple {
  final int a, b, c;
 
  public Triple(int a, int b, int c) {
    this.a=a;this.b=b;this.c=c;
  }
 
  @Override
  public String toString() {
    return "Triple [a=" + a + ", b=" + b + ", c=" + c + "]";
  }
}

Now, lets write the logic:

1
2
3
4
5
6
7
  IntStream intStream = IntStream.range(0, 1000);
  intStream.boxed().map(number -> new Triple(
    (number/100)%10+1,
    (number/10)%10+1,
    (number/1)%10+1)).
  filter(triple -> Math.pow(triple.a, 2)+Math.pow(triple.b, 2)==Math.pow(triple.c, 2)).
  forEach(triple -> System.out.println(triple));

SQL (Oracle)

Tested.

1
2
3
4
5
SELECT a, b, c FROM
  (SELECT Level AS a FROM Dual CONNECT BY Level <=10),
  (SELECT Level AS b FROM Dual CONNECT BY Level <=10),
  (SELECT Level AS c FROM Dual CONNECT BY Level <=10)
WHERE POWER(a, 2)+POWER(b, 2)=POWER(c, 2)

R

Tested.

1
2
3
df=data.frame(a=1:10, b=1:10, c=1:10)
expanded=expand.grid(df)
subset(expanded, a**2+b**2==c**2)

Python (3)

Tested.

1
2
3
4
5
6
import itertools
 
triples = [range(1, 11), range(1, 11), range(1, 11)]
valid=filter(
  lambda t: t[0]**2+t[1]**2==t[2]**2, list(itertools.product(*triples)))
print(*valid, sep="\n")

Javascript

Tested.

Creation of filled arrays: See here.

Integer division: See here.

1
2
3
4
5
6
7
8
var numbers=Array.apply(null, {length: 1000}).map(Number.call, Number);
var triples=numbers.map(function(n){
  return {a: ~~(n/100)%10+1, b: ~~(n/10)%10+1, c: ~~(n/1)%10+1}
});
var valid=triples.filter(function(t){
  return Math.pow(t.a,2)+Math.pow(t.b,2)==Math.pow(t.c,2)
});
console.log(valid);