Feature engineering in python

What: Generating features for regression models in python based on formulas (like in R)
Why: Model notation in formulas and not in code => Better maintainability
How: Use patsy

Test data

Lets take as example a simple: fitting an exponential to some test data. First, some test data needs to be generated:

The test data and its components look like this:

Feature engineering

Lets assume, we want to fit the data with a linear and quadratic term (surprise, surprise). When using a linear model, you would provide a matrix where each column is one of the features. In our case this would mean writing a small script to fill a length x 3 matrix. Although it is no big deal it involves some lines of code and is not very readable.

With patsy we can write instead something like the following and let the matrices be generated by patsy:

This will result for x in something like:


With sklearn it is easy to do the fit:

In the end, it looks like: