Archiv der Kategorie: Java

Tomcat & Meecrowave on a server: Slow startup

What: Using Meecrowave on a server without large delays on startup
Why: Faster startup during development
How: Use a faster entropy source

Background

Meecrowave uses Tomcat, which uses a SecureRandom instance for session ids. On a server, the underlying entropy source can run short resulting in large delays until the application endpoints are scanned and enabled.

Details

See the tomcat how to.

Solution

Add the following line to the JAVA_OPTS in meecrowave.sh:

1
-Djava.security.egd=file:/dev/./urandom

Cooking with meecrowave

What: Building microservices in Java(9) painlessly
Why: Small and maintainable services
How: Using meecrowave

Introduction

Creating microservices in Java can be quite complicated. You either do it by yourself using the Java internal HTTP-Server, using one of the many application servers or using one of the integrated frameworks like wildfly-swarm. While the first option doesn’t work well if you want to use something like dependency injection and you have to include all useful libraries like jax-rs by yourself, the second option already contains the most parts of it. You can use for example Glassfish, Wildfly, Websphere or Tomcat (and TomEE). Nevertheless, you rely on heavy application servers and you have to start an instance of these and deploy for each testing (although there exists some integration solutions into IDEs).

The integrated frameworks are sometimes huge, need extended configuration or doesn’t play well with Java9. Partly testing is not as easy as it should be (dependency injection is one of the issues).

Meecrowave on the other hand is a small framework which works well with CDI, JAX-RS and Jackson out of the box which is super easy to set up and performing integration tests is as easy as starting a JUnit test. The following tutorial shows an easy example.

The source code for this tutorial is available here (folder micro).

Note: Although the example runs with Java9 it is not modularized. Some of the dependencies are not yet available as Java9 modules and thus creating this example as a module is outof scope for this tutorial).

Setup

In the following, maven and jdk9 (both for compiling and running) is used.

Add the following dependencies to your pom.xml to include the needed libraries for this example.

1
2
3
	org.apache.meecrowave
	meecrowave-core
	1.2.0

Server

Starting meecrowave is simple: Just start the meecrowave server. All the rest like scanning classes for endpoints, … is done automatically. Create a class with a main method and add the following code:

1
2
3
4
5
6
public static void main(String[] args) {
	try (final Meecrowave meecrowave = new Meecrowave();final Scanner scanner=new Scanner(System.in);) {
		meecrowave.bake();
	    scanner.nextLine();
	}
}

If you start the class, you should see some printout and meecrowave is up and running. To start it from Java9 you have to add –add-modules java.xml.bind as argument to the virtual machine.

Note: The main class is not needed at all for running it outside of ides(at least not on Linux machines) since meecrowave can create a whole distribution package (see below).

You should see output like:

[09:56:57.591][INFO ][           main][.webbeans.config.BeansDeployer] All injection points were validated successfully.
[09:56:57.904][INFO ][           main][apache.cxf.endpoint.ServerImpl] Setting the server's publish address to be /
[09:56:57.959][INFO ][           main][ifecycle.WebContainerLifecycle] OpenWebBeans Container has started, it took [694] ms.
[09:56:58.119][WARN ][           main][na.util.SessionIdGeneratorBase] Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [120] milliseconds.
[09:56:58.164][INFO ][           main][meecrowave.cxf.CxfCdiAutoSetup] REST Application: / -> org.apache.cxf.cdi.DefaultApplication
[09:56:58.164][INFO ][           main][meecrowave.cxf.CxfCdiAutoSetup]      Service URI: /test  -> de.moduliertersingvogel.micro.SimpleEndpoint
[09:56:58.169][INFO ][           main][meecrowave.cxf.CxfCdiAutoSetup]               GET /test/ ->      Response test()

JAX-RS endpoints

You can create arbitrary endpoints based on the jax-rs annotations. Each endpoint needs to be annotated with a Path and scope annotation. The following example defines a simple endpoint returning the string „Hello World“:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package de.moduliertersingvogel.micro;
 
import javax.enterprise.context.RequestScoped;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.core.Response;
 
@RequestScoped
@Path("test")
public class SimpleEndpoint {
	@GET
	public Response test() {
		return Response.ok().entity("Hello World").build();
	}
}

You can point your browser to http://localhost:8080/test and see the result.

Dependency injection

Dependency injection works like expected. You need a class, which is annotated with the scope and inject it somewhere. Lets test it with a simple object:

1
2
3
4
5
6
7
8
9
10
package de.moduliertersingvogel.micro;
 
import javax.enterprise.context.ApplicationScoped;
 
@ApplicationScoped
public class SimpleObject {
	public boolean callMe() {
		return true;
	}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import javax.enterprise.context.RequestScoped;
import javax.inject.Inject;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.core.Response;
import javax.ws.rs.core.Response.Status;
 
@RequestScoped
@Path("test")
public class SimpleEndpoint {
	@Inject
	SimpleObject obj;
 
	@GET
	public Response test() {
		if(obj.callMe()) {
			return Response.ok().entity("Hello World").build();
		}
		return Response.status(Status.BAD_REQUEST).entity("Something went wrong").build();
	}
}

Run your main class and check in yout browser (see above) to see that everything works fine.

Testing

Testing in meecrowave is as simple as writing (annotated) unit tests. In order to get it working, you have to add the following dependencies to your pom.xml (okhttp is used for getting the result from the running microservice):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
	org.apache.meecrowave
	meecrowave-junit
	1.2.0
	test
 
 
	org.junit.jupiter
	junit-jupiter-api
	5.0.2
	test
 
 
	org.junit.jupiter
	junit-jupiter-engine
	5.0.2
	test
 
 
	com.squareup.okhttp3
	okhttp
	3.9.1

Additionally, you need some tweaking to get the tests working with Java9. Add the following line to the properties section in the pom.xml:

1
--add-modules java.xml.bind

Add the following test class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package de.moduliertersingvogel.micro;
 
import static org.junit.jupiter.api.Assertions.assertEquals;
 
import org.apache.meecrowave.Meecrowave;
import org.apache.meecrowave.junit5.MeecrowaveConfig;
import org.apache.meecrowave.testing.ConfigurationInject;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
 
import com.squareup.okhttp.OkHttpClient;
import com.squareup.okhttp.Request;
import com.squareup.okhttp.Response;
 
@MeecrowaveConfig /*(some config)*/
public class SimpleEndpointTest {
    @ConfigurationInject
    private Meecrowave.Builder config;
	private static OkHttpClient client;
 
	@BeforeAll
	public static void setup() {
		client = new OkHttpClient();
	}
 
	@Test
	public void test() throws Exception {
		final String base = "http://localhost:" + config.getHttpPort();
 
		Request request = new Request.Builder()
	      .url(base+"/test")
	      .build();
		Response response = client.newCall(request).execute();
		assertEquals("Hello World",  response.body().string());
	}
}

And run it either from your IDE or from maven:

1
mvn clean test

During test execution the server should be started and the tests should be executed successfully against your running meecrowave application should be performed:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.8 sec - in de.moduliertersingvogel.micro.SimpleEndpointTest

Distribution

Creating a distribution package (for Linux) is as simple as adding a new goal to the maven call:

1
mvn clean package meecrowave:bundle

Note: The meecrowave:bundle goal creates the distribution and includes what is already in compiled as jar in target directory. A call to meecrowave:bundle without package would result in an empty meecrowave server without your application.

After that, your target directory should contain a file called micro-meecrowave-distribution.zip. The zip archive contains a bin folder in which the executable (….sh) is located. For running this in Java9, the java.xml.bind module needs too be added (remember: We are not using modules here, therefore no module-info and no automatic way for Java to figure this out). Search the line strting with JAVA_OPTS in the start script and add:

--add-modules java.xml.bind

Now you can start the service by:

1
2
cd bin
./meecrowave.sh start

The default port used is 8080 and thus you can test your application easily with curl or the browser. Enjoy!

PS: Cors

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
 * See: http://stackoverflow.com/a/28067653
 *
 */
@ApplicationScoped
@Provider
public class CorsFilter implements ContainerResponseFilter {
    @Override
    public void filter(ContainerRequestContext request, ContainerResponseContext response) throws IOException {
        response.getHeaders().add("Access-Control-Allow-Origin", "*");
        response.getHeaders().add("Access-Control-Allow-Headers", "origin, content-type, accept, authorization");
        response.getHeaders().add("Access-Control-Allow-Credentials", "true");
        response.getHeaders().add("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS, HEAD");
    }
}

PPS: GSON

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.lang.reflect.Type;
 
import javax.enterprise.context.ApplicationScoped;
import javax.ws.rs.Consumes;
import javax.ws.rs.Produces;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.MultivaluedMap;
import javax.ws.rs.ext.MessageBodyReader;
import javax.ws.rs.ext.MessageBodyWriter;
import javax.ws.rs.ext.Provider;
 
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
 
@ApplicationScoped
@Provider
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class GsonMessageBodyHandler implements MessageBodyWriter<object width="300" height="150">, MessageBodyReader<object> {	private static final String UTF_8 = "UTF-8";	private Gson gson = new GsonBuilder().create();	@Override	public boolean isReadable(Class<!--?--> type, Type genericType, java.lang.annotation.Annotation[] annotations,			MediaType mediaType) {		return true;	}	@Override	public Object readFrom(Class<object> clazz, Type type, java.lang.annotation.Annotation[] annotations,			MediaType mediatype, MultivaluedMap&lt;string, string=""&gt; headers, InputStream instream)			throws IOException, WebApplicationException {		try (InputStreamReader streamReader = new InputStreamReader(instream, UTF_8)) {			return gson.fromJson(streamReader, type);		}	}	@Override	public boolean isWriteable(Class<!--?--> arg0, Type arg1, java.lang.annotation.Annotation[] arg2, MediaType arg3) {		return true;	}	@Override	public void writeTo(Object obj, Class<!--?--> clazz, Type type, java.lang.annotation.Annotation[] annotations,			MediaType mediatype, MultivaluedMap&lt;string, object=""&gt; headers, OutputStream outstream)			throws IOException, WebApplicationException {		try (OutputStreamWriter writer = new OutputStreamWriter(outstream, UTF_8)) {			final String content = gson.toJson(obj);			writer.write(content);		}	}}

Update 20200331:Logging

There seems to be a problem in the generated meecrowave.bat file for starting in Windows. It does not include the log4j2 configuration file. The meecrowave.sh for Linux is working. As a workaround the Windows bat file can be edited manually.

Building Java 9 projects with Eclipse and Gradle

What: Using gradle to build and setup a Java 9 project in Eclipse Oxygen (4.7.1)
Why: Automate settings in Eclipse like module path, … for Java 9 projects
How: Using gradle ‚eclipse’/’java‘ plugin

Background

The following is taken from this buildship issue.

If you create a Java 9 project, a module-info.java file is needed describing your module. The dependencies defined in the build.gradle file are not automatically added to the project settings and your module-info.java file will have compile errors in Eclipse. If added manually, the settings are gone if the project is refreshed via gradle or each dependency has to be added manually if no refresh is used. This can be avoided with the gradle ‚eclipse‘ plugin.

At compile time, the dependencies of your project needs to be available as (automatic) modules and at runtime the dependencies should be available somewhere in the module path.

Requirements

Buildship 2.2

You can install the buildship plugin from Eclipse Marketplace or from the update site. You need version 2.2 at least. Currently, only the update site has this version.

Eclipse project nature

Your project needs to be a gradle project. If this is not the case, you can convert it to a gradle project by right-click on the project -> Configure -> Add Gradle Nature.

Modifications of build.gradle

Add the following to your build.gradle file:

apply plugin 'eclipse'
eclipse {
    classpath {
        file {
            whenMerged {
                entries.findAll { isModule(it) }.each { it.entryAttributes['module'] = 'true' }
            }
        }
    }
}
 
boolean isModule(entry) {
    // filter java 9 modules
	entry.kind == 'lib'  // Only libraries can be modules
}

Update Eclipse project

Rightclick on your project -> Gradle -> Refresh Gradle Project.

Now, the project is set up in such a way, that all your dependencies are available for use in module-info.java.

Add dependencies as modules for compilation

The dependencies defined by the build.gradle are by default not added to the module path if you are using Java 9 for compilation. They can be added by:

compileJava {
    inputs.property("moduleName", moduleName)
    doFirst {
        options.compilerArgs = [
            '--module-path', classpath.asPath,
        ]
        classpath = files()  
    }
}

Add dependencies in a folder which can be used as module path

A simple derived copy task can be used to put all the dependencies in a subfolder of the build diretory:

task makePackage(type: Copy) {
    into "$buildDir/lib"
    from configurations.runtime
}

This folder can be used as module path while running your project:

java --module-path <other entries>:<buildDir/lib> <modulename>/<Main class>

Update Eclipse (from Neon to Oxygen)

What: Updating Eclipse without new installation
Why: Beeing up to date
How: Using update site and Oomph-Settings

Setting up Oomph

The first step is to tell Oomph, which version of Eclipse should be used. Select from the menu: Navigate Open SetupInstallation.

A new tab should open with the installation object. Select it and open properties view. Change the product version of Eclipse in the drop down menu to Oxygen.

Adding Update site for oxygen

The second step involves adding the Oomph update site. Select from the menu: WindowPreferences and open Install/UpdateAvailable Software Sites. Add a new site with the oxygen repository (http://download.eclipse.org/releases/oxygen/).

Click Apply and Close.

Update

Update via the standard Eclipse update mechanism. Select from the menu: HelpCheck for Updates.

Perform the update as normal and restart. The Eclipse version starting should now be Oxygen.

Drake, drip & Data science

What: Analysing data with a data workflow, fast Java startup & csv Magic
Why: Building data analysis pipelines for (small) problems, where intermediate steps are automatically documented
How: Use Drake for (data) workflow management, Drip for fast JVM startup and csvkit for csv magic.

In this post, I will show you how to build a small data analysis pipeline for analysing the references of a wikipedia article (about data science). The result is the following simple image, all steps are automated and intermediate results are documented automatically.

result.plot

In the end, you should have four artifacts documenting your work:

  • Drakefile: The workflow file, with which you ca regenerate all other artifacts. Plain text, thus can be easily used with version control systems.
  • data.collect: The html content of the wikipedia article as the source of the analysis
  • data.extract: The publishing years of the references with number of occurrence
  • result.plot.png: A png of the publishing year histogram

Agenda

  1. Install the requirements
  2. Build the pipeline
  3. Run the pipeline

Install the requirements

You need the following tools:

  • Linux (Debian or Ubuntu, command line tools)
  • Python (for html processing and csv magic)
  • R (for plotting)
  • Java (for Drake)

You can install the dependencies easily with the script below. The following steps are tested within a Debian (Jessie) VM, 64bit. It should also work on Ubuntu. Maybe, other distros have to be adapted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Update
sudo apt-get update
 
# Install curl
sudo apt-get install -y curl
 
# Install R
sudo apt-get install -y r-base
sudo apt-get install -y r-cran-rcpp
sudo Rscript -e 'install.packages("ggplot2", repos="https://cran.uni-muenster.de/")'
 
# Install Java8
sudo sh -c 'echo deb http://ftp.de.debian.org/debian jessie-backports main >> /etc/apt/sources.list'
sudo apt-get update
sudo apt-get install -y openjdk-8-jdk
 
# Install Drip
mkdir ~/.lib
git clone https://github.com/flatland/drip.git ~/.lib/drip
cd ~/.lib/drip
make prefix=~/.bin install
 
# Download & Setup Drake
wget https://github.com/Factual/drake/releases/download/1.0.3/drake.jar -O ~/.lib/drake.jar
cat << 'EOF' > ~/.bin/drake
#!/bin/bash
drip -cp ~/.lib/drake.jar drake.core "$@"
EOF
chmod u+x ~/.bin/drake
echo export PATH=~/.bin:$PATH >> ~/.bashrc
 
# Install csvkit
pip install csvkit

Build the pipeline

Drake is controlled by a so called Drakefile. Let us define three steps for the data processing:

  1. Data collection (html from Wikipedia)
  2. Data extraction (Extracting the reference texts and years of the references)
  3. Plotting results (Plotting the results to png)

1. Data collection

The first step can be done with Linux internal tools. Thus, we can create the Drakefile with the first step already:

data.collect <- [-timecheck]
  curl -k https://de.wikipedia.org/wiki/Data_Science > $OUTPUT

Drake takes input (mostly) from files and sends the output of each step again to files. Thus, the result of each step in a workflow is automatically documented.

This first step will download the html for the data science article from the german wikipedia and stores the html in a file called data.collect (the [-timecheck] avoids running the step each time drake is started because of missing input from previous steps). You can already run this workflow with the drake command:

drake

This will generate a file called data.collect containing the html of the wikipedia page.

2. Data extraction

For data extraction from html, Python & BeautifulSoup is used. The extraction of the year from each reference can be done with linux internal tools (for example grep). Thus, the python program should read from stdin, get the reference text and outputs plain text to stdout. Create a file called extract.py with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
#!/usr/bin/env python3
 
from bs4 import BeautifulSoup
import sys
 
input=sys.stdin.read()
soup=BeautifulSoup(input, 'html.parser')
 
entries=soup.find('ol', {'class': 'references'}).findAll('li')
 
for entry in entries:
    print(entry.getText())

Make the script file executable with:

chmod u+x ./extract.py

You can test the script with the following command (If you ran the workflow from step 1 before):

cat data.collect | ./extract.py

Now, let us extend the Drakefile to use the python script, search for years with a regex, create a histogram by counting the occurrences of the years, reorder the columns and add a header:

data.collect <- [-timecheck]
  curl -k https://de.wikipedia.org/wiki/Data_Science > $OUTPUT

data.extract <- data.collect
  cat $INPUT | ./extract.py | grep -o -P '\b\d{4}\b' | sort | uniq -c | sort -nr | \
    sed 's/^[ ]*//g' | csvcut -c2,1 -d " " | printf 'year, occ'"\n$(cat)" > $OUTPUT

If you run this workflow with:

drake

a new file called data.extract will be created which looks like the following:

year, occ
2015,15
2016,5
2013,5
2014,4
1997,3
2003,2
2002,2
2012,1
1145,1

Please note the wrongly detected date 1145.

You can filter stupid dates out with csvsql (I really recommend this tool) from the csvkit tool suite by extending the Drakefile with some simple sql:

data.collect <- [-timecheck]
  curl -k https://de.wikipedia.org/wiki/Data_Science > $OUTPUT

data.extract <- data.collect
  cat $INPUT | ./extract.py | grep -o -P '\b\d{4}\b' | sort | uniq -c | sort -nr | \
    sed 's/^[ ]*//g' | csvcut -c2,1 -d " " | printf 'year, occ'"\n$(cat)" | \
    csvsql --query 'select * from stdin where year>1900 and year<2100' > $OUTPUT

Plotting results

The final step is the plotting of the results. Let us create a R file, which reads from stdin and plots data to png. Create the file plot.R with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env Rscript
 
library(ggplot2)
 
data <- read.table(pipe('cat /dev/stdin'), header=T, sep=",")
data[,1]<-factor(data[,1])
names<-names(data)
 
p<-ggplot(data, aes(x=data[, 1], y=data[, 2]))+geom_bar(stat='identity')+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))+
    xlab(names[1])+ylab(names[2])+theme(text = element_text(size=8))
args = commandArgs(trailingOnly=TRUE)
if(length(args)>1){
    t<-paste(strwrap(args[2], width=40), collapse = "\n")
    p+ggtitle(t)
}
ggsave(args[1], width = 2*1.618, height = 2)

Make the script file executable with:

chmod u+x ./plot.R

Now extend the Drakefile with the last step: Image creation. If Drake is run again, it is checking if the output of steps is already up to date by checking for files with the step name. Thus, the image file name and the step should match:

data.collect <- [-timecheck]
  curl -k https://de.wikipedia.org/wiki/Data_Science > $OUTPUT

data.extract <- data.collect
  cat $INPUT | ./extract.py | grep -o -P '\b\d{4}\b' | sort | uniq -c | sort -nr | \
    sed 's/^[ ]*//g' | csvcut -c2,1 -d " " | printf 'year, occ'"\n$(cat)" | \
    csvsql --query 'select * from stdin where year>1900 and year<2100' > $OUTPUT

result.plot <- data.extract
  cat $INPUT | ./plot.R $OUTPUT \
    "Histogram of reference publishing dates for the Wikipedia data science article"

Again, run this step by executing the drake command.

Run the pipeline

Finally, to see all steps working together delete the generated artifacts (data.collect, data.extract, result.plot.png) and run drake again:

vagrant@debian-jessie:/tmp$ drake
The following steps will be run, in order:
  1: /tmp/././data.collect <-  [missing output]
  2: /tmp/././data.extract <- /tmp/././data.collect [projected timestamped]
  3: /tmp/././result.plot.png <- /tmp/././data.extract [projected timestamped]
Confirm? [y/n] y
Running 3 steps with concurrence of 1...
 
--- 0. Running (missing output): /tmp/././data.collect <-
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 58980    0 58980    0     0   172k      0 --:--:-- --:--:-- --:--:--  196k
--- 0: /tmp/././data.collect <-  -> done in 0.54s
 
--- 1. Running (missing output): /tmp/././data.extract <- /tmp/././data.collect
--- 1: /tmp/././data.extract <- /tmp/././data.collect -> done in 0.28s
 
--- 2. Running (missing output): /tmp/././result.plot.png <- /tmp/././data.extract
--- 2: /tmp/././result.plot.png <- /tmp/././data.extract -> done in 0.75s
Done (3 steps run).

You should now have three files for each of the workflow steps, where the last step file contains the image shown above.

Wildfly Maven Plugin: wildfly:start is not executed sucessfully

What: Strange problem when starting up Wildfly from within Maven with Wildfly Maven plugin and goal wildfly:start/wildfly:run
Why: Good to know
How: Remove corrupt file

Wildfly Maven plugin is great: Download and run a wildfly from scratch (with such goodies like installing database drivers during build, adding test users, setting ports, …) and deploy your application to it. I like it very much (until yesterday ;-)).

From one day to another the plugin stopped working. The error message was something about API incompatibility and Null Pointer exception in class org.wildfly.plugin.server.RuntimeVersion (line 46) when executing goal wildfly:start/wildfly:run. Deploy goal worked well.

After digging in the source code and going from org.wildfly.plugin.server.RuntimeVersion:initorg.jboss.jdf.stacks.clientStacksClient:getStacksorg.jboss.jdf.stacks.clientStacksClient:initializeStacks the reason of the problem was obvious: It reads a file downloaded from the web and stored in the temp-folder: …/AppData/Local/Temp/httpsrawgithubcomjbossjdfjdfstack100Finalstacksyamlstacks.yaml. If the file is there, it is not downloaded and the existing version is used. In my case, the file was corrupt (0 B). Deleting the file, executing the wildfly:run goal and everything was working again.

And they all lived happily ever after.

Pythagorean triples: Do it right

What: Minimal lines of code for calculating the length of integer-sided right triangles with a side length below a given threshold
Why: Functional programming paradigm and vector handling in different languages
How: Write minimal examples for: Frege, Java, SQL, R, Python, Javascript. Please contribute!

Last week I went to a talk, where Frege was introduced. Frege is a purely functinal language based on Haskel. I once looked at Haskell and the introductory example was the famous pythogorean triples. Thats also mentioned on the Frege page. I was asking myself: How can this be done in Java, or R or SQL?

Here is my list of implementations. Please contribute if you know more or have a better (shorter) version. Line breaks are inserted for better layout. All implementations return something like:

(3, 4, 5)
(4, 3, 5)
(6, 8, 10)
(8, 6, 10)

Frege

This is not tested. I am not sure, what Frege says about the inserted line breaks.

1
2
3
4
5
[ (a,b,c) |
  a <- [1..10],
  b <- [x..10],
  c <- [x..10],
a*a + b*b == c*c ]

Java

Tested.

For the Java version: Lets create a model class first. This makes the stream handling more easy. It is just some sugar.

1
2
3
4
5
6
7
8
9
10
11
12
static class Triple {
  final int a, b, c;
 
  public Triple(int a, int b, int c) {
    this.a=a;this.b=b;this.c=c;
  }
 
  @Override
  public String toString() {
    return "Triple [a=" + a + ", b=" + b + ", c=" + c + "]";
  }
}

Now, lets write the logic:

1
2
3
4
5
6
7
  IntStream intStream = IntStream.range(0, 1000);
  intStream.boxed().map(number -> new Triple(
    (number/100)%10+1,
    (number/10)%10+1,
    (number/1)%10+1)).
  filter(triple -> Math.pow(triple.a, 2)+Math.pow(triple.b, 2)==Math.pow(triple.c, 2)).
  forEach(triple -> System.out.println(triple));

SQL (Oracle)

Tested.

1
2
3
4
5
SELECT a, b, c FROM
  (SELECT Level AS a FROM Dual CONNECT BY Level <=10),
  (SELECT Level AS b FROM Dual CONNECT BY Level <=10),
  (SELECT Level AS c FROM Dual CONNECT BY Level <=10)
WHERE POWER(a, 2)+POWER(b, 2)=POWER(c, 2)

R

Tested.

1
2
3
df=data.frame(a=1:10, b=1:10, c=1:10)
expanded=expand.grid(df)
subset(expanded, a**2+b**2==c**2)

Python (3)

Tested.

1
2
3
4
5
6
import itertools
 
triples = [range(1, 11), range(1, 11), range(1, 11)]
valid=filter(
  lambda t: t[0]**2+t[1]**2==t[2]**2, list(itertools.product(*triples)))
print(*valid, sep="\n")

Javascript

Tested.

Creation of filled arrays: See here.

Integer division: See here.

1
2
3
4
5
6
7
8
var numbers=Array.apply(null, {length: 1000}).map(Number.call, Number);
var triples=numbers.map(function(n){
  return {a: ~~(n/100)%10+1, b: ~~(n/10)%10+1, c: ~~(n/1)%10+1}
});
var valid=triples.filter(function(t){
  return Math.pow(t.a,2)+Math.pow(t.b,2)==Math.pow(t.c,2)
});
console.log(valid);

Win a card game against your kids with OCR and statistics

What: OCR & R; Analyze standardized hardcopy forms electronically
Why: Win a card game with a lot of cards to remember (car quartett)
How

You need:

  • The card game
  • A scanner
  • Gimp
  • A linux machine
  • An hour free time

1. Setup a virtual machine or an existing linux

I used Ubuntu Xenial64bit. Maybe, you have to adapt the steps a little bit.

  1. Install tesseract
    1. 1
      
      sudo apt-get update
    2. 1
      
      sudo apt-get install -y tesseract-ocr tesseract-ocr-deu
  2. Install Java
    1. 1
      
      sudo apt-get install -y openjdk-8-jdk

2. Scan the cards

Scan all the cards. Make one image per card. Make sure, you placed the cards all the time at the same position in the scanner (for example upper right corner). All images should have the same resolution and size. Thus, the same regions on the image should correspond to the same region in all the cards (for example a name field). The cards can look like this (you can guess what game I played ;-)):

You have to tweak the images a little bit to get good OCR results. Here is what I did:

  1. Use Gimp to blur the images (Filter ⇒ Blur ⇒ Blur)

3. Basic image processing with Java & tesseract

The cards I scanned had some defined regions with numerical or text values (see figure above). You can enhance the OCR results dramatically, if you know ehere to look for the text. Create a small class containing the information about each region. This class should also contain a flag if you look for text or numerical values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class ImageInfo {
  public final String name;
  public final boolean number;
  public final int leftUpX;
  public final int leftUpY;
  public final int rightDownX;
  public final int rightDownY;
 
  public ImageInfo(String name, boolean number, int leftUpX, int leftUpY, int rightDownX, int rightDownY) {
    this.name = name;
    this.number = number;
    this.leftUpX = leftUpX;
    this.leftUpY = leftUpY;
    this.rightDownX = rightDownX;
    this.rightDownY = rightDownY;
  }
}

Set up the regions for your image as you like. Thats what I used for the cards (coordinates are pixels in the scanned image).

1
2
3
4
5
6
7
8
9
final static List infos = Arrays.asList(new ImageInfo[] {
    new ImageInfo("Name", false, 160, 158, 460, 40),
    new ImageInfo("Geschwindigkeit", true, 200, 685, 70, 35),
    new ImageInfo("Hubraum", true, 460, 685, 110, 40),
    new ImageInfo("Gewicht", true, 180, 790, 110, 40),
    new ImageInfo("Zylinder", true, 475, 790, 60, 40),
    new ImageInfo("Leistung", true, 200, 895, 70, 40),
    new ImageInfo("Umdrehung", true, 470, 895, 90, 40)
  });

Now, process each image. For each image I did the following steps:

  1. Read in the image
    1
    
    BufferedImage image= ImageIO.read(imageFile);
  2. Transformed image to black/white ⇒ Better OCR results
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    int width = image.getWidth();
    int height = image.getHeight();
    for (int y = 0; y &lt; height; y++) {
      for (int x = 0; x &lt; width; x++) { int rgb = image.getRGB(x, y); int blue = rgb &amp; 0xff; int green = (rgb &amp; 0xff00) &gt;&gt; 8;
        int red = (rgb &amp; 0xff0000) &gt;&gt; 16;
     
        if(red&gt;210&amp;&amp;blue&gt;210&amp;&amp;green&gt;210){
          image.setRGB(x, y, new Color(0, 0, 0).getRGB());
        }
        else{
          image.setRGB(x, y, new Color(255, 255, 255).getRGB());
        }
      }
    }
  3. Do some basic clipping to extract one image for each region on the card you are interested in. Start tesseract for each image.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    for (ImageInfo imageInfo : infos) {
      File outputFile = "the output png file, should be in the same directory for the same cards";
      ImageIO.write(image.getSubimage(imageInfo.leftUpX, imageInfo.leftUpY, imageInfo.rightDownX, imageInfo.rightDownY), "png", outputFile);
      ProcessBuilder processBuilder;
      // -psm 6 works best for my case of one line of text in each image. Remember: each image contain ONE numerical or text value from the card.
      if(imageInfo.number){
        // Restrict tesseract to digits.
        processBuilder = new ProcessBuilder("tesseract", "-psm", "6", outputFile.toString(),
          path.resolve(outputFileNameBase + "_"+imageInfo.name).toString(), "digits");
      }
      else{
        processBuilder = new ProcessBuilder("tesseract", "-psm", "6", outputFile.toString(),
            path.resolve(outputFileNameBase + "_"+imageInfo.name).toString());
      }
      processBuilder.start();
    }
  4. Collect the results from the text file output of tesseract (or read directly from the output stream of the process). Maybe, there is also a batch mode of tesseract???

4. Analyze the results with R

Lets assume you have the results now in a list like this:

Geschwindigkeit;Gewicht;Hubraum;Leistung;Name;Umdrehung;Zylinder
120;19.000;12.800;440;Volvo FH12-440;2.200;6
...

You can read in the values easily in R with:

1
2
# The 5th row contains the names of the cars in my example. This is used as dimension.
cards=read.csv("share/result.csv", sep=";", header=T, row.names=5)

This is read in as a data frame. You can get a first impression of the best values with the summary function:

1
summary(cards)

Write a simple function to get the values for all categories:

1
2
3
bestCandidates=function(attribute){
subset(cards, cards[attribute]==max(cards[attribute], na.rm=T))
}

And apply it:

1
2
dimensions=names(cards)
lapply(dimensions, bestCandidates)

And voila: You have to remember just a few cards with the highest values. Next time I ask for „Zylinder“ in case Ihave the „Scania R620“.

1
2
3
[[6]]
            Geschwindigkeit Gewicht Hubraum Leistung Umdrehung Zylinder
Scania R620             125      26    15.6      620       1.9        8

Http-Server for jersey app with Java standard tools

What: Create a simple http server for a jersey app without traditional application servers.
Why: Kiss & useful for micro services
How: Use built-in java classes

Download the following dependencies (or include them in your pom.xml):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<dependency>
	<groupId>org.glassfish.jersey.core</groupId>
	<artifactId>jersey-server</artifactId>
	<version>2.25.1</version>
</dependency>
<dependency>
	<groupId>org.glassfish.jersey.containers</groupId>
	<artifactId>jersey-container-jdk-http</artifactId>
	<version>2.25.1</version>
</dependency>
<dependency>
	<groupId>org.glassfish.jersey.media</groupId>
	<artifactId>jersey-media-json-jackson</artifactId>
	<version>2.25.1</version>
</dependency>

7-line Java way, minimal

1
2
3
4
5
6
7
ResourceConfig resourceConfig = new ResourceConfig();
resourceConfig.packages("mandelbrot");
String hostName = "localhost";
try {hostName = InetAddress.getLocalHost().getCanonicalHostName();}
catch (UnknownHostException e) {e.printStackTrace();}
URI uri = UriBuilder.fromUri("http://" + hostName + "/").port(PORT).build();
JdkHttpServerFactory.createHttpServer(uri, resourceConfig);

Usually, you need a little bit more like:

  • Cors support
  • Jackson
  • Logging

Create the following class:

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
 * See: http://stackoverflow.com/a/28067653
 *
 */
public class CORSFilter implements ContainerResponseFilter {
    @Override
    public void filter(ContainerRequestContext request, ContainerResponseContext response) throws IOException {
        response.getHeaders().add("Access-Control-Allow-Origin", "*");
        response.getHeaders().add("Access-Control-Allow-Headers", "origin, content-type, accept, authorization");
        response.getHeaders().add("Access-Control-Allow-Credentials", "true");
        response.getHeaders().add("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS, HEAD");
    }
}

Add the following lines:

1
2
ResourceConfig resourceConfig = new ResourceConfig();
resourceConfig.packages("mandelbrot");

Full example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ResourceConfig resourceConfig = new ResourceConfig();
resourceConfig.packages("mandelbrot");
resourceConfig.register(JacksonFeature.class);
resourceConfig.register(CORSFilter.class);
 
String hostName = "localhost";
try {
    hostName = InetAddress.getLocalHost().getCanonicalHostName();
} catch (UnknownHostException e) {
    e.printStackTrace();
}
 
URI uri = UriBuilder.fromUri("http://" + hostName + "/").port(PORT).build();
 
JdkHttpServerFactory.createHttpServer(uri, resourceConfig);