segunda-feira, 15 de outubro de 2018

Debugging error "Could not import XML from stream: invalid LOC header (bad signature)" when building a Thorntail application

Tl;DR: Check if there's corrupted JARs in your maven repository


When building a Thorntail 2.2.0.Final application I faced the error Could not import XML from stream: invalid LOC header (bad signature) when trying to build the JAR using mvn clean install. I could not find what is wrong, I just downloaded the application using Thorntail generator but I was not able to run it, so frustrating. When I used the -X flag I could see the error stack trace. The part that was causing the error was:


Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
    at java.util.zip.ZipFile.read (Native Method)
    at java.util.zip.ZipFile.access$1400 (ZipFile.java:60)
(...)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse (DocumentBuilderImpl.java:339)
    at javax.xml.parsers.DocumentBuilder.parse (DocumentBuilder.java:121)
    at org.jboss.shrinkwrap.descriptor.spi.node.dom.XmlDomNodeImporterImpl.importAsNode (XmlDomNodeImporterImpl.java:65)
    at org.wildfly.swarm.tools.ModuleAnalyzer. (ModuleAnalyzer.java:53)
    at org.wildfly.swarm.tools.ResolvedDependencies.findModuleXmls (ResolvedDependencies.java:80)


If you check the full stack trace you can see that the problem is coming from some corrupted zip file in my local maven repository. The simply solution would be clean my local maven, but I didn't want to do that, so I had to debug ResolvedDependencies  class. Code is open, I cloned the thorntail project from github, imported the project tools to my Eclipse, checked the 2.2.0.Final TAG, configured remote Java application debug adding the tools projects in source tab:


Then I added a breakpoint to the point where the exception was thrown: 



Finally I run the maven using mvnDebug When you do this maven hangs waiting for a remote debugger to connect to it, then I had to go back to Eclipse and run my Remote Java Application goal:


When it stopped at my break point I was able to continue the debug execution in debug mode and back to ResolvedDependencies I could see what JAR was corrupted. After the debug session session I simply had to delete the JAR and run mvn clean install again and everything worked because maven re-downloaded the JAR, this time a not corrupted one :-)


sexta-feira, 1 de junho de 2018

Very simple chat using JGroups and JavaFX

I just wanted to share this small application using JGroups and JavaFX. JGroups is "a toolkit for reliable messaging" and one of the oldest and stable Java libraries.
The chat code I used was the same one from JGroups sample code page, we need less than 5 codes to implement all the chat functionality!



sexta-feira, 25 de maio de 2018

The Image Classifier Microservice is public on Docker repository

Using fabric8 docker plugin and the project from A Java microservice for image classification using Thorntail and DeepLearning4J post I was able to build a docker image and I pushed it to my Docker repository!

In this post I am sharing the steps I followed to build the docker image and publish it. Next post I hope to show you how to run on Openshift with some real world application.

Step 1: Build an image from the Thorntail project


This is simple, you will need just a configuration in the maven descriptor, see the changes in the project pom.xml.

Notice that I had to made a manual copy of a JAR file as described in Thorntail's issue #951, however, I am probably missing something, that may not be a bug,. Let's wait to see what Thorntail team will provide as a feedback for that issue. I wasn't a bug, Bob just commented on the issue, the solution is not use the thin mode.

For having help I pinged Thorntail users on #thorntail IRC channel in freenode and Ken Finingan helped me, also, Bob helped me to get started with Thorntail when I wrote "". Both are Thorntail core contributor and they were so kind and patient to me (also others Thorntail users) that I had to mention them here!

Step 2: Test the image


Testing the image is simple. Run it using docker:

docker run -di -p 8080:8080 fxapps-image-classifier/app

It will start the app and bind it to your local port 8080. So when the image is build you can test it using curl:

curl  http://localhost:8080

If it is working if you see a big JSON as response! 

Step 3: Push the image


I followed the instructions from Docker documentation and the image is public now, meaning that you can pull this image and run locally.

If you have docker and want to give it a try first you must pull the image:

docker pull jesuino/image-classifier

Then run it:

docker run -di -p 8080:8080 docker.io/jesuino/image-classifier

Follow the logs until you see that thorntail is started:

INFO  : Sat May 26 03:04:32 UTC 2018 [io.thorntail.kernel] THORN-000999: Thorntail started in 71.896s

Notice that it took more than 1 minute to run because I didn't use a custom model, instead, I let it download from DeepLearning4J Zoo.

Once Thorntail is started you test the service: curl  http://localhost:8080


Conclusion


Yes, Thorntail doesn't even have a final release, we have been using snapshot releases and it is cloud ready. Next steps is to improve the microservice by adding health check and then deploy it to Openshift with a real world application!

segunda-feira, 21 de maio de 2018

A Java microservice for image classification using Thorntail and DeepLearning4J

If you want to consume a trained neural network from an application you will probably not want to package the model within your app. Models usually occupies ~50mb of disk size, see these keras trained models for example.
A good approach for allowing other applications to use your model is a separated microservice and, if you are a Java developer, Thorntail is the best choice:




For image classification nowadays we use CNN neural networks and for Java developers DeepLearning4J is the API you are looking for. You can consume some of the multiple pre-trained models, build your own model or even use a Keras Model!

So yes, we need a microservice for image classification. The microservice should be responsible to run a model that we configure and if we don't provide a model, it should use a default one. It should be accessible using REST and it should cache the model.

Enough cheap talk, let's build our microservice..

The REST interface


We will have only two methods for the API:

  • GET  /:  Returns the model information  in JSON format. Sample Response:


  • POST /: Accept the binaries of an image and return the classification in JSON format; sample response:

I used jq to highlight curl's output

The Java project


We will use Maven and pom.xml will have the following dependencies:

The dependencies are simplified mainly thanks to Thorntail bring natively microprofile IO and CDI. For dl4j we have the nd4j, the base library for dl4j APIs, core and zoo, from where we will take the default model from. We also use the thorntail plugin which builds a structure that can be used to run our application in production or in a Docker image, this structure will be useful for us later as well.



Our project will have only 3 parts:

  • Model: The model objects that contains the JSON representation
  • Service: The project heart that will run the model
  • REST: The part where we expose 

Users can use their own model, so we use Microprofile Config to inject properties. Service is a CDI bean that will load the model and run the classification. REST is a single JAX-RS resource and the model classes are explained by the JSON outputs you saw later. See the full code on my github.

Testing and Running


There are two tests. By default the microservice will use the Resnet50 DeepLerning4J built in model from the Zoo, which is trained using ImageNet dataset, so users have recognition of 1000 classes without using a custom model.The first test is simply to make sure that the default model is working. The second test is against a custom model, we use a small MNIST model.

Running the application is simple, first you build it, you can even skip tests:

$ mvn clean install -DskipTests

First time you run the tests will take a long time because it will download the default model. Once it is built run the script in $PROJECT_HOME/target/image-classifier-microservice-1.0-bin/bin. Use bat or sh according to your operating system.
Now, to run with a custom model this gets interesting. Adding your custom model requires you to paste it in $PROJECT_HOME/target/image-classifier-microservice-1.0-bin/lib and then set the following java properties to configure it:

model.path: The file name which should be in the application classpath. If you don't provide a model the built it one is used;
model.labels: The labels for the model separated by comma (,). It is highly recommend to provide a value for it, the application will work if you don't provide, but the labels will be like "label 1", etc;
model.type:  A name that you can give to this model;
model.input.width: The input width for the model. Default is 224.
model.input.height: The input height for the model. Default is 224.
model.input.channels: The number of channels. Default is 3.

If you is calling the application from Java you can set these properties using System.setProperty, you can fork the code, modify it as you want and have a main class to call Thorntail.run from a main method and set the system properties according to your application. If you don't want to touch Java code you need to follow the steps below:
  •  Have a built version of the project;
  • Paste your model into $PROJECT_HOME/target/image-classifier-microservice-1.0-bin/lib
  • Set the system property JAVA_OPTS before running the script to start the server for example:
export JAVA_OPTS="-Dmodel.path=/mnist-model.zip -Dmodel.labels=0,1,2,3,4,5,6,7,8,9 -Dmodel.type=MNIST -Dmodel.input.width=28 -Dmodel.input.height=28 -Dmodel.input.channels=1"

If everything is fine you should see the following in logs when starting the server:



Conclusion 


We described a simple, but useful, microservice for image classification. There are improvements that could be done:

  • Performance: The performance could be improved in many different ways;
  • Configuration: Make easier to configure the model I/O. Currently we consider that the image will be converted to an INDArray without any additional transformation and the output is considered to be of shape [1 X ] where X is the number of labels, which may not be always the case;
  • Enrich the interface: Right now we just provide the minimal information to use the model. This could be improved.

Currently the project is not ready for every type of model, in future we may extend it to support all the famous pre-trained models out there, who knows.We are also open for collaboration, send me your PR if you think anything that could be improved.

The source is on my github.