sábado, 9 de outubro de 2021

Computer Vision with JavaFX and DJL

Computer Vision is not a new topic in Computer Science, but in the recent years it got a boost due the use of Neural Networks for classification, object detection, instance segmentation and pose prediction.

We can make use of these algorithms in Java using a Deep Learning library, such as DL4J as we already did in this blog with handwritten digits recognition, and detecting objects in a JavaFX application.

However, deep learning is popular mostly among Python developers (one reason is because Python easily wraps on top of native libraries, it will be easier in Java after Project Panama), so we have much more pre-trained models for Tensorflow and Pytorch libraries. It is possible to import them, but it requires a bit more of work then simply reusing a pre-trained model.

Fortunately there is a new library called Deep Java Library which offers a good set of pre-trained models. It makes use of Jupyter, which makes easier to try the library APIs. Another DJL feature is that it is made to wrap an existing library, so it works on top of Keras, Tensorflow, MXNet and other libraries.

In this post we will test some of the DJL Computer Vision models from a JavaFX application. Let's start first capturing webcam from JavaFX then use this input to a pre-trained model

Capturing Web Cam

The input data for the neural network we capture a webcam image. The project that worked without any issue with JavaFX on my Fedora 34 is capture-webcam. I used the JavaFX sample code and it just worked. See my workspace captured from the webcam:

Using Pre-trained Neural Networks

I started with a maven project that only used the webcam-capture. Then I added DJL maven dependencies and Eclipse allowed me to import the classes from DJL. Later I had also the ML library engine to run my models, this is how my final pom.xml looks like:

Back to the code, what I wanted was to grab the image from the webcam, input into a pre-trained model, get the result and print in JavaFX instead printing the webcam image. To allow users to test the pre-trained models we added a combo box to the user interface. 

To abstract the model we created a abstract class called MLModel. This class wraps the model call, so we can focus on printing the image and the UI will not know about any specific model, making it easy to add and remove models. The class MLModel grabs the predictions from the ML algorithm and draw on the image accordingly to the result type:

In the UI we have an array of implementations, which are selected when the combo box changes. We could use other ways to select the model, like Java Service Provider, but for our code we decided to keep it simple.

Finally it was time to implement the algorithms itself. First we created MLModel implementation for Object detection. It returns a class of type DetectedObject and we re able to build the pre-trained model as we want. We could select the engine, select the data used to train the model and other parameters using the class ai.djl.repository.zoo.Criteria. In our case we selected from the Engine Tensorflow a model that was trained using the mobilenet_v2 dataset. A video is on my twitter.

For Pose Prediction we had to internally run two models: the first to extract Person from the input image and the other to do the PoseEstimation itself. We also had to calculate the points of the pose relative to the input image. See a video for this model

Finally we also have Instance Segmentation, but it was so, so slow that I will not discuss it here. You are free to run the application and test it.


Java is a good alternative for building Machine Learning applications! DJL and its great model zoo makes it easier to reuse ms trained with other libraries. Next steps would be use other family of trained models, such as NLP and create more useful applications, like augmented reality!

Full code on Github.

Nenhum comentário:

Postar um comentário