quarta-feira, 24 de janeiro de 2018

Detecting objects using JavaFX and Deep Learning

Computer vision meets Deep Learning mainly due the use of Convolutional Neural Networks. We have used it in our blog already for labeling images. Another challenging area of Deep Learning and computer vision is to identify the position of objects and for this we have a great neural network technique (or architecture - you choose the best) called YOLO. The following video will show you the power of YOLO neural networks:

The output of a neural network such as Resnet50 is a vector with probabilities of labels. So you feed it with an image that contains a cat, it will output various float numbers in an array, each position of that array is a label and the value on that position is the possibility of that label. As an example let's think about a neural network that should recognize dogs (0) and cats (1), and you feed it with an image of a cat, it should output an array of two positions and 0% in position 0 (which represents dog) and 100% in position 1 (which represents cat). The different with YOLO is that it will also output bounding box for the objects detected on an image. So if you show it an image with dogs and cats it will show the label for a given detected object and also the position of that object in a bounding box.

From YOLO web site: https://pjreddie.com/darknet/yolo/

The following presentation from Siraj explains more about YOLO and there's also the AndrewNG coursera classes about object detection.

Ok, how do I use it in Java? As you know Eclipse DeepLearning4J is one of the best framework for deep learning with Java and I found in their git that they are working on a YOLO Zoo model! So we have it already trained out of the box for our use. This is what we are going to use today.

Note: We can also import Keras model - but I didn't find a good keras YOLO model that actually worked with the DL4J import API.


This amazing tool called DL4J has a TinyYOLO (less precise than YOLO, but faster),  model ready for use. Thanks for Samuel Audet on DL4J gitter channel I found great utility tools that quickly helped me to get the YOLO output information without having to maintain my (at that moment) cumbersome code.

The TinyYOLO model will likely be in DL4J 0.9.2. At the time of this writing it was not yet released, but we can use the SNAPSHOT version!

The model is just like any other. The output is also a INDArray, and you can read it manually or use the great utility method from org.deeplearning4j.nn.layers.objdetect.Yolo2OutputLayer:

INDArray output = yoloModel.outputSingle(img);
Yolo2OutputLayer outputLayer = (Yolo2OutputLayer) yoloModel.getOutputLayer(0);
outputLayer.getPredictedObjects(output, threshold);

The object returned by getPredictedObjects is a DetectedObject which contains all the information about the position of the detected object. The position is relative to the box that it is located (YOLO divides the image and X, Y squares). We must calculate the position in the image. I took this code againt from saudet: (and modified it):

for (DetectedObject obj : predictedObjects) {
String cl = yoloModel.getModelClasses()[obj.getPredictedClass()];
double[] xy1 = obj.getTopLeftXY();
double[] xy2 = obj.getBottomRightXY();
int x1 = (int) Math.round(w * xy1[0] / gridW);
int y1 = (int) Math.round(h * xy1[1] / gridH);
int x2 = (int) Math.round(w * xy2[0] / gridW);
int y2 = (int) Math.round(h * xy2[1] / gridH);
int rectW = x2 - x1;
int rectH = y2 - y1;
ctx.strokeRect(x1, y1, rectW, rectH);
ctx.strokeText(cl, x1 + (rectW / 2), y1 - 2);
ctx.fillText(cl, x1 + (rectW / 2), y1 - 2);

This is our simply application that allow you to play with TinyYOLO model or any other YOLO model, which means you can train your own YOLO model and use the app to analyse any image you want. Make sure to set the following system properties, otherwise the default TinyYOLO will be used

model.path: The full path to the model file in your disk
model.classes: the comma separated classes, for example: person,bike,car
model.input.info: The input info in a format width,height,channels, for example: 416,416,3
model.grid: The grids in format w,h, for example: 13,13.

Our application also allow us to set the threshold to remove objects which confidence is too low. With a low threshold we have many boxes, with a high threshold a many predictions are discarded:

Adjusted the threshold and it detected a few objects with precision

Smallest threshold and what we have is a mess!

High threshold and we lose some detections

The application allow zoom and scrolling using a code I found in a JavaFX forum. I just made a few changes and it works really well!

Zoom in the image

That's just a small PoC with the YOLO model. There are improvements. I was initially planning to bring a video with object detection in real time, but the model is too slow on my machine (1 frame per second). If I have a machine with GPU I will make a test and post to this blog again.

In a new version we could remove redundant boxes!

Finally this is it, guys. DL4J developers are doing such amazing job bringing deep learning for Java. I think DL4J will be one of the most important library for Java once the market start to understand how deep learning will change their business!

Find the application code in my github.

Nenhum comentário:

Postar um comentário