The problem is when you have to collect the dataset yourself. There are many ways to do that, we can try to automatize how we collect the data, but then you may have a dataset that does not reflect the real world.
Right now I have to collect images form real world. I can't take pictures from the internet, that are only a few and I can't take in my house or important features won't be learned. So I created a small web application which I think that may be useful for others trying to create their own dataset. The idea is simple:
- You set the output directory and the possible labels in pom.xml;
- Then you start the application and start taking pictures with your mobile;
- The application then saves the images using the label as the parent directory
When training your neural network with Java and DeepLearning4J you can simply use the following code to have the parent directory as the image label:
The application is based on Wildfly Swarm, considering you have maven and JDK 1.8 you just have to clone the application from my github, modify pom.xml to have the output directory and the possible labels for your dataset, run mvn wildfly-swarm:run and access the page with your mobile to start capturing images. You can also set the properties when running the wildfly swarm uber JAR and also configure wildfly swarm using system properties as described in its documentation.
This is what the very simple home page looks like:
Of course we could have much more ideas here, but I need to start collecting my dataset and don't have time to keep improving it and I need to collect information! However, if you have any suggestion of improvements feel free to send me pull requests.
Nenhum comentário:
Postar um comentário