sexta-feira, 16 de junho de 2017

K-means and decision tree using Weka and JavaFX

Weka is one of the most known tools for Machine Learning in Java, which also has a great Java API including API for k-means clustering. Using JavaFX it is possible to visualize unclassified data, classify the data using Weka APIs and then visualize the result in a JavaFX chart, like the Scatter chart.

In this post we will show a simple application that allows you to load data, show it without series distinction using a JavaFX scatter chart,, then we use Weka to classify the data in a defined number of clusters and finally separated the clustered data by chart series. We will be using the Iris.2D.arff file that comes with Weka download.

K-means clustering using Weka is really simple and requires only a few lines of code as you can see in this post. In our application we will build 3 charts for the Iris dataset:

  1. Data without class distinction (no classes)
  2. The data with the ground truth classification
  3. Data clustered using weka

As you can see the clustered data is really close to the real one (the data with correct labels). The code to build the clustered data:

After creating these 3 charts I also modified the whole code to add a decision tree classifier using weka J48 algorithm implementation. Right after the chart you can see the tree that I built our of the Iris 2d data:

When you click in any chart you will see a new item is added and it will be classified on center chart using the decision tree and on clustered chart using the k-means classification.

We use our generated decision tree to classify data and also the cluster. In the image above as you can see the cluster classify some data differently from what is classified with the decision tree.

I think it is particularly interesting how it is easy to visualize data with JavaFX. The full code for this project can be found on my github, but here is the main class code:

Nenhum comentário:

Postar um comentário