segunda-feira, 2 de janeiro de 2017

Sentiments analysis using AFINN in a Java application

Happy and Sad by Michael Coghlan courtesy of Flickr Creative Commons licensed by CC BY-SA 2.0

We talked about sentiments in this blog using a REST API, but today I came across another great Daniel Shiffman video, this time about sentiments and AFINN:


Loading AFINN in a Java application


Using the TSV file provided by AFINN we can create applications that evaluates sentences sentiments.
A simple use of this file is load it in a java.util.Map and use it to check a given sentence or any text contains the key words that help us to identify sentiment and then calculate a score. Let me explain better.
Each word has a score associated to it: a positive value is a positive score, a negative value is a negative score. Take for example the words sad and happy:

sad        -2
happy 3


Having this in mind we can now take a text, split into words, check every word score and sum them all. If the final score is negative, then the sentence can be considered a negative sentence.
I implemented this using the following classes:

AFINNWords: load the AFINN file in a map;
TextSentimentAnalyzer: A class that takes the map and analyze a given text;
TextAnalisysResult: A model object that holds the result of a text analysis.
TextSentimentAnalyzerTest: Just a few tests to show how to use it.


Disadvantages and conclusions

This is a simple implementation, but it has a lot of disadvantages:

- It does not detect irony/sarcasm;
- The target of the adjective is not took in consideration. For example: After his fall, the dictator was sad, abandoned, alone, but the people was happy. 
- It is a simple word checking, a lot of specific cases will not be took in consideration by a simple work counting.

It is possible to improve this simple code to add Natural Language Processing, for example, using CoreNLP or even use genetic algorithms having real people changing the algorithm fitness according to real feeling of a sentence.

However, there are cases where it is useful. Twitter is the best case. A tweet about a brand will usually not contain sarcasm, but direct complain or praise.

In any case, I concluded that there's no perfect sentiment analysis algorithm. Take, for example, this example from MIT using CoreNLP. Let's take a sentence from the Messages to express our sorrow and sadness page:

I dedicated you a thousand love songs, thinking that you would enjoy them. But if there is something I will never devote you, is my pain.

That's so sad!But with the application I showed we have a very positive score:

TextAnalisysResult [text=I dedicated you a thousand love songs, thinking that you would enjoy them. But if there is something I will never devote you, is my pain., score=5, comparative=0.1724137931034483, detectedWords={love=3, dedicated=2, pain=-2, enjoy=2}]
Ok, that's bad. However, using the MIT system we see that it fails to evaluate this sentence as well:



I would say that a best approach for today is leave a human being help a genetic algorithm to select the most fitness algorithm to evaluate sentence sentiments!


Nenhum comentário:

Postar um comentário