Pular para o conteúdo principal

Sentiments analysis using AFINN in a Java application

Happy and Sad by Michael Coghlan courtesy of Flickr Creative Commons licensed by CC BY-SA 2.0

We talked about sentiments in this blog using a REST API, but today I came across another great Daniel Shiffman video, this time about sentiments and AFINN:


Loading AFINN in a Java application


Using the TSV file provided by AFINN we can create applications that evaluates sentences sentiments.
A simple use of this file is load it in a java.util.Map and use it to check a given sentence or any text contains the key words that help us to identify sentiment and then calculate a score. Let me explain better.
Each word has a score associated to it: a positive value is a positive score, a negative value is a negative score. Take for example the words sad and happy:

sad        -2
happy 3


Having this in mind we can now take a text, split into words, check every word score and sum them all. If the final score is negative, then the sentence can be considered a negative sentence.
I implemented this using the following classes:

AFINNWords: load the AFINN file in a map;
TextSentimentAnalyzer: A class that takes the map and analyze a given text;
TextAnalisysResult: A model object that holds the result of a text analysis.
TextSentimentAnalyzerTest: Just a few tests to show how to use it.

import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Map;
import java.util.stream.Collectors;
import javax.management.RuntimeErrorException;
class AFINNWords {
private static final String AFINN_FILE = "/AFINN-111.txt";
private static Map<String, Integer> wordsAndSentiments = getAFinnWords();
private static Map<String, Integer> getAFinnWords() {
try {
URI AFINNURI = AFINNWords.class.getResource(AFINN_FILE).toURI();
Path path = Paths.get(AFINNURI);
return wordsAndSentiments = Files.lines(path)
.map(l -> l.split("\\t"))
.collect(Collectors.toMap(l -> l[0], l -> Integer.valueOf(l[1]))
);
} catch (Exception e) {
throw new RuntimeErrorException(new Error(e), "Error loading AFINN words list.");
}
}
public static Map<String, Integer> get() {
return wordsAndSentiments;
}
}
view raw AFINNWords.java hosted with ❤ by GitHub
import java.util.Map;
public class TextAnalisysResult {
private String text;
private int score;
private double comparative;
private Map<String, Integer> detectedWords;
// constructor, getters and setters...
}
import static java.util.stream.Collectors.toMap;
import java.util.Map;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Stream;
public class TextSentimentAnalyzer {
private static final String WORD_SPLIT_REGEX = "\\W";
public static TextAnalisysResult analyze(String text) {
Map<String, Integer> afinnWords = AFINNWords.get();
String words[] = text.split(WORD_SPLIT_REGEX);
AtomicInteger score = new AtomicInteger(0);
Map<String, Integer> detectedWords = Stream.of(words)
.filter(w -> afinnWords.containsKey(w))
.peek(w -> score.addAndGet(afinnWords.get(w)))
.collect(toMap(w -> w, afinnWords::get));
double comparative = 0;
if(words.length > 0) {
comparative = new Double(score.get()) / new Double(words.length);
}
return new TextAnalisysResult(text, score.get(), comparative, detectedWords);
}
}
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class TextSentimentAnalyzerTest {
public void afinnWordTest(){
int sadScore = AFINNWords.get().get("sad");
int happyScore = AFINNWords.get().get("happy");
int abandonScore = AFINNWords.get().get("abandon");
int zealousScore = AFINNWords.get().get("zealous");
assertEquals(-2, sadScore);
assertEquals(3, happyScore);
assertEquals(-2, abandonScore);
assertEquals(2, zealousScore);
}
@Test
public void positiveInputTest() {
String text = "I am happy.";
TextAnalisysResult result = TextSentimentAnalyzer.analyze(text);
assertEquals(3, result.getScore());
assertEquals(1, result.getDetectedWords().size());
}
@Test
public void negativeInputTest() {
String text = "I am sad.";
TextAnalisysResult result = TextSentimentAnalyzer.analyze(text);
assertEquals(-2, result.getScore());
assertEquals(1, result.getDetectedWords().size());
}
@Test
public void mixedInputTest() {
String text = "I am sad and I am happy.";
TextAnalisysResult result = TextSentimentAnalyzer.analyze(text);
assertEquals(1, result.getScore());
assertEquals(2, result.getDetectedWords().size());
}
}

Disadvantages and conclusions

This is a simple implementation, but it has a lot of disadvantages:

- It does not detect irony/sarcasm;
- The target of the adjective is not took in consideration. For example: After his fall, the dictator was sad, abandoned, alone, but the people was happy. 
- It is a simple word checking, a lot of specific cases will not be took in consideration by a simple work counting.

It is possible to improve this simple code to add Natural Language Processing, for example, using CoreNLP or even use genetic algorithms having real people changing the algorithm fitness according to real feeling of a sentence.

However, there are cases where it is useful. Twitter is the best case. A tweet about a brand will usually not contain sarcasm, but direct complain or praise.

In any case, I concluded that there's no perfect sentiment analysis algorithm. Take, for example, this example from MIT using CoreNLP. Let's take a sentence from the Messages to express our sorrow and sadness page:

I dedicated you a thousand love songs, thinking that you would enjoy them. But if there is something I will never devote you, is my pain.

That's so sad!But with the application I showed we have a very positive score:

TextAnalisysResult [text=I dedicated you a thousand love songs, thinking that you would enjoy them. But if there is something I will never devote you, is my pain., score=5, comparative=0.1724137931034483, detectedWords={love=3, dedicated=2, pain=-2, enjoy=2}]
Ok, that's bad. However, using the MIT system we see that it fails to evaluate this sentence as well:



I would say that a best approach for today is leave a human being help a genetic algorithm to select the most fitness algorithm to evaluate sentence sentiments!


Comentários

Postagens mais visitadas deste blog

Dancing lights with Arduino - The idea

I have been having fun with Arduino these days! In this article I am going to show how did I use an electret mic with Arduino to create a Dancing Lights circuit. Dancing Lights   I used to be an eletronician before starting the IT college. I had my own electronics maintenance office to fix television, radios, etc. In my free time I used to create electronic projects to sell and I made a few "reais" selling a version of Dancing lights, but it was too limited: it simply animated lamps using a relay in the output of a 4017 CMOS IC. The circuit was a decimal counter  controlled by a 555. 4017 decimal counter. Source in the image When I met Arduino a few years ago, I was skeptical because I said: I can do this with IC, why should I use a microcontroller. I thought that Arduino was for kids. But now my pride is gone and I am having a lot of fun with Arduino :-) The implementation of Dancing Lights with Arduino uses an electret mic to capture the sound and light leds...

Simplest JavaFX ComboBox autocomplete

Based on this Brazilian community post , I've created a sample Combobox auto complete. What it basically does is: When user type with the combobox selected, it will work on a temporary string to store the typed text; Each key typed leads to the combobox to be showed and updated If backspace is type, we update the filter Each key typed shows the combo box items, when the combobox is hidden, the filter is cleaned and the tooltip is hidden:   The class code and a sample application is below. I also added the source to my personal github , sent me PR to improve it and there are a lot of things to improve, like space and accents support.

Genetic algorithms with Java

One of the most fascinating topics in computer science world is Artificial Intelligence . A subset of Artificial intelligence are the algorithms that were created inspired in the nature. In this group, we have Genetic Algorithms  (GA). Genetic Algorithms  To find out more about this topic I recommend the following MIT lecture and the Nature of Code book and videos created by Daniel Shiffman. Genetic Algorithms using Java After I remembered the basics about it, I wanted to practice, so I tried my own implementation, but I would have to write a lot of code to do what certainly others already did. So I started looking for Genetic Algorithm libraries and found Jenetics , which is a modern library that uses Java 8 concepts and APIs, and there's also JGAP . I decided to use Jenetics because the User Guide was so clear and it has no other dependency, but Java 8. The only thing I missed for Jenetics are more small examples like the ones I will show i...