SINAI at SemEval-2018 Task 1: Emotion Recognition in Tweets

Emotion classification is a new task that combines several disciplines including Artificial Intelligence and Psychology, although Natural Language Processing is perhaps the most challenging area. In this paper, we describe our participation in SemEval-2018 Task1: Affect in Tweets. In particular, we have participated in EI-oc, EI-reg and E-c subtasks for English and Spanish languages.


Introduction
Emotions are playing a significant role in the effective communication of people. In fact, sometimes, emotional intelligence is more important than cognitive intelligence for successful interaction (Pantic et al., 2005). Therefore, affective computing is a key element to the advancement of Artificial Intelligence. The basic task of affective computing is emotion recognition. This task consists of identifying a set of emotions in a document.
The identification of emotions in texts has multiple benefits in different areas, such as psychology to detect some psychological disorder like depression (Cherry et al., 2012), e-learning to improve student motivation (Suero Montero and Suhonen, 2014) or business intelligence to know the preferences of consumers (Cambria, 2016).
Currently, more and more people express their emotions on social media, such as Twitter or Facebook. Therefore, the role of emotion in social media is becoming more important for the researchers in affective computing.
In this paper, we present the different systems we developed as part of our participation in SemEval-2018 Task 1: Affect in Tweets (Mohammad et al., 2018). We have participated in EI-oc, EI-reg and E-c subtasks for English and Spanish. Below, we briefly describe these subtasks: EI-oc is an emotion intensity ordinal classification task. Given a tweet and an emotion E, it consists of classifying the tweet into one of four ordinal classes of intensity of E that best represents the mental state of the tweeter. Separate datasets are provided for anger, fear, joy, and sadness emotions.
EI-reg is an emotion intensity regression task. Given a tweet and an emotion E, it consists of determining the intensity of E that best represents the mental state of the tweeter. The intensity of E is a real-valued score between 0 (least emotion) and 1 (most emotion). Separate datasets are provided for anger, fear, joy, and sadness emotions.
E-c is an emotion multi-classification task. Given a tweet, it consists of classifying it as 'neutral' or 'no emotion' or as one, or more, of eleven given emotions (anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust) that best represent the mental state of the tweeter.
The rest of the paper is organized as follows. In Section 2 we explain the data used in our methods. Section 3 describes the resources used by our systems. Section 4 presents the details of the proposed systems. Section 5 displays the results and analyses them. We conclude in Section 6 with remarks on future work.

Data
To run our experiments, we used the datasets provided by the task organizers (Mohammad et al., 2018) as follows. During pre-evaluation period, we trained our models on the train set, and evaluated our different approaches on the dev set. During evaluation period, we trained our models on the train and dev sets, and tested the model on the test set. Table 1 shows the number of tweets for each language and subtask dataset.

Resources
For the development of the task, we used different lexicons that we explain in detail below.
Wordnet-Affect (WNA) (Strapparava et al., 2004). This resource is an extension of Word-Net Domains. WNA provides a set of English emotional words organized in a tree. The leaf nodes represent specific emotions that are grouped into general categories (parent nodes). For example, anger, hate and dislike belong to the overall emotion generaldislike. However, the emotions of WNA are not the same as the emotions of the SemEval subtasks. For this reason, each overall emotion of WNA has been mapped with SemEval subtasks emotions (see Appendix A, Table 8 and Table 9).
In order to use this resource in Spanish, we have employed the lexical disambiguator Babelfy (Moro et al., 2014) to obtain the corresponding BalbelNet synset id of a term. Next, we have used the BabelNet API (Navigli and Ponzetto, 2012) to obtain a correspondence between the BalbelNet synset id and the WordNet synset id. WNA includes a subset of appropriate synsets of WordNet 1.6 to represent affective concepts. However, the Word-Net synsets id obtained with BabelNet API corresponds to the 3.0 version of WordNet. Therefore, we have obtained the equivalent synset to the 3.0 version in the 1.6 version. With this, using the synset of the 1.6 version of WordNet, we can map directly the associated emotion and confident value from WNA.
Spanish Emotion Lexicon (SEL) (Sidorov et al., 2012). It includes 2,036 Spanish words that are associated with the measure of Probability Factor of Affective use (PFA) with respect to at least one basic emotion: joy, anger, fear, sadness, surprise, and disgust. The higher the value of the PFA, the more probable the association of the word with the emotion is.
NRC Affect Intensity Lexicon (Mohammad, 2017). It has almost 6,000 entries in English. Each of them has an intensity score associated to one of the following basic emotions: anger, fear, sadness and joy. The scores range from 0 to 1, where 1 indicates that the word has a high association to the emotion and 0 that the word has a low association to the emotion. However, this resource is not in Spanish. For this reason, we have adapted it to Spanish in the following way. We have translated English terms to Spanish and we have selected the maximum value of intensity if the translation of some terms is the same.
NRC Word-Emotion Association Lexicon (EmoLex) (Mohammad and Turney, 2010). This lexicon has a list of English words associated to one or more of the following emotions: anger, fear, anticipation, trust, surprise, sadness, joy. Moreover, the lexicon is also available for more than one hundred languages (including Spanish). All these versions have been generated by translating the English terms using Google Translate.

System description
In this section we describe the systems developed for the subtasks EI-oc, EI-reg and E-c.
In first place, we preprocessed the corpus of tweets provided for each subtask and language (English and Spanish). We applied the following preprocessing steps: the documents were tokenized using NLTK TweetTokenizer 1 , stemming was performed using NLTK Snowball stemmer 2 , stopwords were removed (only for English), and all letters were converted to lower-case.
In relation to the resources, we have tested several combinations. However, for the final SemEval systems we have used the best systems obtained during the development phase. For EI-oc and EI-reg subtasks in Spanish, we used SEL, NRC Affect Intensity and WNA lexicons adapted to the emotions of these subtasks. On the other hand, for English, we used NRC Affect Intensity and WNA lexicons adapted to the emotions of the EI-oc and EI-reg subtasks. Regarding to subtask E-c, for Spanish, we used SEL, EmoLex Spanish version and WNA lexicons adapted to the emotions of this subtask. However, for English, we used Emolex and WNA lexicon adapted to the emotions of the E-c subtask.
Next, it is described the methodology used for each subtask: Subtask EI-oc. To perform the classification, we checked the presence of lexicon terms in the tweet and then we added the intensity value of these words grouping them by the emotional category (anger, fear, sadness and joy).
The result is a vector of four values for each lexicon. Moreover, each tweet is represented as a vector of unigrams using the TF-IDF weighting scheme. The union of the lexicon vectors and the TF-IDF representation of the tweet are used as features for the classification using the SVM algorithm. We selected the SVM formulation, known as C-SVC, the value of the C parameter was 1.0 and the kernel chosen was the linear.
Subtask EI-reg. In this case, we checked the presence of lexicon terms in the tweet and then we computed the sum, the average and the maximum of the intensity value of the words of the tweet grouping them by the emotional category (anger, fear, sadness and joy). The result is a vector of twelve values for each lexicon. The union of the lexicon vectors and the TF-IDF representation of the tweet are used as features for the classification using the SVM algorithm with the same configuration as that used in subtask EI-oc.
Subtask E-c. In this subtask, we identified the presence of lexicon terms in the tweet and we assigned 1 as confidence value (CV).

stem/snowball.html
Then, we summed the CV of the words whose emotion is the same obtaining a vector of emotions for each lexicon. The union of these vectors and the TF-IDF representation of the tweet are used as features for the classification using the Random Forest algorithm with 25 as number of trees.

Analysis of results
The official competition metric to evaluate the systems in EI-reg and EI-oc subtasks is the Pearson Correlation Coefficient (PCC) between semantic similarity scores of machine assigned and human judgments. In the case of the E-c subtask, systems are evaluated by calculating multi-label accuracy.
Since this is a multi-label classification task, each tweet can have one or more gold emotion labels, and one or more predicted emotion labels. Multilabel accuracy is defined as the size of the intersection of the predicted and gold label sets divided by the size of their union. This measure is calculated for each tweet, and then is averaged over all the tweets in the dataset.
The results of our participation in the three subtasks and those of the teams that are in the first and the last position can be seen in Tables 2, 3, 4, 5, 6 and 7. It should be noted that the results of Spanish subtasks are lower than those obtained for English. Another important issue is that the participation in Spanish subtasks is lower than the participation in English subtasks. These facts are due to most of the works and resources for textual emotion mining are in English (Yadollahi et al., 2017).
In relation to our results, in most subtasks we obtained the lowest correlation on anger emotion and the best correlation on joy emotion. On the contrary, in WASSA-2017 Shared Task on Emotion Intensity (Mohammad and Bravo-Marquez, 2017), most of the systems performed better on anger emotion and worse on fear and sadness emotions. In this competition, it was found that despite using deep learning techniques, training data, and large amounts of unlabeled data, the best systems included features from affect lexicons. Given that, we plan to analyze the recall of the lexicons used in our experiments and to explore new lexicons in order to improve the classification.
On the other hand, it should be noted that we achieved higher ranking positions for Spanish sub-tasks. In particular, our best participation has been in the E-c subtask. An important difference found between the classification in both languages was that taking stopwords into consideration contributes to the emotion classification for Spanish while the opposite occurs for English. Therefore, we will further study this issue in order to incorporate an specific treatment to those stopwords that can modify the meaning of a sentence, such as negators, intensifiers and diminishers.

Conclusions
In this paper, we have presented the systems developed for our participation in 3 subtasks (EI-oc, EI-reg, E-c) of SemEval-2018 Task 1: Affect in Tweets. We have addressed these subtasks in two of the three available languages, English and Spanish. Overall, we have obtained better results in Spanish subtasks than in English subtasks. In future works, we plan to continue working on emotion recognition in Spanish because we have observed that the participation in this language is very low, although it is the second most spoken language. Our next study will focus on exploring more affect lexicons because in WASSA-2017 Shared Task on Emotion Intensity (Mohammad and Bravo-Marquez, 2017), it was demonstrated that using features from affect lexicons is beneficial for this task. Moreover, we will study the use of stopwords in Spanish because in the development phase it was observed that stopwords contribute to the emotion classification.