ValenTO at SemEval-2018 Task 3: Exploring the Role of Affective Content for Detecting Irony in English Tweets

In this paper we describe the system used by the ValenTO team in the shared task on Irony Detection in English Tweets at SemEval 2018. The system takes as starting point emotIDM, an irony detection model that explores the use of affective features based on a wide range of lexical resources available for English, reflecting different facets of affect. We experimented with different settings, by exploiting different classifiers and features, and participated both to the binary irony detection task and to the task devoted to distinguish among different types of irony. We report on the results obtained by our system both in a constrained setting and unconstrained setting, where we explored the impact of using additional data in the training phase, such as corpora annotated for the presence of irony or sarcasm from the state of the art. Overall, the performance of our system seems to validate the important role that affective information has for identifying ironic content in Twitter.


Introduction
People use social media platforms as a forum to share and express themselves by using the language in creative ways and employing figurative language devices such as irony to achieve different communication purposes. Irony is closely associated with the indirect expression of feelings, emotions and evaluations, and detecting the presence of irony in social media texts is considered a challenge for research in computational linguistics, also for the impact on sentiment analysis, where irony detection is important to avoid misinterpreting the polarity of ironic statements.
Broadly speaking, under the umbrella term of irony two main concepts are covered: verbal irony and situational irony. Verbal irony is commonly defined as a figure of speech where the speaker intends to communicate the opposite of what is liter-ally said (Sperber and Wilson, 1986). Situational irony, instead refers to a contradictory or unexpected outcome of events (Lucariello, 2014). In Twitter we can find many examples both of verbal irony and of posts where users describe aspects of an ironic situation. Most of the proposed approaches to the automatic detection of irony in social media (Riloff et al., 2013;Buschmeier et al., 2014;Ptáček et al., 2014)take advantage of lexical factors such as n-grams, punctuation marks, among others. Information related to affect has been also exploited (Reyes et al., 2013;Barbieri et al., 2014;Hernández Farías et al., 2015). Other scholars proposed methods exploiting the context surrounding an ironic utterance (Wallace et al., 2015;Karoui et al., 2015). Recently, also deep learning techniques have been applied (Nozza et al., 2016;Poria et al., 2016).
This paper describes our participation in the SemEval-2018 Task 3. The aim of this task is to identify ironic tweets. ValenTO exploited an extended version of emotIDM , an irony detection model based mainly on affective information. In particular, we experimented the use of a wide range of affectrelated features for characterizing the presence of ironic content, covering different facets of affect, from sentiment to finer-grained emotions. Most theorist (Grice, 1975;Wilson and Sperber, 1992;Alba-Juez and Attardo, 2014) recognized, indeed, the important role of affective information for irony communication-comprehension.

The emotIDM model
Irony is a very subjective language device that involves the expression of affective contents such as emotions, attitudes, or evaluations towards a particular target. Attempting to take advantage of the emotionally-laden characteristics of ironic expressions, we relied on emotIDM, an irony detection model that, taking advantage of several affective resources available for English (Nissim and Patti, 2016), exploits various facets of affective information from sentiment to finer-grained emotions for characterizing the presence of irony in Twitter .
In ) the robustness of emotIDM was assessed over different Twitter state-of-the-art corpora for irony detection (Reyes et al., 2013;Barbieri et al., 2014;Mohammad et al., 2015;Ptáček et al., 2014;Riloff et al., 2013). The obtained results outperform those in the previous works confirming the significance of affective features for irony detection. An additional aspect to be mentioned about emotIDM is that it was designed to identify ironic content in a general sense, i.e. considering irony as a broad term covering different types of irony in tweets.

Task Description and Datasets
In the framework of SemEval-2018 was organized the Task 3 on Irony detection in English tweets (Van Hee et al., 2018). The main objective of this task is to identify the presence of irony in Twitter. It was divided in two different subtasks: 1. Task A: Ironic vs. non-ironic: to determine whether a tweet is ironic or not.
Organizers provided datasets for training and test labeled according the objectives of each subtask. The whole dataset was collected by exploiting a set of hashtags (#irony, #sarcasm and #not). Therefore, a manual annotation process was applied in order to minimize the noise in the data. For Task A, 1,911 ironic and 1,923 non-ironic tweets where provided. While for Task B, the distribution was: 1923 for nI, 1393 for vI, 213 for oI and 328 for sI. Participants were allowed to submit systems trained under two settings: constrained (C), where only the training data provided for the task should be used; unconstrained (U), where the use of additional data was permitted.

Our Proposal
We decided to participate to the shared task by using emotIDM. By analyzing the training data, an interesting characteristic was found: 857 out of 3,834 tweets contain an URL. From these tweets, 265 were belonging to the ironic class, while 592 were labeled as non-ironic. Notice that, in (Hernández-Farias et al., 2014), the authors found a similar behavior regarding URL information in the dataset provided by the organizers of SentiPOLC-2014 (Basile et al., 2014). Furthermore, Barbieri et al. (2014) exploited a feature for alerting the existence of an URL in a tweet; such feature was ranked among the most discriminative ones according to an information gain analysis. Since, information regarding to the presence of URL in a tweet has proven to be useful for detecting irony in Twitter, we decided to enrich emotIDM by adding a binary feature for reflecting the presence of URL in a tweet. Below, we describe our participation in the task.
Task A: Ironic vs. non-ironic We addressed this task as a binary classification by taking advantage of two of the most widely applied classifiers in irony detection: Decision Tree (DT) and Support Vector Machine (SVM) 1 . Moreover, we also included Random Forest (RF) as a classifier in our experiments 2 . We carried out a set of experiments for assessing the performance of the original version of emotIDM and the one including information concerning URL (emotIDM+URL). Besides, to investigate the contribution of the different sets of features in emotIDM further experiments were performed. Several classifiers were used in order to identify the most promising setting. As mentioned before, exploiting external data was allowed in the unconstrained setting. We took advantage of a set of corpora previously used in the state of the art in irony detection. We exploited data from a set of corpora collected exploiting different approaches: self-tagging or manual annotation or crowd-sourcing 3 . We exploited the corpora developed by (Reyes et al., 2013), (Barbieri et al., 2014), (Mohammad et al., 2015), (Ptáček et al., 2014), (Riloff et al., 2013), (Ghosh et al., 2015), (Karoui et al., 2017), and (Sulis et al., 2016). Besides, we also take advantage of an in-house collection of tweets containing the hashtags #irony and #sarcasm 4 . Table 1 shows the obtained results during the developing phase for Task A. We experimented with different sets of features and classifiers considering a five fold-cross validation setting.  SVM emerges as the classifier with the best performance in both C and U scenarios. We noticed that, when using SVM, adding the URL feature to emotIDM helps to improve the overall performance of our system. When we experimented by removing a set of features in emotIDM, a drop in the performance (in most of the cases) is observed. The results of the experiments with external data are higher than those using only the training data. The last row in Table 1 shows the obtained results when only affect-related features were used; even though there is a drop in the performance respect to the experiments using structural features, it seems that affective features on their own provide useful information for irony detection.
We participated in the subtask A by submitting two runs (constrained and unconstrained) exploiting the experimental setting with the best performance: emotIDM+URL with a SVM as classifier.

Task B: Different types of irony
Distinguishing between different kinds of ironic devices is still a controversial issue. In computational linguistics, only few research works have attempted to address such a difficult task (Wang, 2013;Barbieri et al., 2014;Sulis et al., 2016;Van Hee et al., 2016). We are interested in assessing the performance of emotIDM when it deals with different types of irony, in order to test if a wide variety of affective features can help in discriminating also in the finer-grained classification task here proposed. This could give some insights on the role of affective content among ironic devices having different communication purposes.
emotIDM+URL was trained with the dataset provided for Task B (constrained setting) to test the effectiveness of affective features in such finergrained task. We exploited the same classifiers than in Task A attempting to evaluate their performance when different classes of irony should be classified. Overall, the best performance was achieved by SVM (see Table 2). However, when the performance of each single class was considered, the best results were those obtained with DT. For this reason, we decided to combine two classifiers with the following criterion: the sI and oI classes are assigned by the DT; while irony and non-irony are assigned by SVM or RF. Table 2 shows the obtained results of the experiments carried out over the dataset for Task B. A five fold cross-validation was applied. From the results in Table 2, it can be noticed that when two classifiers are combined the performance of our model improves. The DT + SVM was selected as the system for participating in the Task B.

Results
The results of ValenTO participation in the shared task are summarized in Table 3. In Task A, on the official CodaLab ranking, we ranked in the 16 th position with the unconstrained version of our submission. When comparing our official result with the one obtained by the best-ranked system (0.7054), it can be noticed that the difference is lower than 0.1 in F-score terms. It is an interesting result considering that our system relies mainly on features covering different facets of affect in ironic tweets, and confirms the key role that such kind of affective information plays for detecting irony in Twitter. In addition, the organizers also provided separate rankings for constrained and unconstrained submissions. Our system ranked in the 17 th position in the constrained setting, while in the unconstrained one we ranked as 4 th . Moreover, the performance of our system seems to be stable in the two C and U settings. Concerning Task B, our system performed relatively well, considering that we did not apply further tuning to capture different ironic devices. We ranked in the 17 th position of 31 submissions in the Official ranking at CodaLab.

Discussion and Error Analysis
Data provided for the task were retrieved by exploiting hashtags #irony, #sarcasm and #not, which according to (Sulis et al., 2016) seems to label different kinds of ironic phenomena. We analyzed the gold standard labels provided by the organizers (where the ironic hashtags were also included in the tweets) in order to see the performance of our model for recognizing tweets labeled with distinct hashtags. Considering the results in Task A, we noticed that our system was able to identify all the three kinds of tweets without any kind of skew towards a particular hashtag.
It somehow confirms the robustness of emotIDM for recognizing irony in a broad sense.
Our system was able to correctly identify instances expressing an apparent positive emotional distress with an ironic intention, such as: Sunday is such a fun day to study #ew #saywhat and Yay I just love this time of the month...!. A special mention is for tweets labeled with #not. This hashtag is not always used for highlighting a figurative meaning. Our system was able to correctly identify instances containing #not when it was used for figurative meaning such as: Yay for Fire Alarms at 3AM #not, and also when it was used as part of the text in a tweet: #Myanmar #men #plead #not #guilty to #murder of #British #tourists. http://t.co/flrKr3H6Kl via @reuters.
For what concerns the performance of emotIDM in Task B, Table 4 5 shows that our model performed better in identifying tweets where verbal irony was expressed by means of a polarity contrast. Moreover, it was recognizing better "situational irony" than "other irony".  Since our model relies mainly on affective information, ironic instances lacking of subjectiverelated content are hard to recognize, as in: Being a hipster now is so mainstream. Oh, the irony. #hipster #irony. Moreover, we found some tweets where context information is crucial for capturing the ironic sense, like in: So there used to be a crossfit place here.... #irony #pizzawins http://t.co/9BDkxT9GFJ; or where the hashtag is the only signal for ironic intention.

Conclusions
In this paper, we described our participation at SemEval-2018 Task 3. We exploited an enhanced version of emotIDM. In our experiments, SVM emerges as the classifier with the best performance. The obtained results serve to validate the usefulness of affect-related features for distinguishing ironic tweets. As future work, it could be interesting to enrich emotIDM with features for capturing other kinds of information such as common-knowledge and semantic incongruity.