A deep-learning framework to detect sarcasm targets

In this paper we propose a deep learning framework for sarcasm target detection in predefined sarcastic texts. Identification of sarcasm targets can help in many core natural language processing tasks such as aspect based sentiment analysis, opinion mining etc. To begin with, we perform an empirical study of the socio-linguistic features and identify those that are statistically significant in indicating sarcasm targets (p-values in the range(0.05,0.001)). Finally, we present a deep-learning framework augmented with socio-linguistic features to detect sarcasm targets in sarcastic book-snippets and tweets.We achieve a huge improvement in the performance in terms of exact match and dice scores compared to the current state-of-the-art baseline.


Introduction
Computational sarcasm is a very well studied research area in computational linguistics (Joshi et al., 2017). Sentiment analysis and opinion mining of sarcastic texts are known to be difficult problems (Pang et al., 2008). For instance, in aspect based sentiment analysis, which deals with the identification of sentiment expressed toward different aspects or dimensions of the entities present in the text, it is very important to identify the sarcasm targets and sentiments toward them in the texts. Thus, if a user expresses a sarcastic utterance such as "My laptop has an awesome battery life that lasts for 15 minutes", the tool should recognize that the speaker is expressing a negative sentiment toward the battery life of the laptop, even though, it has a positive sentiment word 'awesome' in it. Similarly the opinion mining tool should identify the negative opinion of the user expressed toward the entity "battery". Sarcasm target identification can also benefit natural language generation; for example, after detection of entity toward which a negative sentiment is expressed in a sarcastic text, a natural language generation system will have more context to generate a response. Similarly, a sentiment analysis tool will flag the sentiment in a sarcastic text toward the correct aspect of a product or the entity which can help to build a more accurate product review. In this paper we present a novel method for sarcasm target identification with the help of deep learning techniques in addition to a set of socio-linguistic features.
There is a lot of literature that deal with the sarcasm detection in text (Joshi et al., 2017), but only Joshi et al. (2018) have addressed the problem of sarcasm target identification. The sarcasm target is defined as the entity or situation that is being mocked or ridiculed at in the sarcastic text. Formally, the sarcasm target identification is defined as the task of building a system that takes a sarcastic text (book snippets, tweets etc.) as input, and either identifies a subset of words as sarcasm targets or outputs a fall-back label 'outside' if the target is not present in the text. For example in the sarcastic text "I love to be ignored", the target is "I". We consider two assumptions in this work as the same has been done in the baseline, -(a) every sarcastic text has at least one sarcasm target as this holds true by the definition of sarcasm, and, (b) the notion of sarcasm target is applicable for sarcastic texts only.
Sarcasm target identification is a difficult task, the primary reasons being, • Multiple candidate phrases: There can be multiple target candidate phrases present in the sarcastic text. For example, in the sarcastic text, "The laptop heats up so much that I strongly recommend chefs to use it as a cooktop", the target candidates could be 'chefs', 'cook-top' and 'laptop'; however, only 'laptop' is ridiculed in this sentence.
• Multiple sarcasm targets: There can be multiple sarcasm target phrases present in the sentence. For example, in the sarcastic text, "I used to be a middle-of-the-road kid, but now with my freaky looks I'm definitely an outsider. Hooray.", have two sarcasm targets, i.e., 'my freaky looks' and 'I'. • Absence of any target: It is also possible that no sarcasm target is present at all in the sarcastic text. For example in the sarcastic text "Oh, and I suppose the apples ate the cheese." the sarcasm target has to be labelled as 'outside'. The main contributions and results of this paper can be summarized as, • An empirical study of the socio-linguistic features that are highly significant in identifying sarcasm targets. • A novel deep learning framework augmented with socio-linguistic features to detect sarcasm targets in sarcastic texts. We achieve a huge improvement over Joshi et al. (2018) in sarcasm target detection in terms of the evaluation metrics -exact match and dice score. In this paper our main motive was to establish that deep neural machinery can be effectively married with socio-linguistic features to detect sarcasm targets. This exercise was a proof of concept to show that this marriage is indeed useful. The code we developed for this work is made freely available 1 .

Related works
Most of the papers in the area of computational sarcasm address the problem of sarcasm detection, i.e., classification of a text as sarcastic or nonsarcastic. Joshi et al. (2017) present a compilation of past works including the datasets, approaches, issues and trends in automatic sarcasm detection. They observe mainly three approaches to the sarcasm detection problem -semi-supervised extraction of sarcastic patterns Ptáček et al., 2014;Bouazizi and Ohtsuki, 2015;Riloff et al., 2013;Joshi et al., 2015), use of hashtag based supervision Abercrombie and Hovy, 2016), and use of contextual information for sarcasm detection (Hazarika et al., 2018;Wallace et al., 2014;Rajadesingan et al., 2015). Recently, Tay et al. (2018) presented an attention-based neural model to explicitly model contrast and incongruity. Kolchinski and Potts (2018) presented two methods for representing authors in the context of textual sarcasm detection; they show that augmenting a bidirectional RNN with these representations improves performance in sarcasm detection. Ghosh and Muresan (2018) did a thorough analysis of sarcasm markers in social media platforms like Twitter and Reddit; in their study they found that in Twitter while emoticons or emojies are the most discriminative markers to recognize sarcastic/ironic utterances, for Reddit the morphological markers (e.g., interjections, tag questions) are the most discriminative. In socio-linguistic literature even though there are many studies that observe propagation of hate speech (Ribeiro et al., 2018;Salminen et al., 2018) and abusive behaviour (Founta et al., 2018;Maity et al., 2018;Mathew et al., 2019a,b) in social media an in-depth analysis of how sarcastic message travels in social networks and how tweets around the targets behave is an area which social scientists need to investigate. To the best of our knowledge, only Joshi et al. (2018) addresses the problem of sarcasm target identification. This problem attempts to identify the entity toward which sentiment is expressed in a sentence which in turn can have a lot of applications. Our objective here is to leverage recent deep learning methods to escalate the overall performance on this task.

Dataset
We consider the dataset released by Joshi et al. (2018) for our experiments. The dataset has two types of sarcastic text --book snippets and tweets. There are 224 book snippets and 506 tweets present in the data. The sarcasm targets present in these book snippets and tweets are manually annotated by three well experienced linguists who have at least five years of linguistic annotation experience for tasks such as sentiment analysis, word sense disambiguation and other related works. For the book snippets the average length of the sarcasm target is 1.6 words while it is 2.08 words for the tweets. The average length of the whole snippets is 27.74 words whereas for the tweets this is 12.97 words. For annotation, the annotators are given a bunch of sarcastic texts and asked to identify which words represent the target that the author is mocking? In case the annotators do not find specific words in the text that corre-spond to a target, they label it as 'outside'.

Socio-linguistic features
In this section, we present various socio-linguistic features that show statistically significant differences between the words corresponding to the sarcasm targets and the rest of the words in the sarcastic text. The results are shown in Table 1. Some of the observations are, • The distribution of location (LOC) and organisation (ORG) named entities are significantly different for the sarcasm target words compared to the other words (p < 0.001). • The distribution of some of the POS tags (nouns, verbs, adjectives and modifiers) are significantly different for the target words compared to the other words. • We calculate the LIWC 2 and Empath (Fast et al., 2016) category fractional distributions across the target and the other words in the snippets and tweets. Certain categories as noted in the table are significantly different. The LIWC and Empath dictionary has many pre-defined categories (e.g., 'social', 'family' etc.). Analysis using these dictionaries has been done on different collection of tweets in many past research (Fink et al., 2012;Schwartz et al., 2013;Maity et al., 2016) which forms our primary motivation for this study.

Methodology
The architecture of our proposed system is shown in Figure 1. The input to our system is a sarcastic text concatenated with a dummy word at the end of the sentence. We proceed with the hypothesis that each word is a potential candidate to be a sarcasm target. Thus for each word in the sentence we create three components, (i) left context, (ii) right context, and (iii) a word representation for itself. Suppose the input sarcastic sentence is represented by a sequence of words w 1 , w 2 ... w N , w N +1 , where w N +1 is a dummy word. We append a start token and an end token respectively at the beginning and the end of this sentence. These two tokens are never be considered as center word, but act as the left context for the first (w 1 ) word and the right context for last dummy word (w N +1 ) respectively. Thus, for a word w K , where 1 <= K <= N + 1, the left context is defined as [< 2 http://www.liwc.net/comparison.php start > w 1 : w K−1 ] while the right context is defined as [w K+1 : w N +1 < end >]. Each word in the left context, right context and the central word are passed through an embedding layer to initialize them through pre-trained embeddings. We experiment with various pre-trained word embeddings like Glove, fast-text, elmo, BERT etc. The word representations are then passed to a LSTM or bidirectional LSTM (Bi-LSTM) layer or a target dependent LSTM (TD-LSTM) layer. In case of unidirectional LSTM (simple LSTM) layer, we keep the flow of hidden vectors in left context and right context as toward the center. Next we concatenate the hidden vectors of rightmost LSTM cell in left context, the central word LSTM cell hidden vector and the hidden vector of leftmost LSTM cell in right context, and pass them to a dense layer. In case of Bi-LSTM we concatenate both the forward and backward hidden vectors at each component before concatenating them again across the components. The dense representation is then concatenated with socio-linguistic features as we have obtained for the word w k , and passed to a linear layer with sigmoid activation function, for the classification of the center word as sarcasm target or not.
6 Experiments and results

Evaluation metrics
We consider two evaluation metrics -(i) exact match accuracy, and (ii) dice score, as has been also used in the baseline method (see (Joshi et al., 2018) for definitions).

Baselines
The baseline as described in Joshi et al. (2018) consists of two extractors joined by an integrator. The two extractors are (i) rule based, and (ii) statistics based. While the rule based extractor extracts candidate words for sarcasm target based on nine syntactic rules, the statistical extractor takes features such as lexical, POS tag, polarity, pragmatic features etc., and passes them to a classifier for the candidate word selection. The selected candidate words are then given as input to the integrator module, which is a hybrid 'AND' or 'OR' module, to select the final set of words as sarcasm targets.

Model setup and results
The setup: All the results reported are on 3-fold cross validation. Models were trained with Adam Optimizer having a learning rate 1e−5 and a batch Figure 1: Architecture of the proposed system. w K is the center word to be classified as target or not, [< start > w 1 : w K−1 ] is the left context and [w K+1 : w N +1 < end >] is the right context. size of 64. Best results were obtained with the Elmo embedding as initialization.
Results: We report the exact match and the dice score obtained from different variants of our model and compare them with the baseline in Table 2. We note that all the variants of our model outperforms the baseline approach by a large margin. Among non-augmented models, the variant with Bi-LSTM layer performs the best in most of the metrics. The dice score for the book snippets data is best when TD-LSTM is used. The augmentation of socio-linguistic features (BiLSTM layer + slf) the performance further leading to the establishment of new state-of-the-art in sarcasm target detection. In addition, for our model variants we also report the macro and micro F1-scores in Table 3. Once again the Bi-LSTM is indicative of the best performance in majority of cases. The macro-F1 and micro-F1 increases further for book snippets when augmented with socio-linguistic features.

Conclusion
In this work, we have presented a deep learning model for sarcasm target identification. We outperform the only available baseline by a large margin. We identify various socio-linguistic features that differentiate the target text from the rest of the snippet/tweet. When these additional sociolinguistic features are fused into our deep learning framework they seem to improve performance for both snippets and tweets establishing new state-ofthe-art for this problem.