SSN MLRG1 at SemEval-2018 Task 3: Irony Detection in English Tweets Using MultiLayer Perceptron

Sentiment analysis plays an important role in E-commerce. Identifying ironic and sarcastic content in text plays a vital role in inferring the actual intention of the user, and is necessary to increase the accuracy of sentiment analysis. This paper describes the work on identifying the irony level in twitter texts. The system developed by the SSN MLRG1 team in SemEval-2018 for task 3 (irony detection) uses rule based approach for feature selection and MultiLayer Perceptron (MLP) technique to build the model for multiclass irony classification subtask, which classifies the given text into one of the four class labels.


Introduction
Humans have the natural ability to identify the sentiment or the irony intended in a review or comment. However, identifying the intention of the user is a difficult task for the machine. Detecting irony present in a text is critical to sentiment analysis since it will inverse the polarity of the sentiment inferred (Hernandez-Farias et al., 2015).
Choice of shops, books, movies, hotels and various other services and products is influenced by comments and reviews in social media to a large extent. Huge amount of data is available in the Internet about the choices people make and their reviews about it.
Irony in texts affects the polarity of the sentiment inferred from them. Since it gives the text a meaning that is just the opposite to what is actually said, it is called as a polarity reverser (Farias et al., 2016). Irony is studied in various disciplines such as linguistics, philosophy and psychology. Due to the frequent use of irony in social media, its detection has gained importance in natural language processing, which faces difficulty in achieving a high performance (Liu, 2012;Wallace, 2015). The potential applications of irony detection include text mining, author profiling, detecting online harassment and sentiment analysis (Van Hee et al., June 2018). SSN MLRG1 team has already worked in sentiment analysis tasks conducted in SemEval 2017 (Angel Deborah et al., 2017a,b).
We can identify three types irony namely verbal irony, situational irony and dramatic irony. Subtask B in task 3 is a multiclass classification task for classifying a given tweet to one of these four classes: 1. verbal irony realized through a polarity contrast, 2. verbal irony without such a polarity contrast, 3. situational irony, and 4. non-irony.

Related Work
Unlike factual information, sentiment analysis and opinion mining have to deal with subjective information. Consequently, for any problem, it is important to analyze opinions collected from many people and summarize them. Social and political discussions are much harder due to complex topic and sentiment expressions, instances of sarcasm and irony (Liu, 2012). Maynard and Greenwood (2014) discusses the need for analyzing sarcasm in social media. They have developed a hashtag tokenizer for GATE (General Architecture for Text Engineering) tool and detected the sentiments and sarcasm in hashtags. Ghosh and Veale (2016) have found deep neural networks to perform better compared to Support Vector Machines (SVM) for sarcasm detection. Hernandez-Farias et al. (2015) have used MLP for automatic irony detection using the basic features from sentiment analysis and observed that MLP yields better results, compared to Naive Bayes, decision trees, maximum entropy and SVM. Barbieri and Saggion (2014) have used random forest and decision tree for analyzing the irony and humour content in twitter dataset using Weka tool. They have used seven features for detecting imbalance, unexpectedness and common patterns.

System Overview
The system consists of the following modules: data extraction, preprocessing, rule based feature selection, feature vector generation and multilayer perceptron for classification.

Feature Engineering and Implementation
The dataset is cleaned and processed using functions from NLTK toolkit. We identified the keywords for irony detection using rule based feature selection. The selected features are formed into a Bag of Words (BoW) dictionary. For each sentence, feature vectors are generated by one-hot encoding method, using the sentence keywords and BoW dictionary. The feature vectors are given to the MLP and output class label is predicted. Error is calculated and backpropagated to update the weight vectors. Nadam (Nesterov-accelerated Adaptive Moment Estimation) algorithm is used for optimization.
The procedure for data preprocessing is outlined in Algorithm 2: Algorithm 2: Data preprocessing. Input: Input dataset. Output: Tokenized words and their parts of speech. begin 1. Separate labels and sentences. 2. Perform tokenization using word tokenize function of the NLTK toolkit. 3. Perform Parts of Speech tagging using pos tag function from the NLTK toolkit. 4. Return the tokenized words and their parts of speech which will be given as inputs to rule based feature selection. end The procedure for rule based feature selection and feature vector generation is outlined in Algorithm 3: Algorithm 3: Rule based feature selection and feature vector generation Input: Tokenized words and their parts of speech. Output: BoW feature representation with labels. begin For the test dataset, preprocessing is done and the feature vectors are generated from the training data BoW representation. The feature vectors are given as input to the learned model and the predicted output labels are stored.

MultiLayer Perceptron
MLP is a feedforward artificial neural network for supervised learning. MLP can be used for both classification and regression tasks. It consists of an input layer, one or more hidden layers and an output layer. Each neuron in one layer is fully connected to the neurons in next layer. Number of neurons in the output layer depends on the number of class labels in the given problem.
Each connection has a weight assigned to it. Output of each neuron is calculated by applying an activation function on the weighted sum of the inputs. Some of the common activation functions are linear, sigmoid, tanh, elu, softplus, softmax, relu, relu6, crelu, selu and relu x.
Error value is calculated from the value predicted by the output layer and the actual class label. This error value is backpropagated and the weights and biases are updated. This procedure is repeated for the feature vectors of each input sentence. The whole procedure is repeated for some n iterations or until the error value converges to a value below a threshold. Figure 1 depicts a simple MLP model, consisting of a single hidden layer. It takes four inputs and produces one output. The dataset consists of 4792 English tweets that are collected between 01/12/2014 and 04/01/2015 from 2676 unique users. The entire corpus is split into training (80%) and test (20%) sets. The tweets are manually labeled using a fine grained annotation scheme for irony (Van Hee et al., 2016). The training dataset is further divided into training set and development test set for system building.

Performance Evaluation
The performance of the system is measured using accuracy, precision, recall and F1-score, using formulas shown in Equations 1 to 4.
where TP denotes True Positive, TN denotes True Negative, FP denotes False Positive, FN denotes False Negative and N denotes total number of tweets.
The optimization of the model was performed using different gradient descent algorithms such as SGD, adam, adaGrad, RMSProp and nadam. Adam and nadam are the widely used optimizers. Adam (Adaptive Moment Estimation) computes the adaptive learning rates using momentum and RMSProp. Momentum points the model in the best direction, while RMSProp adapts how far the model goes in that direction on parameter basis. Nadam combines Nesterov momentum with Adam which is superior to momentum. (Dozat, 2016).
We split the training set into training set (80%) and development test set (20%). The different optimization algorithms were used with the model and nadam optimizer produced better results compared to other algorithms for the development test set.
There are 32 submissions for this particular task. The model has achieved the following values for the various measures as listed in Table 2  From the result, it appears as if the basic text features selected by rule based approach is not enough to detect the irony level in the given text. Additional features like emoticons and hashtags can be added to the feature set to enhance the performance.

Conclusion
We built a basic MLP to detect the irony level in twitter text, which has an input layer, two hidden layers with 128 and 64 neurons, and an output layer with 4 neurons for the four class labels. Relu activation function was used in both hidden layers and softmax activation function in output layer. The various optimizers such as SGD, RMSprop, adam, adagrad, and nadam were tried. Nadam optimizer performed better than others.
The text features were taken into consideration for BoW representation. Since irony renders an opposite meaning to the text, it is difficult to detect the irony from the text features alone. The system performance can be enhanced with the emoticon and hashtag information. The performance can also be improved by doing tweet normalization before the feature selection. The accuracy of system can be increased by using deep neural networks such as Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN). Feature selection techniques can be enhanced with semantics and lexicon information.