SSN MLRG1 at SemEval-2018 Task 1: Emotion and Sentiment Intensity Detection Using Rule Based Feature Selection

The system developed by the SSN MLRG1 team for Semeval-2018 task 1 on affect in tweets uses rule based feature selection and one-hot encoding to generate the input feature vector. Multilayer Perceptron was used to build the model for emotion intensity ordinal classification, sentiment analysis ordinal classification and emotion classfication subtasks. Support Vector Machine was used to build the model for emotion intensity regression and sentiment intensity regression subtasks.


Introduction
Twitter is a huge microblogging service with more than 500 million tweets per day from different locations of the world and in different languages (Saif and Felipe, 2017). Tweets are often used to convey ones emotions, opinions towards products, and stance over issues (Nabil et al., 2016). Automatically detecting emotion intensities in tweets has several applications, including commerce (Jansen et al., 2009), crisis management (Verma et al., 2011), tracking brand and product perception, tracking support for issues and policies, and tracking public health and well-being (Chew and Eysenbach, 2010). The task is challenging because of the informal writing style, the semantic diversity of content as well as the "unconventional" grammar. These challenges in building a classification model and regression model can be handled by using proper approaches to feature generation and machine learning.

Multi-Layer Perceptron
A Multilayer Perceptron (MLP) as shown in Figure 1 is a feed-forward Neural Network model that maps input data sets onto appropriate output sets. An MLP has many layers of nodes in a directed graph, with each layer connected to the next layer. A neuron is a processing element with activation function (in the input layer the activation function is not applied). The output layer has as many nuerons as the number of class labels in the problem. Each connection has a weight assigned to it. Output of each neuron is calculated by applying the activation function on the weighted sum of the inputs. Linear, sigmoid, tanh, elu, softplus, softmax and relu are some of the commonly used activation functions. The supervised learning problem of the MLP can be solved with the backpropagation algorithm (Haykin, 1998). The algorithm consists of two steps. In the forward pass, the predicted outputs are calculated for the given inputs . In the backward pass, partial derivatives of the cost function with respect to the weight parameters are propagated back through the network. The chain rule of differentiation gives very similar computational rules for the backward pass as the ones for the forward pass. The network weights can then be adapted using any gradient-based optimisation algorithm.
MLP was used in implementing for the following subtasks: 1. EI-oc (an emotion intensity ordinal classification task): Given a tweet and an emotion E, classify the tweet into one of four ordinal intensity classes of E that best represents the mental state of the tweeter. 2. V-oc (a sentiment analysis, ordinal classification task): Given a tweet, classify it into one of seven ordinal classes, corresponding to various levels of positive and negative sentiment intensity, that best represents the mental state of the tweeter. 3. E-c (an emotion classification task): Given a tweet, classify it as neutral or no emotion or as one, or more, of eleven given emotions that best represent the mental state of the tweeter.

Support Vector Regression
Support Vector Machines (SVM) are characterized by usage of kernels, absence of local minima, sparseness of the solution and capacity control obtained by acting on the margin or on number of support vectors, etc. As in classification, support vector regression (SVR) is characterized by the use of kernels, sparse solution, and Vapnik-Chervonenkis control of the margin and the number of support vectors. Although less popular than SVM, SVR has been proven to be an effective tool in real-value function estimation (Awad and Khanna, 2015). The idea of SVR is based on the computation of a linear regression function in a high dimensional feature space where the input data are mapped via a non-linear function. It contains all the main features that characterize maximum margin algorithm: a non-linear function is learned by a linear learning machine mapping into high dimensional kernel-induced feature space. The capacity of the system is controlled by parameters that do not depend on the dimensionality of feature space. Instead of minimizing the observed training error, Support Vector Regression (SVR) attempts to minimize the generalization error bound so as to achieve generalization perfor-mance. SVR was used in implementing for the following subtasks: 1. EI-reg (an emotion intensity regression task): Given a tweet and an emotion E, determine the intensity of E that best represents the mental state of the tweeter, a real-valued score between 0 (least E) and 1 (most E). 2. V-reg (a sentiment intensity regression task): Given a tweet, determine the intensity of sentiment or valence (V) that best represents the mental state of the tweeter, a real-valued score between 0 (most negative) and 1 (most positive).

System Overview
The system consists of the following modules: data extraction, preprocessing, rule based feature selection, feature vector generation and building the model -Multilayer Perceptron model for clasification subtasks and Support Vector Regression for regression subtasks. The algorithm for preprocessing of the data is outlined below: Algorithm: Data extraction and Preprocessing. Input: Input dataset. Output: Tokenized words and their parts of speech. begin 1. Separate labels and sentences 2. Perform tokenization using word tokenize, the function for tokenizing in the NLTK toolkit. 3. Perform Parts of Speech tagging using pos tag function from the NLTK toolkit. 4. Return the tokenized words and their parts of speech as inputs to rule based feature selection. end The algorithm for rule based feature selection and feature vector generation is outlined below: Algorithm: Rule based feature selection and feature vector generation.

Performance Evaluation
We evaluated the system only for English language. The results obtained using MLP and SVR for the subtasks are tabulated in Table 2 to Table 6. From Table 2 which shows the Pearson scores obtained for SVR, we can infer that SVR predicts joy better compared to anger, fear and sadness. Similarly, from Table 3 which shows the Pearson scores obtained for MLP, we observe that MLP model predicts joy better compared to anger, fear and sadness. The Pearson score for valence intensity regression and sentiment intensity ordinal classification are given in Table 4    Pearson (all instances) Pearson (gold in 0.5-1) 0.582 0.424 Pearson (all classes) Pearson (some-emotion) 0.427 0.479 The Pearson score r is calculated using the Equation 1.
Accuracy Micro-avg F1 Macro-avg F1 0.468 0.595 0.476 where Y is actual output and y is predicted output. The accuracy, micro-averaged F score and macro-avearged F score for emotion classification are given in Table 6. The metrics are defined from Equations 2 to 5.
where G t is the set of the gold labels for tweet t, P t is the set of the predicted labels for tweet t, and T is the set of tweets.
Micro-avearge F = 2 × micro-P × micro-R micro-P × micro-R (3) where micro-P is micro-averaged precision and micro-R is micro-averaged recall where P e is precision, R e is recall and E is the given set of eleven emotions.

Conclusion
We have presented the results of using MultiLayer Perceptron for emotion intensity ordinal classification, sentiment analysis ordinal classification and emotion classification. We built a basic MLP, which has an input layer, two hidden layers with 128 and 64 neurons, and an output layer with as many neurons as the number of class labels. We have used nadam optimizer with learning rate as 0.01. We have also presented the results of using Support Vector Regression for emotion intenisty and sentiment intensity regression. It is observed that both MLP and SVR predict joy more accurately when compared to anger, fear and sadness. We analyzed the feature vectors generated for various emotions. Feature vectors generated for joy helps to achieve better results than for other emotions. We used rule based feature selection and one-hot encoding to generate input feature vectors for buliding the models. The results obtained can be enhanced by using different feature selection approaches and incorporating sentiment lexicons.