KDE-AFFECT at SemEval-2018 Task 1: Estimation of Affects in Tweet by Using Convolutional Neural Network for n-gram

This paper describes our approach to SemEval-2018 Task1: Estimation of Affects in Tweet for 1a and 2a. Our team KDE-AFFECT employs several methods including one-dimensional Convolutional Neural Network for n-grams, together with word embedding and other preprocessing such as vocabulary unification and Emoji conversion into four emotional words.


Introduction
With the rapid spread of SNS services (e.g. Twitter, Facebook, Instagram), massive user opinions have accumulated on the Internet. Among such opinions, it has been observed that not a few SNS contents naturally entail the affects (including joy, anger, sadness, fear) within themselves. Hence, the need to accurately detect the affects is increasing year by year.
In SemEval-2018 Task 1: Estimation of Affects in Tweet, we have attempted to extend our horizon from positive, neutral, and negative polarity estimations in former SemEval sentiment analysis in tweet having been held till 2017, to multiple emotions (joy, anger, sadness, and fear) in terms of regression (Task-1 1a) and classification . In doing so, we have adopted a standard one-dimensional Convolutional Neural Network (CNN), which is believed to be effective for text polarity estimation, where the kernel window size for 1D convolution is analogous to the concept of word n-gram. In addition, as most people have noticed, a tweet has potentially many Emojis to express emotions. In the following, we first briefly survey related work on tweet sentiment analysis including emotion estimation. Then, we describe our system, followed by showing the results returned from the organizer, and finally concluding our paper.

Related Work
Sentiment analysis of tweets has been studied by many researchers from the standpoint of classifying a tweet into either positive or negative polarity, and classifying it into multiple emotions (Giachanou and Crestani, 2016;Silva et al., 2016). A supervised approach to polarity classification of a tweet was proposed by Go et al. (2009). They employed Naive Bayes, Maximum Entropy, Support Vector Machine, and several other machine learning methods for their supervised learning. Bravo-Marque et al. (2013) presented an approach using multiple emotion dictionaries, while Saif et al. (2016) employed cooccurrence information of words. Severyn et al. (2015) introduced a deep learning approach. Lu et al. (2013) proposed a deep learning method suited for short texts. In SemEval, since 2014, sentiment analysis tasks using Twitter have been officially conducted, where a variety of methods have been tested (Hagen et al., 2015;Giorgis et al., 2016;Deriu et al., 2016;Rouvier and Favre, 2016;Xu et al., 2016). In SemEval2017 Rosenthal et al. (2017), Cliche et al. (2017 and Hamdan et al. (2017) presented methods for combining multiple Convolutional Neural Networks (CNNs) and multiple Long Short-Term Memories (LSTMs).  published an open dictionary of emotion scores for each word. Mohammad et al. published a dataset for estimating emotion intensities (Mohammad and Bravo-Marquez, 2017).

Methodology
In this section, we focus on our methods and ideas employed in this task. The fundamental idea of our method is based on the observation that "ngrams" seem to have vital effects to represent the emotion of a tweet, where "n-gram" denotes n consecutive words (instead of n consecutive char-  We have adopted the method based on the ngram convolution proposed by Kim (2014). Here, we prepare a matrix corresponding to a sentence representing an n-gram convolution in which this filtering process is carried out by the unit of an ngram.
The overview of our system is as follows: First we apply preprocessing with "vocabulary unification" including lower case conversion, URL unification, two or more consecutive character squeezing, and hashtag elimination. Second, we apply Emoji conversion into four emotional words, which will be elaborated later. From Emoji conversion, we train the model independently for each emotion. Finally, we predict the emotion score for an unknown tweet by using the trained model. The overall system flow is shown in Figure 1.

Preprocessing with Vocabulary Unification
This step is applied to all emotions. It consists of the following processing: • lower case conversion • conversion of every instance of a URL string in a tweet to "<URL>" • collapse of two or more consecutive letters into two • elimination of hash sign (#) It should be noted that by a url string we mean a regular expression starting with either "http", "https", "ftp", or "www". Any url string is converted to <URL>. For example, "I want to be happy on http://t.co/S6moxr1U" is converted to "I want to be happy on <URL>".

Preprocessing for Emoji
From our observation of real tweets, approximately more than 20% of them have some kind of Emojis. Emotions are naturally represented by many different Emojis. Hence, we introduce the conversion of possible emotions represented by an Emoji into each emotional word. Please note that Emoji preprocessing is applied to all Emoji data, regardless of emotions. For instance, each anger Emoji might appear not only in an Anger dataset, but in Fear, Joy, and Sadness datasets as well. This is why we have decided to apply the Emoji conversion despite the differences of emotions. In the following, we present Emoji for each emotion, where Emoji has been taken from a Full Emoji Web site 1 . The selection of Emoji has been made by using the labels (such as "face-positive") annotated to the above Web sites.

Anger Emoji
The Anger Emojis we selected are shown in Figure 2. All of them are replaced by "anger".

Fear Emoji
The Fear Emojis we selected are shown in Fig. 3. All of them are replaced by "fear".

Joy Emoji
The Joy Emojis we selected are shown in Fig. 4. All of them are replaced by "joy".

Sadness Emoji
The Sadness Emojis we selected are shown in Fig. 5. All of them are replaced by "sadness".

Convolutional Neural Network for
n-gram Once preprocessing is done, we have a kind of rectified tweet, represented by a matrix. Figure 6 illustrates a word-by-word matrix representation of a rectified tweet. Here we take a matrix of 80 by 300, where 80 is the maximum number of words per tweet, and 300 corresponds to our embedding vector size. If a tweet has less than 80 words, zero padding is performed to fill the input matrix.

Embedding
In Embedding, each tweet is converted to a matrix. Specifically, we first divide a tweet into words using a whitespace, thereby treating a special character (one of ".h, ",h, "!h, and "?h ) as a separate word. Second, we transform each word into its distributed representation of 300 dimensions using Word2Vec (Mikolov et al., 2013a,b). The training of Word2Vec itself is done by using approximately 470 million tweets after the processing de-

n-gram Convolution Layer
In an n-gram convolutional layer, we perform convolution, and generate a length m − n + 1 vector, where m denotes the maximum word length (here 80). This is straightforward, since both ends are trimmed during the n-gram convolution stage shown in Figure 6. For instance, if 3-gram is concerned, the length in our implementation will be 80-3+1 = 78. Note that we have multiple n-gram convolutional layers for each emotion. "Joy" neural network architecture, for example, has 2-gram, 3-gram, 4-gram, and 5-gram convolutional layers, which will be discussed later in Table 3.

Max Pooling Layer
In a Max Pooling layer, from each n-gram convolutional layer, the maximum value is computed,  and a vector of the length equal to the number of filters is generated. In our system, the output of four multiple n-gram convolutional layers are flattened and concatenated in the subsequent layers. The output dimension is the number of filters multiplied by the number of n-gram convolutional layers.

Fully-connected Layers
In our system, we have two fully-connected layers, where the first hidden fully-connected layer accepts the input from the concatenation layer connceted from multiple max pooling layers. Empirically, we set 30 outputs for the first fullyconnected layer. The second layer outputs either the estimated intensity value of an emotion (Task 1a) or the estimated intensity level (Task 2a). The way to estimate the intensity level (Task 2a) is elaborated in the next section.

Estimating Intensity Level (Task 2a)
For Task 2a, we need to estimate the intensity level. Specifically, participants are required to classify the emotional intensities into four levels; high amount, moderate amount, low amount, and nothing. Our strategy for the amount of emotional intensity amount level is simple, which is based on the inferred intensity range as shown in Table 1. In Table 1, the left column denotes the range of emotional amount that we have defined for this task. For example, [0.0, 0.35) means that the left boundary 0.0 is inclusive, while the right boundary 0.35 is exclusive in the range of the amount of emotion.

Experiments
Here we describe the experimental environment and our evaluation results.

dataset
All participants are given SemEval 2018: Task 1 Affect in Tweets (AIT) (Mohammad et al., 2018) dataset. The details are shown in Table 2.

Evaluation Measure
Here, the evaluation measure for a model is correlation coefficient r. Given variables x and y, where x corresponds to a predicted emotion value and y to a true emotion value, and their associated sample variances S x , S y , and the covariance S xy are represented by the following equation:

Experiment Environment
For our deep learning program for the task, we used the following list of hyper-parameters: The framework we use is Keras with backend Tensorflow. In our Ubuntu server, it took approximately 1 second for each epoch.

Preliminary Experiments for n-gram Convolutions
For each emotion, our system attempts to find an empirical optimal combination of n-gram convolutions.     we use similar notations as in Table 3, and pick up the top-3 ranked teams, as well as randomly chosen teams CrystalFeel (14-th place), ELipRF-UPV (15-th place), iit delhi (29-th), DeepMiner (31-th), and the baseline (37-th), the result looks like Table 4.

Experimental Result (Task 2a)
According to the Official Leaderboard for Task 2a, our team KDE-AFFECT turned out to be 17-th. If we use similar notations as in Table 3, and pick up top-3 ranked teams, as well as randomly chosen teams UNCC (9-th place), ECNU (16-th place), CrystalFeel (14-th place), and the baseline (26-th), the result looks like Table 5.

Conclusion
This paper describes the approach we took for SemEval-2018 Task 1: Affect in Tweets (subtasks 1a and 2a). We have chosen a combination of different n-gram convolutions with preprocessing including vocabulary unification and Emoji conversion.