deepCybErNet at EmoInt-2017: Deep Emotion Intensities in Tweets

This working note presents the methodology used in deepCybErNet submission to the shared task on Emotion Intensities in Tweets (EmoInt) WASSA-2017. The goal of the task is to predict a real valued score in the range [0-1] for a particular tweet with an emotion type. To do this, we used Bag-of-Words and embedding based on recurrent network architecture. We have developed two systems and experiments are conducted on the Emotion Intensity shared Task 1 data base at WASSA-2017. A system which uses word embedding based on recurrent network architecture has achieved highest 5 fold cross-validation accuracy. This has used embedding with recurrent network to extract optimal features at tweet level and logistic regression for prediction. These methods are highly language independent and experimental results shows that the proposed methods are apt for predicting a real valued score in than range [0-1] for a given tweet with its emotion type.


Introduction
Internet has become an essential platform to carry out daily activities to our lives. People use social media resources like Twitter, Facebook, What-sApp, Hike, WeChat etc. to share their language such as views or emotions, stance over issues, reviews related to products, services, blogs etc. In recent days, the amount of language sharing through the internet is ubiquitous. This necessitates the need of analyzing reviews to identify the emotions including estimating the degree to which an emotion is expressed in text. Unlike natural language, the user reviews are small; rich information is represented through nonstandard language such as emoticons, emojis, creatively spelled words (happee), and hash-tagged words (#happy). These factors can make a high influence on the social and economic behavior worldwide like real-world applications such as marketing, e-Governance, business intelligence, social analysis and applications in Natural Language Processing (NLP) -information extraction, question answering, textual entailment, etc. Many methods have been introduced by researchers for emotion annotation work. This gives binary labels for the given text (Alm et al., 2005), (Aman and Szpakowicz, 2007;Brooks et al., 2013), (Neviarouskaya et al., 2009), (Bollen et al., 2011), (Summa et al., 2016. only one annotation work exists for providing a real valued score as annotation for a given text (Strapparava and Mihalcea, 2007). This was a task included in the SemEval-2007 shared task. Many methods devised for automatic emotion classification (Werbos, 1990), (Summa et al., 2016, (Mohammad, 2012), (Bollen et al., 2011), (Aman and Szpakowicz, 2007), (Brooks et al., 2013). However, only less amount work exists on emotion regression other than SemEval-2007 shared task (Strapparava andMihalcea, 2007).
In this paper, we use Bag-of-Words (BOW) and a Bag-of-Words (BOW) based recurrent embedding system for predicting a real valued score in the range [0-1]. In first case, BOW is used to obtain the feature representation for the tweets and classification is done using logistic regression. We also employed an RNN and LSTM based method for mining the features at tweets level. These methods are language independent. So irrespective of the language, we can use these approaches for finding the stance of micro blogging posts.
The rest of the paper is organized as follows. Section 2 discusses information of shared task. Section 3 discusses the proposed methodology.   -Marquez, 2017). The aim of the task is to obtain a real valued score in the range [0-1] for the given tweet including an emotion type . The tweets in training, validation and testing are from four different categories such as anger, fear, joy, sadness. Each tweet has an emotion type with its score in the range [0-1], where 0 denotes that the tweet has maximally away from its emotion and 1 denotes that the tweet has maximally closer to its emotion . The detailed statistics of the data set is described in Table 1.

Methodology
This section provides the information of the proposed approach for predicting a real valued score in the range [0-1] for a given twee with an emotion type . We used two approaches; (1) Bag-of-words (BoW) based word embedding(2) Recurrent Neural Network (RNN) based word embedding

Bag-of-words based system for Emotion Intensities in Tweets
The embedding size was set to 256 so that each word is now represented using a 256 dimension vector and word length to 70. Anger, Fear, Joy and Sadness have 857, 1147, 823 and 786 instances.
We constructed matrix of shape 857X70, 1147X70, 823X70 and 786X70 for training instances and 84X70, 110X70, 79X70 and 74X70 for development instances. Next, we replace each word with their corresponding word embedding and this forms an input tensor of shape 857X70X256, 1147X70X256, 823X70X256 and 786X70X256 for training instances and 84X70X256, 110X70X256, 79X70X256 and 74X70X256 for development instances.
At last, an input tensor is transformed to matrix of shape 857X256, 1147X256, 823X256 and 786X256 for training instances and 84X256, 110X256, 79X256 and 74X256 for development instances using max-pooling approach. These matrices are passed to logistic regression and a real valued score is chosen for a given tweet with an emotion type using argmax function.

Recurrent neural network (RNN) based system for Emotion Intensities in Tweets
Recurrent neural network is largely used deep learning architecture for sequence data modeling. This has achieved significant results in various tasks exists in the field of natural language processing (LeCun et al., 2015). It generally looks same as feed forward networks (FFN), additionally contains self-recurrent connection in units (Elman, 1990). This cyclic loop carries out information from one time-step to another. Consequently, RNN are able to learn the temporal patterns by considering the past information in estimating the present states. Generally, RNN takes input as x t ∈ R n and hi t−1 ∈ R m of arbitrary length to compute succeeding hidden state vector hi t by using the following formulae recursively.
Where f is the nonlinear activation function, particularly logistic sigmoid function (σ) applied on element wise, hi 0 is usually initialized to 0 at timestep t0 and w xh ∈ R m×n , w hh ∈ R m×m and b ∈ R m are arguments of affine transformation.
Here o is the output at time step t. Using RNN approach, a system was implemented for predicting a real valued score in the range [0-1] for emotional intensities in tweets. By following the aforementioned mechanism, we constructed an input tensor of shape 857X70X256, 1147X70X256, 823X70X256 and 786X70X256 for training instances and 84X70X256, 110X70X256, 79X70X256 and 74X70X256 for development instances. So the embedding matrix for the tweets of size 70X256 in both training and development are now reduced to 256 dimensional vectors. So, embedding matrices of size 857X256, 1147X256, 823X256 and 786X256 were used as training samples and 84X256, 110X256, 79X256 and 74X256 were taken as development instances and then fed into the RNN layer followed by logistic regression for prediction.

Long short-term memory based system for Emotion Intensities in Tweets
RNN issues vanishing and exploding gradient issue in memorizing long-term dependencies (Bengio et al., 1994). To reduce, (Hochreiter and Schmidhuber, 1997) has introduced long shortterm memory (LSTM). Unlike RNN simple units in recurrent hidden layer, LSTM has introduced a memory block. A memory block is a complex processing unit that contains one or more memory cell, adaptive gates such as input gate and output gate and Constant Error Carousel (CEC). A memory block stores an information and updates them across time-steps based on the input and output gates. Input and output gate controls the input and output flow of information to a memory cell. Additionally, it is has a built-in value as 1 for constant Error carousel (CEC). This value will be activated when in the absence of value from the outside the signal. Moreover, (Gers et al., 1999) introduced forget gate, (Gers et al., 2002) introduced peephole connections to the memory block in LSTM. A forget gate facilitates to forget or reset the values across time steps and peephole connections helps to learn precise timing of the outputs. The newly proposed architecture has performed well in learning long-range temporal dependencies in various artificial intelligence (AI) tasks (LeCun et al., 2015). Generally, at each time step an LSTM network considers the following 3 inputs; x t , h t−1 , c t−1 and outputs h t , c t through the following equations Where x t is the input at time step t, σ is sigmoid non-linear activation function, tanh is hyperbolic tangent non-linear activation function, denotes element-wise multiplication. Concretely, at t = 0  hidden and memory cell state vectors such as h 0 and c 0 are initialized to 0. We followed subsections 3.1 and 3.2 to develop a LSTM based system for predicting a real valued score in the range [0-1] for a given emotion including its emotion type. This system is constructed by simple replacing RNN layer with LSTM.

Parameter Selection
To choose optimal parameter for embedding size, the LSTM model is trained with embedding size 128 and 256 and the performance of them is evaluated on the development data set. The detailed evaluation results are displayed in Tables 2 and 3. We didn't use any hyper parameter tuning mechanism for tweet length instead we used static length 70 in all our experiments.

Evaluation results
We have submitted one run based on LSTM based recurrent embedding system to WASSA2017 and the detailed results is displayed in Tables 4 and 5 Analysis of training results and testing results showed that there is a significant difference in the performance measure. This is due to the overfitting of the model to the training data because, a deep learning framework requires huge amount of data to learn the features. Unavailability of such

Conclusion
This working note has presented a language independent approach based on BoW and recurrent based embedding for predicting a real valued score in the range [0-1] for a given tweet with an emotion type. LSTM network has outperformed both bag-of-words embedding and recurrent based embedding mechanism. This is primarily due to the fact that LSTM has capability to learn longtemporal dependencies across time steps. Due to less number of instances in training data, the accuracy of the proposed mechanism is less. Though, the efficacy of embedding's of RNN and LSTM is considerable and paves the manner in future to use for predicting real valued score in the range [0-1] with more training instances including its emotion type. To justify that the proposed deep learning mechanism has capability to perform better with large amount of instances will be remained as one direction towards future work.