E-LSTM at SemEval-2019 Task 3: Semantic and Sentimental Features Retention for Emotion Detection in Text

This paper discusses the solution to the problem statement of the SemEval19: EmoContext competition which is ”Contextual Emotion Detection in Texts”. The paper includes the explanation of an architecture that I created by exploiting the embedding layers of Word2Vec and GloVe using LSTMs as memory unit cells which detects approximate emotion of chats between two people in the English language provided in the textual form. The set of emotions on which the model was trained was Happy, Sad, Angry and Others. The paper also includes an analysis of different conventional machine learning algorithms in comparison to E-LSTM.


Introduction
Emotions are the basic human quality that almost every human possesses. According, to a recent study by Glasgow University 1 , human emotions can be divided into six basic classes which are happiness, sadness, anger, fear, surprise and disgust, surprise being the most difficult one as both positive and negative statements can lead to a sense of surprise. For example, the statement Your application for CSE branch in Stanford University is accepted is positive and it leads to surprise whereas the statement Your brother met with an accident is a negative statement which all leads to a surprise.
Problem Statement: Given a text for three turn conversation, classify the emotion of the text in the following four categories -Happy, Sad, Angry, Others.
1 https://www.gla.ac.uk/news/ archiveofnews/2014/february/headline_ 306019_en.html 2 WhatsApp is used as a messaging platform to illustrate the three turn conversation approach Detecting human emotions only from the text is very difficult as the emotions are a combination of the situation and the facial expressions of a person (Cowie et al., 2001). So, merely classifying it from the conversation is not a very accurate way.
In this paper, I have proposed an extended approach to the original model (Chatterjee et al., 2019a) which combines deep learning along with some techniques used in Natural Language Processing(NLP) using semantic and embedding approach (Franco-Salvador et al., 2018;Shivhare and Khethawat, 2012) called as "Emotion LSTM" or E-LSTM to detect emotions in the provided training set. The E-LSTM is a combination of both count-based and predictive techniques which are widely used in Natural Language Processing.

Approach
My approach in solving the given problem statement was to maintain the semantic and sentimental relationship among the words (Gupta et al., 2017). So, as shown in Figure 2, I modeled the architecture such that the lower part contains the embeddings for sentiment analysis whereas the upper part contains the embeddings for maintaining a semantic relationship. The embeddings are then passed onto a network of LSTM layers which memorize the relationship among the words. The output of the final LSTM cell is then flattened and is combined with the output of the LSTM cell in the other half. The combined matrix is then passed as an input to a dense network with two sub-levels  whose output is then treated as a probability for the given four possible emotions using Softmax function.

Training Dataset
For the EmoContext SemEval-2019 Task 3, I was provided initially with a training dataset of about 30,000 entries containing 3 turn conversation and labels corresponding to each conversation. After successfully completing the first round, I was then provided with a final training dataset of about 2,700 entries. Statistics of both the datasets are shown in Table 1. For the first phase, I proceeded with the provided dataset as a whole fro training whereas, in the second phase, I merged the provided new dataset with the dataset of Phase I and then used it for the model training.

Handling Repetition and Emoticons
After thoroughly analyzing the provided dataset, it was observed that emoticons were frequently used in the statements to describe the feeling or to end the statement. Similarly, special characters like . Other than normal preprocessing, the emoticons were also stored according to the sentence index in a dictionary and were used at the last step to verify if the predicted emotion matches partly or fully with the emotion depicted by used emojis using a weighted approach.

Embedding Layers
The main challenge in the architecture of the model was to identify a proper embedding layer  Table 3. Qualitative Analysis of baseline models along with proposed E-LSTM model to increase the model accuracy. The initial evaluations were passed only by using the baseline structure of GloVe embedding along with LSTM layers which proved to be costly as the micro F1 score that I got was comparatively less (about 0.57 for phase I and 0.61 for phase II) whereas the training time for significantly high. So, the accuracy of the model was improved through maintaining the semantic and syntactic features of statements intact by using the two novel types of research in the Natural Language Processing field which are Word2Vec(Word to Vector) (Mikolov et al., 2013;Rahmawati and Khodra, 2016) and GloVe(Global Vectors) (Baroni et al., 2014;Pennington et al., 2014) embedding layers. The Word2Vec embedding layers maintained the sentiments of the provided text whereas the GloVe embedding maintained the semantic feature of the text. As shown in Table 2, Word2Vec was better in classifying a relationship between sad and :( as it is a predictive model and was thus trained accordingly, whereas GloVe embedding was better in classifying relation between words better and great as the approach is completely based on counting i.e. counting involved in matrices operation.

Model Training
For

Experimental Setup
In this section, I have described the statistics of my testing data along with a comparison of the obtained results with other models. I have also discussed some of the glitches that are evident in my model in the latter half.

Test Dataset
Similar to training dataset, test dataset was also provided in both the phases i.e. initial phase and final phase. But before the System-Design submission, one gold test dataset was also provided to test the model if it's changed before paper submission. All the three test dataset files contained an index number and three turn conversation as their entry.

Baseline Approaches
For comparison and proving my model better, I compared it with two different categories -1. Machine Learning based and 2. Deep Learning based For Machine Learning based baseline models, I have used Naive Bayes(NB) and Support Vector Machine(SVM). As the used models are inefficient with large datasets, so I used a subset of provided dataset to train them.
For Deep Learning based baseline models, I have used normal Convolutional Neural Networks(CNNs), CNN combined with Long Short Term Memory(LSTM) memory unit cells for data remembering, CNN combined with embedding layer of Global Vectors(GloVE) which maintains the sentiments in the text, LSTMs combined with GloVe embedding layer to find the sentiments among words and Word2Vect embedding layer combined with LSTMs which is used to maintain the semantic features in a statement. For all the deep learning baseline architectures, text in batches was given as input.

Results
As seen in the table Table 4, E-LSTM model outperformed all other models in both F1 score and average F1 score for all classes of emotion. Hence, it can be concluded that combining semantic and sentiment features of a statement can lead to better accuracy of emotion detection. It is also evident that Deep Learning models like CNNs, LSTMs, and RNNs are better than normal Machine Learning models like SVMs.

Qualitative Analysis
It is evident from Table 3 that E-LSTM model performed best as it tackled all the cases where counting based models when actual emotion is different from the words used in the conversation. Sentimental features also provided wrong results sometimes due to the predicted and true emotions being very close. The third entry in the table involves conversation which is highly contradicting from the true emotion. Thus, almost all the models failed in this type of case. But the verification of predicted emotion with the emoticons as described earlier saved the E-LSTM model from failing. Thus, the handcrafted features at the end of the model are very useful in this type of scenarios.

Conclusion
Evaluation of the given test data set shows that my model outperforms classical machine learning algorithms and also simple CNN and LSTM layers based models. Thus, it can be concluded that maintaining the semantic and syntactic relationship among words can be useful to identify emotions from texts accurately.