ConSSED at SemEval-2019 Task 3: Configurable Semantic and Sentiment Emotion Detector

This paper describes our system participating in the SemEval-2019 Task 3: EmoContext: Contextual Emotion Detection in Text. The goal was to for a given textual dialogue, i.e. a user utterance along with two turns of context, identify the emotion of user utterance as one of the emotion classes: Happy, Sad, Angry or Others. Our system: ConSSED is a configurable combination of semantic and sentiment neural models. The official task submission achieved a micro-average F1 score of 75.31 which placed us 16th out of 165 participating systems.


Introduction
Emotion detection is crucial in developing a "smart" social (chit-chat) dialogue system (Chen et al., 2018). Like many sentence classification tasks, classifying emotions requires not only understanding of single sentence, but also capturing contextual information from entire conversations. For the competition we were invited to create a system for emotion detection of user utterance from short textual dialogue i.e. a user utterance along with two turns of context (Chatterjee et al., 2019b). The number of emotion classes has been limited to four (Happy, Sad, Angry and Others).
The rest of the paper is organized as follows. Section 2 briefly shows the related work. Section 3 elaborates on our approach. It shows preprocessing step and architecture of our system. Section 4 describes the data set, used word embeddings and hyper-parameters, adopted research methodology and experiments with results. Finally, Section 5 concludes our work.

Related Work
Detection of emotions in dialogues can be divided into two types: based only on the text of the dialo-gue (Chen et al., 2018) and based on many channels (video, speech, motion capture of a face, text transcriptions) (Busso et al., 2008). Regardless of the type, the most common solution is the use of neural networks, in particular variations of Recurrent Neural Networks, such as LSTMs (Hochreiter and Schmidhuber, 1997), BiLSTMs (Schuster and Paliwal, 1997) and GRUs (Cho et al., 2014) or Convolutional Neural Networks (Krizhevsky et al., 2012). Our solution uses LSTMs and BiLSTMs and is based on the ideas from SS-BED system (Chatterjee et al., 2019a).

Our Approach
Figure 1 provides an overview of our approach. We wanted to create a system that would benefit from the advantages of semantic and sentiment embeddings (like SS-BED). At the same time, it would be easily configurable both in terms of the selection of parameters/network architecture as well as the change of applied embeddings, both static and dynamic. In the next subsections, we describe in details our approach.

Preprocessing
For the preprocessing, we adjusted the ekphrasis tool (Baziotis et al., 2017). We use this tool for tokenization and to do the following: • Normalize URLs, emails, percent/money/time/date expressions and phone numbers.
• Annotate emphasis and censored words and phrases with all capitalized letters.
We also prepare and apply dictionaries with common abbreviations and mistakes to reduce vocabulary size and deal with Out of Vocabulary (OOV) issue.

Model
Our model contains four parts: Semantic Recurrent Network (SemRN), Sentiment Recurrent Network (SenRN), Fully Connected Network and Others Class Regularizer. SemRN and SenRN are independent of each other and have similar architecture: Text Preprocessing, suitable Word Embedding and 2-layer LSTM or bidirectional LSTM (BiLSTM) -which is configurable. Outputs of those two modules are concatenated and become input for Fully Connected Network. This network has one hidden layer and Softmax layer which represents probabilities of classes. The last element of our model is Others Class Regularizer (used only during the prediction on validation/test set).

Others Class Regularizer
This component was created due to the fact that a real-life distribution is about 4% for each of Happy, Sad and Angry class and the rest is Others class. This component works by grouping records into three sets, depending on whether they are predicted as Happy, Sad or Angry. Next, for all of these sets, it checks if there are more representatives than the assumed percentage of all records. If yes, it increases the probability for Others class by 0.01 (independently in each set) until it reaches the number of representatives lower than the assumed percentage. The assumed percentage value was defined as 5.5% taking into account the validation set.

Data
In our work on the system, we used only official data sets made available by the organizers. However, we noticed that there are cases when conversations occur twice, but with different labels. We have removed these records and received sets which are shown in Table 1.

Word Embeddings
For our experiments, we chose five word embeddings: three semantic and two sentiment. Semantic embeddings are GloVe (Pennington et al., 2014) trained on Twitter data 1 , Word2Vec (Mi-

Hyper-parameters Search
In order to tune the hyper-parameters of our model, we adopt a Bayesian optimization by using Hyperopt library 6 . The names of hyperparameters with possible values (list or range) are shown in Table 2. Parameters with SEM prefix apply to the Semantic Recurrent Network, and with SEN prefix to the Sentiment Recurrent Network. LSTM DIM parameter is for BiL-STM baseline systems. In order to cope with the differences in the distribution of classes in the training set and the validation and test sets, as well as the previously mentioned actual distribution of emotion classes in relation to the Others class, apart from the use of Others Class Regularizer we also used class weight for Others class (OTHERS CLASS WEIGHT parameter).

Methodology
We train all models using the training set and tune the hyper-parameters using the validation set. Due to the time frame of the competition, we limited the search of hyper-parameters to 10 iterations for

Experiments
The results of our experiments are shown in Table  3. We have divided them into two stages: validation of the baseline systems and our solution.
For the first stage, we used the 2-layer bidirectional LSTM model (BiLSTM) with all the word embedding presented in section 4.2 and compared this approach to the baseline model prepared by the organizers (Baseline). The model using NTUA 310 embedding (73.34) performed best, compared to the Baseline, we have an improvement of about fifteen percent. The second best model was a solution using ELMo embedding (72.42). From sentiment embeddings the best was Emo2Vec (71.18).
The second stage was focused on the validation of the ConSSED model. In this experiment, we trained six models to verify all possible pairs of semantic embedding-sentiment embedding. The results show that the use of the ConSSED model allows better results than corresponding baseline systems. As we could have guessed from the first stage, the best was a combination of NTUA 310 and Emo2Vec (75.31), which was our official solution during the competition. In parentheses, we presented the results without the use of Others Class Regularizer. As we can see, the use of this component improves the results but only slightly. In addition, after the competition, we have rerun the search for hyper-parameters (this time increasing the number of iterations) for the ConSSED-   NTUA 310-Emo2Vec model, which give us a better result than our official competition result (76.64). Hyper-parameters found for ConSSED-NTUA 310-Emo2Vec models and differences between them are shown in Table 4.

Competition Results
The best result we have obtained on official leaderboard is equal to 75.31 according to microaveraged F1 score. Our solution is ranked 16th out of 165 participating systems.

Conclusion
In this paper, we present Configurable Semantic and Sentiment Emotion Detector (ConSSED) -our system participating in the SemEval-2019 Task 3. ConSSED has achieved good results, and subsequent studies show that it can achieve even better which results from a further search for hyperparameters. We think that the use of fine-tuned ELMo model (e.g. by Twitter data) would improve the result even more. In addition, we would like to integrate our system with the BERT embedding (Devlin et al., 2018). For developing our system we used Keras 7 with TensorFlow 8 as backend. We make our source code available at https://github. com/rafalposwiata/conssed.