Amobee at SemEval-2018 Task 1: GRU Neural Network with a CNN Attention Mechanism for Sentiment Classification

This paper describes the participation of Amobee in the shared sentiment analysis task at SemEval 2018. We participated in all the English sub-tasks and the Spanish valence tasks. Our system consists of three parts: training task-specific word embeddings, training a model consisting of gated-recurrent-units (GRU) with a convolution neural network (CNN) attention mechanism and training stacking-based ensembles for each of the sub-tasks. Our algorithm reached 3rd and 1st places in the valence ordinal classification sub-tasks in English and Spanish, respectively.


Introduction
Sentiment analysis is a collection of methods and algorithms used to infer and measure affection expressed by a writer.The main motivation is enabling computers to better understand human language, particularly sentiment carried by the speaker.Among the popular sources of textual data for NLP is Twitter, a social network service where users communicate by posting short messages, no longer than 280 characters long-called tweets.Tweets can carry sentimental information when talking about events, public figures, brands or products.Unique linguistic features, such as the use of slang, emojis, misspelling and sarcasm, make Twitter a challenging source for NLP research, attracting the interest of both academia and the industry.
Semeval is a yearly event in which international teams of researchers work on tasks in a competition format where they tackle open research questions in the field of semantic analysis.We participated in Semeval 2018 task 1, which focuses on sentiment and emotions evaluation in tweets.There were three main problems: identifying the * These authors contributed equally to this work.
presence of a given emotion in a tweet (sub-tasks EI-reg, EI-oc), identifying the general sentiment (valence) in a tweet (sub-tasks V-reg, V-oc) and identifying which emotions are expressed in a tweet (sub-task E-c).For a complete description of Semeval 2018 task 1, see the official task description (Mohammad et al., 2018).
We developed an architecture based on gatedrecurrent-units (GRU, Cho et al. (2014)).We used a bi-directional GRU layer, together with a convolutional neural network (CNN) attentionmechanism, where its input is the hidden states of the GRU layer; lastly there were two fully connected layers.We will refer to this architecture as the Amobee sentiment classifier (ASC).We used ASC to train word embeddings to incorporate sentiment information and to classify sentiment using annotated tweets.We participated in all the English sub-tasks and in the valence Spanish subtasks, achieving competitive results.
The paper is organized as follows: section 2 describes our data sources, section 3 describes the data pre-processing pipeline.A description of the main architecture is in section 4. Section 5 describes the word embeddings generation; section 6 describes the extraction of features.In section 7 we describe the performance of our models; finally, in section 8 we review and summarize the results.

Embeddings Training
Word embedding is a family of techniques in which words are encoded as real-valued vectors of lower dimensionality.These word representations have been used successfully in sentiment analysis tasks in recent years.Among the popular algorithms are Word2Vec (Mikolov et al., 2013) and FastText (Bojanowski et al., 2016).Word embeddings are useful representations of words and can uncover hidden relationships.However, one disadvantage they have is the typical lack of sentiment information.For example, the word vector "good" can be very close to the word vector "bad" in some trained, off-the-shelf word embeddings.Our goal was to train word embeddings based on Twitter data and then relearn them so they will contain emotion-specific sentiment.
We started with our 200 million tweets dataset; we cleaned them using the pre-processing pipeline (described in section 3) and then trained generic embeddings using the Gensim package ( Řehůřek and Sojka, 2010); we created four embeddings for the words and two embeddings for the POS tags: for each sentence we created a list of corresponding POS tags (there are 25 tags offered by the tagger we used); treating the tags as words, we trained d = 8 embeddings using the word2vec algorithm on the simple and complex cleaned datasets.The embeddings parameters are specified in table 2.
Following Tang et al. (2014); Cliche (2017), who explored training word embeddings for sentiment classification, we employed a similar approach.We created distant supervision datasets, first, by manually compiling 4 lists of representative words for each emotion: anger, fear, joy and sadness; then, we built two datasets for each emotion: the first containing tweets with the representative words and the second does not.Each list contained about 40 words and each dataset contained roughly 2 million tweets.We used the ASC sub-model architecture (section 4) to train as following: training for one epoch with embeddings set to be untrainable (fixed).Then train for 6 epochs where the embeddings can change.
Overall we trained 16 word embeddings-4 embedding configurations for each emotion.In addition, we decided to use the trained models' final hidden layer (d = 15) as a feature vector in the task-specific architectures; our motivation was using them as emotion and intensity classifiers via transfer learning.

Features Description
In addition to our ASC models, we extracted semantic and syntactic features, based on domain knowledge: • Number of magnifier and diminisher words, e.g."incredibly", "hardly" in each tweet.
• Logarithm of length of sentences.
• The symbols #,@ appearing in the sentence.
Additionally, we compiled a list of 338 emojis and words in 16 categories of emotion, annotated with scores from the set {0.5, 1, 1.5, 2}.For each sentence, we summed up the scores in each category, up to a maximum value of 5, generating 16 features.The categories are: anger, disappointed, fear, hopeful, joy, lonely, love, negative, neutral, positive, sadness and surprise.Finally, we used the NRC Affect Intensity lexicon (Mohammad, 2017) containing 5814 entries; each entry is a word with a score between 0 and 1 for a given emotion out of the following: anger, fear, joy and sadness.We used the lexicon to produce 4 emotion features from hashtags in the tweets; each feature contained the largest score of all the hashtags in the tweet.For a summary of all features used, see table 6 in the appendix.

Experiments
Our general workflow for the tasks is as follows: for each sub-task, we started by cleaning the datasets, obtaining two cleaned versions.We ran a pipeline that produced all the features we designed: the ASC predictions and the features described in section 6.We removed sparse features (less than 8 samples).Next, we defined a shallow neural network with a soft-voting ensemble.We chose the best features and metaparameters-such as learning rate, batch size and number of epochs-based on the dev dataset.Finally, we generated predictions for the regression tasks.For the classification tasks, we used a grid search method on the regression predictions  to optimize the loss.Most model trainings were conducted on a local machine equipped with a Nvidia GTX 1080 Ti GPU.Our official results are summarized in table 3.

Valence Prediction
In the valence sub-tasks, we identified how intense a general sentiment (valence) is; the score is either in a continuous scale between 0 and 1 or classified into 7 ordinal classes {−3, −2, −1, 0, 1, 2, 3}, and is evaluated using the Pearson correlation coefficient.
We started with the regression task and defined the following model: first, we normalized the features to have zero mean and SD = 1.Then, we inserted 300 instances of fully connected layers of size 3, with a softmax activation and no bias term.For each copy, we applied the function f (x) = (x 0 − x 2 ) /2 + 0.5 where x 0 , x 2 are the 1st and 3rd component of each hidden layer.Our aim was transforming the label predictions of the ASCs (trained on 3-label based sentiment annotation) into a regression score such that high certainty in either label (negative, neutral or positive) would produce scores close to 0, 0.5 or 1, respectively.Finally, we calculated the mean of all 300 prediction to get the final node; this is also known as a soft-voting ensemble.We used the Adam optimizer (Kingma and Ba, 2014) with default values, mean-square-error loss function, batch size of 400 and 65 epochs of training.For an illustration of the network, see figure 2. We experimented with the dev dataset, testing different subsets of the features.Finally we produced predictions for the regression sub-task V-reg.
We analyzed the relative contribution of each feature by measuring variable importance using Pratt (1987) approach.We calculated scores d i for each feature using the following formula: d i = βi ρi /R 2 where βi denotes the sample estimation           of the feature, ρi is the simple correlation between the labels and the ith feature and R 2 is the coefficient of determination (see Thomas et al. 1998).We present the relative contribution of each feature in figure 3 and the top 10 features in table 4. We can see that the ASC models, both general and emotion-specific, contributed about 72% of the total contribution made by all features, in this sub-task.
For the ordinal classification task, we used the predictions of the regression task on the sentences, which were the same in both tasks.Using a grid search method, we partitioned the regression scores into 7 categories such that the Pearson correlation coefficient was maximized.We submitted the classes predictions as sub-task V-oc.Our final scores were 0.843, 0.813 in the regression and classification sub-tasks, respectively.

Emotion Intensity
In the emotion intensity sub-tasks, we identified how intense a given emotion is in the given tweets.The four emotions were: anger, fear, joy and sadness; the score is either in a scale between 0 and 1 or classified into 4 ordinal classes {0, 1, 2, 3}.Performance was evaluated using the Pearson correlation coefficient.Our approach was similar to the valence tasks; first we generated features, then we used the same architecture as in the valence sub-tasks, depicted in figure 2. However, in these sub-tasks we used the emotionspecific embeddings for each emotion sub-task.We generated regression predictions and submitted them as the EI-reg sub-tasks; finally we carried a grid search for the best partition, maximizing the Pearson correlation and submitted the classes predictions as sub-tasks EI-oc.For a summary of the training parameters used in the regression subtasks, see table 5.
Our system performed as following: in the regression tasks, the scores were: 0.748, 0.670, 0.748, 0.721 for the anger, fear, joy and sadness, respectively, with a macro-average of 0.721.In the classification tasks, the scores were: 0.667, 0.536, 0.705, 0.673 for the anger, fear, joy and sadness, respectively, with a macro-average of 0.646.

Multi-label Classification
In the multi-label classification sub-task, we had to label tweets with respect to 11 emotions: anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise and trust.The score was evaluated using the Jaccard similarity coefficient.We started with the same cleaning and feature-generation pipelines as before, creating an input layer of size 217.We added a fully connected layer of size 100 with tanh activation.Next there were 300 instances of fully connected layers of size 11 with sigmoid activation function.We calculated the mean of all d = 11 vectors, producing the final d = 11 vector.For an illustration, see figure 4 for an illustration.We used  , where • 1 is an L 1 norm and = 10 −7 is used for numerical stability.We trained with a batch size of 10, for 40 epochs with Adam optimization with default parameters.Our final score was 0.566.

Spanish Valence Tasks
We participated in the Spanish valence tasks to examine the current state of neural machine translation (NMT) algorithms.We used the Google Cloud Translation API to translate the Spanish training, development and test datasets for the two valence tasks from Spanish to English.We then treated the tasks the same way as the English valence tasks, using the same cleaning and feature extraction pipelines and the same architecture described in section 7.1 to generate regression and classification predictions.We reached 1st and 2nd places in the classification and regression subtasks, with scores of 0.765, 0.770, respectively.

Review and Conclusions
In this paper we described the system developed to participate in the Semeval 2018 task 1 workshop.We reached 3rd place in the valence ordinal classification sub-task and 5th place in the valence regression sub-task.In the Spanish valence tasks, we reached 1st and 2nd places in the classification and regression sub-tasks, respectively.In the emotions intensity sub-tasks we reached 4th and 13th places in the classification and regression sub-tasks, respectively.
Summarizing the methods used: training of word embeddings based on a Twitter corpus (200M tweets), developing and using Amobee sentiment classifier (ASC) architecture-a bidirectional GRU layer with a CNN-based attention mechanism and an additional hidden layer-used to adjust the embeddings to include emotional context, and finally a shallow feed-forward NN with a stack-based ensemble of final hidden layers from all previous classifiers we trained.This form of transfer learning proved to be important, as the hidden layers features achieved a significant contribution to minimizing the loss.
Overall, we had better performance in the valence tasks, both in English and Spanish.We posit this is due to the fact our annotated supervised training dataset (non task-specific) was based on Semeval 2017 task 4, which focused on valence classification.In addition, the annotations in Semeval 2017 were label-based, lending themselves more easily to the ordinal classification tasks.In the Spanish tasks, we used external translation (Google API) and achieved good results without the use of Spanish-specific features.

Figure 1 :
Figure 1: Architecture of the ASC nework.Each of the four sub-models on the right has the same structure as depicted in the central region.
/ l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " w W R c Y J 4 y e w m D 9 i 1 1 L h 2 0 y C H T R 8 M = " > A A A B 5 3 i c b Z D N S s N A F IV v 6 l + t V a s u 3 Q y K 4 K o k t c V 2 V 3 D j s g V r C 2 0 o k + m k H T u Z h J m J U E K f w I 0 L F b e + h c / h z p 2 P 4 j Q p 4 t + B g Y 9 z 7 + X e O V 7 E m d K 2 / W 7 l V l b X 1 j f y m 4 W t 4 v b O b m l v / 1 q F s S S 0 Q 0 I e y p 6 H F e V M 0 I 5 m m t N e J C k O P E 6 7 3 v R i U e / e U q l Y K K 7 0 L K J u g M e C + Y x g b a y 2 P y w d 2 2 U 7 F f o L z h K O m 8 X X 9 g c A t I a l t 8 E o J H F A h S Y c K 9 V 3 7 E i 7 C Z a a E U 7 n h U G s a I T J F I 9 p 3 6 D A A V V u k h 4 6 R y f G G S E / l O Y J j V L 3 + 0 S C A 6 V m g W c 6 A 6 w n 6 n d t Y f 5 X 6 8 f a r 7 s J E 1 G s q S D Z I j / m S I d o 8 W s 0 Y p I S z W c G M J H M 3 I r I B E t M t M m m k I Z w 5 j T s W h 1 l U G 0 s o V b 9 C q F T K T f K d t u E U Y F M e T i E I z g F B 8 6 h C Z f Q g g 4 Q o H A H D / B o 3 V j 3 1 p P 1 n L X m r O X M A f y Q 9 f I J S S m P J Q = = </ l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E Q M P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / e R 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E QM P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / eR 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > f < l a t e x i t s h a 1 _ b a s e 6 4 = " w W R c Y J 4 y e w m D 9 i 1 1 L h 2 0 y C H T R 8 M = " > A A A B 5 3 i c b Z D N S s N A F IV v 6 l + t V a s u 3 Q y K 4 K o k t c V 2 V 3 D j s g V r C 2 0 o k + m k H T u Z h J m J U E K f w I 0 L F b e + h c / h z p 2 P 4 j Q p 4 t + B g Y 9 z 7 + X e O V 7 E m d K 2 / W 7 l V l b X 1 j f y m 4 W t 4 v b O b m l v / 1 q F s S S 0 Q 0 I e y p 6 H F e V M 0 I 5 m m t N e J C k O P E 6 7 3 v R i U e / e U q l Y K K 7 0 L K J u g M e C + Y x g b a y 2 P y w d 2 2 U 7 F f o L z h K O m 8 X X 9 g c A t I a l t 8 E o J H F A h S Y c K 9 V 3 7 E i 7 C Z a a E U 7 n h U G s a I T J F I 9 p 3 6 D A A V V u k h 4 6 R y f G G S E / l O Y J j V L 3 + 0 S C A 6 V m g W c 6 A 6 w n 6 n d t Y f 5 X 6 8 f a r 7 s J E 1 G s q S D Z I j / m S I d o 8 W s 0 Y p I S z W c G M J H M 3 I r I B E t M t M m m k I Z w 5 j T s W h 1 l U G 0 s o V b 9 C q F T K T f K d t u E U Y F M e T i E I z g F B 8 6 h C Z f Q g g 4 Q o H A H D / B o 3 V j 3 1 p P 1 n L X mr O X M A f y Q 9 f I J S S m P J Q = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E Q M P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / e R 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " t b z 6 j 6 j R E Q M P D d o b u c s X K s t e 2 7 U = " > A A A B 5 3 i c b Z B N S w J B G M e f t T c z K 6 t j E E M S d J L V l P Q m d O m o 0 K a g i 8 y O s z o 5 + 8 L M b C C L x 0 5 d O l R 0 7 V v 4 O b r 1 G f o S j b s S v f 1 h 4 M f / e R 6 e Z / 5 O y J l U p v l u Z F Z W 1 9 Y 3 s p u 5 r f z 2 z m 5 h b / 9 a B p E g 1 C I B D 0 T X w Z J y 5 l N L M c V p N x Q U e w 6 n H W d y s a h 3 b q m Q L P C v 1 D S k t o d H P n M Z w U p b b X d Q K J o l M x H 6 C + U l F J v 5 e f v j 7 m j e G h T e + s O A R B 7 1 F e F Y y l 7 Z D J U d Y 6 E Y 4 X S W 6 0 e S h p h M 8 I j 2 N P r Y o 9 K O k 0 N n 6 E Q 7 Q + Q G Q j 9 f o c T 9 P h F j T 8 q p 5 + h O D 6 u x / F 1 b m P / V e p F y 6 3 b M / D B S 1 C f p I j f i S A V o 8 W s 0 Z I I S x a c a M B F M 3 4 r I

Figure 2 :
Figure 2: Architecture of the final classifier in the valence sub-tasks, where f = (x0 − x2) /2 + 0.5 and the input dimension is 212 for the V-reg sub-task.

Figure 3 :
Figure 3: Relative contribution of features in the valence regression sub-task.

Figure 4 :
Figure 4: Architecture of the multi-label sub-task E-c.

Table 1 :
An example of a tweet processing, producing two cleaned versions.

Table 2 :
Parameters for the word and POS tag embeddings.

Table 3 :
Summary of results.

Table 4 :
Relative contribution of features in the valence regression sub-task.

Table 5 :
Summary of training parameters for the emotion intensity regression tasks.