ARB-SEN at SemEval-2018 Task1: A New Set of Features for Enhancing the Sentiment Intensity Prediction in Arabic Tweets

This article describes our proposed Arabic Sentiment Analysis system named ARB-SEN. This system is designed for the International Workshop on Semantic Evaluation 2018 (SemEval-2018), Task1: Affect in Tweets. ARB-SEN proposes two supervised models to estimate the sentiment intensity in Arabic tweets. Both models use a set of features including sentiment lexicon, negation, word embedding and emotion symbols features. Our system combines these features to assist the sentiment analysis task. ARB-SEN system achieves a correlation score of 0.720, ranking 6th among all participants in the valence intensity regression (V-reg) for the Arabic sub-task organized within the SemEval 2018 evaluation campaign.


Introduction and Related Work
According to Mohammad (2016) the Sentiment Analysis (SA) task is used to refer to the "task of automatically determining the valence or polarity of a piece of text, whether it is positive, negative, or neutral".
Nowadays, social media platforms like Twitter, Facebook, LinkedIn, and Quora are widely used (Lenze, 2017). For instance, Ranginwala and Towbin (2017) estimate that Twitter has 320 million active monthly users. These social media platforms allow people to communicate not only the sentiment they are feeling (positive or negative) but also the intensity of this sentiment. For example, from the tweet of your friend, you can estimate that: he is very happy (most positive), slightly angry (slightly negative), absolutely sad (most negative) or neutral .
Automatically determining the sentiment intensity is an important task in several application fields, such as public health, intelligence gathering, commerce and social welfare (Mohammad and Bravo-Marquez, 2017).
In this article we present our ARB-SEN system devoted to enhancing the detection of sentiment intensity in Arabic tweets. ARB-SEN system proposes two methods to measure this valence. Our best submitted method achieves a correlation of 0.720, ranking 6th in the Arabic Detecting Sen-timent Intensity shared task (Mohammad et al., 2018a), SemEval-2018.

System Description
The sentiment intensity detection in ARB-SEN system relies on a set of features. In what follows we describe the considered features:

Sentiment Lexicon Features (SLF)
We employed the following four sentiment lexicons to extract the SLF features:

Arabic Sentiment (Valence) Lexicons
Created as part of SemEval-2016 by , this Arabic sentiment lexicon is a list of 1,168 single words and 198 simple phrases and their associations with positive and negative sentiment. The lexicon include both standard and dialectal Arabic terms.

Arabic Sentiment (Valence) Lexicons
This is a annotated Arabic sentiment lexicon that is created by . These lexicons were created by measuring the extent to which the words in a tweets corpus co-occurred with a set of seed positive and seed negative terms. This lexicon includes about 43k entries (23k positive and 20k negative).

ArabSenti sentiment lexicon
ArabSent is a manually annotated Arabic sentiment lexicon of 14k words that was created and by Abdul-Mageed et al. (2011). Each word in Arab-Senti is associated with a positive/negative sentiment label.

Dialectal sentiment lexicon
This is a freely available Arabic sentiment lexicon with more than 480 dialectal Arabic words. The lexicon is proposed by Refaee and Rieser (2014) and it is manually annotated by native Arabic speakers. Using these sentiment lexicons, we extract for each tweet four features: 1) Sum Score The sum of sentiment scores of all the words in the tweet.
2) Average Score This feature computes the average of sentiment scores of all the words in the tweet.

3,4) Min and Max
Score Represent the minimum and maximum sentiment score of words in the tweet.
For each of these features, if one word in the tweet does not exist in a sentiment lexicon, its corresponding sentiment score is not considered.

Negation Feature (NF)
Negation refers to words that reverse the sentiment of the word/phrase coming after them. For example: (I'm not happy), in this example the word (happy) has a positive sentiment, however, due to the negation word (I'm not) the sentiment of expression becomes negative. This feature is used by ENCU system (Wang et al., 2016) the best system in SemEval-2016 (Sentiment Intensity Task). Wang et al. (2016) they showed that the sentiment of the phrase can be reversed by adding a negation. Thus, for this binary feature, we have used a list of five main negation word in the Modern Standard Arabic (MAS) { , , , , } proposed by (Abdulla et al., 2013). If the tweet contains at least one negation, this feature is set to 1, else 0.

Word Embedding Feature (WEF)
One of the main advantages of word embedding model is the fact that it allows for the retrieval of a list of words that are used in the same contexts with respect to a given word (Mikolov et al., 2013). In fact , we use the Arabic CBOW model (Zahran et al., 2015) to construct a list of 5-closet words for each word in the tweet as described in (Nagoudi et al., 2017). Then, we extract for each tweet the same features described in the section 2.1, with the difference that we compute the sentiment score for each word based on their 5-closet in word embedding: 1) Sum ScoreThe sum of the average sentiment scores of all the 5-closet words in tweet.
2) Average Score This feature computes the average of sentiment scores of the 5-closet words in the tweet.

3,4) Min and Max Score
Represent the minimum and maximum average sentiment score of the 5-closet words in tweet.
To compute these features, we have used the same sentiment lexicons presented in the section 2.1.

Emoticons and Emojis Features (EEF)
The emoticons and emojis are already used in the sentiment analysis task in twitter (Read, 2005)   and (Wolny, 2016). Therefore, we have used the Emoticons and Emojis as an indicator to predict the sentiment intensity of the tweet. We have used 3 set of emoticon and emoji positive, negative and neutral. Table 2 shows a sample of the positive, negative and neutral of emoticons and emojis.

Models Construction
The previously described features are fed into two different regression classifiers : Linear Regression (LR) and Support Vector Regression (SVR). We have used the python-based machine learning scikit-learn library 1 to trained these classifiers on the training and development data set of SemEval 2018 (Mohammad et al., 2018b), along with the previously discussed features to predict the sentiment intensity score for each tweet. Figure 1 illustrates an overview of the ARB-SEN system.

Training Data
The organisers of SemEval 2018 provided a training and development data set, which contained 933 and 139 Arabic tweets respectively. Thus, the trial and development are used as training data for our supervised models.

Data Pre-processing
In order normalize tweets, many pre-processing techniques have been proposed in the literature, such as: (Agarwal et al., 2011), (Ahmed et al.,1 http://scikit-learn.org Figure 1: Architecture of the ARB-SEN system. 2013), and (Rosenthal et al., 2014). Therefore, we normalize our tweets using the following preprocessing steps: 1. Removing @user names, RTs, and URLs; 2. Removing diacritics and non-alphanumeric characters; 3. Tokenizing the #hashtags of each tweet by breaking them into words, e.g: #very nice day becomes very, nice and day; 4. Normalizing the exchangeable Arabic letters as described in (Darwish et al., 2012), e.g: normalizing to and replacing final followed by with .

Tests and Results
To evaluate the performance of our system, our two supervised models were assessed based on their accuracy on the 731 tweets in the Arabic Sentiment Intensity Evaluation Set 2 . In addition, we studied the impact of sentiment lexicon, negation, word embedding and emotion symbols features on the prediction efficiency. We calculate the Pearson correlation between our assigned Sentiment Intensity scores and the gold labels. The results are presented in Table 2.
These results demonstrate that SVR classifier with all features succeed in predict the sentiment intensity in Arabic tweets with a Pearson correlation score of 0.720. However, the LR classifier with all features achieves a correlation score of 0.617. Thus, we can easily observe that SVR classifier with all features outperforms the LR classi- fier with a gain of +11%. Regarding the impact of the extracted features, all of them improve the results of the sentiment intensity prediction. Interestingly, we notice that the word embedding and emotion symbols features play a key role in improving the performance of the prediction accuracy in both classifiers with a mean of +3.5% and +4.7% respectively.

Conclusion and Future Work
In this article, we have presented two supervised models to predicate the sentiment intensity in Arabic tweets. Both classifiers are trained along with a set of Arabic tweets characterised by a set of features including: sentiment lexicon, negation, word embedding and emotion symbols features. The performance of our proposed system was confirmed through the Pearson correlation between our assigned sentiment scores and the golden labels. As future work, we are going to extend our features by using an Arabic Combined-Sentiment Word Embedding model. We would also like to further investigate the Arabic sentiment analysis task with more recent classifiers, namely Neural Deep learning.