SentiSys at SemEval-2016 Task 4: Feature-Based System for Sentiment Analysis in Twitter

This paper describes our sentiment analysis system which has been built for Sentiment Analysis in Twitter Task of SemEval-2016. We have used a Logistic Regression classiﬁer with different groups of features. This sys-tem is an improvement to our previous sys-tem Lsislif in Semeval-2015 after removing some features and adding new features extracted from a new automatic constructed sentiment lexicon.


Introduction
Sentiment analysis in Twitter is different from document level sentiment analysis. Normally, in document level, each document is classified as positive or negative, the document is long enough to obtain a good representation using only the existing words (bag-of-words). For example, in movie reviews we can get f-score of 85% using bag-of-words representation with SVM classifier while in Twitter it is about 60% according to our experiments in previous SemEval workshops. This lower performance in Twitter domain is not surprising if we know the limitations of such task when applied to Twitter: • The size of a tweet is limited to 140 characters which leads to sparseness where the tweets do not provide enough word co-occurrence.
• The informal language and non-standard expressions.
• The numerous spelling errors.
For dealing with the previous limitations, we have decided to extend the bag-of-words representation. Therefore, many group of features have been extracted. Uni-gram, bi-gram and 3-grams of words features to capture the text of tweet and the context. Negation features to handle the negated context. Sentiment lexicons features can help the classification because it contains positive and negative words which can add a useful information about the polarity of a tweet, they also contain a lot of terms which may not appear in the training data which can be very useful. Semantic features as Brown clusters can also give a rich representation which can be useful for reducing the sparsity. For evaluating our system, we have participated in SemEval-2016 competition for sentiment analysis in Twitter (message polarity subtask A) 1 (Nakov et al., 2016). Our system has been ranked six over 34, this system is derived from our previous system LsisLif which has been ranked third in SemEval-2015. The rest of this chapter is organized as follows. Section 2 presents the problem formulation. Section 3 gives an overview of our proposed approach. The features we extracted for training the classifier are presented in Section 4. Our experiments are described in Section 5. The related work is presented in Section 6. The conclusion and future work are presented in Section 7.
possible features F = f 1 , f 2 , .., f m that can appear in t i . The features can be single words, bigrams, ngrams, stemmed words or other syntactic or semantic features. If a feature f i exists in a tweet t j , the tweet can be represented as a vector of weighted features t j = (w 1 , w 2 , .., w m ) where w i is the weight of the feature f i in the tweet t j . w i can represent the presence or absence of the feature or the frequency or any other function of the feature frequency in the tweet.
Let us have three classes C = c 1 , c 2 , c 3 where c 1 represents the negative class, c 2 the neutral class and c 3 the positive class. Our task is to assign each tweet t j to a class c i .

Overview of the Proposed Approach
Our proposed approach for sentiment polarity classification consists of three steps: 1. We tokenize each tweet to get the feature space which contains the words, punctuations and emoticons that appear in the tweets.
2. We extend the feature space by extracting some features using different resources (Sentiment lexicons, Twitter dictionary) and some semantic features.
3. We train a supervised classifier to get a trained model in order to predict the sentiment of the new tweets.
The next section describes the features we have extracted.

Feature Extraction
Before extracting the features, we should tokenize the tweet. Tokenization is a challenging problem for Twitter text. Happytokenizer 2 is the tokenizer which we used. It can capture the words, emoticons and punctuations. For example, for this tweet: "RT @ #happyfuncoding: this is a typical Twitter tweet :-)" It returns the following terms: {rt, @, #happyfuncoding, :, this, is, a, typical, twitter, tweet, :-)} 2 http://sentiment.christopherpotts.net/tokenizing.html We also replaced each web link by the word url and each user name by uuser. Then, several groups of features have been extracted to improve the bag-ofwords representation.

Negation Features
The rule-based algorithm presented in Christopher Potts' Sentiment Symposium Tutorial 3 is implemented. This algorithm appends a negation suffix to all words that appear within a negation scope which is determined by a negation key and a punctuation or a connector belonging to [",", ";", ".", "!", "?", "but", "-", "so"]. All the negated words are added to the feature space. For example, for this tweet: "I'am not happy" The feature vector generated by the words n-gram features with negation features is: {"i'am", 'not', 'happy Neg', 'happy', "i'am not", 'not happy', "i'am not happy"} happy NEG is added by the negation features while the others are the ngrams features. Obviously, we have chosen to add the negated feature to the vector without removing the original feature happy.

Twitter Dictionary
We constructed a dictionary for the abbreviations and the slang words used in Twitter in order to overcome the ambiguity of these terms which may increase the similarity between two similar tweets written in two different ways. This dictionary maps certain Twitter expressions and emotion icons to their meaning or their corresponding sentiment. It contains about 125 terms collected from different pages on the Web.  All terms presented in a tweet and in the Twitter dictionary are mapped to their corresponding terms in the dictionary and added to the feature space. For this tweet: "i'am going to chapel hill on sat. :)", the term veryhappy will be added to the tweet vector because the emoticon :) will be replaced by veryhappy as indicated in the dictionary.

Semantic Features
The semantic representation of a text may bring some important hidden information, which may result in a better document representation and a better classification system. Usually, the semantic features can help to overcome the problem of spareness in short text. Externally resources may be important to get such representation.

Brown Dictionary Features
From over 56 million English tweets (837 million tokens), 1000 hierarchical clusters have been constructed over 217 thousand words (Owoputi et al., 2013). Table 2 shows an example of five clusters.  Note that in cluster A1, the term lololol (an extension of lol for "laughing out loud") is grouped with a large number of laughter acronyms.
Each word in the text is mapped to its cluster in Brown dictionary, 1000 features are added to feature space where each feature represents the number of words in the text belonging to each cluster.

Sentiment Lexicons
The system extracts four features from the manual constructed lexicons and six features from the automatic ones. For each sentence the number of positive words, the number of negative ones, the number of positive words divided by the number of negative ones and the polarity of the last word are extracted from manual constructed lexicons. In addition to the sum of the positive scores and the sum of the negative scores from the automatic constructed lexicons.
The manual lexicons are: MPQA Subjectivity Lexicon 4 and Bing Liu Lexicon 5 . The automatic ones are: NRC Hashtag Sentiment Lexicon and our lexicon based on natural entropy measure (Hamdan et al., 2015c).
Thus, this feature group adds 20 features to the tweet vector, some of this features are integer numbers others are floats. The lexicons which we used are the following:

Manually Constructed Sentiment Lexicons
Two manual constructed lexicons have been exploited:

MPQA Subjectivity Lexicon
Multi-Perspective Question Answering Subjectivity Lexicon is maintained by (Wilson et al., 2005), a lexicon of over 8,000 subjectivity single-word clues, each clue is classified as positive or negative. This is a fragment illustrating this lexicon structure:  . This list was compiled over many years starting from this paper (Hu and Liu, 2004a Score is a real number indicates the sentiment score. #positive is the number of times the term co-occurred with a positive marker such as a positive emoticon or a positive hashtag. #negative is the number of times the term cooccurred with a negative marker such as a negative emoticon or a negative hashtag.

Our Sentiment Lexicon
PMI metric has been widely used to compute the semantic orientation of words in order to construct the automatic lexicons. Sentiment140 lexicon is constructed using semantic orientation on Sentiment140 corpus (Go et al., 2009), a collection of 1.6 million tweets that contain positive and negative emoticons 6 . But this corpus is a balanced corpus, it contains the same number of positive and negative tweets. Therefore, semantic orientation can be rewritten as following: SO(w) = P M I(w, +) − P M I(w, −) = log( p(w,+) p(w).p(+) ) − log( p(w,−) p(w).p(−) ) (1) As p(+) = p(−) = 0.5 in the balanced corpus: So(w) = 1 + log(p(+|w)) − 1 − log(p(−|w)) = log(a/c) (2) where + stands for the positive class, -stands for negative class, a is the number of documents containing the word w in the positive class, c is the number of documents containing the word w in the negative class. Thus, the semantic orientation is positive if a>c else it is negative. We should note that the probability of the classes does not affect the final semantic orientation score, therefore we propose another metric which depends on the distribution of the word over the classes which seems more relevant in the balanced corpus.
p(−|w): The probability of the negative class given the word w.
The more uneven the distribution of documents where a term occurs, the larger the Natural Entropy of this term is. Thus, the entropy of the term can express the uncertainty of the classes given the term. One minus this degree of uncertainty boosts the terms that unevenly distributed between the two classes (Wu and Gu, 2014). ne score is always between 0 and 1, and it assigns a high score for the words unevenly distributed over the classes, but it cannot discriminate the positive words from the negative ones. Therefore, we have used the a and c for discriminating the positive words from the negative ones; if a>c then the word is considered positive else it is considered negative.
Using this lexicon instead of sentiment140 can improve the performance of a state-of-the-art sentiment classifier as shown in (Hamdan et al., 2015c).

Twitter Dataset
Twitter datasets have been provided by SemEval organizers since 2013 for message polarity classification subtask of sentiment analysis in Twitter (Nakov et al., 2013). The participants have been provided with training tweets annotated positive, negative or neutral. In addition to a script for downloading the tweets. After executing the given script, we got the whole training dataset which consists of 9684 tweets. The organizers have also provided a development set containing 1654 tweets for tuning a machine learner.

Experiment Setup
We trained the L1-regularized logistic regression classifier implemented in LIBLINEAR (Fan et al., 2008), we had also tested L2 regularization technique but it gives less performance than L1. The classifier is trained on the training dataset using the features in the previous section with the three polarities (positive, negative, and neutral) as labels. A weighting schema is adapted for each class, we use the weighting option −w i which enables a use of different cost parameter C for different classes.
Since the training data is unbalanced, this weighting schema adjusts the probability of each label. Thus, we tuned the classifier in adjusting the cost parameter C of logistic regression, weight w pos of positive class and weight w neg of negative class. We used the development set for tuning the three parameters, all combinations of C in range [0.1 .. 4] by step of 0.1, w pos in range [1 .. 8] by step of 0.1, w neg in range [1 .. 8] by step of 0.1 are tested. The combination C=0.3, w pos =7.6, w neg =5.2 have given the best F1score for the development set and therefore it was selected for our experiments on test set 2016.

Results
The evaluation score used by the task organizers was the averaged F1-score of the positive and negative classes. In the SemEval-2016 competition, our submission is ranked six (59.8%) over 34 submissions while it was ranked third in SemEval-2015. Table 4 shows the results of our experiments after removing a feature group at each run for the four test set 2016.

Run
Test-2016 All features 59.8 all-lexicons 56.9 all-ngram 58.1 all-brown 58.4 The results show that the sentiment lexicons features are the most important ones which conforms with the conclusion in different studies (Hamdan et al., 2015c;Mohammad et al., 2013) .

Related Work
There are two principally different approaches to opinion mining: lexicon-based and supervised. The lexicon-based approach goes from the word level in order to constitute the polarity of the text. This approach depends on a sentiment lexicon to get the word polarity score. While the supervised approach goes from the text level and learn a model which assigns a polarity score to the whole text, this approach needs a labeled corpus to learn the model.

Lexicon-Based Approach
Lexicon-based approaches decide the polarity of a document based on sentiment lexicons. The sentiment of a text is a function of the common words between the text and the sentiment lexicons.
Much of the first lexicon-based research has focused on using adjectives as indicators of the seman-tic orientation of text (Hatzivassiloglou and McKeown, 1997;Hu and Liu, 2004b). (Taboada et al., 2011) proposed another function called SO-CAL (Semantic Orientation CALculator) which uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation.
Thus, the sentiment lexicon is the most important part of this approach. Three different ways can be used to construct such lexicons: Manual Approach, Dictionary-Based Approach and Corpus-Based Approach.

Supervised Approach
The supervised approach is a machine learning approach. Sentiment classification can be seen as a text classification problem (Pang et al., 2002;Liu, 2012).
The research papers in sentiment classification have mainly focused on the two steps: document representation and classification methods.
While some papers have extended the bag-ofword representation by adding different types of features (Pang et al., 2002;Mohammad et al., 2013;Hamdan et al., 2013;Hamdan et al., 2015c), others have proposed different weighting schemas to weight the features such as PMI, Information Gain and chi-square χ 2 (Martineau and Finin, 2009;Paltoglou and Thelwall, 2010;Deng et al., 2014). Recently, after the success of deep learning techniques in many classification systems, several studies have learned the features instead of extracting them (Socher et al., 2013;Severyn and Moschitti, 2015).
The work of (Pang et al., 2002) was the first to apply this approach to classify the movie reviews into two classes positive or negative. They tested several classifiers (Naive Bayes, SVM, Maximum entropy) with several features.
Later on, many studies have proposed different features and some feature selection methods to choose the best feature set. Many features have been exploited : • Terms and their weights: The features are the unigrams or n-grams with the associated frequency or weight given by a weighting schema like TF-IDF or PMI.
• Part of Speech (POS): The words can indicate different sentiment according to their parts of speech (POS). Some papers treated the adjectives as special features.
• Sentiment Lexicons: The words and expressions which express an opinion have been used to add additional features as the number of positive and negative terms.
• Sentiment Shifters: The terms that are used to change the sentiment orientation, from positive to negative or vice versa such as not and never.
Taking into account these features can improve the sentiment classification.
• Semantic Features: The named entities, concepts and topics have been extracted to get the semantic of the text.
Many systems which have worked on feature extraction have achieved a state-of-the-art performance in many competitions like SemEval 7 . For example, (Mohammad et al., 2013) used SVM model with several types of features including terms, POS and sentiment lexicons in Twitter data set. (Hamdan et al., 2015a;Hamdan et al., 2015c;Hamdan et al., 2015b) have also proved the importance of feature extraction with logistic regression classifier in Twitter and reviews of restaurants and laptops. They extracted terms, sentiment lexicon and some semantic features like topics. And (Hamdan et al., 2013) has proposed to extract the concepts from DBPedia. Recently, some research papers have applied deep learning techniques to sentiment classification. (Socher et al., 2013) proposed to use recursive neural network to capture the compositionality in the phrases, (Tang et al., 2014) combined the handcrafted features with learned features. They used neural network for learning sentiment-specific word embedding, then they combined hand-crafted features with these word embedding to produce a stateof-the-art system in sentiment analysis in Twitter. (Kim, 2014) proposed a simple convolutional neural network with one layer of convolution which performs remarkably well. Their results add to the wellestablished evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for Natural language processing.

Conclusion and Future Work
In this paper, we tested the impact of combining several groups of features on the sentiment classification of tweets. A logistic regression classifier with weighting schema was used, the sentiment lexiconbased features seem to get the most influential effect with the combination. As the sentiment lexicons features seem to be so important in sentiment classification, we think that it is important to orient our future work on this direction. Improving the automatic construction of sentiment lexicons may lead to an important improvement on sentiment classification. For example, taking the context in the consideration may help such process. Another important direction is using deep learning techniques which have recently proved their performance in several studies. Thus, we can learn the features instead of extracting them.