SeNTU: Sentiment Analysis of Tweets by Combining a Rule-based Classifier with Supervised Learning

We describe a Twitter sentiment analysis sys-tem developed by combining a rule-based classiﬁer with supervised learning. We submitted our results for the message-level sub-task in SemEval 2015 Task 10, and achieved a F 1 -score of 57.06%. The rule-based classi-ﬁer is based on rules that are dependent on the occurrences of emoticons and opinion words in tweets. Whereas, the Support Vector Machine (SVM) is trained on semantic, dependency, and sentiment lexicon based features. The tweets are classiﬁed as positive , negative or unknown by the rule-based classiﬁer, and as positive , negative or neutral by the SVM. The results we obtained show that rules can help reﬁne the SVM’s predictions.


Introduction
Our opinions and the opinions of others play a very important role in our decision-making process and even influence our behaviour. In recent times, an increasing number of people have taken to expressing their opinions on a wide variety of topics on microblogging websites such as Twitter. Being able to analyse this data and extract opinions about a number of topics, can help us make informed choices and predictions regarding those topics. Due to this, sentiment analysis of tweets is gaining importance across a number of domains such as ecommerce (Wang and Cardie, 2014), politics (Tumasjan et al., 2010;Johnson et al., 2012;Wang et 1 We average the positive and negative F-measures to get the F-score, which is the evaluation metric for this task. al., 2012), health and psychology (Cambria et al., 2010;Harman, ;Harman, ), multimodality (Poria et al., 2015), crowd validation (Cambria et al., 2010), and even intelligence and surveillance (Jansen et al., 2009).
SemEval 2015 Task 10 (Rosenthal et al., 2015) is an international shared-task competition that aims to promote research in sentiment analysis of tweets by providing annotated tweets for training, development and testing. We created a sentiment analysis system to participate in the message-level task of this competition. The objective of the system is to label the sentiment of each tweet as "positive", "negative" or "neutral".
In this paper, we describe our sentiment analysis system, which is a combined classifier created by integrating a rule-based classification layer with a support vector machine.

System Description
Our Sentiment Analysis System consists of two classifiers -(i) Rule-based and (ii) Supervised, integrated together. This section describes both these classifiers and how we combine them.
During pre-processing, all the @<username> references are changes to @USER and all the URLs are changed to http://URL.com. Then, we use the CMU Twitter Tokeniser and POS Tagger (Gimpel et al., 2011) to tokenise the tweets and give a parts-of-speech tag to each token. We use the POS tags to remove all emoticons from the preprocessed tweets. Pre-processed tweets with emoticons are given as input to the rule-based classifier, whereas the support vector machine takes pre-processed tweets without emoticons as an input.

Supervised Learning
For the supervised classifier, we cast the sentiment analysis problem as a multi-class classification problem, where each tweet has to be labeled as "positive", "negative" or "neutral". We train a Support Vector Machine (SVM) (Cortes and Vapnik, 1995) on the tweets provided for training. For all our experiments, we use a linear kernel and L1regularisation. The C parameter is chosen by crossvalidation. As mentioned above, emoticons have already been removed from tweets given as input to the SVM.
Each tweet is represented as a feature vector, containing the following features: • Word N-grams: Frequencies of contiguous sequences of 1, 2 or 3 tokens. The TF-IDF weighting scheme is applied.
• Character N-grams: Frequencies of contiguous sequences of 1, 2 or 3 characters inside each word's boundary. The TF-IDF weighting scheme is applied. • @USER: A boolean feature that is set to 1 if the tweet contains a @<username> reference.
• Hashtag: A boolean feature that is set to 1 if the tweet contains a hashtag.
• URL: A boolean feature that is set to 1 if the tweet contains a URL.
• Discourse: A boolean feature that is set to 1 if the tweet contains a "discourse marker". Examples of discourse markers would be a "RT" followed by a username to indicate that the tweet is a re-tweet, news article headline followed by "..." followed by a URL to the news article, etc. Basically, this feature indicates whether or not the tweet is a part of a discourse.
• Sentiment140 Lexicon: The Sentiment140 Lexicon  contains unigrams and bigrams along with their polarity scores in the range of −5.00 to +5.00. Considering all uni/bi-grams with polarity less than −1.0 to be negative and with polarity greater than +1.0 to be positive, we count the number of negative (negativesCount) and the number of positive (positivesCount) uni/bi-gram occurrences in every tweet. For each tweet, -the polarityMeasure is based on the pos-itivesCount and negativesCount, and calculated using Algorithm 1. -the maximum polarity value (maxPolari-tyValue) is the most positive or most negative polarity value of all polar uni/bi-gram occurrences in the tweet.
Both these features are normalised to values between −1 and +1. • Bing Liu Lexicon: The Bing Liu lexicon (Liu et al., 2005) is a list of positive and negative words. We count the number of positive (positivesCount) and negative words (neg-ativesCount) in each tweet, and calculate po-larityMeasure using Algorithm 1. The polari-tyMeasure is appended to the feature vector.
• NRC Emotion Lexicon: The NRC Emotion Lexicon (Mohammad and Turney, 2013) contains a list of positive and negative words. The polarityMeasure is calculated using the method used for the Bing Liu Lexicon.
• NRC Hashtag Lexicon: The NRC Hashtag Lexicon  contains unigrams and bigrams along with their polarity scores in the range of −5.00 to +5.00. Using the method used for the Sentiment140 Lexicon, we calculate polarityMeasure and maxPolarity-Value, and append them to the feature vector.
• SentiWordNet: SentiWordNet (Esuli and Sebastiani, 2006) assigns to each synset of Word-Net (Fellbaum, 2010) 3 scores: positivity, negativity, objectivity. A word whose positivity score is greater than negativity and objectivity is positive, while a word whose negativity score is greater than positivity and objectivity is negative. For each tweet, we calculate po-larityMeasure and maxPolarityValue using the method used for the Bing Liu Lexicon.
• SenticNet: SenticNet (Cambria et al., 2014) contains polarity scores of single and multiword phrases. We count the number of positive and negative words/phrases in each tweet, and calculate polarityMeasure using the method used for the Sentiment140 Lexicon.
• Negation: The Stanford Dependency Parser (De Marneffe et al., 2006) is used to find negation in tweets. Negation is not a feature on its own. Rather, it affects the word n-grams and the lexicons related features. The negated word is appended with a " NEG" in all n-grams, while the polarity of all negated words is inverted in the lexicon features.

Rule-based Classifier
For the rule-based classifier, we cast the problem as a multi-class classification problem, where each tweet is to be labeled as "positive", "negative" or "unknown". This is an unsupervised classifier, which applies the following rules for predictions: • Sentiment Lexicon-related Rules: The Bing Liu lexicon, the NRC Emotion lexicon, and SentiWordNet are used as resources for positive and negative opinion words. If a tweet contains more than two positive words, and no negation or negative words from either of the lexicons, it is classified as positive. If a tweet contains more than two negative words, and no negation or positive words from either of the lexicons, it is classified as negative. If none of the above rules apply, the tweet is classified as unknown.

Combining the Classifiers
After developing the rule-based classifier and training the SVM, we combine the them to refine the SVM's predictions. Since, our goal is to maximise positive and negative precision and recall, we use the rule-based classifier to correct or verify the "neutral" SVM predictions. So, for every tweet labeled as neutral by the SVM, we consider the predictions of the rule-based layer as the final labels.

Experiments and Results
We trained a Support Vector Machine (SVM) on 9418 tweets allowed to be used for training purposes. The results we submitted to SemEval 2015 were yielded by using all SVM features and emoticon-related rules. The sentiment lexiconrelated rules were implemented later, and thus could not be used for the official submission. Table 2 shows the official test results for SemEval 2015.  Table 1: Feature ablation study for the SVM classifier. Each row shows the precision, recall, and F-score for the positive, negative, and neutral classes respectively, followed by the average positive and negative F-score, which is the chosen evaluation metric. All values in the table are between 0 and 1, and are rounded off to 3 decimal places.   Table 1 reports the results of a feature ablation study carried out by testing the SVM classifier on 3204 development tweets (from SemEval 2013) not included in the training data. These are cross-validation results obtained using the hold-out method.This study helps us understand the importance of different features. From the table, we can see that the word and character n-grams features are the most useful, followed by negation and then the rest. All sentiment lexicon related features appear to have similar importance, but we get the best F-score when we append them all to the feature vector.

Features
Fpn  Table 3: Comparison between the results obtained using SVM alone, and using SVM with a rule-based layer.
Since, using all the previously described features gives the best SVM predictions, we add the rule-based classification layer to a SVM trained on all features. Table 3 compares the results obtained using the SVM alone with the results obtained using SVM along with all the rules (emoticon and lexiconbased) specified in section 2.2. We observe that the F-score further increases by around half a unit and the classification rate 2 increases by around 0.8.

Conclusion
In this paper, we described a sentiment analysis system developed by combining a SVM with a rulebased classification layer. Even though we do not get the best scores, we find that a rule-based classification layer can indeed refine the SVM's predictions. We also devise creative twitter-specific, negation and lexicon-related features for the SVM, and demonstrate how they improve the sentiment analysis system. In future, we aim to use enriched sentiment and emotion lists like the ones used by (Poria et al., 2012). We would also like to experiment with refining the SVM's predictions using more rules based on complex semantics.