JU_NLP at SemEval-2016 Task 6: Detecting Stance in Tweets using Support Vector Machines

We describe the system submitted to the SemEval-2016 for detecting stance in tweets (Task 6, Subtask A). One of the main goals of stance detection is to automatically determine the stance of a tweet towards a speciﬁc target as ‘FAVOR’, ‘AGAINST’, or ‘NONE’. We developed a supervised system using Support Vector Machines to identify the stance by analyzing various lexical and semantic features. The average F1 score achieved by our system is 60.60.


Introduction
Recent research in the areas of opinion mining and/or sentiment analysis on natural language texts is gaining importance due to various academic and commercial perspectives. One of the main reasons is that a vital amount of information can be obtained from text data that are available in the internet in forms of news, reviews, blogs, chats, and tweets.
Several experiments have been attempted in the field of sentiment analysis or opinion mining on user generated content or social media data till date (Patra et al., 2015). One of the main goals in such tasks is to assign polarities (positive or negative) to a piece of text. Identification of the writer's attitude towards a specific target, known as stance detection, is a relevant and challenging topic which has unduly been averted by most researchers (Mohammad et al., 2016). Lately it has become conventional that people express stance explicitly or implicitly in various microblogging sites. Stance detection is the task of automatically determining from text whether the author is in favor of the given target, against the given target, or whether neither inference is likely. 1 The stance detection can often bring complementary information to sentiment analysis, because we often care about the author's evaluative outlook towards specific targets and propositions rather than simply considering whether the speaker is angry or happy (Mohammad et al., 2016). Stance detection even becomes more difficult when it is performed on the short texts like tweets. The latter being an important sub-field of sentiment analysis/opinion mining. Automatic stance detection can be used in several applications such as information retrieval, text summarization, and textual entailment.
More recent approaches to stance detection have been performed using linguistic rules on online debate dataset (Somasundaran and Wiebe, 2009;Hasan and Ng, 2013;Sridhar et al., 2014) and NPOV Corpus from Wikipedia (Recasens et al., 2013). To the best of our knowledge, not much computational attempts have been performed on tweets for stance detection. It is also observed that both the supervised and unsupervised machine learning algorithms have been implemented for identifying the stance and several features like n-gram, frame semantic features, and dependency were used in stance detection tasks (Somasundaran and Wiebe, 2009;Anand et al., 2011;Sridhar et al., 2014).
We participated in the SemEval 2016-Task 6: Detecting Stance in Tweets. 1 The main goal of this task is to identify stance present in a tweet towards a specific target. For example, Target = 'Hillary Clinton, Tweet = 'A Black President,Healthcare 4 all, Marriage Equality...whatś next?..a Woman President?!Damn RIGHT! #SCOTUS #MarriageEquality, Stance = 'FAVOR. In the above example, the tweet is in favor of the target "Hillary Clinton". In the test dataset, the targets of the corresponding tweets are given and we have to find out whether the stance towards the target is in 'FAVOR' or 'AGAINST' or 'NONE'.
There are two subtasks, in subtask A, we have to identify the stance towards a specific target whereas in subtask B, we have to identify the stance towards only one target "Donald Trump" using a large set of unlabeled tweets associated with the target. We have only participated in subtask A. Several lexical and semantic features are used to identify the stance towards a target. Support Vector Machines (SVM) is used for the classification purpose.

Dataset and Evaluation
The organizers provided the trial, training, and test dataset of 100, 2814, and 1249 tweets, respectively for detecting stance towards five targets namely "Atheism", "Climate Change is a Real Concern", "Feminist Movement", "Hillary Clinton", and "Legalization of Abortion". All tweets in the dataset are annotated with stance ('FAVOR' or 'AGAINST' or 'NONE') towards the above mentioned targets. The stance 'NONE' means the tweet is not in 'FAVOR' or 'AGAINST' of the target.
Systems were evaluated using the official metric calculated macro-average of two labels only, i.e. F1 score = (F-Score FAVOR + F-Score AGAINST )/2.

Features
In order to achieve good accuracy using machine learning algorithms, we need to have a good set of features. Based on the genre of tweet, we identified the following features as listed below.

Target Specific Words
Initially, an n-gram model was created to identify the target's presence in the tweet. But, we observed that the dimension as well as the sparsity of feature vectors were increased enormously. Thus, we created individual topic bags related to each of the targets. Seed words related to the targets were populated using RitaWordNet. 2 Some handcrafted rules were implemented to find names in words and hashtags. For example, "Hillary" is present in "#HillaryClinton", then we considered "#HillaryClinton" in the topic bag "Hillary". Further, we checked the topic bags manually and removed some unrelated words collected using the WordNet. The detailed word level statistics of the topic bags are given in Table 1. At first, we checked that whether a tweet contains any word from the corresponding target related topic bag or not. If no word is present, then we tagged that tweet as 'NONE' and no further processing is performed on that tweet. Further, this topic bags are used along with the dependency information.

Sentiment Words
We have used three lexicons namely SentiWord-Net (Baccianella et al., 2010), NRC Emotion Lexicon (Mohammad and Turney, 2013), and NRC Hashtag Emotion Lexicon (Mohammad, 2012) for our experiments along with manually created lexicons for each of the targets. Moreover, two bags were created for each of the targets namely 'favor' and 'against' bags. Hashtags are also included in the above lexicons. The word level statistics of the 'favor' and 'against' bags are given in Table 1. We used the frequency of the sentiment words matched with the above sentiment or emotion lexicons as features. Again, we used these lexicons in the later experiments also.

Dependency Information
It is found in the literature that the dependency relations act as useful feature in sentiment analysis (Patra et al., 2014a;Patra et al., 2014b). Thus, we used the Stanford Parser 3 to get the dependency relations. We searched for the word pairs in dependency relations that consist of two component words, one of which is present in either 'favor' or 'against' bag, whereas the other one is present in SentiWordNet.
In the above Figure 1, we found a relation "dobj(support-7, campaign-9)". The word 'campaign' is present in the 'favor' bag for the target "Hillary Clinton" and the word 'support' is present in SentiWordNet as positive. This relation is considered as favor positive type. Similarly, three other types namely favor negative, against positive, and against negative were considered. The counts of each type are used as features.
Again, we identified the simple sentences based on the symbol "(S" or "(ROOT" in Figure 1. For 3 http://stanfordnlp.github.io/CoreNLP/ each of the simple sentences, we removed the stopwords (except the negation words) and identified the sentiment words using SentiWordNet. We applied some handcrafted rules given below. 1. Not + NEG = POS, 2. Not + POS = NEG, 3. POS + POS = POS, 4. NEG + NEG = NEG. If there are multiple positive words present in a simple sentence, we tagged it as positive. If there are only two sentiment words, one with positive and other as negative, we tagged the sentence as negative. It is observed that the frequency of the negative instances is much higher than the positive instances in the training data. Finally, we counted the number of positive and negative instances in a tweet and used as features.

Classification Framework
We used LibSVM, an variant of SVMs implemented in the Weka. 4 We performed the 10-fold cross validation on the training and trial dataset for all the targets using the above features. But, the system performance was observed as poor and the system achieves the average F1 score of 0.43. Again, we performed the 10-fold cross validation on the training and trial dataset for all the targets, separately. We calculated the average F1 score for each of the five targets and these are 0.46, 0.40, 0.48, 0.52, and 0.49 for "Atheism", "Climate Change is a Real Concern", "Feminist Movement", "Hillary Clinton", and "Legalization of Abortion", respectively. This yields an average F1 score of 0.47 for all the targets, which is higher than the F1 score achieved by the previous system. This motivates us to develop our final systems separately for each of the targets.

Results
The LibSVM based system achieved the maximum F-Score of 0.4668 and 0.7452 for the 'FAVOR' and 'AGAINST' classes, respectively as shown in Table 2. The system achieved the maximum average F1 score of 0.6060. Whereas, the team MITRE has achieved the first position with the maximum average F1 score of 0.6782.
The confusion matrix for our system is shown in Table 3. From the matrix, we observed that the system is biased towards the 'AGAINST' stance (except "Climate Change is a Real Concern" target). The main reason may be the biasness in the   training dataset as the number of instances for the 'AGAINST' stance is maximum for all the targets (except "Climate Change is a Real Concern" target). Another reason may be the less number of training instances as we developed our system for each of the targets, separately.

Conclusion and Future Work
We presented a system for identifying the stance in tweets using dependency and semantic features. The maximum average F1 score of 0.6060 is achieved by the system using SVM classifier. The task stance detection on social media data is helpful for real life applications like political campaign and opinion poll.
In near future, we plan to use the TweeboParser, 5 as it works well for tweets as compared to the Stanford Parser. Another immediate goal is to increase the size of topic bags and sentiment lexicons related to each of the targets. We are also planing to use unsupervised approach and several machine learning