UIR-PKU: Twitter-OpinMiner System for Sentiment Analysis in Twitter at SemEval 2015

Microblogs are considered as We-Media information with many real-time opinions. This paper presents a Twitter-OpinMiner system for Twitter sentiment analysis evaluation at SemEval 2015. Our approach stems from two different angles: topic detection for discovering the sentiment distribution on different topics and sentiment analysis based on a variety of features. Moreover, we also implemented intra-sentence discourse relations for polarity identification. We divided the discourse relations into 4 predefined categories, including continuation, contrast, condition, and cause . These relations could facilitate us to eliminate polarity ambiguities in compound sentences where both positive and negative sentiments are appearing. Based on the SemEval 2014 and SemEval 2015 Twitter sentiment analysis task datasets, the experimental results show that the performance of Twit-ter-OpinMiner could effectively recognize opinionated messages and identify the polarities.


Introduction
This year comes the third edition of SemEval Twitter sentiment analysis task consisting of new genres, including topic-based polarity classification, trends detection towards a topic, and the sentimental strength of association of terms (Nakov et al., 2013).

 Corresponding author
We only participated in the subtask of message sentiment analysis and built up a system, named Twitter-OpinMiner for the task. Twitter-OpinMiner stems from two different angles: LDAbased topic detection for discovering the opinionated features of trending tweets' topics and sentiment analysis based on a variety of features.

 Topic detection
Recent studies show that people often search Twitter to find temporally relevant information (Teevan et al., 2011), such as emergent events, trending topics. In fact, similar opinions were likely to express on the same topic/event in Twitter. For example, there are 20 tweets expressing similar opinions on "Blood moon" in SemEval 2015 dataset. Therefore, it can facilitate us to discover the sentiment distribution on different topics.  Sentiment analysis Unlike traditional news content, tweets are specialists in short texts with long compound sentences, and a number of irregular expressions, including emoticon, hashtag, and special punctuations. In order to better support tweets analysis, we extract features from following aspects: textual content, irregular expression, discourse relations, and word embedding. Then we introduce above features into a SVM classifier for sentiment analysis.
This paper is organized as follows. Section 2 describes the framework of our system. Section 3 introduces the details of our feature extraction. We present the evaluation results in Section 4. Finally, Section 5 concludes the paper.

Architecture
The architecture of Twitter-OpinMiner is described in Figure 1. Twitter-OpinMiner system is comprised of three modules: (1) Pre-processing module: reads all data of training data and test data. It performs, POS tagging, named entity recognition, and semantic role labeling.
(2) Feature extraction module: extracts the features including formal text features, tweet-specific features, discourse features, sentiment distribution among topics, and word embedding.
(3) Sentiment analysis module: creates a SVM classifier that incorporates the above features classify the polarity of each tweet.
Finally, Twitter-OpinMiner outputs the polarity of each tweet.

Development Data and Lexicon
The development data are necessary in our system. We fully utilize the training tweets provided by SemEval 2013. The dataset consists of 9,912 annotated tweets.

Word-Level and entity-level features
The presence of sentiment word The ratio of sentiment word in a sentence The total number of positive words The total number of negative words The presence of negation words The total number of the word in all-caps Bi-gram features Named entities + opinion operators Pronouns + opinion operators Nouns or named entities + opinion words Pronouns + opinion words Opinion words (adjective) + (noun)

Feature Extraction
The objective of this task is to determine whether a given message is positive, negative, or neutral. We train sentiment classifiers with LibLinear (Fan et al., 2008) on the training set and dev set, and tune parameter −c, −wi of SVM on the test set of SemEval 2013. SVM is a popular machine learning algorithm, the effectiveness of which has been proved in sentiment analysis on formal texts in related work (Pang and Lee, 2002;Liu, 2012). Since the performance of SVM classifier will be greatly influenced by the features selection, we explore a variety of features in the evaluation.

Features of topical sentiment distribution
The advancement of Twitter is fast response to the real world, so people often search Twitter to find temporally relevant information, such as emergent events, trending topics. In fact, tweets are likely to converge on some opinions for a specific topic, which will lead to different sentiment distributions among topics.
In our system, we adopt LDA-based approach for representing the typical sentiment distribution features. We use the Mallet toolkit, set the topic number as 50, and map each tweet into 50 dimensions to extract those features.

Features of formal text
Although the task is to analyze sentiment in Twitter, much research proved the effectiveness of the classic features of formal texts on tweets. The features we adopted in this task are partly the same with (Zhou et al., 2010) and listed in Table 1, and two types of features are incorporated in the classifier.

Results
These features are also integrated into our SVM classifier for training and treated as the baseline in our experiment.

Twitter specific feature
Unlike formal texts, tweet has its own characteristics, including irregular expressions, emoticon, hashtag, ill format, and special punctuations. In our system, we combine the features proposed by Mohammad et al. (2013) with some new features as Twitter-specific features for supplementary to the forma text.
• Hashtags: the number of hashtags in one tweet; • Ill format: the presence of ill format with some characters replacing by *, for example, f**k; • Punctuation: the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks; whether the last token contains an exclamation or question mark; • Emoticons: the presence of positive and negative emoticons at any position in the tweet; whether the last token is an emoticon; • OOV: the ratio of words out of vocabulary; • Elongated words: the presence of sentiment words with one character repeated more than two times, for example, 'cooool'; • URL: whether the tweet contains a URL. • Reply or Retweet: Is the current tweet a reply/retweet tweet

Word embedding
We also utilize word embedding technique for feature extraction. We adopt sentiment-specific word embedding method (Tang et al., 2014) that could encode sentiment information in the continuous representation of words. In our approach, each term is extended into a 150 dimensional vector.

Discourse specific feature
Since tweets are usually expressed informally, there are many compound sentences in a tweet, which always contain positive sentiment and negative sentiment with ambiguity. For example,

It may not be the biggest squad in the last 10yrs, but
Ancelotti is working for quality over quantity. Everyone... http://t.co/oCdPXQWggT.  Cause because, so that, due to, in order that In this case, there are two segments in the tweet that holds a Contrast discourse relation, and the polarity is determined by "but" segment. In our system, we also take into consideration of intrasentence discourse relation features for processing compound sentences.
Mann and Thompson (1988) defined a complete discourse scheme Rhetorical Structure Theory (RST). Since not all of the discourse relations in RST would help eliminate polarity ambiguities, the discourse relations were implemented in our system was on a subset (Zhou et al., 2011).
In our system, we use cue-phrase based method for discourse relation identification. We maintain a cue phrase lexicon and the examples of the cue phrases were shown in Table 2.

Experiment
We trained a SVM classifier on 9,912 annotated tweets (8,258 in the training set and 1,654 in the development set). We used the same evaluation metrics with SemEval 2013, including the macroaveraged F-score of the positive and negative classes. The experimental results obtained by our system on the training set (ten-fold cross validation), development set, and test sets on Twitter 2013 were shown in Table 3 where the baseline was achieved by using the formal text features as well as twitterspecific features. Since the effectiveness of these two types of features were analyzed in (Mohammad et al., 2013), we mainly evaluated the effectiveness of other features. Table 3 showed that the most effective feature on Twitter 2013 dataset turned out to be the word embedding features: they provided gains of about 7%. For LDA, we set the numbers of topic from 10 to 100, and found it could achieve best performance when equaling 50. We then constructed the sentiment distribution among 50 topics for the further evaluation.
Besides, we also investigated the effectiveness of discourse features on compound sentences, and the statistics were shown in Table 6.

Approaches
Metrics pos-P pos-R pos-F neg-P neg-R neg-F ave-F Baseline (  By adopting discourse features, around 59% sentences with discourse relations were identified. Among these four types of relations, better performance were achieved on cause and condition relations. Especially for the sentences with condition relation, they were all classified correctly. It is because that more cue-phrase of cause and condition relations were used to explicitly denote the discourse relations in tweets, but more likely use context to imply contrast and continuation relations. Table 4 and Table 5 showed the evaluation results in SemEval 2015 Task 10. Compared with the best run in Table 5, our system achieved comparable results on Twitter sentiment analysis and better performance on the evaluation of sarcasm. In fact, many sarcasm are likely expressed in ironic, hence most feature types are ineffective for this case. In our system, we also used the features of topical sentiment distribution, which assumed the polarity of sarcasm tweet the same with non-sarcasm tweets.

Conclusion
We describe our Twitter-OpinMiner systems for participating in SemEval 2015 sentiment analysis in Twitter. Our approach stems the features from two different aspects: topical sentiment distribution and a variety of short text based features. In our paper, we also implemented intra-sentence discourse relations for polarity identification in compound sentences where both positive and negative sentiments are appearing. In this way, the polarity ambiguities will be eliminated. Based on SemEval 2015 and SemEval 2014 datasets for Twitter sentiment analysis task, we examined the performance of Twitter-OpinMiner, which could achieved comparable results on recognizing opinionated messages and identifying the polarities.