Rule-Based Weibo Messages Sentiment Polarity Classification towards Given Topics

Weibo messages sentiment polarity classification towards given topics refers to that the machine automatically classifies whether the weibo message is of positive, negative, or neutral sentiment towards the given topic. The algorithm the sentiment analysis system CUCsas adopts to perform this task includes three steps: (1) whether there is an “exp” (short for “expression having evaluation meaning”) in the weibo message; (2) whether there is a semantic orientation relationship between the exp and topic; (3) the sentiment polarity classification of the exp. CUCsas completes step (1) based on the sentiment lexicon and sentiment value assignment rules, completes step (2) based on the topic extraction and sentiment polarity classification rule base, and completes step (3) based on the sentiment computing rules. Taking 20 given topics and a total of 19,469 weibo messages released by SIGHAN-2015 Bake-off as the test data, the overall F value of the rule-based system CUCsas is 0.69 in the unrestricted test.


Algorithm Description
The locutionary subjectivity denotes the locutionary agent's self-expression of cognition, feeling or perception in the use of language (John Lyons, 1995). And the evaluation is one type of locutionary subjectivity. An evaluation discourse con-sists of four basic elements: E(s) = {sub, obj, exp, com}. Herein, "E(s)" represents an evaluation discourse, and "sub", "obj", "exp" and "com" refers to the subject of evaluation, the object of evaluation, an expression having evaluation meaning, and other discourse components respectively . The study of this paper is under the condition of knowing obj (= the given topic) in the weibo message, enabling the system automatically recognize whether there is an exp in the same weibo message. If there is not, the system will output result [topic: 0]; if there is, the system will make a further identification that whether there is a semantic orientation relationship between the exp and the given topic. If there is not, the system will outputs result [topic 0]; if there is, the system will further classify the sentiment polarity of the exp. If it is positive, the system will output result [topic 1]; if it is negative, the system will output result [topic -1]; if it is neutral, the system will output result [topic 0]. Apparently, the algorithm is different from some widely used machine learning sentiment polarity classification algorithms, such as Naï ve Bayes, Max Entropy, Boosted Trees and Random Forest (Amit Gupte et al., 2014). Figure 1 shows the algorithm of the the system of rule-based weibo messages sentiment polarity classification towards given topics.
Based on the sentiment lexicon SentiDic and sentiment value assignment rules in PhraseRule, the system CUCsas realizes the automatic recognition of whether there is an exp in the weibo discourse. Figure 2 shows the recognition procedure:

The Identification of Whether There Is a Semantic Orientation Relationship between Exp-Topic
The existence of an exp in the weibo message does not imply a semantic orientation relationship between the exp and the topic. Because the evaluation object of the exp has two potential choices: topic or non-topic. The system CUCsas uses the method of combining syntactic structure and semantic features to build a topic extraction and polarity classification rule base. The essence of the rule base is using formal languages to describe the definite semantic direction relationships between exp-topic, which are induced by analyzing the training corpus by us. The topic extraction and polarity classification rule base consists of 10 rule modules with a total of 36 rules (see Table 1).

Explanation
When the evaluation object of the exp is non-topic, the system will assign a 0 sentiment value to the topic, so as to avoid the weibo message continuing to match the latter rule modules and cause errors. (1) QSB: It is a macro definition symbol (including the punctuation, conjunction, evaluation-triggering word, time word or discourse maker) used as the initial item in this rule; (2) NP: It is a macro definition symbol (including the common noun or proper noun such as the name of a person, organization or product) representing a nominal element; (3) */topic: the given weibo topic; (4)  topic-exp co-occurrence in the same clause

Explanation
When the topic and the exp appear in the same clause, the rule will select the exp nearest to the topic as the one semantically oriented it.(The exception is that the topic is the subject of a sentence expressing a causing or obtaining meaning or with a "preposition + object" adverbial.) In addition, according to the Chinese pragmatic habit that the semantic focus is usually located at the end of the discourse, when exps appear both before and after the topic, i.e. exp1-topic-exp2, the rule will select exp2 only as the output result.

Explanation
When the topic is the subject of a sentence with a "preposition + object" adverbial, the rule will select the exp in the central components modified by the adverbial as the output result.

Module 8
The topic and the exp are distributed in different clauses or sentences. Type one: topic + exp

Explanation
The topic appears first, and then the exp appears in the clause or sentence adjacent or nonadjacent to the clause or sentence the topic in. In this case, only the weibo message satisfies certain syntactic and semantic constraints, will the rule judge that the evaluation object of the exp is the topic.

Rule sample explanation
Constraints of the rule sample: (1) There is no exp appearing together with the topic in the clause; (2) There is no NP appearing before the exp in the clause; (3) The word class after the exp is only auxiliary, modal or interjection, and three interrogative words 吗、呢 and 么 are forbidden. Matching example Topic:油价 <weibo>:涨 油价的 时候 也不 提消 费税 了 ，流氓啊 </weibo> [output：油价 -1] Rule number 26-32

Module 9
The topic and the exp are distributed in different clauses or sentences. Type two: exp + topic

Explanation
The exp appears first, and then the topic appears in the clause or sentence adjacent or nonadjacent to the clause or sentence the exp in. In this situation, only the weibo message satisfies certain syntactic and semantic constraints, will the rule judge that the evaluation object of the exp is the topic. Rule sample */^ + #[*/!nq] + */na + #[*/!w] + */vl + #[*/!nq] + */topic&nq = N7:N5

Rule sample explanation
Constraints of the rule sample: (1) */^: The initial item of the rule is the weibo start marker; (2) #[*/!nq]: The word with a semantic marker of product name is forbidden; (3) */na: A word with the semantic marker of product attribute must appear; (4) */topic&nq: The topic word must is also a product name.

Explanation
When the referent of a pronoun is the topic, the rule will assign the sentiment value of the exp semantically orientated to the pronoun to the topic. (1) The 36 rules of the 10 rule modules are sequentially arranged, forming the topic extraction and sentiment polarity classification rule base.
(2) Matching procedure: The weibo message matches the rule base starting from the first rule. If the matching succeeds, the system will output a corresponding matching result; if fails, the weibo message will skip to the second rule to continue matching. If this matching succeeds, the system will output a corresponding matching result; or else the weibo message will skip to the next rule to continue matching……If the matching still fails at the end of the rule base (i.e. rule 36), then the system will make a judgment that there is no semantic orientation relationship between the exp and the topic in this weibo message and output a corresponding result: topic 0. The next weibo message matches the rule base in the same way……until the last weibo message in the experimental data. Table 1. Topic Extraction and Sentiment Polarity Classification Rule Base Based on the topic extraction and polarity classification rule base, the system CUCsas realizes the automatic identification of whether there is a semantic orientation relationship between the exp and the topic in the weibo message. If the weibo message matches the rule base unsuccessfully, the system will output topic 0; if successfully, the system will assign the value of the corresponding exp to the topic. If the value > 0, the system will output: topic 1; if the value < 0, the system will output: topic -1; if the value = 0, the system will output: topic 0. Figure 3 shows the general procedure:

The Sentiment Polarity Classification of the Exp
The term "corresponding result" in Figure 3 contains double meanings: ⅰ The "corresponding" means that there is a semantic orientation relationship between the exp and the topic. ⅱ The "result" refers to the sentiment value and polarity of the exp in the weibo message context, not necessarily equals the value and polarity in the sentiment lexicon. ⅰ is guaranteed by 36 rules of 10 modules. ⅱ is obtained by sentiment computing rules (see Table 2) in the PhraseRule.txt.

Description
The sentiment polarity of the exp in the weibo message context is contrary to its sentiment polarity in the sentiment lexicon.

Description
The evaluation meaning of the exp is dissolved in the weibo message context.

Experimental Results and Analysis
Taking 20 given topics and a total of 19,469 weibo messages released by SIGHAN-2015 Bake-off as the test data, the experimental results of the sentiment analysis system CUCsas are as follows:  Table 4 with Table 5, we can see the introduction of the phrase rule base improved the system overall performance, but only to a small extent. Comparing Table 5 with Table 3, we can see the introduction of the topic extraction and polarity classification rule base further improved the system overall performance to a large extent.
At present, the overall F value of the system is about 0.69. Evaluation results in Table 3 suggest that the performance of the system is good in dealing with neutral sentiment weibo messages, but poor in dealing with positive sentiment weibo messages (F+≈0.24) and negative sentiment weibo messages (F-≈0.44).
Reasons and solving methods for poor Recall+ and Recall-: (1) The scale of the topic extraction and polarity classification rule base built according to the training data is small (only 36 rules). Thus, the language phenomena having not appeared in the training data can't be covered. For instance, the module 10 -anaphora resolution neglects the case that the pronoun appears ahead of the topic. In the next stage, new rules will be added to the rule base to expand its coverage. (2) The sentiment lexicon and the sentiment phrase rule base are not incomplete so that many exps in the test data can't be recognized. In the next stage, the system will improve the automatic recognition of unlisted exps.
Reasons and solving methods for poor Preci-sion+ and Precision-: (1) Some rules in the topic extraction and polarity classification rule base do not appropriately describe the semantically orientated relationship between topic-exp, which leads to the wrong extraction of exps. In the next stage, some rules will be revised based on the errors analysis. (2) Some "exps" in the sentiment lexicon actually do not have evaluation meaning. For example, the word 激烈 is not a sentiment word. However, it is listed in the sentiment lexicon as a negative word. Therefore, the sentiment polarity output result of Topic :水货客 in <wei-bo>: 反 水 货 客 行 动 越 趋 激 烈 。 </weibo> is wrong -1. In the next stage, the sentiment lexicon will be checked and non-sentiment words will be removed.

Conclusion
In this paper, firstly, we proposed the algorithm of rule-based weibo messages sentiment polarity classification towards given topics. Then, we adopted the rule methods to implement the requirements of the algorithm procedures. Based on the sentiment lexicon SentiDic and sentiment value assignment rules in PhraseRule, the sentiment analysis system CUCsas realized the automatic recognition of the exp in weibo messages. Based on the topic extraction and polarity classification rule base, the system realized the automatic identification of whether there is a semantic orientation relationship between the exp and the topic. And based on the sentiment computing rules in PhraseRule, the system realized the sentiment value calculation and polarity classification of the exp in specific weibo message context. At present, the overall F value of the ruled-based sentiment analysis system CUCsas is about 0.69. In the future, the lexicon and rule base will be revised based on the errors analysis to improve the performance of the system.