Event-Driven Emotion Cause Extraction with Corpus Construction

In this paper, we present our work in emotion cause extraction. Since there is no open dataset available, the lack of annotated resources has limited the research in this area. Thus, we ﬁrst present a dataset we built using SINA city news. The annotation is based on the scheme of the W3C Emotion Markup Language. Second, we propose a 7-tuple deﬁnition to describe emotion cause events. Based on this general deﬁnition, we propose a new event-driven emotion cause extraction method using multi-kernel SVMs where a syntactical tree based approach is used to represent events in text. A convolution kernel based multi-kernel SVM are used to extract emotion causes. Because traditional convolution kernels do not use lexical information at the terminal nodes of syntactic trees, we modify the kernel function with a synonym based improvement. Even with very limited training data, we can still extract sufﬁcient features for the task. Evaluations show that our approach achieves 11.6% higher F-measure compared to referenced methods. The contributions of our work include resource construction, concept deﬁnition and algorithm development.


Introduction
With the rapid growth of Internet, people can easily share experiences and emotions through this powerful medium anywhere and anytime. How to analyze the emotions of individuals through their writings becomes a new challenge for NLP. In recent years, s- * corresponding author tudies in emotion analysis focus on emotion classification including detection of emotions expressed by writers of text (Gao et al., 2013) as well as prediction of reader emotions (Chang et al., 2015). There are also some information extraction tasks in emotion analysis, such as extracting the feeler of emotion (Das and Bandyopadhyay, 2010). However, these methods need to observe emotion linked expressions. Sometimes, however, we care more about the stimuli, or the cause of an emotion. For instance, manufacturers want to know why people love, or hate a certain product. The White House may also prefer to know the cause of the emotional text "Let us hit the streets" rather than the distribution of different emotions.
There are three main challenges in the study of emotion cause extraction. The first is that, up to now, there is no open dataset available for emotion cause extraction. This may explain why there are only few studies on emotion causes. The second is that, there is no formal definition about event in emotion cause extraction even though some researches claim that they extract events of emotion causes . The third is that, due to the complexity in annotation, the size of corpus for emotion cause extraction is usually very small. Due to this limitation, many machine learning methods are not suited for emotion cause detection. How to mine deep knowledge of a language for emotion causes is another thorny issue.
In this paper, we first present an annotated dataset for emotion cause extraction to be released to the public. We then propose to use a 7-tuple to define emotion cause events. Based on this general defi-nition, we then present a new event-driven emotion cause extraction method. The basic idea is to extract events in the context of emotional text through dependency parsing. Then, a syntactic structure is used to represent nearby events. Based on this structured representation of events, a modified convolution kernel which also takes lexical features(as terminal nodes) is used to determine whether an event is emotion cause relevant. This method can detect all possible combinations of syntactic structures to obtain sufficient features for emotion analysis using a limited training set. Compared to existing methods, which either use manual rules or commonsense knowledge to extend information, our approach is completely machine learning based and it still achieves state-of-the-art performance. The contributions of this work include both resource development and algorithm development.
The rest of the paper is organized as follows. Section 2 provides a review of related works on emotion analysis. Section 3 presents emotion cause related definitions and the construction of emotion cause extraction corpus. Section 4 gives the event-driven emotion cause extraction method and section 5 is the evaluations and discussions. Section 6 concludes this work and gives the future directions.

Related Works
Identifying emotion categories in text is an essential subject in NLP and its applications (Liu, 2015). Moreover, emotion causes can provide important information on why there is any emotion changes. In this section, we introduce related works on the emotion analysis and emotion cause extraction.
The first issue in emotion analysis is to determine the taxonomy of emotions. Researchers have proposed a list of primary emotions (Plutchik, 1980;Ekman, 1984;Turner, 2000). In this study, we adopt Ekman's emotion classification (Ekman, 1984), which identifies six primary emotions, namely happiness, sadness, fear, anger, disgust and surprise, known as the "Big6" 1 scheme in the W3C Emotion Markup Language. This list is agreed upon by most previous works in Chinese emotion analysis.
The second issue is how to do emotion classification and emotion information extraction. 1 http://www.w3.org/TR/emotion-voc/xml#big6 Beck (Beck et al., 2014) proposed a Multi-task Gaussian-process based method for emotion classification. Xu (Xu et al., 2012) used a coarse to fine method to classify emotions in Chinese blog. Gao (Gao et al., 2013) proposed a joint model to cotrain a polarity classifier and an emotion classifier. Chang (Chang et al., 2015) used linguistic template to predict reader's emotions. Das (Das and Bandyopadhyay, 2010) used an unsupervised method to extract emotion feelers from Bengali blog. There are other studies focused on joint learning with sentiment (Luo et al., 2015;Mohtarami et al., 2013), emotion in tweets or blog (Hasegawa et al., 2013;Qadir and Riloff, 2014;Ou et al., 2014;Quan and Ren, 2009), and emotional lexicon construction Staiano and Guerini, 2014;Mohammad and Turney, 2013). However, these related works all focused on analysis of emotion expressions rather than emotion causes.. Sophia M. Y. Lee first proposed a task on emotion cause extraction . They manually constructed a corpus from Academia Sinica Balanced Chinese Corpus. Based on this corpus, Chen and Lee  proposed a rule based method to detect emotion causes. The basic idea is to make linguistic rules for cause extraction. Some studies (Gui et al., 2014;Li and Xu, 2014;Gao et al., 2015) extended the rule based method to in-formal text in Weibo text (Chinese tweets).
Other than rule based methods, Ghazi (Ghazi et al., 2015) used CRFs to extract emotion causes. However, it requires emotion cause and emotion keywords to be in the same sentence. I. Russo (Russo et al., 2011) proposed a crowd-sourcing method to obtain emotion cause related commonsense knowledge. But it is challenging to extend the commonsense knowledgebase automatically.
Resources used in the above works are not publicly accessible. Most of the methods used are rule based. Learning based methods are quite limited because annotated data is quite small in size due to high cost for annotation. Thus, rule based methods seem to be the easiest way to achieve acceptable performance. Since machine learning methods require more knowledge, which is difficult to generalize. So automatic methods only focused on simple text genre.

Construction of Emotion Cause Corpus
In this section, we first describe the linguistic phenomenon in emotion expressions. It serves as the inspiration to develop the annotated dataset. We then introduce details of the annotation scheme, followed by the construction of the dataset.

Linguistic Phenomenon of Emotion Causes
Emotion causes play an important role in emotion expressions. An emotion cause reveals the stimulus of an emotion. Considering linguistic phenomenon of emotion causes, we follow three basic principles in corpus construction: (1) Keep the whole context of emotion expression; (2) The basic processing unit is at the clause level; and (3) Use of formal text.
In written text, there is an emotion keyword, which is used to express an emotion, in the context of the emotion cause. Thus, finding the appropriate context of emotion keywords in the annotation is the pre-requisite to identify its cause. It is the reason why we keep the whole context of emotion keywords.
Another important kind of cues is the presence of conjunctions and prepositions. These words indicate the discourse information between clauses. In order to make use of discourse information, the basic analysis unit should be at clause level rather than at sentence level.
In the third principle, we choose the formal text in corpus construction. According to the related works, emotion expressions can have overlapping emotion cause and emotion target (Gui et al., 2014) in informal text. This is why some studies even incorporate cause extraction with target identification to improve performance. However, our focus is on emotion cause identification. We use formal news text to avoid the potential mix up.

Collection and Annotation
We first take 3 years (2013-15) Chinese city news from NEWS SINA 2 containing 20,000 articles as the raw corpus. Based on a list of 10,259 Chinese primary emotion keywords (keywords for short) (Xu et al., 2008), we extract 15,687 instances by keyword matching from the raw data. Here, we call the presence of an emotion keyword as an instance in the corpus. For each matched keyword, we extract three preceding clauses and three following clauses as the context of an instance. If a sentence has more than 3 clauses in each direction, the context will include the rest of the sentence to make the context complete. For simplicity, we omit cross paragraph context.
Note that the presence of keywords does not necessarily convey emotional information due to different possible reasons such as negative polarity and sense ambiguity. For example, "祝 愿/wishes" is an emotion word of "happiness". It can also be the name of a song. Also, the presence of emotion keywords does not necessarily guarantee the existence of emotional cause neither. After removing those irrelevant instances, there are 2,105 instances remain. For each emotional instance, two annotators manually annotate the emotion categories and the cause(es) in the W3C Emotion Markup Language (EML) format. Ex1 shows an example of an annotated emotional sentence in the corpus, presented by the original simplified Chinese, followed by its English translation. To save space, we remove the xml tags in the annotation. The original annotated data is in a subsidiary file 3 . The basic analysis unit is a clause. Emotion cause is marked by <cause>, and the emotion keyword is marked by <keywords>. Emotion type, POS, position and the length of annotation are also annotated in Emotionml format.
Mr. Zhu is 55 years old. He started working in 1979 as a barber when he was 19 , and has 36 years of experience. "I was assigned to work at the Barbershop in Danyang, Nanjing. It is the largest barbershop in Danyang. I won many awards and honors there." <cause POS="v" Dis="-1">Talking about his honors</cause>, Mr. Zhu is so <keywords type="happiness"> proud </keywords>.
Ex.1 only contains one cause. However, one keyword may have more than one corresponding emotion causes. In Ex.2, there are two relevant causes During persuasion, firemen realized that the woman attempted suicide because of <cause POS="v" Dis="-2">the hold back of wages by the employer</cause>, and <cause POS="v" Dis="-1">her family asked for money urgently</cause>, she feels <keywords type=sadness>helpless</keywords> and thus

Details of Dataset and Its Annotations
Each instance in our dataset contains only one emotion keyword and at least one emotion cause. It is ensured that the keyword instance and the causes are relevant. The number of extracted instances, clauses, and emotion causes are listed in Table 1. Note that 97.2% of the instances has only one emotion cause, and instances that have two and three emotion causes hold 2.6% and 0.2% respectively. Table  2 shows the distribution of emotion types and Table 3 shows the distribution of cause positions. In the latter we can see that 78% emotion causes adjoin the emotion keywords at the clause level. Apparently, position plays a very important role in emotion cause extraction. Thus, using distance based features for emotion cause extraction is rational and necessary. Table 4 lists the phrase types of emotion causes. Verbs and verb phrases cover 93% of all cause events. Thus, our learning algorithm mainly focus on them.
Two annotators work independently during the annotation process. The key point is to distinguish clause level and phrase level in cause annotation. The clause level labels the clause which contains the emotion cause. The phrase level determines the boundary of an emotion cause. When two annota-   In the phrase level, we use the larger boundary of the two annotations when they have the same annotation at the clause level. We reach 0.9287 for the kappa value on clause level annotation which confirmed the reliability of our annotation.

Event-Driven Emotion Cause Extraction
Due to the complexity of annotation in emotion cause identification, the size of annotated corpus is usually small. Since we aim to use machine learning methods to automatically learn and identify causes, we use a convolution kernel to detect all possible combinations in the syntactic structure. This allows learning from syntactic representations for emotion cause extraction. The basic idea of our proposed method is to use a tree-structure representation to capture features for emotion cause identification. For training data, we extract all valid tree structures for each event, referred to as the ETs (Event Trees). If an event is a cause, the corresponding ET is positive. Otherwise, the corresponding ET is negative. Then, we train a convolution kerneland a POS/phrase type Number Percentage Noun/Noun phrase 147 6.78% Verb/Verb phrase 2020 93.21% Table 4: Distribution of the POS Tag multi-kernel SVMs using the training set to classify candidate ETs in the testing set. Since more than 97% emotion keywords only have one cause, and more than 95% causes are near the emotion keywords, candidate ETs are extracted from the context of emotion keywords. We only choose the ET with the highest probability in the classification result as the emotion cause.

Event Tree Construction
Even though there are related works on event identification in emotion cause detection, there is no formal definition of events In area of artificial intelligence (AI), researchers, such as Radinsky (Radinsky et al., 2012), gave a formal definition of an event as "action, actor, object, instrument, location and time". In our work, we need to give clear definition of event first.
In emotion cause extraction, the components of an event should be simpler. We are only interested in the action, the actor and the object, which are denoted as P , O 1 , O 2 , respectively, following the conventions in AI. Since Chinese is a SVO language, the actor is the subject and the action is the verb. The subject and the object of a sentence may have attributes and a predicate may have adverbial and complement. Since these components may also be helpful in emotion cause extraction, we formally define an emotion cause event as a 7-tuple: Here, Att O 1 is the attribute of O 1 ；Att O 2 is the attribute of O 2 ；Adv is the adverbial of the predicate P ；and Cpl is P 's complement. In case syntactic components are not present, NIL values are used. Note that the main cue in an event is P , the action. So, in our algorithm, we extract all verbs from the text, and use dependency parsing 4 to extract all relevant syntactic components specified in e. Then, we can construct an ET.
An ET has has a fixed height of four levels. The top level is the root node. Since Chinese is a SVO language, the descendant of the root is S(subject), V(verb), and O(object). Then, the seven event components can be categorized and filled up in the relevant slots. to O. Then we can get the ET based on the definition of an event.
Let us review Ex.1 and Ex.2 again. There are three emotion cause events below with their corresponding ETs shown in Figure 1. 1."说起自己的荣誉/Talking about his honors" 2."对方拖欠工程款/ the hold back wages by employers" 3."家中又急需用钱/ her family asked for money urgently" After the construction of the ETs, emotion cause extraction becomes a classification problem. If an ET is an emotion cause, the label should be positive. Otherwise, the label should be negative. A binary classifier should be used.

Emotion Cause Extraction
After the construction of ETs, we obtain positive and negative ET samples. Due to small amount of training samples, it is necessary to capture all features in the ETs. We choose convolution kernel based SVMs because it can search all possible syntactic features under a tree structure.
Here, c(n, i) is the i-th node of n.
However, the above tree kernel definition does not consider terminals, which means that the actual words in a sentence are ignored. As emotions causes are semantically meaningful, we need to incorporate lexical information into the convolution kernel.
Modified kernel function In order to distinguish different ETs, we need to modify the definition of the tree kernel to include lexical words in a clause. So we add one more definition to include the terminals: 4.If n 1 and n 2 are terminal nodes, δ(n 1 , n 2 ) = 1 if and only if n 1 and n 2 are synonyms. Otherwise δ(n 1 , n 2 ) = 0.
Here a synonym is defined in Tongyici Cilin (Extended). 5 which has 17,817 synonyms and 77,343 words. We use the synonym rather than word matching because the size of the corpus is limited. simple word matching is quite sparse.
Let K ET −O denote the original kernel and K ET −M denote the modified kernel, respectively. It can be easily proven that K ET −M is a valid kernel function. Following the notation in (Collins and Duffy, 2002), the function I i (n) is 1 if the sub-tree i is rooted at node n and 0 otherwise. So the original tree kernel is an inner product and the kernel matrix is 5 http://ir.hit.edu.cn/demo/ltp/Sharing Plan.htm semi-definite. In our modified kernel, the function I i (n) is more complicated. Beside the definition above, it has the following additional definition : I i (n) is 1 if i is a terminal node and it is a synonym of n. The new indicator is marked as I i (n). Then we have: K ET −M (T 1 , T 2 ) = n 1 ∈T 1 n 2 ∈T 2 i I i (n 1 )I i (n 2 ). This means that the modified kernel is symmetrical and the kernel matrix is semi-definite. In our work, K ET −M uses SVM optimization and the code is from SVM-light-TK 6 .

Multi-kernel function
Since there are only syntactic information and synonyms in the convolution kernel based method, we need to add some lexical features. Given a 7tuple event e, we obtain the bag-of-words based or word embedding based representation for each component in e, and the distance between a component and emotion keywords are used as the features, respectively. Let the features of each component in e be R i , for every i ∈ e. Then, we can capture the feature set, F , of an ET by a joint operation, called the ET features: (2) We can join the ET features with syntactic information by a multi-kernel function. For any two ETs T 1 and T 2 , with the respective features F 1 and F 2 , the two new multi-kernels can be defined as: Here, K vec denotes a kernel function which can be a linear kernel, a polynomial kernel or a Gaussian kernel. The next step is to train the classifier based on the multi-kernel function.
The training data is already in labeled ET format. To prepare testing data, we extract all ETs from a given instance as candidate ETs. A classifier is used to obtain the probability of emotion cause for each ET to produce a ranked list of candidate ETs. The ET with the highest rank serves as the cause event for the current instance.

Experimental Setup
In the experiments, we stochastically select 90% of the dataset as training data and 10% as testing data. In order to obtain statistically credible results, we evaluate our methods and the reference methods 25 times. We conduct two sets of experiments. The first one evaluates the performance at the clause level to identify the clauses that contain emotion causes. The second one evaluates emotion causes using verb classification. This is because 93.21% of emotion causes are verb/verb phrase and verbs serve as the action component in event definition.

Emotion Cause Extraction
We use the commonly accepted measure proposed by Lee  for emotion cause extraction (Gao et al., 2015;Li and Xu, 2014). In this measure, if a roposed emotion cause covers the annotated answer, the sequence is considered correct. Te precision, recall, and F-measure are defined by In the experiment, evaluation is conducted for the following works: 1.RB(Rule based method): Among several rule based methods Gui et al., 2014;Li and Xu, 2014). We use lee2010's rules (listed in Appendix of this paper). 2.CB(Commonsense based method): In order to reproduce this method (Russo et al., 2011), we use the Chinese Emotion Cognition Lexicon (Xu et al., 2013) as the commonsense. The lexicon contains more than 5,000 emotion stimulations and their corresponding reflection words. 3.ML(Rule base features for machine learning): Rules are used as features with other manual features for emotion cause classification . 4. K vec : Features are defined in Formula (2) in the training of classifier.   (3) to (6). The performance result is given in Table 5. Among all methods, K new * M achieves the top performance in F-measure. Compared to other methods, the improvement is significant with p-value less than 0.01 in t-test.
Even though RB achieves the top precision, its Fmeasure is limited by the low recall. Since CB is opposite to RB, the performance by RB+CB is improved. However, the improvement is quite limited, at 0.0127 in F-measure. The F-measure of our reproduced RB is similar to mentioned result of other references (Gui et al., 2014;Li and Xu, 2014). They repeat Lee's  method and achieve the F-measure with 0.55 more or less.  reported that by using handcrafted rules as features to train a classifier with some additional features such as conjunction, action and epistemic verbs, performance can be improved significantly. In our experiment, the result is opposite to this claim. The main reason is the samples in  are less complex. About 85% of the emotion causes are in the same clause where the emotion keywords are. Our corpus is quite different. The percentage of causes in the same clause where the emotion keyword itself is has only about 23.6%. 's method does not handle long distance relations well. This explains why it does not work well for our dataset. Although (RB+CB+ML) does not perform well, there is still 0.0334 improvement in F-measure compare to RB. Among our proposed methods, K vec on the ET feature achieves 0.4285 in F-measure. Compare to CB and ML, the performance is not satisfactory. However, as a simple feature to represent lexical information, the performance is acceptable. word2vec also yield similar result. Maybe the joint operation is too simple to handle composition.
For the modified tree kernel K ET −M , the performance is 0.0605 higher than the original tree kernel K ET −O in F-measure. It means that the consideration of terminal node improves the performance of the tree kernel significantly. The modified tree kernel K ET −M is also 0.0377 higher than K vec , and 0.0526 higher than K word2vec in F-measure. This means kernel based syntactic representation does have better generalization ability. The original kernel function K ET −O has syntactic information but no lexicon, and it not only underperforms compared to K ET −M but also K vec and K word2vec . This demonstrates our modified kernel function can effectively turn an inferior method into a superior one. Compared to rule based method, the performance still needs to be enhanced and a multi-kernel is necessary. After the combination with ET feature using a multi-kernel, the performance of K new * M achieves a higher level with 0.6756 in F-measure. Compare to RB, the improvement in F-measure is 0.1513. Compare to the combination of existing methods, the improvement is 0.1159. The reason is that our method represents events at the syntactic level. Synonym information gives the model more generalization ability.

Verb Classification for Emotion Cause
In this section, we examine the performance of ETs classification with respect to verbs identified in the emotion clauses.

ETs Classification
Our method is based on ETs classification to choose the candidate ET with the highest probability. The performance is measured by the verbs in the identified ET. Results are shown in Table 6.
Note that K word2vec performs much better than  K vec in verb identification, contrary to their similar performance in clause identification. The reason is that extraction result is based on ranking and only top ranked event affects the performance. In other words, precision is more important than recall here. For the same reason, K new+M is better than K new * M in classification of ETs, although only marginally. Nonetheless, using revised convolution kernel with multi-kernel training is still significantly better than the original kernel K new * M which achieves the best performance in Table 5. When the precision of the two methods are similar, such as K ET −O and K ET −M , the effect of recall becomes important. The multi-kernel not only achieves the best performance on both precision and recall, the increase in performance is also significant with at least 0.2173 (between K ET −M and K new * M ). Obviously, multi-kernel is not just a simple voting or joint for the components, it benefits from two kernels to achieve better performance.

Error Analysis
There are mainly three types of errors in our model. We use case examples to show them.

a) Cascading Events
In some cases, events may happen like a chain reaction. An event that leads to an emotion may be the consequent of another event. Identifying the right event in a chain is more challenging. In the following example: Ex.3: 约兰·沃森坠入冰冷的水中。<cause>刺骨 的冰水</cause>让他感到极其寒冷与<keywords>害 怕</keywords>，约兰·沃森慌忙用不太流利的中文 大呼"救命"。 John Watson fell into icy water. <cause>The chilly water</cause> made him feel so cold and <keywords>scared</keywords> John Watson had to use his broken Chinese to call for help. the emotion cause should be "刺 骨 的 冰 水/the chilly water". Our method output "坠 入 冰 冷 的 水中/fell into icy water" as the emotion cause with probability 60.83%. The probability of the correct cause is 58.89%. As a probability based method, our method does not have the ability to analyze the sequence of events nor the relation between them.
b) Sensory verbs Sensory verbs usually indicate the emotion cause. There are exceptional cases as shown below: After investigation on bullying, the head says that the students realized their mistake and were also <keywords>scared</keywords>. They <cause> may need to do community service</cause> In this case, the cause of "scared" is the punishment of community service. But the template of "知 道…感到/realized ... and felt" usually indicate that there is an emotion cause between the two sensory verbs. Our algorithm gives "知道错了/ realized their mistake" a probability of 61.65% as a cause, although this is incorrect. But, this actually indicates that our method can learn latent patterns in text.
c) Coverage of cause candidates In the construction of ETs, we use actions as the cue to construct candidate events. However, 6.78% of our clauses do not have action words. So, these clauses are not selected as candidates.

Conclusion
In this paper, we present our work on emotion cause extraction. Due to the lack of open resources for this area of study, we first construct an annotated dataset from news text which will be released for public use. We also propose an event-driven emotion cause extraction method to capture the triggering events emotion changes. In this method, we propose a 7tuple representation of events using syntactic structures to identify events. Based on this structured representation of events and the inclusion of lexical features, a convolution kernel based learning method is designed to train a multi-kernel classifier to identify emotion cause events. Compared to manually constructed rules and commonsense knowledge based methods, our proposed model can automatically obtain structure features and lexical features to achieve state-of-the-art performance on this dataset. i) E(B/F) + yue4 C yue4 K "the more C the more K" (F) ii) E = the nearest Na/Nb/Nc/Nh before the first yue4 in B/F iii) C = the V in between the two yue4's in F 8 i) E(F) + K(F) + C(F) ii) E = the nearest Na/Nb/Nc/Nh before K in F iii) C = the nearest (N)+(V)+(N) after K in F 9 i) E(F) + IV(F) + K(F) ii) E = the nearest Na/Nb/Nc/Nh before IV in F iii) C = IV+(an aspectual marker) in F 10 i) K(F) + E(F) + de "possession"(F) + C(F) ii) E = the nearest Na/Nb/Nc/Nh after K in F iii) C = the nearest (N)+V+(N)+"的"+N after de in F 11 i) C(F) + K(F) + E(F) ii) E = the nearest Na/Nb/Nc/Nh after K in F iii) C = the nearest (N)+(V)+(N) before K in F  Here, C = Cause event; E = Experiencer; K = Keyword/emotion verb; B = Clause before the focus clause; F = Focus clause/the clause containing the emotion verb; A = Clause after the focus clause; I to VII are cue words in ; Na/Nb/Nc/Nh is common noun/proper noun/place noun/pronoun.