Capturing Argument Relationship for Chinese Semantic Role Labeling

,


Introduction
Semantic Role Labeling (SRL) is defined as the task to recognize arguments for a given predicate and assign semantic role labels to them. Because of its ability to encode semantic information, there has been an increasing interest in SRL on many languages (Gildea and Jurafsky, 2002;Sun and Jurafsky, 2004). Figure 1 shows an example in Chinese Proposition Bank (CPB) (Xue and Palmer, 2003), which is a Chinese corpus annotated with semantic role labels.
Previous works of Chinese SRL include featurebased approaches and neural network based approaches. Feature-based approaches often extract a large number of handcrafted features from the sentence, and feed these features to statistical classifiers such as CRF, MaxEnt and SVM (Sun and Jurafsky, 2004;Xue, 2008;Ding and Chang, 2008;Ding and Chang, 2009;Sun, 2010). Neural network based approaches usually take Chinese SRL as sequence labeling task and use bidirectional recurrent neural network (RNN) with long-short-term memory (LST-M) to solve the problem (Wang et al., 2015).
However, both of the above two kinds of approaches identify each candidate argument separately without considering the relationship between arguments. We define two categories of argument relationships here: (1) Compatible arguments: if one candidate argument belongs to a given predicate, then the other is more likely to belong to the same predicate; (2) Incompatible arguments: if one candidate argument belongs to a given predicate, then the other is less likely to belong to the same predicate. For example, in Figure 1, the word " û"(foreign businessman) and "è'"(entrepreneur) tend to be compatible arguments when the predicate word is "Ý]"(invest). On the other hand, "è '"(entrepreneur) and " 5½"(rule) are not likely to belong to the same predicate "Ý]"(invest).
In this paper, we propose to use a quadratic optimization method to explicitly model the relationship between candidate arguments to improve the performance of Chinese SRL. We train a maximum entropy classifier, and then use the classifier to predict argument relationships between any two candidate arguments in a sentence. Experiments show that argument relationships can greatly improve the performance of Chinese SRL.  Figure 1: A sentence with semantic roles labeled from CPB. "rel" represents the predicate, English translation: "Six rules to protect foreign businessman's legal profits when investing entrepreneurs"

Related Work
Semantic Role Labeling (SRL) task was first proposed by Gildea and Jurafsky (2002). Previous approaches on Chinese SRL can be classified into two categories: (1) feature-based approaches (2) neural network based approaches. Among feature-based approaches, Sun and Jurafsky (2004) did the preliminary work on Chinese SR-L without any large semantically annotated corpus and produced promising results. Xue and Palmer (2003) proposed Chinese Proposition Bank (CPB), which leads to more complete and systematic research on Chinese SRL (Xue and Palmer, 2005;Xue, 2008;Ding and Chang, 2009). Sun et al. (2009) extended the work of Chen et al. (2006), performed Chinese SRL with shallow parsing, which took partial parses as inputs. Yang and Zong (2014) proposed multi-predicate SRL, which showed improvements both on English and Chinese Proposition Bank.
Neural network based approaches are free of handcrafted features, Collobert and Weston (2008) proposed a convolutional neural network for SRL. Their approach achieved competitive performance on English SRL without requiring task specific feature. Wang et al. (2015) proposed a bidirectional LSTM-RNN for Chinese SRL.
However, most of the aforementioned approaches did not take the compatible arguments and incompatible arguments into account. Inspired by Sha et al. (2016), our approach model the two argument relationships explicitly to achieve a better performance on Chinese SRL.

Capturing the Relationship Between Arguments
We found that there are two typical relationships between candidate arguments: (1) Compatible arguments: if one candidate argument belongs to one event, then the other is more likely to belong to the same event; (2) incompatible arguments: if one candidate argument belongs to one event, then the other is less likely to belong to the same event.
We trained a maximum entropy classifier to predict the relationship between two candidate arguments. We choose the following features:  The chinese sentences should be segmented to chinese words first. For a sentence with n + 1 words, we denote C ∈ R n×n as the argument relationship matrix. In the testing procedure, the maximum entropy classifier is used to predict the relationship between argument i and argument j as C ij .
When the output of the maximum entropy classifier is around 0.5, it is not easy to figure out whether it is the first relationship or the second, we call this kind of information "uncertain information"(unclear relationship). For a better performance, we strengthen the certain information and weaken the uncertain information. We transform the result of maximum entropy classifier as follows: We set two thresholds, if the output of the maximum entropy classifier is larger than 0.8, we set C i,j = 1 (compatible arguments), if the output is lower than 0.2, we set C i,j = −1 (incompatible arguments), otherwise, we set C i,j = 0 (unclear relationship). The threshold 0.8 and 0.2 are tuned by development set.

Post-processing Module of Bidirectional LSTM-RNN
Our quadratic optimization method is a postprocessing module of bidirectional LSTM-RNN (Wang et al., 2015). The simplified architecture of bidirectional LSTM-RNN is shown as Figure 2.
Each dimension of the output vector L i ∈ R n L , i = 1 · · · n corresponds to the score of a certain semantic role label. n L represents the number of semantic role labels. Then we normalize L i over semantic roles as Eq 2 shows.
Each dimension of L i represents the probability of a certain semantic role label. Let P Arg ∈ R n be a probability vector, each dimension of which represents the probability that the current word has a semantic role as is shown in Eq 3. P Role ∈ R n is another probability vector, each dimension represents the probability of the most likely semantic role the current word may be labeled as is shown in Eq 4.
where [·] equals to 1 if the inner statement is true and 0 otherwise. label(j) = '0' means the j-th word is not labeled with semantic role.

Quadratic Optimization
We use a n-dim vector X to represent the identification result of candidate arguments. Each entry of X is 0 or 1, 0 represents "noArg", 1 represents "arg". X can be assigned by maximizing E(X) as defined by Eq 5.
Here, X T CX means to add up all the relationship value if the two arguments are identified. Hence, the more the identified arguments are related, the larger the value X T CX is. X T P arg is the sum of all chosen arguments probability. X T P role is the sum of all the classified roles' probability. Eq 5 means that, while we should select the semantic role with a larger probability, the argument relationship evaluation should also as large as possible.
We use Beam Search method (Algorithm 1) to search for the optimal assignment X. The hyperparameters λ 1 and λ 2 can be chosen according to development set.
Input: Argument relationship matrix: C the argument probabilities required by P arg sum the role probabilities required by P role sum Data: K: Beam size n: Number of candidate arguments Output: The best assignment X Set beam B ← [ ] ; Algorithm 1: Beam Search decoding algorithm for SRL. • means to concatenate an element to the end of a vector.

Experiment
We conduct experiments to compare our model with previous landmark methods on the benchmark dataset CPB for Chinese SRL. We use Wang et al. (2015)'s model as baseline. The result reveals that our quadratic optimization method can further improve the result of bidirectional LSTM-RNN.

Experiment Settings
We conduct experiments on the standard benchmark dataset CPB 1.0 2 . We follow the same data setting as previous work (Xue, 2008;Sun et al., 2009), which divided the dataset into three parts: 648 files (from chtb 081.fid to chtb 899.fid) are used as the training set. The development set includes 40 files, from chtb 041.fid to chtb 080.fid. The test set includes 72 files,
The training dataset of the argument relationship matrix contains 1.6M cases (736K positive and 864K negative) which are randomly generated according to the ground truth in the training documents. We use Stanford Parser 3 for dependency parsing.
We tuned the coefficients λ 1 and λ 2 of Eq 5 on the development set, and finally we set λ 1 = 0.10 and λ 2 = 0.45. Table 1 shows our SRL performance compared to previous landmark results. We can see that with quadratic optimization method as the postprocessing module, our approach (QOM) outperforms Wang et al. (2015) by a large margin (Wilcoxon Signed Rank Test, p < 0.05). We also did some ablation test, in Table 1, "QOM -stengthen" is the result when we do not strengthen the argument relationship matrix. We can see that the uncertain information is very harmful to the performance, which worsen the accuracy for about 1%. "QOM -feature 4,5,6" is the performance when we do not use the dependency features when capturing the argument relationships since Wang et al. (2015) didn't use any dependency feature. We can see that event without dependency feature, our method still can outperform Wang et al. (2015)'s result. Figure 3 visualizes the candidate argument relationship matrix. From this graph, we captured the compatible arguments (" û foreign businessman " and "è ' entrepreneur "), incompatible arguments ("' item " and " û foreign businessman "), ("5 ½ rule "

Chinese SRL Performance
English translation protect foreign businessman entrepreneur legal profit six item rule Figure 3: The Visualization of argument relationship Matrix, Left is the origin matrix. Right is the strengthened matrix. In the origin matrix, we can directly see the argument relationship we captured (the darker green means stronger relationship, lighter green means weaker relationship). After strengthening, on the right, the words with strong relationship are classified as compatible arguments (the black squares), weak relationship are classified as incompatible arguments (the white squares). Others (the grey squares) are unclear relationship. and " û foreign businessman ").

Conclusion
In this paper, we propose to use a quadratic optimization method based on two kinds of argument relationships to improve the performance of Chinese SRL. We first train a maximum entropy classifier to capture the compatible arguments and incompatible arguments. Then we use quadratic optimization to improve the result of bidirectional LSTM-RNN (Wang et al., 2015). The experiment has proved the effectiveness of our approach. This method can also be used in other probabilistic methods.