Chinese Semantic Role Labeling using High-quality Syntactic Knowledge

This paper presents an application of Chinese syntactic knowledge for semantic role labeling (SRL). Besides basic morphological information, syntactic structures are crucial in SRL. However, it is difﬁcult to learn such information from limited, small-scale, manually annotated training data. Instead of manually increasing the size of annotated data, we use a large amount of automatically extracted syntactic knowledge to improve the performance of SRL.


Introduction
Semantic role labeling (SRL) is regarded as a task that is intermediate between syntactic parsing and semantic analysis in natural language processing (NLP). The main goal of SRL is to extract a proposition from a sentence about who does what to whom, when, where and why. By using semantic roles, the complex expression of a sentence is then interpreted as an event and its participants (i.e., predicates and arguments such as agent, patient, locative, temporal and manner). Unlike syntactic level surface cases (i.e., dependency labels such as subject and object), semantic roles can be regarded as a deep case representation for predicates. Because of its ability to abstract the meaning of a sentence, SRL has been applied to many NLP applications, including information extraction (Christensen et al., 2010), question answering (Pizzato and Mollá, 2008) and machine translation (Liu and Gildea, 2010).
Semantically annotated corpora, such as FrameNet (Fillmore et al., 2001) and PropBank (Kingsbury and Palmer, 2002), make this type of automatic semantic structure analysis feasible by using supervised machine learning methods. Automatic SRL processing has two major drawbacks: firstly, the scale of the training data is quite limited and although manually annotated data such as PropBank is available as training data for learning semantic role prediction models, it is still hard to learn lexical preferences due its limited size. Increasing the size and coverage of this resource for improving the quality of learned models is a time consuming task. Secondly, similar to syntactic analysis such as syntactic dependency parsing, whose performance is highly dependent on preceding analysis such as POS tagging, automatic SRL systems are based on syntactic structures along with lower level information including POS tags and lexical information. As a result, SRL suffers from error propagation from the lower levels of the whole framework. Although some studies use automatic analysis of unlabeled data to enrich the training data to solve the first problem (Fürstenau and Lapata, 2009), accumulated errors in such automatic analysis inevitably causes negative effects. Especially, for some hard-to-analyze languages such as Chinese, which is difficult to analyze morphologically, the performance of SRL is always limited due to the above two problems.
In this paper, we focus on Chinese SRL and address the problems mentioned above by using high-quality knowledge automatically extracted from a large-scale corpus. Instead of using high level automatic analyses such as semantic roles, we use lower level syntactic knowledge because lower level analyses are less erroneous compared to higher level analyses. The additional knowledge can provide not only a rich lexicon but also syntactic information, both of which play crucial roles in SRL. In order to show that automatically extracted syntactic knowledge is beneficial, we use predicate-argument structures and case frames (which will be introduced in later sections) in our experiments to validate our claim.
The rest of this paper is organized as follows.
Section 2 contains related work. Section 3 describes the high-quality dependency selection process. Section 4.1 presents a detailed description of our approach, conducted on three languages, along with the results followed by a discussion in Section 4.2. Finally, Section 5 contains our conclusions and future work.

Related work
The CoNLL-2009 shared task (Hajič et al., 2009) features a substantial number of studies on SRL that used Propbank as one of the resources. These work can be categorized into two types: joint learning of syntactic parsing and SRL (Tang et al., 2009;Morante et al., 2009), which learns a unique model for syntactic parsing and SRL jointly. This type of framework has the ability to use SRL information in syntactic parsing for improvement, but has a much larger search space during the joint model learning. The other type is called SRLonly task (Zhao et al., 2009;Björkelund et al., 2009), which uses automatic morphological and syntactic information as the input in order to judge which token plays what kind of semantic role. Our work focuses on the second category of SRL. Our framework is based on those used by Björkelund et al. (2009) and Yang and Zong (2014). There were also several studies using semisupervised methods for SRL. One basic idea of semi-supervised SRL is to automatically annotate unlabeled data using a simple classifier trained on original training data (Fürstenau and Lapata, 2009). Since there is a substantial amount of error propagation in SRL frameworks, the additional automatic semantic roles are not guaranteed to be of good quality. Contrary to this approach, we only rely on syntactic level knowledge which does not suffer too much from error propagation. Also, some studies assume that sentences that are syntactically and lexically similar are likely to share the same frame-semantic structure (Fürstenau and Lapata, 2009). This allows them to project semantic role information to unlabeled sentences using alignments. However, computation of these alignments requires additional information such as word similarity, whose quality is language dependent. Less sparse features capturing lexical information of words can be also used for semisupervised learning of SRL. Such lexical representation can be learned from unlabeled data (Bengio et al., 2003). Deschacht and Moens (2009) used word similarity learned from unlabeled data as additional features for SRL. Word embeddings have also been used in several NLP tasks including SRL (Collobert et al., 2011). Instead of using wordlevel lexical information, our work uses syntactic knowledge as syntactic level lexical information. Zapirain et al. (2009) used selectional preferences to improve SRL. This study is similar to our approaches but the quality of selectional preferences was not concerned at all.
In syntactic level of NLP, rich knowledge such as predicate-argument structures and case frames are strong backups for various kinds of tasks. A case frame, which clarifies relations between a predicate and its arguments, can support tasks ranging from fundamental analysis, such as syntactic dependency parsing and word similarity calculation, to multilingual applications, such as machine translation. Japanese case frames have been successfully compiled (Kawahara and Kurohashi, 2006), where each argument is represented as its case marker in Japanese such as 'ga', 'wo', and 'ni'. For the case frames of other languages such as English and Chinese, because there are no such case markers that can help clarify syntactic structures, instead of using case markers like in Japanese, syntactic surface cases (i.e., subject, object, prepositional phrase, etc.) are used for argument representation (Jin et al., 2014). Case frames can be automatically acquired using a different method such as Chinese Restaurant Process (CRP)  for different languages. In our work, we employ such syntactic level knowledge, which use surface cases as argument representation, to help SRL task. We refer to this kind of knowledge as syntactic knowledge in this paper.

SRL task description
In previous studies, SRL pipeline 1 can be divided into three main steps: predicate disambiguation (PD), argument identification (AI), and argument classification (AC). In the PD step, the main goal is to identify the "sense id" of each given predicate. Because the sense id for a certain predicate is meaningless for other predicates, classifiers for PD are trained separately for each pred- icate. We used the part of the feature set proposed by Björkelund et al. (2009) and some additional features. Table 1 lists the feature sets used in the PD step. During the prediction, there will be some predicates which have not been seen before in training data. We label the sense of those unseen predicates using the default sense, which is '01' in our work.
Different from syntactic dependency parsing, given a predicate in a sentence, each token has a possibility to hold a semantic relation with the given predicate. Each token is regarded as an argument candidate. The AI step is mainly to recognize these semantic arguments from the argument candidates. In the AC step, which is the last step in the SRL pipeline, each semantic argument is labeled with a semantic role. However, there was some work in which AI and AC step are executed jointly by inducing a new label 'null', which indicates that the token is not a semantic argument of the predicate. As far as we know, there is small amount of debate involving the merging of the AI step and the AC step, especially on whether such merging is beneficial or not. The joint method seems to have an ability to reduce the error propagation from the AI step to the AC step. However, at the same time, since the training samples with label 'null' will consequently outnumber other labels, there is still a drawback during learning. In our work, we apply a separate framework that carries out the AI and AC step in a pipeline since it is much more intuitive. We use features from Björkelund et al. (2009) and Yang and Zong (2014) along with some new features in AI and AC step. Table 2 lists the features used in each step, in which we use the mark † to indicate the proposed features.

Syntactic knowledge acquisition
We constructed two types of syntactic knowledge namely, predicate-argument structures and case frames.

High-quality predicate-argument structure extraction
Predicate-argument structures (PAS) have been basically acquired from syntactic analyses which varies from phrase chunking to syntactic dependency parsing. For example, English PAS in surface case was acquired in a large scale using a chunking-based system (Kawahara and Kurohashi, 2010). Some phenomena in Chinese, such as omission and complex grammar, make it intractable to automatically extract PAS only using shallow syntactic analysis, such as chunking. Syntactic dependency parsing is applied for Chinese PAS extraction. Arguments are represented by their syntactic dependency labels (i.e., subject, object, etc.) Due to various factors, Chinese syntactic dependency parsing is relatively worse in performance compared to that of English, Japanese, etc. However, using an existing treebank, it is possible to train a classifier to acquire high-quality PAS by only using highly reliable syntactic dependencies. As a result, we applied syntactic dependency parsing to large-scale raw corpora and adopted the high-quality syntactic dependency selection approach (Jin et al., 2014). Their approach first trains a base parser using a part of the Chinese treebank and then applies syntactic dependency parsing on the raw text of another part of the same treebank. According to the gold-standard annotations, both postive and negative samples are then collected to train a binary classifier, which selects those dependencies more likely to be correct. We also follow their method for the compilation of high-quality PAS, which can provide a massive amount of syntactic knowledge.

High-quality case frame compilation
In NLP, at the level of syntax, case frames, compiled from PAS, were proposed as strong backups • binary feature indicating whether the given predicate is the nearest VerbChainHasSubj • binary feature indicating whether there is a dependency label 'SUBJ' between the argument and the predicate for various kinds of tasks (Kawahara and Kurohashi, 2006). For each predicate, all the PAS are clustered into different case frames to reflect different semantic usages. We show an example of case frames for the verb '谢' in Table 3, which has multiple meanings. '谢(1)' is the case frame used to represent the sense of 'withering of flower'. Similarly, the sense of '谢' which means 'to thank', the applicable case frame is '谢(2)'. '谢(3)' is the case frame for the sense of 'curtain call'. In other words, case frames are knowledge that solves word sense disambiguation (WSD) by clustering the PAS. We applied the CRP method described by  for clustering the high-quality PAS to compile high-quality case frames.

Using syntactic knowledge for SRL
The motivation of using large-scale syntactic knowledge is to complement the syntactic information in the limited size of training data. In SRL, an argument may not contain a direct syntactic relation between a given predicate but still plays a semantic role of the predicate. However, this kind This sentence can be translated as "promulgated and implemented files involving multiple fields." "文件(file)" is a child of "颁布(promulgate)" in the dependency tree and labeled as semantic role "A1" of "颁 布(promulgate)". Even though "文 件(file)" does not have a direct dependency relation with "实行(implement)", it is still regarded as a semantic role "A1" of "实 行(implement)". Similarly, "文 件(file)" has also a semantic role "A0" of the verb "涉及(involve)" with no direct dependency relation. However, both direct syntactic dependencies "实 行(implement) 文 件(files)" and "文件(file) 涉及(involve)" appear frequently in real world text. Such patterns in surface cases captured from large-scale corpora would be important clues for SRL.
In addition, some special surface cases such as "BA" and "LB/SB" explicitly indicate accusative case and nominative case, which for most of the time is labeled as "A1" and "A0" respectively in PropBank-style SRL specification. "用/以(use)" is a preposition that strongly indicates the semantic role "MNR" and "在(at)" is a preposition that always stands for the semantic role "LOC" or "TMP". Therefore, it is promising to use largescale syntactic knowledge as an additional resource.
We created three kinds of additional feature sets extracted from the above mentioned syntactic knowledge for SRL. Firstly, we used large-scale automatically acquired surface case predicateargument structures. For each predicate-argument pair, we measured their point-wise mutual information (PMI). Secondly, we used the frequency of an argument candidate being a certain syntactic role. Finally, by considering the effect of word sense ambiguity, for each predicate sense, we calculated the frequency of an argument being a certain syntactic role of a predicate from the corresponding case frames. For all of the additional features, we used binned frequency (i.e., high, middle and low).
Note that a case frame id and a PropBank sense id do not correspond to each other. As a result, a mapping process which aligns case frame id(s) to PropBank verb sense is needed. For example, for the sense '谢.01' of the verb '谢', we extracted and grouped all the related predicateargument structures. Then we calculated the similarity between verb sense '谢.01' and each case frame (i.e., '谢(1)', '谢(2)', etc.) by matching the corresponding predicate-argument structures that they are composed of. To determine the similarity between the two groups of predicate-argument w/o selection select 50% select 20% UAS 0.677 0.824 0.920  Table 5: Evaluation results of Chinese SRL. The ** mark and * mark mean that the result is regarded as significant (with a p value < 0.01 and a p value < 0.05 respectively) using McNemar's test.
structures, we used the method proposed by Kawahara and Kurohashi (2001). This ensures that each case frame id is aligned to its most similar verb sense in PropBank.

Experimental settings
For large-scale syntactic knowledge acquisition, 30 million sentences from Chinese Gigaword 5.0 (LDC2011T13) 2 were used. For the high-quality dependency selection approach in the knowledge construction pipeline, the Stanford parser was used to apply syntactic dependency parsing on the raw texts from Chinese Gigaword. The training section of Chinese Treebank 7.0 was used to train the dependency parser and the official development section was used to train a classifier for high-quality dependency selection. Judging whether the automatic dependencies are reliable can be regarded as a binary classification problem, for which we utilized support vector machines (SVMs). Specifically, we employed SVM-Light 3 with a linear kernel to select high-quality dependencies from large-scale automatic dependency parses on the Chinese Gigaword for syntactic knowledge construction. Using official evalution section of CTB 7.0, we evaluated the quality of thoses selected dependencies using unlabeled attachment score (UAS), which calculates the percentage of correctly indentified dependency heads.
For SRL, we used the Chinese section of CoNLL-2009 shared task data for experiments. Automatically obtained morphological and syntactic information (the columns begin with "P") was used. PD and AI, AC step are regarded as multi-class classification problems. We employed OPAL 4 to solve this problem. We set the options as follows: polynomial kernel with degree 2; passive aggressive I learner; 20 iterations. The SRL system without using additional syntactic knowledge was used as a baseline. To examine the effect of different quality of syntactic knowledge, we used different set of PAS which was extracted under different dependency selection thresholds (20%, 50%, w/o selection). The official script provided on the CoNLL-2009 shared task website was used for evaluation.

Experimental results
Tabel 4 shows the quality of selected dependencies using different selection criteria. The precision of automatic syntactic depdencies increases when we lower the recall. Table 5 shows our experimental results using the syntactic knowledge-based features. Syntactic knowledge (x%) indicates that the top x% (according to the classifier) of the automatically extracted syntactic knowledge was used. '100%' means that dependency selection step was not performed.
Our baseline system outperforms as well as the best system in CoNLL-2009 shared task. As we can see from the result, using large-scale syntactic knowledge can help improve the performance of SRL. Syntactic knowledge extracted from automatic parses without any selection (100%) contains a lot of noise and hence is not beneficial at all. However, filtering noisy syntactic knowledge leads to an significant improvement in Chinese SRL task. This shows that selecting hiqhquality dependencies is an important aspect of high-quality SRL.

Conclusion
In this paper, we have used high-quality syntactic knowledge to improve Chinese SRL. The result showed that this kind of knowledge has a positive effect on the SRL performance. The quality of syntactic knowledge turns out to be an important factor in such a semi-supervised learning approach.
In the future, we plan to make use of other low level knowledge such as word embeddings (Collobert et al., 2011) or word clusters (Koo et al., 2008), which can be complementary to our syntactic level knowledge. Since recent SRL approaches are mostly point-wise, i.e., features are extracted from pairs of the predicate and an argument candidate. We plan to design a higher order system to capture more global features. Also, reranking is widely utilized in many SRL systems and we plan to combine our surface case knowledge with a reranker, in order to further improve Chinese SRL. Finally, we plan to experiment on different languages and compare the effectiveness of syntactic knowledge for different languages.