Zero-shot Chinese Discourse Dependency Parsing via Cross-lingual Mapping

Due to the absence of labeled data, discourse parsing still remains challenging in some languages. In this paper, we present a simple and efficient method to conduct zero-shot Chinese text-level dependency parsing by leveraging English discourse labeled data and parsing techniques. We first construct the Chinese-English mapping from the level of sentence and elementary discourse unit (EDU), and then exploit the parsing results of the corresponding English translations to obtain the discourse trees for the Chinese text. This method can automatically conduct Chinese discourse parsing, with no need of a large scale of Chinese labeled data.


Introduction
Discourse parsing aims to analyze the inner structure of texts, which is fundamental to many natural language processing applications, such as question answering and summarization. The construction of discourse corpora has promoted the development of discourse parsing techniques. In English, the widely-used discourse corpora include the Rhetorical Structure Theory Treebank (RST-DT)  and Penn Discourse TreeBank (PDTB) (Prasad et al., 2008).
Recently, Li et al. (2014a) and Yoshida et al. (2014) proposed the discourse dependency structure (DDS). DDS directly links the EDUs, so it has fewer nodes and simpler structures compared to RST and PDTB. In addition, it can easily represent non-projective structures, while hierarchical structures need other complex mechanisms to do so. DDS is especially important for Chinese. Kang et al. (2019) analyzes almost all the existing Chinese discourse treebanks and concludes that DDS is the future direction due to its right balance between expressiveness and practicality. However, little research has been done on Chinese DDS. On one hand, there have been no such DDS treebanks in Chinese yet. Most of the existing Chinese discourse corpora follow PDTB-style or RST-style annotation Xue, 2012, 2015;Ming, 2008). Building a high-quality DDS corpus from scratch is labor-intensive and there are some conversion problems in transforming an existing corpus into DDS. On the other hand, a Chinese discourse parser needs to explore efficient features through trial and error based on the characteristics of Chinese. For the above reasons, Chinese textlevel dependency parsing remains challenging.
To overcome these problems, we propose a simple and efficient method that conducts zero-shot Chinese discourse dependency parsing by exploiting the existing English discourse resources, with no need for Chinese training data. This is motivated by the observation of some Chinese-English parallel sentences such as the examples in Fig.1, whose dependency parsing trees are the same. It can be seen from the figure that the logical organization of a text is similar at the macro discourse level regardless of languages, in spite of lexical or grammatical differences.
Based on this observation, we employ machine translation (MT) and English discourse parsing techniques to parse a Chinese text. Our proposed method is simple but feasible, because English discourse dependency parsing has made progress, especially in parsing discourse tree structures (Liu and Lapata, 2017;Kim et al., 2017), and Chineseto-English MT techniques are relatively mature (Nikolov et al., 2018;Hadiwinoto and Ng, 2018). Specifically, we first make use of MT techniques to translate a Chinese text into English and then adopt a transition-based English parser to analyze the translated text. Finally, we map this English parsing result to the Chinese text. During this process, some modifications are made to MT and the parsing result for performance improvement . To evaluate our proposed method, we manually construct a small dataset, on which our method exhibits promising performance. This corpus will be released soon. The experiment results demonstrates that our method is potentially helpful in building large-scale data for Chinese neural NLG systems that make use of discourse structure. To the best of our knowledge, we are the first to conduct discourse dependency parsing in Chinese.

Chinese Discourse Dependency Corpus Construction
In this work, a small-scale Chinese discourse dependency treebank is constructed for evaluation. Here, we primarily follow the guideline of building the English discourse dependency treebank SciDTB (Yang and Li, 2018a) to explore the specifics of labeling DDS in Chinese. First, scientific abstracts are chosen as corpus sources, because they are short texts with obvious logic and within the same domain as the English treebank (SciDTB) (Yang and Li, 2018a). Specifically, 108 abstracts are selected from a Chinese NLP journal JCIP 1 .
Second, we manually separate these abstracts into elementary discourse units (EDUs), the basic units of a parsing tree. Each segmented abstract is checked at least twice to ensure segmentation quality. Our EDU segmentation mainly refer to the criteria of RST-DT  and make some modifications to the guideline based on the linguistic characteristics of Chinese (Cao et al., 2017;Yang and Li, 2018b to space limitation, we do not list these modifications, as EDU segmentation is not the main work of this paper. Third, for each abstract, we identify the head of every EDU and the relation type between them, which is the most labor-intensive of all steps. We adopt the head and relation identification guidelines defined in Yang and Li (2018a). The relation categories include 17 coarse-grained and 26 fine-grained relation types. During the annotation process, some relation types are hard to distinguish (e.g., the distinction between the relations "manner-means" and "enablement" is vague). In addition, relation pronouns (e.g., that) and conjunctions (e.g., but) are used less frequently in Chinese (Li et al., 2014b), adding to the difficulty of relation labeling. The primary target of this study is to automate this step, i.e., to build the discourse tree with relation types between EDUs identified for a Chinese text.
Two annotators first learned the annotating principle before the annotation work. It takes the annotators 3 months to label the 108 abstracts, each being labeled at least twice independently in order to check annotation consistency and provide human performance as an upper bound. 30 abstracts are used for validation and the rest for test. The inter-annotator agreement is 0.780 and 0.673 with respect to UAS and LAS. In total, there are 1,500 EDUs (including 108 artificial root EDUs) with an average of 12.9 EDUs per abstract and 1,392 labeled discourse relations. On average, there are 2.91 EDUs per sentence and 22.17 characters per EDU. Table 1 shows the five most frequent relation types, along with their frequencies.
3 Zero-shot Chinese Dependency Parsing As stated above, our method aims to generate a dependency parsing tree with relation types between EDUs identified for a Chinese text. It is assumed that golden EDU segmentation has al-ready been conducted for the text. Formally, given a Chinese text t C = (u 1 , u 2 , ..., u k ) composed of k EDUs, we translate each Chinese EDU u i directly into u i (i=1, 2, ..k), which can be seen as an English EDU. Translation performance is restrained to a certain extent because some EDUs cannot individually express their precise meaning when taken out of context. Thus, we make some modifications to the translation results before adopting a transition-based parser to generate a discourse dependency tree for the translated English EDUs. Finally, this dependency tree is mapped onto the EDU-segmented Chinese text. Fig.1 illustrates the whole process of our method. The main idea is simple. Only some technical issues in translation and text parsing need addressing, which will be introduced in the subsections.

Translation
We translate each Chinese EDU separately, instead of processing the whole text at a time, in order to obtain one-to-one correspondence between translated English EDUs and their Chinese counterparts , and to bypass EDU segmentation in English. But due to the absence of context information, translation accuracy is sacrificed, which degrades parsing performance. Since our work does not involve improving translation techniques, we only modify some obvious translation problems.
First, in translation, Chinese EDUs with incomplete meaning may be mistranslated into a sentence ended with a period. As Zhou and Xue (2015) point out, punctuation marks in Chinese can serve as clues of discourse relations. Most competitive Chiense discourse parsing models (Kang et al., 2016) use punctuation as one of their features. Therefore, we stipulate that the translated English EDUs can only be ended with a period if its corresponding Chinese EDU is ended with one. The other periods in the translation are replaced with commas.
Second, we modify the EDU identification of some relative pronouns because the position of them is helpful information for judging specific relation types (e.g.,"attribution"). Since we use EDUs as translation units , the EDU identification of some relation pronouns violates English EDU segmentation criteria. Take u 6 and u 7 in Fig.1 as example. Their translations are respectively: [Experiments show that]ū 6 , [the data augmentation effectively mitigate the problem of insufficient re-sources.]ū 7 . Our modification is to move "that" fromū 6 toū 7 , because a relative pronoun should be with the clause it introduces, according to the EDU segmentation criteria of RST-DT.

English Discourse Parsing
We follow the work of Yang and Li (2018a) and implement a two-stage transition-based dependency parser based on the idea of  to conduct English parsing. In the first stage, the transition-based method for dependency parsing (Nivre, 2003) is adopted to identify the head for each EDU. We employ the action set of arc-standard system (Nivre et al., 2004), and an SVM classifier is designed to predict the most possible transition action. In the second stage, another SVM classifier is trained to predict relation types.
Since this parser is trained with SciDTB, its performance heavily relies on the features of the corpus. By analyzing the parsing results on the validation data, we find one obvious problem: the parser identifies the topic EDU (whose head is the root EDU, such as u 4 in Fig.1) with an accuracy of only 44.95%, while it reaches 85.06% on SciDTB.
To alleviate this problem, we first identify the topic sentence (which includes the topic EDU) in a rule-based way, because it usually begins with certain words, such as "该文"(this paper). Next, we split the passage into two parts with the topic sentence being the beginning of the latter part. The two parts are then parsed separately and joined together. In this way, the topic EDU identification accuracy increases to 68.52%.

Setup
In our work, we compared several ready-made translation tools and chose to use Youdao Translator 2 . We referred to Yang and Li (2018a)'s work and implemented a two-stage transition-based discourse dependency parser to parse the English translated EDUs, with SciDTB as the training corpus. For comparison, we adopted the metrics of unlabeled and labeled attachment scores (UAS and LAS). UAS measures the accuracy of labeling the heads, while LAS measures the accuracy with respect to both head and relation labeling.

Results
Since there is no previous research on Chinese text-level dependency parsing, and our parsing approach is mainly designed to help construct a large-scale discourse dependency corpus in Chinese, our major concern is what performance this method (named Zero-shot in Table 2) can achieve and how it compares to human performance. We list several parsing results for comparison: • Random is a transition-based dependency parser which randomly chooses "shift" or "reduce" as its next action and always uses the most frequent relation type "elab-addition" as the relation label. We test it on our Chinese corpus.
• Supervised(Chinese) is a two-stage transitionbased dependency parser trained with 80 abstracts of our Chinese corpus and tested with the remaining 28 abstracts.
• Supervised(English) is a two-stage transitionbased dependency parser trained on the training set of SciDTB and evaluated on its test set.
• Human(Chinese) and Human(English) are human performance on our Chinese discourse corpus and SciDTB respectively. Table 2 shows the UAS and LAS of different parsing results. The top four rows are performance tested on our Chinese corpus and the bottom two on SciDTB. From Human(English) and Human(Chinese), we can see that discourse labeling is a difficult task for both languages. Our Zeroshot method significantly outperforms the Random parser, meaning that parallel English and Chinese texts have similar discourse structures, and that our method effectively leverages such information. Zero-shot also performs about 12% and 11% higher than the Supervised(Chinese) with respect to the UAS and LAS metrics, because our corpus is too small to support supervision well   (English), the performance of Zero-shot is acceptable in terms of identifying the head EDU, but barely satisfactory in labeling the relations, which might be explained by different statistical distributions of relations types in Chinese and English.. To evaluate the contribution of each modification mentioned in Section 3, we conduct ablation experiments as shown in Table 3. The first line displays the performance of direct parsing without any modifications. The next three lines shows the performance with the modification strategies added in turn. As demonstrated in the table, these subtle modifications all play a useful role in improving performance.
Through error analysis, we find that many wrong cases can be corrected if the parser is given precise translation. Fig.2 provides an example where the heads of some EDUs are wrongly labeled, but are correct if given right translation. Translation precision can be improved with consideration of a larger context than EDU, which will be our future work.

Conclusions
In this paper, we present a simple and efficient method to conduct zero-shot Chinese discourse parsing, whose performance is close to the one of the state-of-art English parsers. It opens the possibilities for conducting dependency parsing on lowresource languages via cross-lingual mapping, re-ducing human labor of corpus construction. In the future, we will further improve our method and test it in more languages and more domains.