Shallow Discourse Parsing Using Constituent Parsing Tree

This paper describes our system in the closed track of the shared task of CoNLL-2015. We formulize the discourse parsing work into a series of classiﬁcation sub-tasks. The ofﬁcial evaluation shows that the proposed framework can give competitive results and we give a few discussions over latent improvement as well


System Overview
We design our shallow discourse parser as a sequential pipeline to mimic the annotation procedure as the Penn Discourse Treebank (we will use PDTB instead in the rest of this paper) annotator (Lin et al., 2014). Figure 1 gives the pipeline of the system. The system can be roughly split into two parts: the explicit and the non-explicit. The first part consists of three steps, which sequentially are Explicit Classifier, Explicit Argument Labeler, and Explicit Sense Classifier. While the non-explicit part consists of Filter, Non-explicit and Non-explicit Sense Classifiers. Non-explicit relations include 'Implicit', 'AltLex', 'EntRel', but not 'NoRel'.
We adopt an adapted maximum entropy model as the classification algorithm for every steps. Our system only exploits resources provided by the organizer. * This work of C. Chen, P. Wang, and H. Zhao    We first give a brief introduction over each step of the entire system as the following. After the Explicit Classifier detects explicit connectives, the Explicit Argument Labeler then prunes and classifies the 'Arg1' and 'Arg2' of the detected connective. Then, Explicit Sense Classifier integrates results of previous two steps when trying to distinguish different senses. The second part of the system starts with filtering out obvious false cases. Then the Non-explicit classifier classifies the nonrelations into three classes, i.e., 'Implicit', 'Al-tLex', 'EntRel'. Finally, the Non-explicit Sense classifier determines the sense of the non-explicit relation. In the last two steps, we take the 'En-tRel' as a sense of implicit relation, which we will explain later.  In this part, our parser extracts the explicit relations. An explicit example is given below.
He added that "having just one firm do this isn't going to mean a hill of beans. But if this prompts others to consider the same thing, then it may become much more important.
'Arg1' is shown in italic, and 'Arg2' is shown in bold. The discourse connective is underlined and the sense of this explicit relation is 'Comparison.Concession'.

Explicit Classifier
There are 100 explicit connectives in PDTB annotation (Prasad et al., 2008). However, some connectives, e.g., 'and', do not express a discourse relation. We use a level-order traverse to scan every node in the constituent parse tree to select the connective candidates. This method gives us a high recall in the train set as shown in Table 1.
Seven features are considered ): a) Self Category The highest dominated node which covers the connective. b) Parent Caterogy The category of the parent of the self category. c) Left Sibling Category The syntactic category of immediate left sibling of the self-category. It would be 'NONE' if the connective is the leftmost node. d) Right Sibling Category The immediate right sibling of the self category. It also would be assigned 'NONE' if the self-category has been the rightmost node. e) VP Existence We set a binary feature to indicate whether the right sibling contains a VP. f) Connective In addition to those features proposed by Pilter and Nenvoda, we introduce connective feature. The potential connective itself would be a strong sign of its function. A few of discourse connectives that are deterministic. For example, 'in addition' will always be 'Expansion.Conjunction'.
Maximum Entropy classifier has shown good performance in various previous works (Wang et al., 2014;Jia et al., 2013;Zhao and Kit, 2008). Based on these features, we trained a Maximum Entropy classifier. In order to check the performance of the classifier only, we evaluate the classifier on connective candidates that selected by a level-order traverse. This gives 93.87% accuracy and 90.1% F1 score on dev set.

Explicit Argument Labeler
With all explicit connectives detected, we exploit a constituent-based approach to perform argument labeling (Kong et al., 2014). Along the path from the connective node to the root node in the constituent parse tree, all the siblings of every node on the path are selected as candidates for 'Arg1' and 'Arg2'. For these candidates, we compare them with PDTB to label them as 'Arg1', 'Arg2', or 'NULL'. However, this argument prune strategy focuses on intra sentence. In addition, Kong et al. unified the intra-and inter-sentence cases by treating the immediate preceding sentence as a special constituent. Based on our empirical results, the inter-sentences only contribute to the augment candidate Arg1. Kong et al. also reported a very high recalls (80-90%) on 'Arg1' and 'Arg2' extraction, though our re-implementation only receive recalls 37.5% and 51.3% of the 'Arg1' and 'Arg2', respectively. And about 87.75% of all the pruning out constituents are labeled as 'NULL'. Similar to treating the immediate preceding sentence as 'Arg1' candidate, we take the remaining part of the sentence that is adjacent to the connective as 'Arg2' candidate. This approach gives a boost in 'Arg2' recall, as high as 93.1%.
We extract features from constituent parser tree (Zhao and Kit, 2008;Zhao et al., 2009). The extracted features can be divided into two parts. The first part captures information about the connective itself: a) Con-str Case-sensitive string of the given connective. b) Con-Lstr The lowercase string of the connective. c) Con-iLSib Number of left sibling of the connective. d) Con-iRSib Number of right sibling of the connective.
The second part consists of features from the syntactic constituent: e) NT-CtxContext of the constituent. We use POS combination of the constituent, its parent, left sibling and right sibling to represent the context. f) Con-NT-Path The path from the parent of the connective to the node of the constituent. g) Con-NT-Position The positive of the constituent relative to the connective: left, right, or previous.
After the parser categories all the candidates constituent into 'Arg1', 'Arg2', and 'NULL', Kong et al. adopted a Linear Integer Programming to impose constraints that the number of 'Arg1' and 'Arg2' should no less than one, The extracted arguments should not overlap with the connective. Our experiments also show that some constraints are useless. For example, constraint that the pruned out candidates should not overlap with the connective. The pruning algorithm considers the siblings of the node along the path, there is no chance that the pruned out candidate would overlap with the connective node.
Without considering the error propagated by the pruning process, the argument labeler gives results as Table 2.

Explicit Sense Classifier
In this part we only take a naive approach that take the most frequent sense of the detected explicit connective. A better approach needs to build a sense classifier with syntactic features of the connective such as POS, and position and length of arguments.

Non-Explicit Part
This part is based on the result of explicit part. We assume that Explicit and Non-explicit relations cannot exist in the same sentence simultaneously. So we take out sentences which have been labeled as Explicit in the first part. Then, we take all the adjacent sentences left in the article as candidate implicit relations. There are 13,155 implicit relations given in the train set.

Filter
Apart form filtering out the explicit connective, we also discard sentences between two paragraphs. After these two filtering steps we get 8,728 nonexplicit relations.

Non-explicit Classifier
At first glance, we should build a classifier that can distinguish the relations 'Implicit', 'AltLex', and 'EntRel'. We give the distribution of each relations in the train set in Table 3   We can see the 'AltLex' only covers about 2.94%, which is relatively negligible comparing with 'Implicit'( 73.85%) and 'EntRel'(23.2%). So we decide to focus only on the latter two relations, and the classifier only works on these two relations. Instead of building a single classifier, we set all the non-explicit relations as 'Implicit' here, and view 'EntRel' as a sense of implicit relation.

Non-explicit Sense Classifier
The distribution of all senses in the train set is given in  What's more, we can see that the most frequent sense is 'EntRel'. This leads to our another strategy: At first we set all the candidate non-explicit senses as 'Implicit' and view 'EntRel' as a sense. Then when the Non-explicit Sense Classifier labels the sense as 'EntRel', the Non-explicit Sense Classifier re-labels the type of corresponding relation as 'EntRel'.
Previous studies attempt to predict the missing connective of implicit relations (Zhou et al., 2010; . It has been shown that connective is very predictive for the sense of the relation (Kong et al., 2014). Consequently, we can get the intuition that features for predicting the missing connective are also useful for predicting the implicit sense. Thus we use word-pair features to train our Non-explicit Sense Classifier: b) Arg1Last The last word of 'Arg1'. a) Arg1First The first word of 'Arg1'. c) Arg2First The first word of 'Arg2'. d) Arg2Last The last word of 'Arg2'. e) FirstS Arg1First + Arg2First. f) LastS Arg1Last + Arg2Last. g) Arg1First3 The first three words of 'Arg1'. h) Arg1Last3 The last three words of 'Arg2'. i) Arg2First3 The first three words of 'Arg2'.

Evaluation
A comprehensive evaluation towards our parser has been given in Table 5. We can see that the first step of our parser, i.e., Explicit Classifier, does a moderate job. However, our work to extract the 'Arg1' and 'Arg2' cannot be regarded as success. Since our parser is in a sequential mode, all steps after that receive negative impacts.

Conclusion and Future Work
In this paper, a sequential system is proposed to do shallow discourse parsing. We demonstrate that the whole task can be worked out by a pipeline consists of several subtasks.
In future, we will tune our Argument Labeler in order to gain a better result in the explicit part .