Hybrid Approach to PDTB-styled Discourse Parsing for CoNLL-2015

This paper describes our end-to-end PDTB-styled discourse parser for the CoNLL-2015 shared task. We employed a machine learning-based approach to identify discourse relation between text spans for both explicit and implicit relations and employed a rule-based approach to extract arguments of the discourse relations. In particular, we focus on improving the implicit discourse relation identiﬁcation. First, we extract adjacent pairs of sentences that have some discourse relationships by exploiting a two-class classiﬁer from an entire document. Second, we assign sense labels for them by utilizing a multiple-class classiﬁer. Our system achieved a 0.316 overall F-score for the development set, 0.249 for the testset and 0.157 for the blind testset.


Introduction
In this paper, we describe our end-to-end PDTBstyled discourse parsing system for CoNLL-2015. Our system is an extension of Ziheng et al.'s discourse parser (Ziheng et al., 2014). Our explicit connective-argument structure parser consists of three modules: (1) a connective classifier that classifies connective candidates into discourse connective or not, (2) an argument position classifier that classifies whether Arg1 and the discourse connective co-occur in the same sentence or not.
(3) a rule-based argument extraction that extracts both Arg1 and Arg2 using rules derived from a syntactic tree. The implicit parser consists of two modules: (1) argument pair identification that finds the pair of adjacent sentences that have some discourse relation, (2) sense labeler assigning the role of the discourse relation between the sentences.
In addition, we introduce a new evaluation measure for argument extraction. Since exact matching between arguments used in "scorer.py" provided by the organizers of CoNLL-2015 is too strict, we introduce relaxed matching for the task. The evaluation metric measures how close arguments provided by the system are to the gold arguments.
The evaluation results provided by the CoNLL-2015 official scorer show that our system achieved 5th rank in the Arg1 extractor, 6th rank in the Arg2 extractor, 4th rank in the Arg1&Arg2 extractor, and 8th rank in overall performance.

Explicit Connective-Argument Identification
The explicit connective-argument parser consists of three steps. First, we identify discourse connectives for an entire document. Second, we determine whether Arg1 is contained in the same sentence that includes the discourse connective. Third, we assign a sense label for each discourse connective.

Connective Classification
The connective classifier classifies ambiguous connective candidates such as "and" into discourse connective or not. We exploit lexical features and features obtained from parse trees by extending (Ziheng et al., 2014). Note that connective candidates were extracted from the PYTHON script "conn head mapper.py" provided by the organizers of CoNLL-2015. Features that we utilized are shown in Table 1. We trained the classifier by using SVM with second-order polynomial kernel.

Argument Position Classification
By following (Ziheng et al., 2014), we implemented an argument position classifier that classifies the location of the arguments of arbitrary  Table 1: Features used in connective classifier discourse connective into "same sentence" (SS) or "previous sentence" (PS). SS indicates both Arg1 and Arg2 are located in the same sentence that contains the discourse connective. PS indicates Arg1 is located in the sentence previous to that containing both the discourse connective and Arg2. We utilized context features in Table 1 and the position of the connective C s : start, middle, or end.
We also trained the classifier by using SVM with second-order polynomial kernel.

Sense Classification
We assign majority sense ℓ * for each discourse connective C s as follows: (1) L is a set of sense labels used in training data and freq returns the frequency of co-occurrences of the discourse connective and sense label.

Implicit Connective-Argument Relationship Identification
The implicit parser consists of two steps. First is the argument identification step. In this step, we examine whether an adjacent sentence pair in the same paragraph has a discourse relation or not. Second is the sense classification step. Given a pair of sentences, we classify it into a predefined sense label.

Argument Position Identification
In the argument identification step, following Ghosh et al. (2011), the identifier examines all adjacent sentence pairs within each paragraph. For each pair of sentences (S i , S i+1 ), we identify the existence of a discourse relation. To identify the existence of the relation (binary classification), we used SVM with the following features.
• First unigram, last unigram, and first trigram of S i and S i+1 .
• S i (or S i+1 ) contains modality words or not.
• Brown cluster pairs feature defined in Rutherford and Xue (2014) • Sentence-to-sentence discourse dependency tree features including existence of dependency edges and rhetorical relation labels. Discourse dependency trees are defined in Li et al. (Li et al., 2014).
If the identifier identifies that a pair of sentences (S i , S i+1 ) has the discourse relation, we heuristically regard S i as Arg1 and S i+1 as Arg2.

Sense Classification
In the sense classification step, we classify the discourse relation between a pair of sentences (S i , S i+1 ) into five senses: "Expansion", "Contingency", "Temporal", "Comparison", and "En-tRel". To classify the sense of a pair of sentences, we used multi-class SVM. We used the same features described in the argument position identification step. To increase the number of training data, we used the (inter-sentential) explicit training data as the additional training data (Rutherford and Xue, 2015). We removed a connective from each instance in the explicit training data and treated them as implicit training data. The accuracy of classification into five senses is still low because the distribution of the senses is imbalanced. Following Rutherford and Xue (2014), we resampled the instances in the training data of sense classification to balance the distribution of the senses.

Argument Extractor
We utilized two rule-based argument extractors. One extracts both Arg1 and Arg2 from the same sentence (SS). The other extracts Arg1 and Arg2 from adjacent sentences respectively (PS).

Subordinating Conjunctions
We adopted Dinesh et al. (2005)'s tree subtraction method for subordinating conjunctions. This method takes a constituent parse tree as an input and detects argument spans as follows: (1) set a node variable x to the last word of the target connective, (2) set x to the parent node of x and repeat until x has label SBAR or S and set a node variable Arg2 to the node of x, (3) set x to the parent node of x and repeat again until x has label SBAR or S and set a node variable Arg1 to the current node of x, (4) consider span(Arg2) as the span of argu-ment2 and span(Arg1)\span(Arg2) as that of argument1, where span(·) is a function mapping a node · to a set of words dominated by the node.

Coordinating Conjunctions
For coordinating conjunctions, we also define a rule-based method that works on a constituent tree: (1) set a node variable x to the last word of the target connective, (2) set a node variable y to x and x to the parent node of x, and repeat while the leftmost word in span(x) is equal to that in span(y), and after the process, add y and the more right child nodes of x into a set Arg2 set, (3-1) if a node labeled with S or SBAR is contained in the set of the more right child nodes of x than y, set a node variable Arg1 to the node, (3-2) otherwise, set x to the parent node x and repeat until x has label SBAR or S, and set a node variable Arg1 to the node of x, (4) consider union span(Arg2 set) as the span of argument2 and span(Arg1) \ union span(Arg2 set) as that of argument1, where union span(·) is a function mapping a node set · to the union of each word set span(Arg2) for Arg2 ∈ Arg2 set.

Discourse Adverbials & Implicit
Argument Structures We did not treat the discourse adverbial connective-argument and inter-sentential implicit argument structures because their frequencies are not high in the training data.

PS Cases
In the PS cases, our rule-based extraction method is very simple and has only two processes: (1) remove sentence end symbols such as . ! ?. and (2) remove brace expressions enclosed in sentence start and end brackets like "". This method repeats (1) and (2) until unchanged. Table 2 shows the official evaluation results. From the results, explicit connective identification and the Arg2 extractor performed well, but performance of the Arg1 extractor and sense classification was not very good. Thus, the overall performance is significantly degraded. Table 3 shows the official evaluation results for explicit relations. Compared with the testset, the accuracies for the blind testset drastically dropped. This is because our programs might failed to identify some connectives. Table 4 shows the official evaluation results for implicit relations. Among the participants, our implicit parser performed well (1st rank in the Arg1&Arg2 extractor and 2nd rank in the overall performance). Previous study like Ghosh et al. (2011) jointly extracted the argument and classified the sense with a single classifier. Our system performed well since we split our system into the argument extractor and the sense classifier.

Evaluation Results
"scorer.py" employs exact matching for argument extraction, and when the span of the argument provided by systems exactly matches the span of the human annotated argument, the scorer evaluates the system's tuples. However, the boundaries of human annotated arguments are blurry. The span of the argument may differ from the span annotated by another human. Thus, we evaluate our argument extractor with relaxed    matching. We compute token-based arg-Fscore between the system argument and the gold argument that is defined as follows: arg-Fscore = 2 * Prec. * Rec. Prec. + Rec. .
A s indicates a set of tokenIDs obtained from the system argument. A g indicates a set of to-kenIDs obtained from the gold argument. Then, we regard the system argument that has a certain threshold arg-Fscore as the correct argument. Figure 1 shows evaluation results with thresholds from 1.0 to 0.5. When we set the threshold to 0.5, Arg1&Arg2 Fscore achieved 0.7. This implies that our system can detect most of the correct positions of both explicit and implicit connectives but can not extract the correct span of the arguments. Moreover, overall performance is still low because of error caused by the sense classification modules.

Conclusion
In this paper, we presented our PDTB-styled full discourse parser for CoNLL-2015. We extended the work by (Ziheng et al., 2014). The experimental resulted show that our performed well on explicit connective identification and Arg1 extraction, but not on Arg2 extraction and sense classification.