Two Practical Rhetorical Structure Theory Parsers

We describe the design, development, and API for two discourse parsers for Rhetorical Structure Theory. The two parsers use the same underlying framework, but one uses features that rely on dependency syntax, produced by a fast shift-reduce parser, whereas the other uses a richer feature space, including both constituent-and dependency-syntax and coreference information, produced by the Stanford CoreNLP toolkit. Both parsers obtain state-of-the-art performance, and use a very simple API consisting of, minimally, two lines of Scala code. We accompany this code with a visualization library that runs the two parsers in parallel, and displays the two generated discourse trees side by side, which provides an intuitive way of comparing the two parsers.


Introduction
This paper describes the design and development of two practical parsers for Rhetorical Structure Theory (RST) discourse (Mann and Thompson, 1988). This work contributes to the already vast body of research on RST parsing (see, inter alia, Soricut and Marcu, 2003;Feng and Hirst, 2012;Joty et al., 2013, Joty et al., 2014 with the following: 1. We propose two parsers that use constituentbased and dependency-based syntax, respectively. The underlying framework, other than the syntax-based features, is identical between the parsers, which permits a rigorous analysis of the impact of constituent and dependency syntax to RST parsing. We describe the parsers in Section 2 and empirically compare the impact of the two syntactic representations in Section 3. Our analysis indicates that both parsers achieve state-of-the-art performance. The parser based on dependency syntax performs marginally worse (by 0.1 F 1 points) but runs approximately 2.5 times faster than the parser based on constituent syntax. On average, the faster parser processes one document from the RST corpus in 2.3 seconds.

Both parsers have been released as open-source
Scala code with a very simple API; consisting of, minimally, two lines of code. We discuss this API in Section 4.
3. We also introduce a visualization tool that runs the two parsers in parallel, and displays the two generated discourse structures side by side. This allows users to directly compare the runtimes and outputs of the two parsers. This visualization tool will be the centerpiece of the proposed demo session. We summarize this tool in Section 5.

The Two Parsers
The proposed parsing approach follows the architecture introduced by Hernault et al. (2010), and Feng and Hirst (2012). The parser first segments the text into elementary discourse units (EDUs) using an i.i.d. classifier that identifies which tokens end an EDU. Then the parser iteratively constructs the discourse tree (consisting of binary relations between discourse units) using a greedy bottom-up approach that interleaves two classifiers: the first de-1 tects which two adjacent discourse units are most likely to be connected given the current sequence of units; and the second labels the corresponding relation. The resulting discourse unit produced by the new relation replaces its two children. The process repeats until there is a single discourse unit spanning the text. 1 We chose this algorithm rather than other recent proposed approaches (Joty et al., 2013;Joty and Moschitti, 2014) because: (a) it promotes a simple, modular architecture; (b) it is fast, and (c) as we show later, it performs well. For classification, we experimented with Support Vector Machines (SVM), Perceptron, and Logistic Regression (LR). The results reported here use Perceptron for EDU segmentation and relation detection, and LR for relation labeling, thus offering a good balance between performance and quick training.
With respect to features, our approach builds on previous work (Hernault et al., 2010;Feng and Hirst, 2012;Joty et al., 2013) and extends it in two ways. First, we implement all syntactic features using both constituent and dependency syntax. For example, a crucial feature used by the relation detection/labeling classifiers is the dominance relations of Soricut and Marcu (2003), which capture syntactic dominance between discourse units located in the same sentence. While originally these dominance relations were implemented using constituent syntax, we provide an equivalent implementation that relies on dependency syntax. There are two advantages to this approach: (a) we can now implement a full RST discourse parser using a (much faster) dependency parser; (b) when using a parser that produces both constituent and dependency syntax, such as Stanford's CoreNLP 2 , our experiments show that using both these feature sets increases the performance of the model.
Our second contribution is adding features based on coreference links. We currently use coreference information in two of the latter classifiers (relation detection and labeling) by counting the number of coreference links crossing between the two discourse units under consideration. The intuition behind this feature is that the more coreferential relations exist between two discourse units, the more likely they are to be directly connected.
Using the above framework, we implemented two discourse parsers. The first uses CoreNLP for syntactic parsing and coreference resolution. This parser uses both constituent-and dependency-based features generated using the parser of Manning and Klein (2003). The second discourse parser uses either Malt 3 or the recent neural-network-based parser of Chen and Manning (2014) for dependency parsing. The second discourse parser does not use constituent-nor coreference-based features. For all syntactic parsers, we used the "basic" Stanford dependency representation (de Marneffe et al., 2006). Empirically, we found that this representation yields better discourse parsing performance than any of the "collapsed" representations.

Analysis
We analyze the performance of the two discourse parsers in Table 1. For conciseness, we identify the parser that uses both constituent-and dependencybased syntax and coreference resolution (all produced using CoreNLP) as C, and the parser that uses only dependency-based features as D. The latter one is subclassed as D malt , if the syntactic analysis is performed with the Malt parser, or D stanford , if syntactic parsing is performed with the parser of Chen and Manning (2014). Because we are interested in end-to-end performance, we report solely end-toend performance on the RST test corpus (Carlson et al., 2003). This analysis yields several observations: • The overall performance of the proposed parsers compares favorably with the state of the art. Both the C and D parsers outperform the parser of Hernault et al. (2010), and perform comparably to the parser of Joty et al. (2013). The recent work of Joty et al. (2014), which uses a considerably more complex architecture based on reranking, outperforms our parsers by 1.8 F 1 points.
• In general, the C parser performs better than D on all metrics. This is to be expected 57.3 --- Table 1: Performance of the two discourse parsers: one relying on constituent-based syntactic parsing (C), and another using a dependency parser (D). We report endto-end results on the 18 relations with nuclearity information used by (Hernault et al., 2010;Feng and Hirst, 2012), using both manual segmentation of text into EDUs (left table block), and EDUs predicted by the parser (right block). We used the Precision/Recall/F 1 metrics introduced by Marcu (2000). The ablation test removes various feature groups: features extracted from the dependency representation (dep), features from constituent syntax (const), and coreference features (coref). We compare against previous work that reported end-to-end performance of their corresponding approaches (Hernault et al., 2010;Joty et al., 2013;Joty and Moschitti, 2014).
considering that C uses both constituent-and dependency-based features, and coreference information. However, the improvement is small (e.g., 0.2 F 1 points when gold EDUs are used) and the D parser is faster: it processes the entire test dataset in 88 seconds (at an average of 2.3 seconds/document) vs. 224 seconds for C. 4 For comparison, the (Feng and Hirst, 2012) discourse parser processes the same dataset in 605 seconds.
• The comparison of the two configurations of the dependency-based parser ("D malt " vs. "D stanford ") indicates that the parser of Chen and Manning (2014) yields better RST parsing performance than the Malt parser, e.g., by 0.9 F 1 points when predicted EDUs are used. 4 These times were measured on a laptop with an i7 Intel CPU and 16GB of RAM. The times include end-to-end execution, including model loading and complete preprocessing of text, from tokenization to syntactic parsing and coreference resolution.
• The ablation test in rows 4-5 of the table indicate that the two syntactic representations complement each other well: removing dependency-based features (the "C -dep" row) drops the F 1 score for predicted EDUs by 1.2 points (because of the worse EDU segmentation); removing constituent-based features ("C -const") drops performance by 1.6 F 1 points.
• Feature wise, the "C -const -coref" system is equivalent to D, but with dependency parsing performed by converting the constituent trees produced by the Stanford parser to dependencies, rather than direct dependency parsing. It is interesting to note that the performance of this system is lower than both configurations of the D parser, suggesting that direct dependency parsing with a dedicated model is beneficial.
• The "C -coref" ablation experiment indicates that coreference information has a small contribution to the overall performance (0.3 F 1 points when gold EDUs are used). Nevertheless, we find this result exciting, considering that this is a first attempt at using coreference information for discourse parsing.

Usage
With respect to usage, we adhere to the simplicity principles promoted by Stanford's CoreNLP, which introduced a simple, concrete Java API rather than relying on heavier frameworks, such as UIMA (Ferrucci and Lally, 2004). This guarantees that a user is "up and running in ten minutes or less", by "doing one thing well" and "avoiding over-design" . Following this idea, our API contains two Processor objects, one for each discourse parser, and a single method call, annotate(), which implements the complete analysis of a document (represented as a String), from tokenization to discourse parsing. 5 Figure 1 shows sample API usage. The annotate() method produces a Document object, which stores all NLP annotations: tokens, part-of-speech tags, constituent trees, dependency graphs, coreference relations, and discourse trees. import edu.arizona.sista.processors.corenlp._ import edu.arizona.sista.processors.fastnlp._ // // CoreNLPProcessor: // -syntax/coref with CoreNLP; // -constituent-based RST parser. // FastNLPProcessor: // -syntax with Malt or CoreNLP. // -dependency-based RST parser. // val processor = new CoreNLPProcessor() val document = processor.annotate( "Tandy Corp. said it won't join U.S. Memories, the group that seeks to battle the Japanese in the market for computer memory chips.") println(document.discourseTree.get) Figure 1: Minimal (but complete) code for using the discourse parser.
Use CoreNLPProcessor for the constituent-based RST parser, and FastNLPProcessor for the dependency-based discourse parser. Other than the different constructors, the APIs are identical.
The DiscourseTree class is summarized in Figure 2.
The code for the two parsers is available on GitHub, and is also packaged as two JAR files in the Maven Central Repository (one JAR file for code, and another for the pre-trained models), which guarantees that others can install and use it with minimal effort. For code and more information, please see the project's GitHub page: https://github. com/sistanlp/processors.

Visualization of Discourse Trees
We accompany the above Scala library with a webbased visualization tool that runs the two parsers in parallel and visualizes the two outputs for the same text side by side. This allows the users to: (a) directly compare the runtimes of the two systems in realtime for arbitrary texts; (b) analyze the qualitative difference in the outputs of two parsers; and (c) debug incorrect outputs (e.g., is the constituent tree correct?). Figure 3 shows a screenshot of this visualization tool.
The visualization tool is implemented as a clientserver Grails 6 web application which runs the parsers (on the server) and collects and displays the results (on the client side). The application's client-side code displays both the discourse trees and 6 https://grails.org class DiscourseTree ( / ** Label of this tree, if non-terminal * / var relationLabel:String, / ** Direction of the relation, * if non-terminal. It can be: * LeftToRight, RightToLeft, * or None. * / var relationDir:RelationDirection.Value, / ** Children of this non-terminal node * / var children:Array[DiscourseTree], / ** Raw text attached to this node * / val rawText:String, / ** Position of the first token in the * text covered by this discourse tree * / var firstToken: TokenOffset, / ** Position of the last token in the * text covered by this discourse tree; * this is inclusive! * / var lastToken: TokenOffset ) Figure 2: Relevant fields in the DiscourseTree class, which stores the RST tree produced by the parsers for a given document. The token offsets point to tokens stored in the Document class returned by the annotate() method above. syntactic information using Dagre-d3 7 , a D3-based 8 renderer for the Dagre graph layout engine.

Conclusions
This work described the design, development and the resulting open-source software for a parsing framework for Rhetorical Structure Theory. Within this framework, we offer two parsers, one built on top of constituent-based syntax, and the other that uses dependency-based syntax. Both parsers obtain state-of-the-art performance, are fast, and are easy to use through a simple API.
In future work, we will aim at improving the performance of the parsers using joint parsing models. Nevertheless, it is important to note that RST parsers have already demonstrated their potential to improve natural language processing applications. For example, in our previous work we used features extracted from RST discourse relations to enhance a non-factoid question answering system (Jansen et al., 2014). In recent work, we showed how to use discourse relations to generate artificial training data for mono-lingual alignment models for question answering (Sharp et al., 2015). Figure 3: Screenshot of the discourse parser visualization tool for the input: "Tandy Corp. said it won't join U.S. Memories, the group that seeks to battle the Japanese in the market for computer memory chips." The left pane shows the output of the C parser; the right one shows the output of the D parser. Hovering with the cursor over a tree node shows its full content. Not shown here but included in the visualization: syntactic analyses used by the two parses and runtimes for each component (from tokenization to syntactic analysis).