Discourse Representation Parsing for Sentences and Documents

We introduce a novel semantic parsing task based on Discourse Representation Theory (DRT; Kamp and Reyle 1993). Our model operates over Discourse Representation Tree Structures which we formally define for sentences and documents. We present a general framework for parsing discourse structures of arbitrary length and granularity. We achieve this with a neural model equipped with a supervised hierarchical attention mechanism and a linguistically-motivated copy strategy. Experimental results on sentence- and document-level benchmarks show that our model outperforms competitive baselines by a wide margin.


Introduction
Semantic parsing is the task of mapping natural language to machine interpretable meaning representations. Various models have been proposed over the years to learn semantic parsers from linguistic expressions paired with logical forms, SQL queries, or source code (Kate et al., 2005;Liang et al., 2011;Zettlemoyer and Collins, 2005;Banarescu et al., 2013;Wong and Mooney, 2007;Kwiatkowski et al., 2011;Zhao and Huang, 2015).
In this work we focus on parsing formal meaning representations in the style of Discourse Representation Theory (DRT; Kamp and Reyle 1993  DRT is a popular theory of meaning representation (Kamp, 1981;Kamp and Reyle, 1993;Asher, 1993;Asher and Lascarides, 2003) designed to account for a variety of linguistic phenomena, including the interpretation of pronouns and temporal expressions within and across sentences. The basic meaning-carrying units in DRT are Discourse Representation Structures (DRSs) which consist of discourse referents (e.g., x 1 , x 2 ) representing entities in the discourse and discourse conditions (e.g., max(x 1 ), male(x 1 )) representing information about discourse referents. An example of a two-sentence discourse in box-like format is shown in Figure 1a. DRT parsing resembles the task of mapping sentences to Abstract Meaning Representations (AMRs; Banarescu et al. 2013) in that logical forms are broad-coverage, they represent compositional utterances with varied vocabu-lary and syntax and are ungrounded, i.e., they are not tied to a specific database from which answers to queries might be retrieved (Zelle and Mooney, 1996;Cheng et al., 2017;Dahl et al., 1994).
Our work departs from previous generalpurpose semantic parsers (Flanigan et al., 2016;Foland and Martin, 2017;Lyu and Titov, 2018;Liu et al., 2018;van Noord et al., 2018b) in that we focus on building representations for entire documents rather than isolated utterances, and introduce a novel semantic parsing task based on DRT. Specifically, our model operates over Discourse Representation Tree Structures (DRTSs) which are DRSs rendered in a tree-style format (Liu et al. 2018; see Figure 1b). Discourse representation parsing has been gaining more attention lately. 1 The semantic analysis of text beyond isolated sentences can enhance various NLP applications such as information retrieval (Zou et al., 2014), summarization (Goyal and Eisenstein, 2016), conversational agents (Vinyals and Le, 2015), machine translation (Sim Smith, 2017;Bawden et al., 2018), and question anwsering (Rajpurkar et al., 2018).
Our contributions in this work can be summarized as follows: 1) We formally define Discourse Representation Tree structures for sentences and documents; 2) We present a general framework for parsing discourse structures of arbitrary length and granularity; our framework is based on a neural model which decomposes the generation of meaning representations into three stages following a coarse-to-fine approach (Liu et al., 2018;Dong and Lapata, 2018); 3) We further demonstrate that three modeling innovations are key to tree structure prediction: a supervised hierarchical attention mechanism, a linguistically-motivated copy strategy, and constraint-based inference to ensure wellformed DRTS output; 4) Experimental results on sentence-and document-level benchmarks show that our model outperforms competitive baselines by a wide margin. We release our code and DRTS benchmarks in the hope of driving research in semantic parsing further. 2 1 The shared task on Discourse Representation Structure parsing in IWCS 2019. https://sites.google.com/ view/iwcs2019/home 2 https://github.com/LeonCrashCode/ TreeDRSparsing

Discourse Representation Trees
In this section, we define Discourse Representation Tree Structures (DRTSs). We adopt the boxto-tree conversion algorithm of Liu et al. (2018) to obtain trees which we generalize to multi-sentence discourse. As shown in Figure 1, the conversion preserves most of the content of DRS boxes, such as referents, conditions, and their dependencies. Furthermore, we add alignments between sentences and DRTSs nodes.
A DRTS is represented by a labeled tree over a domain D = [R, V, C, N ] where R denotes relation symbols, V denotes variable symbols, C denotes constants and N denotes scoping symbols. Variables V are indexed and can refer to entities x, events e, states s, time t, propositions p, and segments k. 3 R is the disjoint union of a set of elementary relations R e and segment relations R s . The set N is defined as the union of binary scoping symbols N b and unary scoping symbols N u , where N b = {IMP, OR, DUP}, denoting conditions involving implication, disjunction, and duplex, 4 and N u = {POS, NEC, NOT} denoting modality operators expressing possibility, necessity, and negation.
There are six types of nodes in a DRTS: simple scoped nodes, proposition scoped nodes, segment scoped nodes, elementary DRS nodes, segmented DRS nodes, and atomic nodes. Atomic nodes are leaf nodes such that their label is an instantiated relation r ∈ R with argument variables from V or constants from C. 5 Relations can either be unary or binary. For example, in Figure 1, male(x 1 ) denotes an atomic node with a unary relation, while Patient(e 2 , x 1 ) denotes a binary relation node.
A simple scoped node can take one of the labels in N . A node that takes a label from N u has only one child which is either an elementary or a segmented DRS node. A binary scope label node can take two children nodes which are an elementary or a segmented DRS. A proposition scoped node can take as label one of the proposition variables p. Its children are elementary or segmented DRS nodes. A segment scoped node can take as label one of the segment variables k and its chil- Figure 2: The DRTS parsing framework; words and sentences are encoded with bi-LSTMs; documents are decoded in three stages, starting with tree non-terminals, then relations, and finally variables. Decoding makes use of multi-attention and copying. dren are elementary or segmented DRS nodes.
An elementary DRS node is labeled with "DRS" and has children (one or more) which are atomic nodes (taking relations from R e ), simple scoped nodes, or proposition scoped nodes. Atomic nodes may use any of the variables except for segment variables k. Finally, a segmented DRS node (labeled with "SDRS") takes at least two children nodes which are segment scoped nodes and at least one atomic node (where the variables allowed are the segment variables that were chosen for the other children nodes and the relations are taken from R s ). For example, the root node in Figure 1 is an SDRS node with two segment variables k 1 and k 2 and the instantiated relation is because(k 1 , k 2 ). The children of the nodes labeled with the segment variables are elementary or segmented DRS nodes. A full DRTS is a tree with an elementary or segmented DRS node as root.

Modeling Framework
We propose a unified framework for sentenceand document-level semantic parsing based on the encoder-decoder architecture shown in Figure 2. The encoder is used to obtain word and sentence representations while the decoder generates trees in three stages. Initially, elementary DRS nodes, segmented DRS nodes, and scoped nodes are generated. Next, the relations of atomic nodes are predicted, followed by their variables. In order to make the framework compatible for discourse structures of arbitrary length and granularity and capable of adopting document-level information, we equip the decoder with multi-attention, a supervised attention mechanism for aligning DRTS nodes to sentences, and a linguistically-motivated copy strategy.

Encoder
Documents (or sentences) are represented as a sequence of words d ,w 00 ,..., sep i ,...,w ij ,..., /d , where d and /d denote the start and end of document, respectively, and sep i denotes the right boundary of the ith sentence. 6 The jth token in the ith sentence of a document is represented by vector which is the concatenation (;) of randomly initialized embeddings e w ij , pre-trained word embeddingsē w ij , and lemma embeddings e ij (where f (·) is a non-linear function). Embeddings e w ij and e ij are randomly initialized and tuned during training, whileē w ij are fixed.
The encoder represents words and sentences in a unified framework compatible with sentenceand document-level DRTS parsing. Our experiments employed recurrent neural networks with long-short term memory units (LSTMs; Hochreiter and Schmidhuber 1997), however, there is nothing inherent in our framework that is LSTM specific. For instance, representations based on convolutional (Kim, 2014) or recursive neural networks (Socher et al., 2012) are also possible.

Word Representation
We encode the input text with a bidirectional LSTM (biLSTM): where ← → h x ij denotes the hidden representation of the encoder for x ij , which denotes the input representation of token j in sentence i. Shallow Sentence Representation Each sentence can be represented via the concatenation of the forward hidden state of its right boundary and the backward hidden state of its left boundary, i.e., Deep Sentence Representation An alternative to the shallow sentence representation just described, is a biLSTM encoder: which takes h x i , the shallow sentence representation, as input.

Decoder
We generate DRTSs following a three-stage decoding process (Liu et al., 2018), where each stage can be regarded as a sequential prediction on its own. Based on this, we propose the multiattention mechanism to make it possible to deal with multiple sentences. The backbone of our tree-generation procedure is an LSTM decoder which takes encoder representations H x as input and constructs bracketed trees (i.e., strings) in a top-down manner, while being equipped with multi-attention. We first describe this attention mechanism as it underlies all generation stages and then move on to present each stage in detail.

Multi-Attention
Multi-attention aims to extract features from different encoder representations and is illustrated in Figure 3. The hidden representations h y k of the decoder are fed to various linear functions to obtain vector space representations: where g v (·) is a linear function with the name v. 7 Given encoder representations H x = h x 0 , h x 1 , ...h xm , we extract features by applying a standard attention mechanism (Bahdanau et al., 2015) on encoder representations h v y k : .
Multi-attention scores can be also obtained from the attention weights:

Tree Generation
Stage 1 Our decoder first generates tree nonterminals y st 0 , ..., y st k (see Figure 2). 8 The probabilistic distribution of the kth prediction is: where H x refers to the encoder representations and score s st k is computed as: where h y st k is the hidden representation of the decoder in Stage 1, i.e., h y st k = LSTM(e y st k−1 ). 9 Stage 2 Given elementary or segmented DRS nodes generated in Stage 1, atomic nodes y nd 0 , ..., y nd k are predicted (see Figure 2), with the aid of copy strategies which we discuss shortly. The probabilistic distribution of the kth prediction is: where s nd k and s copy k are generation and copy scores, respectively, over the kth prediction.
where [h copy 0 : h copy z ] are copy representations used for copy scoring; and h y nd k is the hidden representation of the decoder in Stage 2, which is obtained based on how the previous token was constructed: The generation of atomic nodes in the second stage is conditioned on h drs , the decoder hidden representation of elementary or segmented DRS nodes from Stage 1 by the linear function g st2nd . For the generation of atomic nodes, we copy lemmas from the input text. However, copying is limited to unary nodes which mostly represent entities and predicates (e.g., john(x 1 ), eat(e 1 )), and correspond almost verbatim to input tokens.
Binary atomic nodes denote semantic relations between two variables and do not directly correspond to the surface text. For example, given the DRTS for the utterance "the oil company is deprived of ...", nodes oil(x 1 ) and company(x 2 ) will be copied from oil and company, while node of(x 2 , x 1 ) will not be copied from deprived of.
are constructed for each document d from its encoder hidden representations [h x 00 : h xmn ], by averaging the encoder word representations which have the same lemma, where ∈ L and L is the set of distinct lemmas in document d: and N is the number of tokens with lemma z .
Stage 3 Finally, we generate terminals, i.e. atomic node variables y rd 0 , ..., y rd k (see Figure 2). The probabilistic distribution of the kth prediction is: where h y rd k is the decoder hidden representation in the third stage: Here, the generation of variables is conditioned upon h atm , the decoder hidden representation of atomic nodes from the second stage by the linear function g nd2rd .

Training
The model is trained to minimize an average crossentropy loss objective: where p j is the distribution of output tokens, θ are the parameters of the model. We use stochastic gradient descent and adjust the learning rate with Adam (Kingma and Ba, 2014).

Extensions
In this section we present two important extensions to the basic modeling framework outlined above. These include a supervised attention mechanism dedicated to aligning sentences to tree nodes. This type of alignment is important when parsing documents (rather than individual sentences) and may also enhance the quality of the copy mechanism. Our second extension concerns the generation of well-formed and meaningful logical forms which is generally challenging for semantic parsers based on sequence-to-sequence architectures, even more so when dealing with long and complex sequences pertaining to documents.

Supervised Attention
The attention mechanism from Section 3.2.1 can automatically learn alignments between encoder and decoder hidden representations. However, as shown in Figure 1, DRTSs are constructed recursively and alignment information between DRTS nodes and sentences is available. For this reason, we propose a method to explicitly learn this alignment by exploiting the feature representations afforded by multi-attention. Specifically, we obtain alignment weights via multi-attention: where β align km = P (a k = m|h y k , [h x 0 : h xm ]), i.e., the probabilistic distribution over alignments from sentences to the kth prediction in the decoder, where a k = m denotes the kth prediction aligned to the mth sentence. We add an alignment loss to the objective in Equation (5): is the probability distribution of alignments. We then use these alignments in two ways.
Alignments as Features Alignments are incorporated as additional features in the decoder by concatenating the aligned sentence representations with the scoring layers. Equations (1), (2), and (4) are thus rewritten as: where stg ∈ {st, nd, rd}, and h xa k is the a k th sentence representation.
At test time, the scoring layer requires the alignment information, so we first select the sentence with the highest probability, i.e., a * k = arg max a k P (a k |h y k , [h x 0 : h xm ]), and then add its representation h x * a k to the scoring layer.
Copying from Alignments We use alignment as a means to modulate which information is copied. Specifically, we allow copying to take place only over sentences aligned to elementary DRS nodes. We construct copy representations for each sentence in a document, i.e., M 0 , ..., , iz ∈ L i , and L i is the set of distinct lemmas in the ith sentence: Given the alignment between elementary DRS nodes and sentences, we calculate the copying score by rewriting Equation (3) as: where a is the index of the sentence that is aligned to the elementary DRS node. At test time, when an elementary DRS is generated during the first stage, we further predict which sentence the node should be aligned to. The information is then passed onto the second stage, and elements from the aligned sentence can be copied.

Constraint-based Inference
Recall that our decoder consists of three stages, each of which is a sequence-to-sequence model. As a result, there is no guarantee that tree output will be well-formed. To ensure the generation of syntactically valid trees, at each step, we generate the set of valid candidates Y valid k which do not violate the DRTS definitions in Section 2, and then select the highest scoring tree as our prediction: where θ are the parameters of the model, and Y valid k the set of valid candidates at step k. In Stage 1, partial DRTSs are stored in a stack and for each prediction the model checks the stack to obtain a set of valid candidates. In the example in Figure 4, segment scoped node k 1 has a child already at step 5, so predicting a right bracket would not violate the definition of DRTS. 10 In stage 2, when generating relations for elementary DRS nodes, the candidates come from R e and lemmas that are used for copying; when generating relations for segmented DRS nodes, the candidates only come from R s . Finally, in stage 3 we generate only two variables for binary relations and one variable for unary relations. A formal description is given in the Appendix.

Experimental Setup
Benchmarks Our experiments were carried out on the Groningen Meaning Bank (GMB; Bos et al. 2017) which provides a large collection of English texts annotated with Discourse Representation Structures. We preprocessed the GMB into the tree-based format defined in Section 2 and created two benchmarks, one which preserves documentlevel boundaries, and a second one which treats sentences as isolated instances. Various statistics on these are shown in Table 1, for the respective training, development, and testing partitions. We followed the same data splits as Liu et al. (2018).
Settings We carried out experiments on the sentence-and document-level GMB benchmarks in order to evaluate our framework. We used the same empirical hyper-parameters for sentenceand document-level parsing. The dimensions of word and lemma embeddings were 300 and 100, respectively. The encoder and decoder had two layers with 300 and 600 hidden dimensions, respectively. The dropout rate was 0.1. Pre-trained word embeddings (100 dimensions) were generated with Word2Vec trained on the AFP portion of the English Gigaword corpus. 11 Model Comparison For the sentence-level experiments, we compared our DRTS parser against Liu et al. (2018) who also perform tree parsing and have a decoder which first predicts the structure of the DRS, then its conditions, and finally its referents. Our parser without the documentlevel component is similar to Liu et al. (2018); a key difference is that our model is equipped with linguistically-motivated copy strategies. In addition, we employed a baseline sequence-tosequence model (Dong and Lapata, 2016) which treats DRTSs as linearized trees.
For the document-level experiments, we built two baseline models. The first one treats documents as one long string (by concatenating all document sentences) and performs sentence-level parsing (DocSent). The second one parses each sentence in a document with a parser trained on the sentence-level version of the GMB and constructs a (flat) document tree by gathering all sentential DRTSs as children of a segmented DRS node (DocTree). We used the sentence-level DRTS parser for both baselines. We also compared four variants of our document-level model: one with 11 Models were trained on a single GPU without batches. Evaluation We evaluated the output of our semantic parser using COUNTER (van Noord et al., 2018a), a recently proposed metric suited to matching scoped meaning representations. COUNTER converts DRSs to sets of clauses and computes precision and recall on matching clauses. We transformed DRTSs to clauses as shown in Figure 5. b variables refer to DRS nodes, and children of DRS nodes correspond to clauses. We used a hill-climbing algorithm to match variables between predicted clauses and gold standard clauses. We report F 1 using exact match and partial match. For example, given predicted clauses "b 0 f all e 1 , b 0 Agent e 2 x 1 , b 0 push e 2 " and gold standard clauses "b 0 f all e 1 , b 0 Agent e 1 x 1 ", exact F 1 is 0.4 (1/3 precision and 1/2 recall) while partial F 1 is 0.67 (4/7 precision and 4/5 recall).

Results
Parsing Sentences Table 2 summarizes results on the sentence-level semantic parsing task for our model (DRTS parser), Liu et al.'s (2018) model, and the sequence-to-sequence baseline (Seq2Seq). As can be seen, our system outperforms comparison models by a wide margin. The better performance over Liu et al. (2018) is due to the richer feature space we exploit and the application of linguistically-motivated copy strategies.     improvements over shallow representations (+3.68 exact-F 1 ). Using alignments as features and as a way of highlighting where to copy from yields further performance gains both in terms of exact and partial F 1 . The best performing variant is Deep-Copy which combines supervised attention with copying. Table 4 shows our results on the test set (see the Appendix for an example of model output); we compare the best performing DRTS parser (DeepCopy) against two baselines which rely on our sentence-level parser (DocSent and DocTree). The DRTS parser, which has a global view of the document, outperforms variants which construct document representations by aggregating individually parsed sentences. Table 5, we examine whether constraint-based inference is helpful. In particular we show the percentage of ill-formed DRTSs when constraints are not enforced. We present results for the sentence-and documentlevel parsers overall and broken down according to the type of DRTS nodes being violated. 30.75% of document level DRTSs are ill-formed when constraints are not imposed during inference. This is in stark contrast with sentence-level outputs which are mostly well-formed (only 2.09% display violations of any kind). We observe that most violations concern elementary and segmented DRS nodes.

Influence of Constraints In
Influence of Document size Figure 6 shows how our parser (DeepCopy variant) and comparison systems perform on documents of varying length. Unsurprisingly, we observe that F 1 decreases with document length and that all systems have trouble modeling documents with 10 sen- tences and beyond. In general, DeepCopy has an advantage over comparison systems due to the more sophisticated alignment information and the fact that it aims to generate global document-level structures. Our results also indicate that modeling longer documents which are relatively few in the training set is challenging mainly because the parser cannot learn reliable representations for them. Moreover, as the size of documents increases, ambiguity for the resolution of coreferring expressions increases, suggesting that explicit modeling of anaphoric links might be necessary.

Related Work
Le and Zuidema (2012) were the first to train a data-driven DRT parser using a graph-based representation. Recently, Liu et al. (2018) conceptualized DRT parsing as a tree structure prediction problem which they modeled with a series of encoder-decoder architectures. van Noord et al.
(2018b) adapt models from neural machine translation (Klein et al., 2017) to DRT parsing, also following a graph-based representation. Previous work has focused exclusively on sentences, whereas we design a general framework for parsing sentences and documents and provide a model which can be used interchangeably for both. Various mechanisms have been proposed to improve sequence-to-sequence models including copying (Gu et al., 2016) and attention (Mikolov et al., 2013). Our copying mechanism is more specialized and linguistically-motivated: it considers the semantics of the input text for deciding which tokens to copy. While our multi-attention mechanism is fairly general, it extracts features from different encoder representations (word-or sentencelevel) and flexibly integrates supervised and unsupervised attention in a unified framework.
A few recent approaches focus on the alignment between semantic representations and input text, either as a preprocessing step (Foland and Martin, 2017;Damonte et al., 2017) or as a latent variable (Lyu and Titov, 2018). Instead, our parser implicitly models word-level alignments with multi-attention and explicitly obtains sentence-level alignments with supervised attention, aiming to jointly train a semantic parser.

Conclusions
In this work we proposed a novel semantic parsing task to obtain Discourse Representation Tree Structures and introduced a general framework for parsing texts of arbitrary length and granularity. Experimental results on two benchmarks show that our parser is able to obtain reasonably accurate sentence-and document-level discourse representation structures (77.85 and 66.56 exact-F 1 , respectively). In the future, we would like to more faithfully capture the semantics of documents by explicitly modeling entities and their linking.

A Constraint-based Inference
In this section we provide more formal detail on how our model applies constraint-based inference.
In order to guide sequential predictions, we define a State Tracker (ST) equipped with four functions: INITIALIZATION initializes the ST, UP-DATE updates the ST according to token y, IS-TERMINATED determines whether the ST should terminate, and VALID returns the set of valid candidates in the current state. The state tracker provides an efficient interface for applying constraints during decoding. Sequential inference with the ST is shown in Algorithm 1; θ are model parameters and Y valid k all possible valid predictions at step k.

A.1 Stage 1
Algorithm 2 implements the ST functions for Stage 1; DRS denotes an elementary DRS node, SDRS denotes a segmented DRS node, propSN is short for proposition scoped node, segmSN is short for segment scoped node, simpSN is short for simple scoped node. Function INITIALIZA-TION (lines 1-4) initializes the ST as an empty stack with a counter. Lines 5-15 implement the function UPDATE, where y is placed on top of the stack if it is not a CompletedSymbol (lines 6-7) and the counter is incremented if y is an elementary DRS node (lines 8-9). The top of the stack is popped if y is a CompletedSymbol (line 12), i.e., the children of the node on top of the stack have been generated, and the stack is updated (line 13).
Lines 16-22 implement the function ISTERMI-NATED. If the stack is empty, decoding in Stage 1 is completed. Function ISTERMINATED is called after function UPDATE has been called at least once (see lines 7-8 in Algorithm 1).
Lines 23-63 implement the function VALID, which returns the set of valid candidates Y valid  shows the type of nodes, i.e., e for elementary DRS nodes or s for segmented DRS nodes, based on which the relations are constructed. The completed flag checks if the construction is completed. Lines 6-11 implement the function UPDATE. If CompletedSymbol is predicted, the completed flag is set to true, and the completed flag is checked (lines 12-14, function ISTERMINATED).
Lines 15-24 implement the function VALID. If the number of constructed relations is zero, Y valid only includes R (lines 16-17). If the number of constructed relations is within the threshold MAX REL ST.type , it is possible to construct more relations (lines 18-19). If the number of children exceeds the threshold, Y valid only includes CompletedSymbol to complete the construction of relations (lines 20-21).

A.3 Stage 3
Algorithm 4 implements the ST functions for Stage 3, where V e includes entity variables, event variables, state variables, time variables, proposition variables, and constants, and V s includes segment variables. Lines 1-5 (INITIALIZATION) initialize the ST with a variable counter, a type flag, and a completed flag. The variable counter records the number of variables that have already been constructed. The type flag shows the type of nodes (e for elementary DRS nodes or s for segmented DRS nodes), based on which the variables are constructed. The completed flag checks if the construction is completed. Lines 6-11 implement the function UPDATE. If CompletedSymbol is predicted, the completed flag is set to true and checked (lines 12-14, function ISTERMINATED).
Lines 15-28 implement the function VALID. If no variables are constructed, Y valid only includes V ST.type (lines 16-17). If only one variable is constructed and ST.type is a segmented DRS, Y valid only includes V s to construct one more variable because relations in segmented DRS nodes are binary (lines 21-22). If two variables are constructed, Y valid only includes CompletedSymbol (line 25). Note that indices of variables are in increased order.

B Example Output
We provide example output of our model (DRTS parser, DeepCopy variant) for the GMB document below in Figure 7.
European Union energy officials will hold an emergency meeting next week amid concerns that the Russian-Ukrainian dispute over natural gas prices could affect EU gas supplies. An EU statement released Friday says the meeting is aimed at finding a common approach. It also expresses the European Commission's concern about the situation, but says the EU top executive body remains confident an agreement will be reached. A Russian cut-off of supplies to Ukraine will reduce the amount of natural gas flowing through the main pipeline toward Europe. But the commission says there is no risk of a gas shortage in the short term. German officials say they are hoping for a quick resolution to the dispute. Government spokesman, Ulrich Wilhelm says officials have been in contact with both sides at a working level, but will not mediate.  Figure 7: Output of DRTS parser (DeepCopy variant) for the document in Section 2.