DRS at MRP 2020: Dressing up Discourse Representation Structures as Graphs

Discourse Representation Theory (DRT) is a formal account for representing the meaning of natural language discourse. Meaning in DRT is modeled via a Discourse Representation Structure (DRS), a meaning representation with a model-theoretic interpretation, which is usually depicted as nested boxes. In contrast, a directed labeled graph is a common data structure used to encode semantics of natural language texts. The paper describes the procedure of dressing up DRSs as directed labeled graphs to include DRT as a new framework in the 2020 shared task on Cross-Framework and Cross-Lingual Meaning Representation Parsing. Since one of the goals of the shared task is to encourage unified models for several semantic graph frameworks, the conversion procedure was biased towards making the DRT graph framework somewhat similar to other graph-based meaning representation frameworks.


Introduction
Graphs are a common data structure for representing meaning of natural language sentences or texts. Several shared tasks on semantic parsing have been organized, and the target meaning representations of the shared tasks were predominantly encoded as directed labeled graphs: 1 Semantic Dependency Graphs (Oepen et al., 2014(Oepen et al., , 2015, Abstract Meaning Representation (May, 2016;May and Priyadarshi, 2017), and Universal Conceptual Cognitive Annotation . Some of these graphs are presented in Figure 1. Recently, Oepen et al. (2019) packaged several meaning representation graphs in a uniform graph Parallel to these developments our point of departure is Discourse Representation Theory (DRT, Kamp and Reyle, 1993), a well-studied framework for studying formal semantics beyond sentences. Its meaning representation structures, Discourse Representation Structure (DRS), are directly translatable into formal logic. A sample DRS, in its traditional box format, is illustrated in Figure 2. We will discuss the DRS in more details in Section 2.
Obviously, DRSs are meaning representation structures, but they are different from the already mentioned graph-based meaning representations in two aspects. First, DRSs are not inherently graphs. A DRS is more like a formula of predicate logic which is further organized in sub-formulas and governed with additional operations that account for co-reference and presupposition. That's why DRSs are usually not considered as graphbased meaning representations. For example, DRT was not among the frameworks of the shared task on cross-framework meaning representation parsing (MRP 2019, Oepen et al., 2019) since the meaning representations at the shared task were all uniformly formatted as graphs. Žabokrtský et al. (2020) excluded DRSs when surveying sentence meaning representations as they "limit [themselves] to meaning representations whose backbone structure can be described as a graph over words (possibly with added non-lexical nodes) [. . . ]". The second main contrast between DRSs and several of the graph-based meaning representations is that DRSs are very different from syntactic structures. DRSs have roots in formal semantics, and they are geared to account for negation, quantification, and semantic scope rather than for syntactic  (Sgall et al., 1986;Hajič et al., 2012;  (e) DRS: Discourse Representation Structure in a clausal form (Kamp and Reyle, 1993;Abzianidze et al., 2017) Figure 1: The meaning representation graphs (a-d) of the MRP 2020 frameworks for the sentence The House has voted but the Senate doesn't act. (e) is the DRS of Figure 2 in the clausal form, a suitable format for semantic parsing. The goal is to convert (e) into a graph somewhat similar to (a-d).
structures. 2 Given that graphs are mainstream when it comes to representing meaning and semantically parsing wide-coverage natural language texts, it is important that DRSs are also convertible into graphs, and we refer to these structures as Discourse Representation Graphs (DRGs). This will make DRSs accessible for researchers that primarily focus on graph-based meaning representations and parsing: (a) already existing graph-based semantic parsing models can be re-used or tested on DRGs; and (b) the specific structure of DRGs, reflecting formal semantics of the meaning, will pose new challenges for graph representation learning.
In a nutshell, to embrace DRSs in the second edition of the shared task on cross-framework (and cross-lingual) meaning representation pars-2 For instance, this fact is another reason for excluding DRSs from the survey byŽabokrtský et al. (2020): "we do not include primarily logical representations which are too distant from sentence structures; this leaves out some prominent frameworks such as the Groningen Meaning Bank [. . . ]". ing (MRP 2020;Oepen et al., 2020), we investigate the conversion of DRSs from clausal form (the form adapted to semantic parsing, see Figure 1e) into graphs. While doing so, our goal is to (i) make DRGs structurally as close as possible to the graphs of other frameworks in MRP 2020 (see Figure 1), and (ii) keeping redundant information in DRGs to a minimum to prevent graphs of extensive size and to avoid inflation of the evaluation score. Our efforts contribute to unified parsing models and evaluation tools across the frameworks. Hopefully, it will also save the time of participants by preventing them from developing a completely new parsing model for DRGs.
The rest of the paper is organized as follows. First, Section 2 briefly describes the building blocks of DRSs, and then Section 3 outlines already existing approaches of converting DRSs into graphs. In addition to the existing ones, Section 4 introduces several candidate graph-based encodings of DRSs. In Section 5, we compare several  x 1 b1 house.n.05(x 1 ) Name(x 1 , house) x 2 b3 senate.n.01(x 2 ) Name(x 2 , senate) t 2 b4 time.n.08(t 2 ) t 2 = now e 1 t 1 b2 vote.v.01(e 1 ) Agent(e 1 , x 1 ) Time(e 1 , t 1 ) time.n.08(t 1 ) t 1 ≺ now e 2 b5 act.v.01(e 2 ) Agent(e 2 , x 2 ) Time(e 2 , t 2 ) DRG formats on the computational feasibility of finding maximum common edge subgraph (MCES) because the computational feasibility is crucial for evaluating the meaning representation graphs against the gold standard. In the end, based on the findings of the MCES experiment and our desire for similarity with other graph-based frameworks, we select the specific DRG format that is included in MRP 2020.

Discourse Representation Structures
DRT is a framework that dates back to the early 1980s (Kamp, 1981;Heim, 1982). Since then, the framework has gone through several extensions and modifications to account for certain semantic or pragmatic phenomena. Throughout the paper we use DRSs that are derived from the Parallel Meaning Bank (PMB, Abzianidze et al., 2017). One such DRS is presented in Figure 2. The DRS signature is given in Table 1.
The PMB incorporates several extensions to DRSs. On a micro level, the extensions aim to make DRSs language-neutral by disambiguating non-logical symbols with WordNet (Miller, 1995) synsets and VerbNet (Bonial et al., 2011) roles, where the VerbNet roles are used in combination with neo-Davidsonian event semantics (Parsons, 1990). On a macro level, presuppositions are modeled and explicitly represented following Van der Sandt (1992) and Projective DRT (Venhuizen et al., 2013) while discourse is analyzed following Segmented DRT (Asher and Lascarides, 2003) and flattened by treating discourse relations and DRS operators in a unified way. Due to these extensions, all boxes are labeled with identifiers.
Let's decipher what the DRS in Figure 2 is expressing. It consists of two parts: a set of boxes and a set of discourse connectives applied to box labels (i.e., identifiers). Boxes can be seen as sub-formulas whose separation is relevant for finegrained semantics. Each box includes a (possibly empty) set of discourse referents stacked on a (possibly empty) set of conditions. The example sentence contains two clauses, corresponding to boxes b2 and b4, that are related with each other via the CONTRAST discourse relation. Both b2 and b4 presuppose the existence of entities x 1 (for the House) and x 2 (for the Senate), which are further characterized with concepts (using WordNet synsets) and the naming semantic role. The presuppositions are put in separate boxes labeled with b1 and b3. The presupposition relations are explicitly stated with the binary PRESUPPOSITION DRS operator. Since we use a flat visualization of DRSs, b5, which is negated and nested in b4 (expressed by NEGATION(b4, b5)), is depicted outside b4. In addition to modeling verb argument structure via neo-Davidsonian event semantics and semantic roles, the DRS also contains information about tense. 3

Related Work
There have been several approaches to represent DRSs as graphs. These representations are put side-by-side in Figure 3.   The work by Power (1999) doesn't aim to convert DRSs into graphs as such, but it proposes to augment object-oriented knowledge representation (OOKR) graphs with additional scope information to establish correspondence with DRSs. Although the correspondence is incomplete, e.g., some OOKR graphs might have no corresponding DRS. The augmentation of Power (1999) doesn't cover DRSs with discourse relations, presuppositions (e.g., b1 to b2 in Figure 2) or with an embedded box that contains base and complex conditions (like b4 in Figure 2). Nevertheless, for demonstration purposes, we still present Power (1999)'s augmented graph for a felicitous, simplified DRS of Figure 2. Basile and Bos (2013) proposed converting DRSs into graphs, calling them Discourse Representation Graphs (DRGs). Their goal was to facilitate word-level alignment between surface forms and the corresponding DRSs to generate texts from DRSs. The graph encoding, with several simplifications, is exemplified in Figure 3b. 4 The simplifications decrease the number of nodes and out-of-signature labels in the graph. The encoding can be seen as node-centric since the most frequent signature symbols, namely the symbols of type B and C, are modeled as labeled nodes. Argument positions (A) of binary predicates are distinguished via edge labels. We call this DRG format BB * .
To evaluate the output of their DRS parser, Liu et al. (2018) converted DRSs into graphs, demonstrated in Figure 3c. This graph encoding, in contrast to BB * , is edge-centric as the symbols of type B and C are used as edge labels. Moreover, compared to BB * , the encoding contains more unlabeled nodes since B and C are also modeled with reified nodes. We call Liu et al. (2018)'s encoding L18.
Interestingly, in contrast to the proposed graph encodings of DRS, van Noord et al. (2018a) refused to convert DRSs into graphs and instead used so-called clausal form of DRSs (see Figure 1e). The clauses in clausal form are triples, e.g., b4, NEGATION, b5 , or quadruples, e.g., b2, Agent, e1, x1 , where the quadruples are hyper-edges and fall out of the scope of standard graph encodings. The official evaluation of the shared task on DRS parsing (Abzianidze et al.,

More Graph-based Encodings of DRS
As illustrated in the previous section, there is no agreement on how DRSs should be converted into graphs (or whether they should be converted at all). The range of graph encodings in Figure 3 presents anything but an exhaustive list. Some encoding can even be further refined and compressed without affecting the readability or expressivness. For instance, as explained in footnote 4, BB * represents a refined version of DRGs proposed by Basile and Bos (2013). L18 can also be further compressed by discarding reified concept nodes and their outgoing a1 edges, e.g., replacing b5 We will use L18 * to refer to the DRGs refined in such a way.
In general, the choices in which DRG formats might differ are several. Here we will discuss some of them, namely (see also (A) Expressing Argument positions of B via forking and labeled edges ( B 1 2 , like BB * ) or solely via graph configuration ( B 1 2 , without labeled edges), e.g., encoding Agent(e 1 , x 1 ) as e 1 − → Agent − → x 1 ; (B) Representing Binary predicates as labeled nodes ( B , like BB * ) or unlabeled nodes with B-labeled edges ( B , like L18); (C) Encoding Concepts as labeled nodes ( C , like BB * ), unlabeled nodes with incoming Clabeled edges ( C , like L18), labeled edges ( C , like L18 * ), or as a label on an r node ( C , which will be discussed further); (I) Expressing box membership explicitly (Exp) or implicitly (Imp). Whether a node (corresponding to B, C, or r) is In a particular B, can be depicted via an explicit connecting edge or implicitly via graph configuration.
Here we would like to elaborate more on (I). The box membership in DRT directly accounts for a semantic scope. Like discourse referents, conditions are also members of boxes. So, we also need to express the box membership of condition predicates in the graphs. All the encodings in Figure 3 explicitly express box membership. For instance, Agent(e 1 , x 1 ) belonging to b2 is expressed via connecting b2 to the Agent node (see Figure 3b) or via the outgoing Agent edge from b2 to c3. Explicating all box memberships via labeled edges

DRG encoding
Args Im-a1 Table 2: Several combinations of the choices in DRG design. The choices concern representation of argument positions, B symbols, C symbols, and in-box relations. The names of encodings visually follow the combinations of the choices.
increases the graphs in size. To prevent this, one can make box membership of certain predicates or their arguments implicit but at the same time easily and unambiguously recoverable from the graphs. For example, if we assume that directionality of arrows carries the in-box inheritance and consider the case when argument positions are configurationally encoded ( B 1 2), then there is no need to explicate the in-box relation for Name in x 1 − → Name − → house whenever the Name condition and x 1 are in the same box. 5 We dub such an implication of box membership of B from the first argument as 'Im-a1'. Table 2 lists several DRG formats based on combinations of how argument positions, binary predicates, concepts, and in-box relations are represented in a graph. While modeling the argument position, B 1 2 is preferred over B 1 2 from a theoretical point of view because a1 and a2 labels are not part of the DRS signature; They are adhoc ingredients only helping with distinguishing argument positions. When it comes to modeling concepts, as we already discussed, C leads to more economic graphs than C .
In the PMB annotation, for almost any discourse referent, there exists the most specific concept among the concepts applied to it. For example, a discourse referent might have only two con- cepts, male.n.02 and person.n.01, applied to it, but among these concepts there exists the most specific concept, namely male.n.02, as male.n.02 is a hyponym of person.n.01 according to WordNet. The C choice exploits this annotation property of concepts in the PMB and labels the node of a discourse referent with the corresponding most specific concept. This type of encoding of C is shown in Figure 4. Figure 4 also depicts A -• B • C • I DRG encoding with implicit box membership of B. Though all the box membership edges of B are made implicit in the encoding example, this is not the case in general. For example, attributive and predicative adjectives usually introduce b1, Attribute, x1, s1 clause, where x1 is the attributed entity which is not necessarily introduced in the same b1 box as the attributing state s1. Another example is a construction with a locative preposition and a definite noun phrase, e.g., hid a parcel under the bed, whose DRS contains the following fragment: b2 REF e1 b2 Location e1 x3 b2 hide "v.01" e1 b2 SZP x2 x3 b2 REF x1 b3 REF x2 b2 parcel "n.01" x1 b3 bed "n.01" x2 b2 Patient e1 x1 where the binary relation SZP (spatial above) is in  a different box than its first argument. As we have shown, there are at least a dozen ways to dress up DRSs as graphs. Some of the DRG formats are verbose, some can employ default rules to ignore certain redundancies, some require out-of-signature symbols, and some prefer labeled edges over labeled nodes. There isn't enough space to illustrate the graphs listed in Table 2, but each of the mentioned encoding choices is demonstrated by at least one of the graphs from Figure 3 and

Matching & Evaluating DRGs
In graph-based semantic parsing, system outputs are conventionally evaluated against the gold standard graphs by finding the maximum common edge subgraph (MCES) for each pair of produced and gold graphs, and then calculating macro-average F-score (Oepen et al., 2019). In general, the MCES problem is NP-complete, and finding the maximum subgraph shared between two relatively large graphs is sometimes computationally infeasible. In this section, we experiment on how computationally expensive is the MCES problem for each DRG design.

Data & Tools
We run the experiments on the output of existing DRS parsers. Four distinct parsing models are selected to achieve diversity in the system output graphs. Two of the parsers are end-to-end character-based LSTM models from van Noord et al. (2018b): one is their best model (chLSTM ↑ ) while another one is trained on fewer data on purpose to have mediocre performance (chLSTM ↓ ). Another two parsers are based on the semantic  parser Boxer (Bos, 2008), which is used in the PMB to pack all annotations layers into DRS boxes. Boxer ↓ is Boxer based on the NLP tools of the PMB pipeline 6 , on the other hand, Boxer ↑ is Boxer employing annotation layers output by MaChAmp (van der Goot et al., 2020). As the names suggest, Boxer ↑ is a better model than Boxer ↓ . The output DRSs are obtained by parsing the development set (885 documents) of the PMB v3.0.0. 7 Evaluation of the models based on the DRSs of the dev set is given in Table 3. DRSs are scored with Counter (van Noord et al., 2018a), the clause matching tool for DRSs in clausal form. 8 For MCES-based matching of DRGs, we use mtool 9 , the Swiss Army Knife for Graph-Based Meaning Representation. Based on the graph configurations, mtool schedules potential node-tonode mappings between two graphs. This information is used to initialize promising node-to-node mappings that might lead to finding the MCES early. mtool is the official scorer in both the MRP 2019 and MRP 2020 shared tasks. All types of graph encodings employed in the experiments are obtained with the DRS2Graph tool. 10 This new converter from clause-based DRSs to labeled directed graphs is one of the contributions of the paper.

Results & Analysis
The results of finding MCES between the system generated and converted DRGs and reference 6 https://pmb.let.rug.nl/software.php 7 https://pmb.let.rug.nl/data.php 8 https://github.com/RikVN/DRS_parsing 9 https://github.com/cfmrp/mtool 10 https://github.com/kovvalsky/DRS2Graph DRGs are provided in Table 4. The reference DRGs were obtained by converting the gold standard DRS of the PMB 3.0.0 development set. We run experiments with 13 DRG formats. All 885 DRSs were converted in each DRG format without problems. In principle, the encodings with the C • choice are lossy, however, they were successfully applied to the gold DRSs. Several parser-produced DRSs were not converted according to the C • choice since the parsers assert the inconsistent concepts for discourse referents. For example, Boxer ↑ produced a DRS with measure.n.02 and book.n.01 applied to the same discourse referent. Since these senses are not in hyponymy/hypernymy relation, the DRS didn't meet the requirement from C • and was one of the three DRSs of Boxer ↑ that couldn't be dressed up as C • -based graphs. 11 Table 4 shows the computational (in)feasibility of the MCES problem across the combinations of parsing models and graph encodings (using the mtool implementation with default limits on its search space). Given that models are sorted according to their performance in ascending order from top to bottom, the table shows that for relatively distinct graphs it can be difficult to guarantee the MCES solution. 12 But things are not so straightforward as chLSTM ↑ outperforms Boxer ↓ but finding MCES for Boxer ↓ is easier for 10 encodings out of 13. This can be explained by the fact that gold DRSs are obtained from Boxer ↓ while taking into account added human annotations. Given this, it is expected that gold and Boxer ↓ 's DRSs have in common substantial chunks of boxes, and this sharing is transferred on the DRGs too.
Interestingly, the encodings BB * (Basile and Bos, 2013) and A <a a B C (Liu et al., 2018) are one of the most inefficient encodings across all the models. For instance, non-exact (i.e., approximate) MCES was found for 237 DRG pairs out of 872 for chLSTM ↓ and BB * encoding. For other encodings the ratio of approximate matches halves.
Among the encodings with the C choice, A <a a B C appears to provide most computationally friendly graphs. Every encoding with C becomes even better when C is replaced with C • . This is because C • brings at least a 16% reduction in the number of edges and increases the number of labeled nodes. The latter apparently helps mtool to get better initializations for node mappings.
A <a a B C • is the best among C • -featured encodings with explicit box membership. It doesn't improve further when changing its encoding choices, including switching to A , the combination A -• B • C • I yields a substantial decrease in the number of approximate matches. This is explained by the fact that the number of edges decreases by at least 23%. Adding the out-of-signature edge labels for marking argument positions further improves the encoding.
Differences between F-scores calculated over DRS (with Counter) and DRGs (with mtool) are significant (see Table 3). The gap between low-and high-performing model is greater than 10% and 5%, respectively. The DRS-based score is more strict than the DRG-based one because DRSs are evaluated in the clausal form, where some DRSs conditions (e.g., built with B) are modeled via quadruples, i.e., hyper-edges. In DRGs, the hyper-edges are represented by multiple triples ( nodeID, edgeLabel, nodeID or nodeID, label, labelValue ), and this additionally rewards the models when they get parts of hyper-edges correctly.

Conclusion
There have been several approaches that encoded DRSs as graphs (surveyed in Section 3), but their objectives were to transform DRSs in a suitable format for particular applications rather than exploring and comparing different types of DRG encodings. This paper fills this gap. We have systematically characterized a dozen of DRG encodings and contrasted them with each other, and compared them to the DRS clausal form from an evaluation perspective.
We opt for the A -• B • C • I DRG encoding (see Figure 4) to represent DRSs at the MRP 2020 shared task. Despite the encoding being lossy, it represents an excellent trade-off due to the advantages it brings: (a) the encoding has at least 23% fewer edges than other encodings, which makes the DRGs more compact and easier to read; (b) given that scope information inflates DRSs, learning relatively compact DRGs seems a good starting point for the shared task; (c) only less than 0.25% DRSs are lost when applying the encoding; (d) it doesn't employ the out-of-signature labels a1 and a2; (e) for the DRGs obtained from the averageperforming DRS parsers, the evaluation tool can find exact maximal matches for at least 98.4% of DRG pairs. When abstracting from the reification of the roles as nodes, the chosen DRG encoding and the graphs of other frameworks in MRP 2020 have abstractly parallel graph topologies for linguistically parallel predicate-argument structures.