The Role of Reentrancies in Abstract Meaning Representation Parsing

Abstract Meaning Representation (AMR) parsing aims at converting sentences into AMR representations. These are graphs and not trees because AMR supports reentrancies (nodes with more than one parent). Following previous findings on the importance of reen- trancies for AMR, we empirically find and discuss several linguistic phenomena respon- sible for reentrancies in AMR, some of which have not received attention before. We cate- gorize the types of errors AMR parsers make with respect to reentrancies. Furthermore, we find that correcting these errors provides an in- crease of up to 5% Smatch in parsing perfor- mance and 20% in reentrancy prediction


Introduction
Abstract Meaning Representation (AMR) is a semantic formalism used to annotate natural language sentences as graphs. The task of AMR parsing is to convert sentences into AMR graphs (Banarescu et al., 2013) -rooted and directed acyclic graphs where nodes represent concepts and edges represent semantic relations between them. The AMR for the sentence I want you to believe me is shown in Figure 1.
One of the main properties of AMR, and the reason why sentences are represented as graphs rather than trees, is the presence of nodes with multiple parents, called reentrancies, as demonstrated in Figure 1, where the node I has two parents. Reentrancies complicate AMR parsing and require the addition of specific transitions in transition-based parsing (Wang et al., 2015;Damonte et al., 2017) or of pre-and post-processing steps in sequence-to-sequence parsing (van Noord and Bos, 2017). Enabling AMR parsers to predict * Equal contribution reentrancy structures correctly is of particular importance because it separates AMR parsing from semantic parsing based on tree structures (Steedman, 2000;Liang, 2013;Cheng et al., 2017). Reentrancy is however not an AMR-specific problem (Kuhlmann and Jonsson, 2015), and other formalisms can benefit from a better understanding of how to parse such structures. Nevertheless, to our knowledge, the AMR literature lacks any detailed discussion of the types and linguistic causes of reentrant structures. We aim to fill the gap by describing the phenomena causing reentrancies and quantifying their prevalence in the AMR corpus. We identify sources of reentrancy which have not been acknowledged in the AMR literature such as adjunct control, verbalization, and pragmatics. AMR parsers are evaluated using Smatch (Cai and Knight, 2013), which however does not explicitly assess the parsers' ability to recover reentrancies. Damonte et al. (2017) introduced a measure of reentrancy prediction, which computes the Smatch score of the AMR subgraphs containing reentrancies. It was observed that the performance of parsers at recovering reentrancy structures is generally poor. We analyze errors made by the parsers and use an oracle to demonstrate that correcting reentrancy-related errors leads to parsing score improvement. Our contributions are as follows: cies, some which have been neglected so far; • We quantify their prevalence in the AMR corpus automatically and, for a small sample of sentences, manually; • We categorize types of reentrancy errors made by the parsers and perform oracle experiments showing that correcting these errors can lead to improvements of 20% in reentrancy prediction and 5% overall parsing (Smatch); 1 • We establish baselines to correct the errors automatically as a post-processing step.

Phenomena Causing Reentrancies
AMR reentrancies reflect the fact that an entity can have more than one semantic role in the events described by a sentence. Some of the causes of reentrancies, such as control or coordination, are mentioned in the AMR guidelines and are widely recognized in the AMR literature. Here we present a more in-depth and exhaustive catalogue of reentrancy sources (Table 1) in order to shed some light on what difficult aspects of language and AMR formalism conventions we have to contend with during the task of AMR parsing.
In the analysis that follows we define an AMR node as reentrant if it is a child of more than one other node in the Penman linearization of the graph provided in the corpus. Because of the frequent use of inverse roles in AMR graphs, the directionality of the edges is not obvious. Normalizing inverse roles reverses the edge direction, which changes the parent-child relations between nodes and thus influences which nodes are reentrant, i.e. have more than one parent. As Kuhlmann and Oepen (2016) report, the percentage of reentrant nodes in the AMR corpus increases from 5% to 19% when inverse roles are normalized. For instance, in the relative clause example in Table 1 the woman node would be reentrant in a graph with normalized edges, but is not in the graph which follows the corpus linearization. We decided not to normalize the inverse roles for the purposes of our analysis because of the following considerations. Firstly, we assume that there is merit in accepting the edge directionality chosen by the annotator and encoded in the linearization. While different linearizations of the same graph are possible, as the AMR guidelines note, there is usually one that is sensible and reflects the intuitive understanding of which nodes should be considered reentrant. Second, most of the phenomena we discuss yield reentrancies regardless of whether the edge direction is normalized or not. Those phenomena tend to be the more linguistically interesting ones, and the reentrancies which only appear after normalization are largely formalism artefacts (such as ones resulting from using inverse roles to represent adjectives or "-er" nouns), with relative clauses admittedly being an exception.
With that in mind, we classify reentrancy triggers into three broad types: syntactic, pragmatic, and AMR-specific.

Syntactic triggers
We consider a reentrancy as syntactically triggered if the syntactic structure of a sentence forces an interpretation in which one entity performs more than one semantic role. Below we illustrate the syntactic triggers which are commonly discussed in the AMR literature: some types of pronominal anaphora resolution (1), prototypical subject and object control (3 and 4), and coordination (2) (Groschwitz et al., 2017;van Noord and Bos, 2017).
(1) The man i saw himself i in the mirror.
(2) She i ate and i drank.
(3) They i want i to believe.
(4) I asked you i i to sing.
In addition to those, our inspection of the AMR data revealed that other kinds of control structures, primarily adjunct control, are frequent reentrancy triggers. In adjunct control, the clause which lacks a subject is an adjunct of the main clause, as in the following examples: (5) I i went home before i eating.
(6) She i left the room i crying.
Such adjuncts express various additional information regarding the main clause, for example the goal, reason, or timing of an event. Unlike the prototypical cases of control, there is by definition no finite list of verbs associated with adjunct control.
Ellipsis is another cause of reentrancies, as in the sentence: (7) Who can afford it and who can't.

Phenomenon
Sentence AMR

Coreference
The man saw himself in the mirror  in which the node it has two incoming edges, creating a reentrancy.
As mentioned before, one would expect relative clauses to be one of the syntactic reentrancy triggers, because the noun involved has a semantic role in both the main and relative clause: (8) I saw the woman i who i won.
In the example above, the woman is the object of seeing and the subject of winning. However, according to the AMR guidelines (Banarescu et al., 2013) relative clauses should be annotated as attaching to the noun with an inverse role, thereby avoiding a reentrancy (see Table 1).

Pragmatic triggers
Human annotators resolve coreferences even in the absence of definite syntactic clues, giving rise to pragmatically triggered reentrancies. To this class belong for instance the cases of pronominal anaphora resolution where the anaphora is not syntactically bound (unlike in 1). While coreference is, in general, a discourse phenomenon (Hobbs, 1979), it is also applicable to individual sentences such as those in the AMR corpora: (9) The coach of FC Barcelona said the team had a good season.
It is pragmatically understood that FC Barcelona and the team refer to the same entity, even though the coach could have been talking about another team. Another example is provided by control-like structures within nominal and adjectival phrases: (10) They i have a right i to speak freely.
(11) He i was crazy i to trust them.
An AMR annotation will state that in example 10 the possessor of the right and the subject of speaking are the same, and in example 11 the the same person is crazy and is trusting them. The recovery of the subject of the infinitival clause in such constructions is driven by semantics or pragmatics rather than syntax (Huddleston and Pullum, 2002).

AMR conventions
Finally, the last source of reentrancies is AMR conventions. The AMR guidelines instruct annotators to use OntoNotes predicates whenever possible, regardless of the part of speech of the word. This encourages verbalization of elements of the sentence which would not usually be considered predicative.
(12) I received instructions to act.
In example 12 the plural noun instructions appears in the AMR graph as a predicate node instruct-01. This encourages explicitly annotating inferred semantic roles and so I becomes an object of instructing as well as of receiving, causing a reentrancy. Additionally, because of the control-like structure, I is also annotated as an object of acting. In example 13 the adjective corrupt becomes in the AMR graph a predicate whose subject are the officials.
We consider this class as separate from pragmatical triggers, because the inference made by annotators goes beyond pragmatics and is motivated by the constraints of the formalism rather than by what is actually expressed by the sentence. There are other conventions besides verbalization which introduce reentrancies, in particular if inverse roles were normalized 2 . Our choice to nor normalize edge direcionality was partially motivated by a desire to avoid including those phenomena in our analysis.

Quantifying Reentrancy Causes
In order to assess the prevalence of the various reentrancy triggers, we designed heuristics to assign each reentrancy in the AMR corpus to one of the above phenomena. We automatically align AMR graphs to their source sentences using JAMR (Flanigan et al., 2014) and identify the spans of words associated with re-entrant nodes. 3 Heuristics based on Universal Dependency (UD) parses (Manning et al., 2014) and automatic coreference resolution are applied to the spans and the AMR subgraphs containing the reentrancy to classify the cause. 4 We use the NeuralCoref project for coreference resolution. 5 We recognize syntactic reentrancy triggers primarily with UD-based heuristics. For prototypical cases of control we look for common con-  The heuristics column reports automatically detected frequencies for the whole training set. The total column reports frequencies estimated by combining automatic and manual annotation. "Unclassified" are all reentrancies for which our heuristics fail to detect the cause. trol verbs such as want, try, and persuade, 6 with an outgoing xcomp dependency. To identify other types of control, such as adjunct control, we look for xcomp, ccomp or advcl dependency between words aligned to parents of a re-entrant node. For coordination we only check the AMR itself, looking for coordination nodes (i.e., nodes labeled with and, contrast-01, or or). For coreference, we look for re-entrant nodes associated with more than one span and check if those spans corefer.
Finally, for verbalization, we look for nouns or adjectives aligned with OntoNotes predicates in the AMR graph. We tried to identify nominal control-like structures by looking for nominals with an acl dependent infinitive or gerund subjectless verb. However, as the precision of the rule is low, and most examples uncovered by this heuristic also fall into the verbalization category, we do not include it in our statistics.
The results of this analysis are in Table 2 in the heuristics column. The most common cause of reentrancy appears to be coreference. Control is almost as frequent, with adjunct control being much more common than prototypical control verbs.
We note that our heuristics cannot find the cause for 46% of all reentrancies. This can happen for several reasons. There are sources of reentrancy (ellipsis, nominal control-like structures) for which we do not have heuristics due to the difficulty of defining them in terms of UD parses. The heuristics we do define are of high precision if provided with correct input, but all of the systems we use to provide that input -AMR aligner, POS tagger, UD parse, and coreference resolution system -are in fact noisy. Moreover, what is considered to co-refer in AMR does not necessarily agree with the notion implicit in the coreference resolution system. Consider the following sentence: (14) The countries signed an agreement that binds the signatories.
The coreference resolution system does not follow the looser definition of coreference used in the AMR annotation guidelines, where The countries and the signatories are labeled as coreferential. Finaly, some of the reentrancies unaccounted for by the heuristics are due to annotation mistakes. For example in the sentence A nuclear team will make a visit to inspect the nuclear site. The AMR for this sentence contains a reentrancy for the nucleus node, which is used to modify both the team and the site, while there should be two separate nucleus nodes.
To estimate the overall prevalence of reentrancy triggers, including cases for which the heuristics do not work, we manually annotated causes of unaccounted for reentrancies (79 cases) in a sample of 50 sentences. We combine the results of that manual analysis with the frequencies obtained through the use of heuristics to obtain the overall trigger frequency estimate. The results are shown in Table 2 in the total column. We find that triggers not covered by heuristics account for estimated 4% of total cases, and 34% of unclassified triggers belong to categories for which we do have heuristics, which illustrated the noisiness of the systems used for the heuristic analysis. The final 3% consist of examples of what we consider to be AMR annotators overreaching in their pragmatic interpretation of the sentence. Consider the sentence: (15) The group said the foreign broadcasters are battering their culture and that it is insulting behavior.
In its AMR, the node insult-01 takes group as its :ARG1, making an arguably unwarranted assumption that the behavior is insulting to the group. We note that the inclusion of this type of reentrancies in AMR is controversial as it annotates be-yond what semantics should represent. Finally, 5% of the unaccounted reentrancies were due to mistakes in the AMR annotations. In the following sentence, the annotator redundantly created both an edge expressing that make-19 is the purpose of remove-01, as well as an edge showing that remove-01 is :ARG0 of make-19, leading to an unnecessary reentrancy for the remove-01 node.
(16) People were removed from their homeland to make way for the base.

Reentrancy-related Parsing Errors
We propose a method, independent from the AMR parser used, to classify the errors that AMR parsers typically make when predicting such structures. In order to identify the errors, we compare the predicted AMR graphs with the gold standard.
We use Smatch to find the best alignments between variables of the predicted and gold graph. We can then find cases where the predicted graph is either missing a reentrancy or contains an unnecessary one. Due to the aforementioned noise in the heuristics of Section 3, we did not follow the finegrained classification of linguistic causes. We instead follow a coarser structural classification of the errors. A typical reentrancy error involves the parser generating two nodes in place of one in the gold standard. This is often the case for reentrancies caused by coreference, as shown in Figure 2. The parser may not realize that two entity corefer, hence erroneously generating two different nodes. The opposite is also possible, where two nodes are erroneously collapsed.
Re-entrant edges can also occur between siblings. This is often the case for reentrancies caused by control verbs, as shown in Figure 3.

Oracle
We introduce corrections for reentrancy errors, implemented as actions that modify the edges and nodes of the predicted AMR. We then define an oracle, a deterministic method that, given a predicted AMR and the relative gold AMR, returns the set of actions that correct errors in the predicted AMR.
Let the predicted graph, containing n nodes, be defined as: and the target graph, containing m nodes, be defined as: Let A(·) be an alignment (computed using Smatch) that maps a node in V s to a node in V t , or nil if the node is not in present in V t , and A −1 (·) be an alignment that maps a node in V t to a node in V s , or nil if the node is not in present in V s . Given a source node s i , we define t i = A(s i ). We can then define the following actions: • ADD: A reentrancy edge is added (Figure 4a).
• ADD-ADDN A reentrancy edge and a node are added (Figure 4b).
• REMOVE A reentrancy edge is removed (Figure 4c). • REMOVE-RMN A reentrancy edge and a node are removed (Figure 4d). • MERGE Two nodes are merged (Figure 5a). • MERGE-RMN Two nodes are merged and a node is removed (Figure 5b). • SPLIT A node is split in two already existing nodes (Figure 5c). • SPLIT-ADDN A node is split in one existing node and a new node (Figure 5d). • ADD-SIB An edge between siblings is added ( Figure 6a). • ADD-SIB-ADDN A node is added and an edge with one of its siblings is added (Figure 6b). • REMOVE-SIB An edge between siblings is removed ( Figure 6c). • REMOVE-SIB-RMN An edge between siblings and one of the siblings are removed (Figure 6d).
In order to identify the errors and generate the respective oracle actions, we use Smatch to align the variables of predicted and gold graphs. For instance, for the action ADD (Figure 4a), we identify three variables s a , s b , s c and the aligned variable in the target graph t a , t b , t c such that: When such a pattern is found, the oracle algorithm determines that an edge between the siblings has to be created:   The definition of all actions is reported in Appendix A. We also consider the combination of all actions (ALL). We do so by correcting one error type at the time in a pre-determined order: 7 for each error type, we re-run the oracle to find all errors after   Lyu and Titov (2018) of all actions on the test split of LDC2015E86 and LDC2017T10. Freq. is the number of times the action could be applied, Smatch is the parsing score and Reent. is the reentrancies prediction score. ALL is the combination of all actions. VANILLA are the scores obtained by the original parsers. In parentheses, we report the standard deviation of the actions' frequency. The standard deviation for the Smatch and reentrancy prediction scores is less or equal than 0.12. Figure 6: Actions to solve errors due to reentrancies between siblings.

REMOVE-RMN
of the subgraphs containing reentrancies. 8 We experiment with the parser of Lyu and Titov (2018) on the test set of LDC2015E86 and LDC2017T10. We rely on Smatch to identify the errors. Because Smatch is randomized, different runs can identify different errors to correct. To account for this, we compute the mean and standard deviation of three runs.
Results are shown in Table 3. 9 While the largest improvements are observed when correcting all error types, the most relevant single oracle action is ADD. For this action, we obtain considerable improvements for both corpora, especially for reentrancy prediction (increase by 10.4 and 10.3 points), but also for Smatch (increase by 1.7 points for both corpora). The ADD corrections provide more than half of the reentrancy score improvement provided by ALL corrections, and slightly less than half of the Smatch improvement.
Because of the use of noisy alignment in oracle action prediction, the oracle provides a lower bound estimate of the possible gains. Overall, we argue that the room for improvement is large enough to warrant more careful treatment of reentrancies, either during training or as a postprocessing step.   Lyu and Titov (2018).

Automatic Error Correction
We further provide baseline systems that learn when to apply ADD, the most impactful action. First, we experiment with a system that randomly selects two nodes in the predicted graph that are not connected by any edge and add an edge with ARG0, the most frequent label. We also train a OpenNMT-py (Klein et al., 2017) sequence-tosequence model (Bahdanau et al., 2015) with a copy mechanism (Gulcehre et al., 2016). The input sequence is the predicted graph and the output sequence is the sequence of edges to add. For each edge, the output contains three tokens: the parent node, the child node, and the edge label. Table 4 shows that the baselines do not improve the predictions of the original parsers (VANILLA). While sequence modeling of the output is convenient, other options can be attempted. We are also only exploiting the input AMR parse but not the input sentence. We leave it to future work to address these issues and achieve better results.

Related Work
Our classification of phenomena causing reentrancies extends previous work in this direction (Groschwitz et al., 2017). van Noord and Bos (2017) previously attempted to improve the prediction of reentrancies in a neural parser. They experiment with several pre-and post-processing techniques and showed that co-indexing reentrancies nodes in the AMR annotations yields the best results. Transformation-based learning (Brill, 1993) inspired the idea of correcting existing parses. This approach has been mostly used for tagging (Ramshaw and Marcus, 1999;Brill, 1995;Nguyen et al., 2016) but it has also shown promises for semantic parsing (Jurčíček et al., 2009). A similar approach has been also used to add empty nodes in constituent parses (Johnson, 2002), with considerable success. The SEQ2SEQ baseline is an adaptation of the popular sequenceto-sequence modeling (Bahdanau et al., 2015).
An alternative approach to reduce reentrancy errors is to better inform training so that the errors are avoided in the first place. A recent AMR parser (Zhang et al., 2019) outperforms the previous state of the art (Lyu and Titov, 2018) by implementing a copy mechanism aimed at recovering reentrancies, confirming that reentrancies are critical for achieving good AMR parsing performance.

Conclusions
Building upon previous observations that AMR parsers do not perform well at recovering reentrancies, we analyzed the linguistic phenomena responsible for reentrancies in AMR. We found sources of reentrancies which have not been acknowledged in the AMR literature such as adjunct control, verbalization, and pragmatics. The inclusion of reentrancies due to pragmatics is controversial; we hope that this work can spur new discussions on the role of reentrancies. Our heuristics fail to detect the causes of many reentrancies.
For a more precise estimate of the most common causes of reentrancies, it is necessary to manually annotate the reentrancies in the AMR corpora.
Our oracle experiments show that there is room for improvement in predicting reentrancies, which in turn can translate to better parsing results. Stronger baselines that can learn how to correct the errors automatically are left to future work. While the parser we experimented with no longer gives state-of-the-art results (but also not far from them), newer parsers (Zhang et al., 2019;Cai and Lam, 2020) also report relatively low accuracy on reentrancies (using the metrics from Damonte et al. 2017), and as such we believe our work is relevant to these parsers.
Unions Horizon 2020 research and innovation programme, under grant agreement No. 742137.