Parsing Graphs with Regular Graph Grammars

Recently, several datasets have become available which represent natural language phenomena as graphs. Hyperedge Replacement Languages (HRL) have been the focus of much attention as a formalism to represent the graphs in these datasets. Chiang et al. (2013) prove that HRL graphs can be parsed in polynomial time with respect to the size of the input graph. We believe that HRL are more expressive than is necessary to represent semantic graphs and we propose the use of Regular Graph Languages (RGL; Courcelle 1991), which is a subfamily of HRL, as a possible alternative. We provide a top-down parsing algorithm for RGL that runs in time linear in the size of the input graph.


Introduction
NLP systems for machine translation, summarization, paraphrasing, and other tasks often fail to preserve the compositional semantics of sentences and documents because they model language as bags of words, or at best syntactic trees. To preserve semantics, they must model semantics. In pursuit of this goal, several datasets have been produced which pair natural language with compositional semantic representations in the form of directed acyclic graphs (DAGs), including the Abstract Meaning Representation Bank (AMR; Banarescu et al. 2013), the Prague Czech-English Dependency Treebank (Hajič et al., 2012), Deepbank (Flickinger et al., 2012), and the Universal Conceptual Cognitive Annotation (Abend and Rappoport, 2013). To make use of this data, we require models of graphs.
Consider how we might use compositional semantic representations in machine translation  Figure 1: Semantic machine translation using AMR (Jones et al., 2012). The edge labels identify 'cat' as the object of the verb 'miss', 'Anna' as the subject of 'miss' and 'Anna' as the possessor of 'cat'. Edges whose head nodes are not attached to any other edge are interpreted as node labels.
( Figure 1), a two-step process in which semantic analysis is followed by generation. Jones et al. (2012) observe that this decomposition can be modeled with a pair of synchronous grammars, each defining a relation between strings and graphs. Necessarily, one projection of this synchronous grammar produces strings, while the other produces graphs, i.e., is a graph grammar. A consequence of this representation is that the complete translation process can be realized by parsing: to analyze a sentence, we parse the input string with the string-generating projection of the synchronous grammar, and read off the synchronous graph from the resulting parse. To generate a sentence, we parse the graph, and read off the synchronous string from the resulting parse. In this paper, we focus on the latter problem: using graph grammars to parse input graphs. We call this graph recognition to avoid confusion with other parsing problems.
Recent work in NLP has focused primarily on hyperedge replacement grammar (HRG; Drewes et al. 1997), a context-free graph grammar formalism that has been studied in an NLP context by several researchers (Chiang et al., 2013;Peng et al., 2015;Bauer and Rambow, 2016). In particular, Chiang et al. (2013) propose that HRG could be used to represent semantic graphs, and precisely characterize the complexity of a CKY-style algorithm for graph recognition from Lautemann (1990) to be polynomial in the size of the input graph. HRGs are very expressive-they can generate graphs that simulate non-context-free string languages (Engelfriet and Heyker, 1991;Bauer and Rambow, 2016). This means they are likely more expressive than we need to represent the linguistic phenomena that appear in existing semantic datasets. In this paper, we propose the use of Regular Graph Grammars (RGG; Courcelle 1991) a subfamily of HRG that, like its regular counterparts among string and tree languages, is less expressive than context-free grammars but may admit more practical algorithms. By analogy to Chiang's CKY-style algorithm for HRG. We develop an Earley-style recognition algorithm for RGLs that is linear in the size of the input graph.

Regular Graph Languages
We use the following notation. If n is an integer, [n] denotes the set {1, . . . , n}. Let Γ be an alphabet, i.e., a finite set. Then s ∈ Γ * denotes that s is a sequence of arbitrary length, each element of which is in Γ. We denote by |s| the length of s. A ranked alphabet is an alphabet Γ paired with an arity mapping (i.e., a total function) rank: Γ → N.
Definition 1. A hypergraph (or simply graph) over a ranked alphabet Γ is a tuple G = (V G , E G , att G , lab G , ext G ) where V G is a finite set of nodes; E G is a finite set of edges (distinct from V G ); att G : E G → V * G maps each edge to a sequence of nodes; lab G : E G → Γ maps each edge to a label such that |att G (e)| = rank(lab G (e)); and ext G is an ordered subset of V G called the external nodes of G.
We assume that the elements of ext G are pairwise distinct, and the elements of att G (e) for each edge e are also pairwise distinct. An edge e is attached to its nodes by tentacles, each labeled by an integer indicating the node's position in att G (e) = (v 1 , . . . , v k ). The tentacle from e to v i will have label i, so the tentacle labels lie in the set [k] where k = rank(e). To express that a node v is attached to the ith tentacle of an edge e, we say vert(e, i) = v. Likewise, the nodes in ext G are labeled by their position in ext G . We refer to the ith external node of G by ext G (i) and in figures this will be labeled (i). The rank of an edge e is k if att(e) = (v 1 , . . . , v k ) (or equivalently, rank(lab(e)) = k). The rank of a hypergraph G, denoted by rank(G) is the size of ext G . Example 1. Hypergraph G in Figure 2 has four nodes (shown as black dots) and three hyperedges labeled a, b, and X (shown boxed). The bracketed numbers (1) and (2) denote its external nodes and the numbers between edges and the nodes are tentacle labels. Call the top node v 1 and, proceeding clockwise, call the other nodes v 2 , v 3 , and v 4 . Call its edges e 1 , e 2 and e 3 . Its definition would state att G (e 1 ) = ( Definition 2. Let G be a hypergraph containing an edge e with att G (e) = (v 1 , . . . , v k ) and let H be a hypergraph of rank k with node and edge sets disjoint from those of G. The replacement of e by H is the graph Example 2. A replacement is shown in Figure 2.

Hyperedge Replacement Grammars
Definition 3. A hyperedge replacement grammar G = (N G , T G , P G , S G ) consists of ranked (disjoint) alphabets N G and T G of nonterminal and terminal symbols, respectively, a finite set P G of productions, and a start symbol S G ∈ N G . Every production in P G is of the form X → G where G is a hypergraph over N G ∪ T G and rank(G) = rank(X).
For each production p : X → G, we use L(p) to refer to X (the left-hand side of p) and R(p) to refer to G (the right-hand side of p). An edge is a terminal edge if its label is terminal and a nonterminal edge if its label is nonterminal. A graph is a terminal graph if all of its edges are terminal. The terminal subgraph of a graph is the subgraph consisting of all terminal edges and their incident nodes.
Given a HRG G, we say that graph G immediately derives graph G , denoted G → G , iff there is an edge e ∈ E G and a nonterminal X ∈ N G such that lab G (e) = X and G = G[e/H], where X → H is in P G . We extend the idea of immediate derivation to its transitive closure G → * G , and say here that G derives G . For every X ∈ N G we also use X to de-  (1) 1 need Table 1: Productions of a HRG. The labels p, q, r, s, t, and u label the productions so that we can refer to them in the text. Note that Y can rewrite in two ways, either via production r or s. note the graph consisting of a single edge e with lab(e) = X and nodes (v 1 , . . . , v rank(X) ) such that att G (e) = (v 1 , . . . , v rank(X) ), and we define the language L X (G) as We call the family of languages that can be produced by any HRG the hyperedge replacement languages (HRL).
We assume that terminal edges are always of rank 2, and depict them as directed edges where the direction is determined by the tentacle labels: the tentacle labeled 1 attaches to the source of the edge and the tentacle labeled 2 attaches to the target of the edge.
Example 3. Table 1 shows a HRG deriving AMR graphs for sentences of the form 'I need to want to need to want to ... to want to go'. Figure 3 is a graph derived by the grammar. The grammar is somewhat unnatural, a point we will return to ( §4).
We can use HRGs to generate chain graphs  Table 1.
Figure 4: A HRG producing the string language a n b n .
(strings) by restricting the form of the productions in the grammars. Figure 4 shows a HRG that produces the context-free string language a n b n . HRGs can simulate the class of mildly context-sensitive languages that is characterized, e.g., by linear context-free rewriting systems (LCFRS; Vijay-Shanker et al. 1987), where the fan-out of the LCFRS will influence the maximum rank of nonterminal required in the HRG, see (Engelfriet and Heyker, 1991).

Regular Graph Grammars
A regular graph grammar (RGG; Courcelle 1991) is a restricted form of HRG. To explain the restrictions, we first require some definitions.
Definition 4. Given a graph G, a path in G from a node v to a node v is a sequence Note that the endpoints v 0 and v k of an internal path can be external.
Definition 5. A HRG G is a Regular Graph Grammar (or simply RGG) if each nonterminal in N G has rank at least one and for each p ∈ P G the following hold: (C1) R(p) has at least one edge. Either it is a single terminal edge, all nodes of which are external, or each of its edges has at least one internal node.
(C2) Every pair of nodes in R(p) is connected by a terminal and internal path.
Example 4. The grammar in Table 1 is an RGG. Although HRGs can produce context-free languages (and beyond) as shown in Figure 4, the only string languages RGGs can produce are the regular string languages. See Figure 5 for an example of a string generating RGG. Similarly, RGGs can produce regular tree languages, but not context-free tree languages. Figure 6 shows a tree generating RGG that generates binary trees the internal nodes of which are represented by a-labeled edges, and the leaves of which are represented by b-labeled edges. Note that these two results of regularity of the string-and tree-languages generated by RGG follow from the fact that graph languages produced by RGG are MSO-definable (Courcelle, 1991), and the well-known facts that the regular string and graph languages are MSO-definable. We call the family of languages generated by RGGs the regular graph languages (RGLs).

RGL Recognition
To recognize RGG, we exploit the property that every nonterminal including the start symbol has rank at least one (Definition 5), and we assume that the corresponding external node is identified in the input graph. This mild assumption may be reasonable for applications like AMR parsing, where grammars could be designed so that the external node is always the unique root. Later we relax this assumption.
The availability of an identifiable external node suggests a top-down algorithm, and we take in-spiration from a top-down recognition algorithm for the predictive top-down parsable grammars, another subclass of HRG (Drewes et al., 2015). These grammars, the graph equivalent of LL(1) string grammars, are incomparable to RGG, but the algorithms are related in their use of top-down prediction and in that they both fix an order of the edges in the right-hand side of each production.

Top-Down Recognition for RGLs
Just as the algorithm of Chiang et al. (2013) generalizes CKY to HRG, our algorithm generalizes Earley's algorithm (Earley, 1970). Both algorithms operate by recognizing incrementally larger subgraphs of the input graph, using a succinct representation for subgraphs that depends on an arbitrarily chosen marker node m of the input graph. For each production p of the grammar, we impose a fixed order on the edges of R(p), as in Drewes et al. (2015). We discuss this order in detail in §3.2. As in Earley's algorithm, we use dotted rules to represent partial recognition of productions: X →ē 1 . . .ē i−1 ·ē i . . .ē n means that we have identified the edgesē 1 toē i−1 and that we must next recognize edgeē i . We writeē and v for edges and nodes in productions and e and v for edges and nodes in a derived graph. When the identity of the sequence is immaterial we abbreviate it as α, for example writing X → · α.
We present our recognizer as a deductive proof system (Shieber et al., 1995). The items of the Name Rule Conditions recognizer are of the form where I is a subgraph that has been recognized as matchingē 1 , . . . ,ē i−1 ; p : X →ē 1 , . . . ,ē n is a production in the grammar with the edges in order; and φ p : E R(p) → V * G maps the endpoints of edges in R(p) to nodes in G.
For each production p, we number the nodes in some arbitrary but fixed order. Using this, we construct the function φ 0 As we match edges in the graph with edges in p, we assign the nodesv to nodes in the graph. For example, if we have an edgeē in a production p such that att(ē) = (v 1 ,v 2 ) and we find an edge e which matchesē, then we update φ p to record this fact, written φ p [att(ē) = att(e)]. We also use φ p to record assignments of external nodes. If we assign the ith external node to v, we write φ p [ext p (i) = v]. We write φ 0 p to represent a mapping with no grounded nodes.
Since our algorithm makes top-down predictions based on known external nodes, our boundary representation must cover the case where a subgraph is empty except for these nodes. If at some point we know that our subgraph has external nodes φ(ē), then we use the shorthand φ(ē) rather than the full boundary representation φ(ē), ∅, m ∈ φ(ē) .
To keep notation uniform, we use dummy nonterminal S * ∈ N G that derives S G via the production p 0 . For graph G, our system includes the axiom: Our goal is to prove: where φ p S has a single edgeē in its domain which has label S G in R(p S ) and φ p S (ē) = ext G .
As in Earley's algorithm, we have three inference rules: PREDICT, SCAN and COMPLETE (Table 2). PREDICT is applied when the edge after the dot is nonterminal, assigning any external nodes that have been identified. SCAN is applied when the edge after the dot is terminal. Using φ p , we may already know where some of the endpoints of the edge should be, so it requires the endpoints of the scanned edge to match. COMPLETE requires that each of the nodes ofē i in R(p) have been identified, these nodes match up with the corresponding external nodes of the subgraph J, and that the subgraphs I and J are edge-disjoint. We provide a high-level proof that the recognizer is sound and complete.
Proposition 1. Let G be a HRG and G a graph.
Proof. We prove that for each X ∈ N G , and only if G ∈ L X (G) where the dummy nonterminal X * was added to the set of nonterminals and p X : X * → X was added to the set of productions. We prove this by induction on the number of edges in G.
We assume that each production in the grammar contains at least one terminal edge. If the HRG is not in this form, it can be converted into this form and in the case of RGGs they are already in this form by definition.
Base Case: Let G consist of a single edge. If: Assume G ∈ L X (G). Since G consists of one edge, there must be a production q : X → G. Apply PREDICT to the axiom and p X : X * → X to obtain the item [φ p X (X), q : . Apply SCAN to the single terminal edge that makes up G to obtain [b(G), q : X → G · , φ q ] and finally apply COMPLETE to this and the axiom reach the goal Only if: Assume the goal can be reached from the axiom and G = e. Then the item [b(e), q : X → e, φ q ] must have been reached at some point for some q ∈ P G . Therefore q : X → e is a production and so e = G ∈ L X (G).
Assumption: Assume that the proposition holds when G has fewer than k edges. Inductive Step: Assume G has k edges. If: Assume G ∈ L X (G), then there is a production q : X → H where H has nonterminals Y 1 , . . . , Y n and there are graphs H 1 , . . . , H n such has fewer than k edges and so we apply the inductive hypothesis to show that we can prove the items . By applying COMPLETE to each such item and applying SCAN to each terminal edge of H we reach the goal [b(G), p X : X * → X · , φ p X ].
Only If: Assume the goal can be proved from the axiom. Then we must have at some point reached an item of the form [b(G), q : X → H, φ q ] and that H has nonterminals Y 1 , . . . , Y n . This means that there are graphs H 1 , . . . , H n such Since each H i has fewer than k edges, we apply the inductive hypothesis to get that H i ∈ L Y i (G) for each i ∈ [n] and therefore G ∈ L X (G).
Example 5. Using the RGG in Table 1, we show how to recognize the graph in Figure 7, which can be derived by applying production s followed by production u, where the external nodes of Y are (v 3 , v 2 ). Assume the ordering of the edges in production s is arg1, arg0, Z; the top node isv 1 ; the bottom node isv 2 ; and the node on the right isv 3 ; and that the marker node is not in this subgraphwe elide reference to it for simplicity. Letv 4 be the top node of R(u) andv 5 be the bottom node of R(u). The external nodes of Y are determined top-down, so the recognize of this subgraph is triggered by this item: and φ s (Z) = (v 1 ). Table 3 shows how we can prove the item The boundary representation {v 3 , v 2 }, {e 3 , e 2 } in this item represents the whole subgraph shown in Figure 7.   Figure 3. To refer to nodes and edges in the text, they are labeled v1, v2, v3, e1, e2, and e3.

Normal Ordering
Our algorithm requires a fixed ordering of the edges in the right-hand sides of each production. We will constrain this ordering to exploit the structure of RGG productions, allowing us to bound recognition complexity. If s =ē 1 . . .ē n is an order, define s i:j =ē i . . .ē j .
Definition 7. Let s =ē 1 , . . . ,ē n be an edge order of a right-hand side of a production. Then s is normal if it has the following properties: 1.ē 1 is connected to an external node, 2. s 1:j is a connected graph for all j ∈ [n] 3. ifē i is nonterminal, each endpoint ofē i must be incident with some terminal edgeē j for which j < i.
Example 6. The ordering of the edges of production s in Example 5 is normal.
Arbitrary HRGs do not necessarily admit a normal ordering. For example, the graph in Figure 8 cannot satisfy Properties 2 and 3 simultaneously. However, RGGs do admit a normal ordering.  SCAN: 4. and e1 = edg need (v1, v4) COMPLETE: 3. and 5. Table 3: The steps of recognizing that the subgraph shown in Figure 7 is derived from productions r2 and u in the grammar in Table 1.
Proposition 2. If G is an RGG, for every p ∈ P G , there is a normal ordering of the edges in R(p).
Proof. If R(p) contains a single node then it must be an external node and it must have a terminal edge attached to it since R(p) must contain at least one terminal edge. If R(p) contains multiple nodes then by C2 there must be terminal internal paths between all of them, so there must be a terminal edge attached to the external node, which we use to satisfy Property 1. To produce a normal ordering, we next select terminal edges once one of their endpoints is connected to an ordered edge, and nonterminal edges once all endpoints are connected to ordered edges, possible by C2. Therefore, Properties 2 and 3 are satisfied.
A normal ordering tightly constrains the recognition of edges. Property 3 ensures that when we apply PREDICT, the external nodes of the predicted edge are all bound to specific nodes in the graph. Properties 1 and 2 ensure that when we apply SCAN, at least one endpoint of the edge is bound (fixed).

Recognition Complexity
Assume a normally-ordered RGG. Let the maximum number of edges in the right-hand side of any production be m; the maximum number of nodes in any right-hand side of a production k; the maximum degree of any node in the input graph d; and the number of nodes in the input graph n.
As previously mentioned, Drewes et al. (2015) also propose a HRG recognizer which can recognize a subclass of HRG (incomparable to RGG) called the predictive top-down parsable grammars. Their recognizer in this case runs in O(n 2 ) time. A well-known bottom-up recognizing algorithm for HRG was first proposed by Lautemann (1990).
In this paper, the recognizer is shown to be polynomial in the size of the input graph. Later, Chiang et al. (2013) formulate the same algorithm more precisely and show that the recognizing complexity is O((3 d × n) k+1 ) where k in their case is the treewidth of the grammar. Remark 1. The maximum number of nodes in any right-hand side of a production (k) is also the maximum number of boundary nodes for any subgraph in the recognizer.
COMPLETE combines subgraphs I and J only when the entire subgraph derived from Y has been recognized. Boundary nodes of J are also boundary nodes of I because they are nodes in the terminal subgraph of R(p) where Y connects. The boundary nodes of I ∪ J are also bounded by k since form a subset of the boundary nodes of I. Remark 2. Given a boundary node, there are at most (d m ) k−1 ways of identifying the remaining boundary nodes of a subgraph that is isomorphic to the terminal subgraph of the right-hand side of a production.
The terminal subgraph of each production is connected by C2, with a maximum path length of m. For each edge in the path, there are at most d subsequent edges. Hence for the k − 1 remaining boundary nodes there are (d m ) k−1 ways of choosing them.
We count instantiations of COMPLETE for an upper bound on complexity (McAllester, 2002), using similar logic to (Chiang et al., 2013). The number of boundary nodes of I, J and I ∪ J is at most k. Therefore, if we choose an arbitrary node to be some boundary node of I ∪ J, there are at most (d m ) k−1 ways of choosing its remaining boundary nodes. For each of these nodes, there are at most (3 d ) k states of their attached boundary edges: in I, in J, or in neither. The total number of instantiations is O(n(d m ) k−1 (3 d ) k ), linear in the number of input nodes and exponential in the degree of the input graph. Note that in the case of the AMR dataset (Banarescu et al. 2013), the maximum node degree is 17 and the average is 2.12.
We observe that RGGs could be relaxed to produce graphs with no external nodes by adding a dummy nonterminal S with rank 0 and a single production S → S. To adapt the recognition algorithm, we would first need to guess where the graph starts. This would add a factor of n to the complexity as the graph could start at any node.

Discussion and Conclusions
We have presented RGG as a formalism that could be useful for semantic representations and we have provided a top-down recognition algorithm for them. The constraints of RGG enable more efficient recognition than general HRG, and this tradeoff is reasonable since HRG is very expressive-when generating strings, it can express non-context-free languages (Engelfriet and Heyker, 1991;Bauer and Rambow, 2016), far more power than needed to express semantic graphs. On the other hand, RGG is so constrained that it may not be expressive enough: it would be more natural to derive the graph in Figure 4 from outermost to innermost predicate; but constraint C2 makes it difficult to express this, and the grammar in Table 1 does not. Perhaps we need less expressivity than HRG but more than RGG.

RTL Trees
RL Strings RGL DAGAL Figure 9: A Hasse diagram of various string, tree and graph language families. An arrow from family A to family B indicates that family A is a subfamily of family B.
A possible alternative would be to consider Restricted DAG Grammars (RDG; Björklund et al. 2016). Parsing for a fixed such grammar can be achieved in quadratic time with respect to the input graph. It is known that for a fixed HRG generating k-connected hypergraphs consisting of hyperedges of rank k only, parsing can be carried out in cubic time (k-HRG; (Drewes, 1993)).
More general than RDGs a is the class of graph languages recognized by DAG automata (DA-GAL; Blum and Drewes 2016), for which the deterministic variant provides polynomial time parsing. Note that RGGs can generate graph languages of unbounded node degree. With respect to expressive power, RDGs and k-HRGs are incomparable to RGGs. Figure 9 shows the relationships between the context-free and regular languages for strings, trees and graphs. Monadic-second order logic (MSOL; Courcelle and Engelfriet 2011) is a form of logic which when restricted to strings gives us exactly the regular string languages and when restricted to trees gives us exactly the regular tree languages. RGLs lie in the intersection of HRG and MSOL on graphs but they do not make up this entire intersection. Courcelle (1991) defined (non-constructively) this intersection to be the strongly context-free languages (SCFL). We believe that there may be other formalisms that are subfamilies of SCFL which may be useful for semantic representations. All inclusions shown in Figure 9 are strict. For instance, RGL cannot produce "star graphs" (one node that has edges to n other nodes), while DAGAL and HRL can produce such graphs. It is well-known that HRL and MSOL are incomparable. There is a language in RGL that is not in DAGAL, for instance, "ladders" (two string graphs of n nodes each, with an edge between the ith node of each string).
Another alternative formalism to RGG that is defined as a restriction of HRG are Tree-like Grammars (TLG; Matheja et al. 2015). They define a subclass of SCFL, i.e., they are MSO definable. TLGs have been considered for program verification, where closure under intersection of the formalism is essential. Note that RGGs are also closed under intersection. While TLG and RDG are both incomparable to RGG, they share important characteristics, including the fact that the terminal subgraph of every production is connected. This means that our top-down recognition algorithm is applicable to both. In the future we would like to investigate larger, less restrictive (and more linguistically expressive) subfamilies of SCFL. We plan to implement and evaluate our algorithm experimentally. funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1) and the University of Edinburgh; and in part by a Google faculty research award (to AL). We thank Clara Vania, Sameer Bansal, Ida Szubert, Federico Fancellu, Antonis Anastasopoulos, Marco Damonte, and the anonymous reviews for helpful discussion of this work and comments on previous drafts of the paper.