(Re)introducing Regular Graph Languages

,


Introduction
NLP systems for machine translation, summarisation, paraphrasing, and other problems often fail to preserve the compositional semantics of sentences and documents because they model language as bags of words, or at best syntactic trees. To preserve semantics, they must model semantics. In pursuit of this goal, several datasets have been produced which pair natural language with compositional semantic representations in the form of directed acyclic graphs (DAGs), including the Abstract Meaning Represenation Bank (AMR; Banarescu et al. 2013), the Prague Czech-English Dependency Treebank (Hajič et al., 2012), Deepbank (Flickinger et al., 2012), and the Universal Conceptual Cognitive Annotation (Abend and Figure 1: Semantic machine translation using AMR (Jones et al., 2012). The edge labels identify 'cat' as the subject of the verb 'miss', 'Anna' as the object of 'miss' and 'Anna' as the possessor of 'cat'. Rappoport, 2013). To make use of this data, we require probabilistic models of graphs. Consider how we might use compositional semantic representations in machine translation ( Figure 1). We first parse a source sentence to its semantic representation, and then generate a target sentence from this representation. To do this practically, we must be able to compose a stringto-graph model with a graph-to-string model, and we must be able to compute the probability of this composition. To compose the models, we need to be able to compute the intersection of the graph domains of each model. Hence, we must be able to define probability distributions over the graph domains and efficiently compute their intersection.
For NLP problems in which data is in the form of strings and trees, such distributions can be represented by finite automata (Mohri et al., 2008;Allauzen et al., 2014), which are closed under intersection and can be made probabilistic. It is therefore natural to ask whether there is a family of graph languages with similar properties to finite automata. Recent work in NLP has focused primarily on two families of graph languages: hyperedge replacement languages (HRL; Drewes et al. 1997), a context-free graph rewriting formalism that has been studied in an NLP context by several researchers (Chiang et al., 2013;Peng et al., 2015;Bauer and Rambow, 2016); and DAG automata languages, (DAGAL; Kamimura and Slutzki 1981), studied by (Quernheim and Knight, 2012). (Thomas, 1991) showed that the latter are a subfamily of the monadic second order languages (MSOL), which are of special interest to us, since, when restricted to strings or trees, they exactly characterise the recognisable-or regular-languages of each (Büchi, 1960;Büchi and Elgot, 1958;Trakhtenbrot, 1961).
The HRL and MSOL families are incomparable: that is, the context-free graph languages do not contain the recognisable graph languages, as is the case in languages of strings and trees (Courcelle, 1990). So, while each formalism has appealing characteristics, neither appear adequate for the problem outlined above: HRLs can be made probabilistic, but they are not closed under intersection; and while DAGAL and MSOL are closed under intersection, it is unclear how to make them probabilistic (Quernheim and Knight, 2012). 1 This situation suggests that we might want a family of languages that is a subfamily of both HRL and MSOL. Courcelle (1991) defines all such languages as the family of strongly contextfree languages (SCFL). 2,3 Unfortunately, SCFLs are defined non-constructively, but Courcelle (1991) exhibits a constructively-defined subfamily: Regular Graph Languages (RGL), defined as a restricted form of HRL, which Courcelle demonstrates is also in MSOL.
Recently, two new graph grammar formalisms have been defined which are also restricted forms of HRL: Tree-like Grammars (TLG; Matheja et al. 2015) and Restricted DAG Grammars (RDG; Björklund et al. 2016). TLGs are claimed to be in SCFL, but the relationship of RDG to SCFL is unknown. The grammar restrictions of TLGs, RDGs and RGGs are incomparable, but they share important characteristics, which we discuss in §5.
This paper provides an accessible proof that RGL are a subfamily of MSOL, since only a sketch is provided in Courcelle (1991). Our aim in studying the proof is to gain insights into the re-1 Semiring-weighted MSOLs have been defined, where weights may be in the tropical semiring (Droste and Gastin, 2005). However, for the weights to define a probability distribution, they must meet the stronger condition that the sum of multiplied weights over all definable objects is one. This does not appear to have been demonstrated for DAGAL, which violate the sufficient conditions that (Booth and Thompson, 1973) give for probabilistic languages. We suspect that there are DAGAL (hence MSOL) for which it is not possible. 2 Courcelle's definition of strongly context-free is unrelated to use of this term in NLP. 3 The equality of SCFL and MSOL ∩ HRL was recently proved by (Bojanczyk and Pilipczuk, 2016 Figure 2: Containment relationships for families of regular and context-free string and tree languages, hyperedge replacement languages (HRL), monadic second order definable graph languages (MSOL), directed acyclic graph automata languages (DAGAL), and the regular graph languages (RGL). * indicates that the family of languages is probabilistic and † indicates that the family of languages is intersectible.
lationship of RGL, TLG, and RDG, which might enable us to define more general classes of graph languages that are also within SCFL. Our discussion emphasises points at which Courcelle's proof relies on particular restrictions of RGL, and is intended to highlight the places where relaxations of these restrictions may be possible. Figure 2 summarises the relationship of RGL to other formalisms and their properties. The proof of each Lemma, Proposition and Theorem in this paper that does not appear here is provided in full in the supplementary materials. 4

Monadic Second-Order Logic
The regular string and tree languages precisely coincide with the monadic second-order logic (MSO) definable sets of strings and trees, respectively (Büchi, 1960;Büchi and Elgot, 1958;Trakhtenbrot, 1961), so it is natural to look at MSO over graphs. We use the following notation. If n is an integer, [n] denotes the set {1, . . . , n}. If A is a set, s ∈ A * denotes that s is a sequence of arbitrary length, each element of which is in A. We denote by |s| the length of s. A ranked alphabet is an alphabet A paired with an arity function rank: A → N.
maps each edge to a sequence of nodes; lab G : E G → Γ maps each edge to a label such that |att G (e)| = rank(lab G (e)); and ext G is an ordered subset of V G called the external nodes of G. We assume that both the elements of ext G and the elements of att G (e) for each edge e are pairwise distinct. An edge e is attached to its nodes by tentacles, each labeled by an integer indicating the node's position in att G (e) = (v 1 , . . . , v k ). The tentacle from e to v i has label i, so the tentacle labels lie in the set [k] where k = rank(e). To express that a node v is attached to the i-th tentacle of an edge e we say vert(e, i) = v. The nodes in ext G are labeled by their position in ext G . In figures, the i-th external node is labeled (i). The rank of an edge e is k if att(e) = (v 1 , . . . , v k ) (or equivalently, rank(lab(e)) = k). The rank of a hypergraph G is |ext G |. An induced subgraph of a hypergraph G by edges E ⊆ E G is the subgraph of G formed by including all edges in E and their endpoints. Define HG Σ,Γ to be the set of all hypergraphs with node labels in Σ and edge labels in Γ (the hypergraphs as defined here have no node labels so are in HG ∅,Γ ).
Example 1. Hypergraph G in Figure 3 has four nodes (shown as black dots) and three hyperedges labeled a, b, and X (shown boxed). The bracketed numbers (1) and (2) denote its external nodes and the numbers between edges and the nodes are tentacle labels. Call the top node v 1 and, proceeding clockwise, call the other nodes v 2 , v 3 , and v 4 . Call its edges e 1 , e 2 and e 3 . Its definition would state: . MSO on graphs quantifies over nodes, sets of nodes, edges, and sets of edges. 5 The atomic formulas are x ∈ X, x = y, lab γ (x), and vert(x, i) = y. We construct MSO sentences using the atomic formulas, connectives ∧, ∨, ¬, ⇒, and quantifiers ∃, ∀. We allow vert(x, i) = y to hold only when x is an edge and y is a node. In the case of edge-labelled graphs, the x in lab γ (x) must be an edge. We define the formula edg(x, y 1 , . . . , y k ) : vert(x, 1) = y 1 ∧ . . . vert(x, k) = y k ∧ k >k ∀y¬vert(x, k ) = y which expresses att(x) = (y 1 , . . . , y k ).
We can write down an MSO formula to express that sets X 1 , . . . , X n partition the domain. PART(X 1 , . . . , X n ): We use ! to denote unique existential quantification. For any formula R: We can define an MSO statement expressing that the graph is a string by defining an edge labelled graph where the edges have rank 2, there is exactly one node with no incoming edge, there is exactly one node with no outgoing edge, and every node has at most one incoming edge and at most one outgoing edge: Let First(x) denote that x has no incoming edges and Last(x) denote that x has no outgoing edges.
Example 2. Let A be the automaton in Figure 4. The corresponding MSO quantifies over a subset X i for each state q i in the automaton. The subsets partition the nodes of the string graph to simulate a run of the automaton.
Finally, we encode each transition of the form From A, we construct the formula aut A : For a graph G and an MSO statement φ we say that G |= φ (or G satisfies φ) when there is an assignment of variables of φ to nodes and edges of G that makes φ true. Example 3. The string graph G = aaba as shown in Figure 5 can be produced by automaton A. The letters are edge labels and call its nodes from left to Let aut A (X 0 , X 1 ) be the MSO formula identical to aut A with ∃X 0 ∃X 1 removed from the beginning of the formula. X 0 and X 1 are free variables of aut A , and we refer to the set of free variables of a formula as its parameters. Given a graph G and a formula φ(W) with parameters W, let α be function from W to subsets of nodes and edges in G. Then we say that (G, α) |= φ(W) if G and α satisfy φ(W). We call α a parameter assignment. The MSO interpretation of an automaton is satisfied if we can find a parameter assignment that simulates a run of the automaton-more precisely, G |= aut A if (G, α) |= aut A (X 0 , X 1 ). In general, there may be more than one such α.
We can use an MSO statement φ to define a language, L(φ) = {G | G |= φ}, and we call the family of languages definable this way as MSOL. We define the intersection of two lan- This clearly shows that MSO languages are closed under intersection.

MSO Transductions
One way to show that a language is MSO definable is to use the backwards translation theorem (Courcelle and Engelfriet, 2011), which depends on MSO transductions (MSOT), a generalisation of finite-state string and tree transductions. The theorem is a generalisation to graphs of the fact that regular string and tree languages are closed under inverse finite-state transductions (Hopcroft and Ullman, 1979;?).
Theorem 1 (Backwards Translation Theorem). If L is an MSO definable graph language and f is an MSO graph transduction then f −1 (L) is effectively MSO definable.
W is a set of parameters; ρ(W) is a precondition which input graphs must satisfy; δ(x, W) is a domain formula defining the output domain (i.e. nodes); and θ r (x 1 , . . . , x N (r) , W) is a relation formula defining relationships between the elements in the output domain (i.e. edges). 6 The role of parameters here is to allow nondeterminism. Given a graph G and a parame-

Hyperedge Replacement Grammars
If f is a function and S is a set, f | S is the restriction of f to domain elements in S. If f, g are functions, f • g is their composition.
Definition 3. Let G be a hypergraph with an edge e of rank k and let H be a hypergraph also of rank k disjoint from G. The replacement of e by H is the graph Example 5. Replacement is shown in Figure 3. We denote the replacement as G[X/H] since the edge is unambiguous given its label.  (1) (1) 1 2 1 1 2 arg1 arg0 Z u : (1) 1 need Table 1: Productions of a HRG. The labels p, q, r, s, t, and u label the productions so that we can refer to them in the text. Note that Y can be rewritten either via production r or s. terminals, a finite set of productions P , and a start symbol S ∈ N . Every production in P is of the form X → H where X ∈ N is of rank k and H is a hypergraph of rank k over N and T .
A HRG G produces graphs in HG ∅,T G . In each example, we only show terminal edges of rank 2, and depict them as directed edges where the direction is determined by the tentacle labels: tentacle 1 attaches to the source and 2 attaches to the target (Table 1). For each production p : X → G, we use L(p) to refer to its left-hand side (X) and R(p) to refer to its right-hand side (G). An edge is a terminal edge if its label is terminal and a nonterminal edge if its label is nonterminal. A graph is terminal if all of its edges are labeled with terminal symbols. The terminal subgraph of a graph is the subgraph induced by its terminal edges. Let NT(p) = {e 1 , . . . , e n } be an enumeration of the nonterminal edges in R(p), let |NT(p)| be the number of nonterminal edges in R(p) and let |NT(P )| = max p∈P |NT(p)|.
Given a HRG G, we say that graph G derives graph G , denoted G → G , iff there is an edge e ∈ E G and a nonterminal X ∈ N such that lab G (e) = X and G = G[e/H], where X → H is in P . We extend the idea of derivation to its transitive closure G → * G . For every X ∈ N we also use X to denote the connected graph consisting of a single edge e with lab(e) = X and nodes (v 1 , . . . , v rank(X) ) such that att(e) = (v 1 , . . . , v rank(X) ), and we define the language L X (G) = {G | X → * G, G is terminal}. The language of G is then L(G) = L S (G). We call the family of languages that can be produced by any HRG the hyperedge replacement languages (HRL).

HRL and MSOT
Since HRGs are context-free, for each HRG G, there is an underlying regular tree grammar T G defining the derivation trees of the graphs in L(G). Each T ∈ T G has node labels in P G and edge labels in |NT(P )|. If a node has label p and R(p) has n nonterminals X 1 , . . . , X n then for each i ∈ [n], there is an i labelled edge from p to a node labelled q where L(q) = X i . The label of the root of T must be p for some p with L(p) = S. Let VAL : L(T G ) → L(G) be a mapping from derivation trees to graphs so that G = VAL(T) iff T is a derivation tree of G. Since HRGs can be ambiguous, this mapping is not injective. (Courcelle, 1991) shows that VAL is an MSO transduction. 7 This does not imply that HRLs are MSOL, since in general MSOL is not closed under MSOT. Hence an MSOT representing the inverse of VAL may not exist for an arbitrary HRG, but we later discuss a subfamily for which it does ( §4), allowing us to apply Theorem 1.
To distinguish between elements of a graph and its derivation tree, we denote a grammar by G, graph by G, derivation tree by T, derivation tree node by v, edges and nodes in productions are written with a bar (v) and nodes and edges in G are unmarked (x).
The transduction VAL preserves the terminal subgraph of every production used in a derivation and fuses nodes from different productions together in the output graph. Node fusion is determined by an equivalence relation ∼ generated by a relation ∼ 0 . Let NT(p) = (e 1 , . . . , e n ) the nonterminal edges of R(p), let NT i (p) = e i , and let ext G (i) be the ith external node of G.
Definition 5. Let G be a HRG and T be a derivation tree of G, so that G = VAL(T). Define a binary relation ∼ 0 on pairs (x, v) wherex is a node in R(p) for some p ∈ P and v is a node of T with label p. Then (x, v) ∼ 0 (ȳ, v ) iff: 1. v, v are nodes in T and v is the ith child of v in T.
We define ∼ as the reflexive, symmetric, transitive closure of ∼ 0 .
The mapping VAL translates derivation trees to graphs in two steps. First, the terminal subgraph of every instance of every production used in the derivation tree is produced in the output. Then, all equivalent nodes under ∼ are fused. (Courcelle, 1991) shows that each step is a MSOT; their composition is also a MSOT.
Example 6. Figure 6 illustrates how VAL maps a derivation tree to a graph.
The mapping VAL can be defined in terms of two finer-grained mappings. Let E P = ∪ p∈P E R(p) and V P = ∪ p∈P V R(p) . Then h e : E P × V T → E G maps a pair (ē, v) to its image e in the graph, whereē is a terminal edge in p and lab(v) = p. This mapping is one-to-one since edges cannot be fused. h v : V P × V T → V G maps a pair (x, v) to its image v, wherex is a node in p and lab(v) = p. It is not one-to-one since nodes can be fused.
Lemma 1. Let G be a HRG, and let G be a graph in L(G) with derivation tree T. Ifx andx are nodes such that h and v is an ancestor of v in T.

Regular Graph Grammars
A regular graph grammar (RGG; Courcelle 1991) is a restricted form of HRG. To explain the restrictions, we first require some definitions.
Definition 6. Given a graph G, a path in G from a node v to a node v is a sequence The endpoints v 0 and v k of an internal path can be external.

Definition 7. A HRG G is a Regular Graph
Grammar if each nonterminal in N has rank at least one and for each p ∈ P G the following hold: (C1) R(p) has at least one edge. Either it is a single terminal edge, all nodes of which are external, or each of its edges has at least one internal node.
(C2) Every pair of nodes in R(p) is connected by a terminal and internal path.
RGLs are HRLs by definition; we will prove that they are also MSOLs by constructing the inverse of VAL, a transducer from RGL graphs to their derivation trees. Since the derivation trees are MSO definable, RGLs must also be MSO definable by Theorem 1. The construction requires a unique anchor element (a node or edge) for each production in the grammar. Given an input graph, the transducer first guesses-via parameter assignment-the preimage of each edge and the set of elements whose preimages are anchors. It then checks whether the guess satisfies constraints that must be true for every derived graph: 1. It must be possible to partition the graph into a set of edge-disjoint connected subgraphs, each isomorphic to the terminal subgraph of some production.
2. For each node that is in two such subgraphs, the node must be the image of two nodes in the productions that are allowed to be fused under the grammar.
If these constraints are satisfied, the transducer outputs each guessed anchor and an edge between anchors that it identifies to be in a parent-child relationship.
Every valid parameter assignment corresponds to a different output from the transducer, and we will show that all derivation trees for any input graph in the grammar lie in this output set.
The proof of each Lemma and Proposition in this section either appears here or in the supplementary materials. The proof of Theorem 2 is provided in §4.2.1.

Anchors and Parameters
There are two types of productions in RGGs: those with a single terminal edge, all nodes of which are external; and those where each edge has an internal node. We call the former extproductions and the latter int-productions. For each int-production, we arbitrarily choose one of its internal nodes to be its anchor. For each extproduction, we choose its single terminal edge to be the anchor. By Lemma 1, this choice ensures that a pair of anchors cannot be fused, so the set of anchors in any derived graph is guaranteed to be in one-to-one correspondence with the nodes of its derivation tree. We define two sets of parameters: E and C, where E guesses preimages of edges, and C guesses anchors (which may be either nodes or edges). To define E precisely, we require some notation. Let G be an RGG, and for each p ∈ P , let T(p) = {f p,1 , . . . ,f p,|T(p)| } enumerate the terminal edges of R(p) and let γ p,j be the label of f p,j for each p ∈ P and j ∈ [|T(p)|]. Let |NT(p)| be the number of nonterminal edges in p and let |NT(P )| = max p∈P |NT(p)|. Given a node v in a derivation tree T, we say that v is an i-child if it is the ith child of some other node in T. By convention, the root node is the only 0-child.
Let G be in L(G) and let T be a derivation tree of G. For each i ∈ [0, |NT(P )|], p ∈ P and j ∈ [|T (p)|], we define a parameter E i,p,j : For each i ∈ [|NT(P )|] and p ∈ P , define , v is an i-child.} Where h = h e ∪ e v sincec p can either be an edge or a node. Let C = ∪ i,p C i,p . Let W = E ∪ C.
Example 7. Table 2 shows the productions of Table 1 with labels on each node and edge. Figure  7 shows the derivation tree and graph from Figure  6 with variable names added. We use these variable names to refer to specific nodes and edges in the text. For example, h v (c s , v 8 ) = v 1 , and h e (f u,1 , v 9 ) = e 5 . Example 8. Using the labels in Table 2 and Figure  7, we see that E 0,p,1 = {e 9 }, E 1,q,2 = {e 12 , e 14 }, and v 1 = h(c p , v 8 ) is an anchor.

Path Properties of RGLs
The precondition will exploit the properties of RGGs, particularly the properties of paths between nodes. Let G be an RGG, G ∈ L(G), and let T be a derivation tree of G. In the following, we relate paths within individual productions in P (denoted π) to paths in G (denoted λ). For each e in G, we define o(e) = (i, p, j) iff e ∈ E i,p,j .

Now let π be a path
We denote by h(π, v) the following path in G: . Note that tr(π) = tr(h(π, v)). The trace is a property that remains constant when a path is projected from a production into a graph. This projection is not one-to-one since a production can be applied several times; a trace appears in the graph once for each application of the corresponding production in a derivation. For v ∈ V T , we write π ∈ R(lab T (v)) to denote that π is a path in the production which is the label of v.
Lemma 2 (Lemma 5.5 from (Courcelle, 1991)). Let G be an RGG, G be a graph in L(G), and T be a derivation tree of G. Let λ be a path in G of the form h(π, v) for some v ∈ V T and some terminal path π ∈ R(lab T (v)). The final node of π may be internal or external but every other node must be internal. If λ is another path in G with the same trace and the same initial node as λ, then λ = λ.
Lemma 2 guarantees a unique trace for every path in a graph that is the projection of a path in a single production. By property C2 of RGGs, this guarantee must hold for at least one path from the anchor node of an int-production to every other node in the production. For ext-productions, all paths are of the form π = (ē, i,v i ), where e is the single nonterminal edge; these paths are also guaranteed unique traces.

MSO Formulas for the Precondition
Given an assignment to our parameters, we can use the path property in Lemma 2 to define some useful MSO statements. The first, ANC, relates anchors to the nodes in the graph. Throughout this section, given a derivation tree T, we will refer to α T which is the parameter assignment from W to V G ∪ E G as defined above. Lemma 3 (Lemma 5.6 from (Courcelle, 1991)). Let G be an RGG, G be a graph in L(G), and T be a derivation tree of G. For every p ∈ P , every i ∈ [0, |NT(P )|], and every nodex ∈ R(p), one can construct a formula ANC p,i,x (u, w, {W}) such that, for every u ∈ V G ∪ E G , w ∈ V G : We say that node u anchors node v if for some p, i andx, ANC p,i,x (u, v, {W}) holds. We use the fact that a node or edge anchors itself to establish its corresponding production.
The next MSO formula we construct relates pairs of anchors to each other. Since the anchors define the output domain of the transducer, the formula PAR defines the edges of the output.
Lemma 4 (Lemma 5.7 of (Courcelle, 1991)). Let G be an RGG, G be in L(G), T be a derivation tree of G, and α be the parameter assignment defined with respect to T. One can construct a formula PAR p,i,p ,i (u, w, {W}) such that, for u, w ∈ V G ∪ E G : If PAR p,i,p ,i (u, u , {W}) holds, then u will become the parent of u in the output tree. The proof of this lemma relies on C1 of RGG.
As introduced in §3, we have a binary equivalence relation ∼ over pairs of the form (x, v) wherex is a node in a production p and v is a node in the derivation tree with label p. We use this relation for the precondition of the transducer so that a pair of nodes are only fused if the grammar and derivation tree allows them to be. We project ∼ into the graph to construct a relation over anchors such that FUSE p,i,x,p ,i ,x (u, u , {W}) Lemma 5. Let G be an RGG, G be in L(G), and T be a derivation tree of G. One can construct a formula FUSE p,i,x,p ,i ,x (u, u , {W}) such that, for u, u ∈ V G ∪ E G : Example 12. From Table 2 and Figure 7, we can see that FUSE p,0,x 1 ,s,2,

The Precondition of the Transducer
Let X be in N , then P X = {p ∈ P |L(p) = X}, and an X-derivation tree is a derivation tree with respect to X as the start symbol (in this case, the root will have label in P X ). An S-derivation tree is referred to simply as a derivation tree.
Edge Requirements for all e ∈ α(E i,p,j ) e has label γ p,j (E3) there is a unique p ∈ P X such that α(E 0,p,j ) has exactly one element for each j ∈ [|T(p)|] and for every p = p, α(E 0,p ,j ) is empty for all j.
Decomposition into Subgraphs This constraint partitions the graph into a set of connected subgraphs, each of which is isomorphic to the terminal subgraph of the right-hand side of some production. The requirements are: (S1) Every node in G is attached to some edge, (S2) for each anchor u ∈ C i,p we can identify a unique edge e ∈ E i,p,j for each j ∈ |T(p)| such that u anchors all of the endpoints of e, (S3) for each edge e ∈ E i,p,j we can identify a unique anchor u ∈ C i,p such that u anchors all of the endpoints of e. SUBGRAPH i,p,j (W) : Lemma 7. Let G be an RGG and let G ∈ L(G) then for each derivation tree T of G, (G, α T ) |= SUBGRAPH(W ).

Subgraph Composition
We require that for a graph G with derivation tree T, FUSE p,i,x,p ,i ,x (u, u , {W}) holds. This part of the precondition ensures that two different ways of looking at how nodes can be fused agree with one another. The first is if a node can be anchored by two different anchors then this node must be the image of two nodes from different production applications. The second is that we have FUSE which is the equivalence relation generated by a relation over the neighbouring nodes in the derivation tree. p,x,i ,p ,x (u, u , {W}) Lemma 8. Let G be an RGG and let G ∈ L(G) then for each derivation tree T of G, there exists α T such that (G, α T ) |= SHARE(W ). Example 15. Looking at Table 2 and Figure  7, The proof of each of the above lemmas is available in the supplementary materials. In each of these proofs, we prove by induction on the size of T that (G, α T ) |= R(W) for R ∈ {EDGE, SUBGRAPH, SHARE}. In each induction, we use the equations (defined below) which express α T in terms of the parameter assignments of sub-trees of T.
Let G ∈ L X (G) and q : X → H such that H has nonterminals Y 1 , . . . , Y n and G = H[Y 1 /H 1 ] . . . [Y n /H n ]. Then H η ∈ L Yη (G) for each η ∈ [n]. Let T η be a derivation tree for H η and let α Tη be the assignment of W to the nodes and edges in H η . Then we can define α T (E) in terms of the set of α Tη (E)s: e ∈ E i,p,j if e ∈ E Hη , α Tη : e ∈ E i,p,j , i = 0 e ∈ E η,p,j if e ∈ E Hη , α Tη : e ∈ E 0,p,j e ∈ E 0,q,j if e ∈ E H , e = h e (f q,j , v 0 ).
(2) Where e = h e (f q,j , v 0 ) means that e can be uniquely identified as corresponding tof q,j since H and R(q) are isomorphic and v 0 is the root of T. For the anchor set, where c = h(c q , v 0 ).

RGLs Satisfy the Precondition
The precondition of the transducer is the conjunction of each of these formulas, ρ X (W) : EDGE X (W)∧SHARE(W)∧SUBGRAPH(W) Define ρ(W) = ρ S (W). Proposition 1. Let G be an RGG and let G ∈ L(G), then for each derivation tree T of G, there exists a parameter assignment α T such that (G, α T ) |= ρ(W).

Parsing as Transduction
The transducer is made up of three types of formulas: the precondition, the domain formulas, and the relation formulas. We have established the precondition ρ(W) and next we define the domain and relation formulas. The domain formulas define the nodes of the derivation tree and so we write node(x, {W}). The relation formulas define which output node is the ith child of another output node, written child i (x, y, {W}), and the labels of the output nodes, written lab p (x, {W}). The domain of the output for a parameter assignment α is D T where: The relation formula child r (x, y, {W}) defines the edges of the output of the transducer. We use the formula PAR p,i,p ,i (u, u , {W}) from Lemma 4, this encodes that the derivation tree node corresponding to u is the i th child of the node corresponding to u (which itself is the ith child of some other node). child i (x, y, {W}) : i,p,p (PAR p,i,p ,i (x, y, {W}) We also need to assign labels to the tree nodes which can be done via the unary relation: Example 16. Figure 8 shows the output of the transducer when it takes Figure 7 as input with α defined as in the previous examples. The domain formulas specify the existence of the 9 nodes and the relation formulas specify the edges between the nodes, labelled by PAR formulas, and the labels of the nodes, according to the C i,p sets.

Transducer Output and Derivation Trees
We will show that for each G ∈ L(G) if T is a derivation tree of G then T ∈ τ (G). We will also show that for each T ∈ τ (G), if it is a derivation tree in T G then it is a derivation tree of G.
Proposition 2. Let G be an RGG and τ be the corresponding transducer. Let G ∈ L(G) and T be a derivation tree of G. Then T ∈ τ (G).
Proposition 3. Let G be an RGG and G ∈ L(G). Let α be a parameter assignment such that (G, α) |= ρ(W). Then if T = τ (G, α) is in T G then VAL(T) = G.
Proof. Let G be an RGG and τ be the corresponding transducer. By Propositions 2 and 3, for each G ∈ L(G), τ (G) is a set which contains all of the derivation trees of G and possibly other elements none of which are derivation trees of any G ∈ L(G) where G = G. Therefore, for each G ∈ L(G), τ (G) ∩ T G = {T ∈ T G | VAL(T) = G}. Therefore, τ (L(G)) ∩ T G = {T ∈ T G | VAL(T) = G, G ∈ L(G)}.

Conclusions and Discussion
Property C1 of RGGs is used repeatedly in the proof that RGL is in MSOL. This property implies connectedness of the terminal subgraph, a property that both Tree-like Grammars (Matheja et al., 2015) and Restricted DAG Grammars (Björklund et al., 2016) share, although both of these formalisms allow nodes that are connected only to nonterminals, which is forbidden in RGG. We suspect that all three families of languages are incomparable. That these restricted forms of HRG all share the property of connectedness suggests that it may be an important property. In particular, we plan to investigate whether connectedness of terminal subgraphs implies that an HRL is in MSOL.
Languages which contain graphs of the form shown in Figure 9 are MSOL but not in RGL or TLG; hence both RGL and TLG are proper subfamilies of SCFL. Languages of this form can be produced by RDG, whose relationship to SCFL is unknown. To produce graphs like this, we must allow productions containing nonterminals that are not incident to any internal node. We would need to allow this only in certain circumstances however, as we could easily produce a language of graphs that look like the graph in Figure 9 with equal numbers of a-labelled and b-labelled edges; such languages are not MSO-definable. On a technical level, allowing such extensions would mean that PAR no longer holds. (Courcelle, 1991) dis-cusses this problem and introduces an alternative representation of derivation trees called reduced trees which enable some cases of this type to be defined in MSOL. This point requires further investigation.
Another possible extension would be to consider alternative forms of Lemma 2. Every MSO formula in the transducer depends on this lemma. We could potentially extend RGG if we can define other cases in which a path could be defined in terms of its trace and initial vertex. We intend to investigate such cases in future work.
. . . a a Figure 9: A graph where every edge is labelled a and has the same tail but each edge has a unique head.