Graph parsing with s-graph grammars

A key problem in semantic parsing with graph-based semantic representations is graph parsing , i.e. computing all possible analyses of a given graph according to a grammar. This problem arises in training synchronous string-to-graph grammars, and when generating strings from them. We present two algorithms for graph parsing (bottom-up and top-down) with s-graph grammars. On the related problem of graph parsing with hyperedge replacement grammars, our implementations outperform the best previous system by several orders of magnitude.


Introduction
The recent years have seen an increased interest in semantic parsing, the problem of deriving a semantic representation for natural-language expressions with data-driven methods. With the recent availability of graph-based meaning banks (Banarescu et al., 2013;Oepen et al., 2014), much work has focused on computing graph-based semantic representations from strings (Jones et al., 2012;Flanigan et al., 2014;Martins and Almeida, 2014).
One major approach to graph-based semantic parsing is to learn an explicit synchronous grammar which relates strings with graphs. One can then apply methods from statistical parsing to parse the string and read off the graph.  and Quernheim and Knight (2012) represent this mapping of a (latent) syntactic structure to a graph with a grammar formalism called hyperedge replacement grammar (HRG; (Drewes et al., 1997)). As an alternative to HRG, Koller (2015) introduced s-graph grammars and showed that they support linguistically reasonable grammars for graph-based semantics construction.
One problem that is only partially understood in the context of semantic parsing with explicit grammars is graph parsing, i.e. the computation of the possible analyses the grammar assigns to an input graph (as opposed to string). This problem arises whenever one tries to generate a string from a graph (e.g., on the generation side of an MT system), but also in the context of extracting and training a synchronous grammar, e.g. in EM training. The state of the art is defined by the bottomup graph parsing algorithm for HRG by , implemented in the Bolinas tool .
We present two graph parsing algorithms (topdown and bottom-up) for s-graph grammars. Sgraph grammars are equivalent to HRGs, but employ a more fine-grained perspective on graphcombining operations. This simplifies the parsing algorithms, and facilitates reasoning about them. Our bottom-up algorithm is similar to Chiang et al.'s, and derives the same asymptotic number of rule instances. The top-down algorithm is novel, and achieves the same asymptotic runtime as the bottom-up algorithm by reasoning about the biconnected components of the graph. Our evaluation on the "Little Prince" graph-bank shows that our implementations of both algorithms outperform Bolinas by several orders of magnitude. Furthermore, the top-down algorithm can be more memory-efficient in practice.

Related work
The AMR-Bank (Banarescu et al., 2013) annotates sentences with abstract meaning representations (AMRs), like the one shown in Fig. 1(a). These are graphs that represent the predicate-argument structure of a sentence; notably, phenomena such as control are represented by reentrancies in the graph. Another major graph-bank is the SemEval-2014 shared task on semantic dependency parsing dataset (Oepen et al., 2014). The primary grammar formalism currently in use for synchronous graph grammars is hyperedge replacement grammar (HRG) (Drewes et al., 1997), which we sketch in Section 4.3. An alternative is offered by Koller (2015), who introduced sgraph grammars and showed that they lend themselves to manually written grammars for semantic construction. In this paper, we show the equivalence of HRG and s-graph grammars and work out graph parsing for s-graph grammars.
The first polynomial graph parsing algorithm for HRGs on graphs with limited connectivity was presented by Lautemann (1988). Lautemann's original algorithm is a top-down parser, which is presented at a rather abstract level that does not directly support implementation or detailed complexity analysis. We extend Lautemann's work by showing how new parse items can be represented and constructed efficiently. Finally,  presented a bottom-up graph parser for HRGs, in which the representation and construction of items was worked out for the first time. It produces O((n · 3 d ) k+1 ) instances of the rules in a parsing schema, where n is the number of nodes of the graph, d is the maximum degree of any node, and k is a quantity called the tree-width of the grammar.

An algebra of graphs
We start by introducing the exact type of graphs that our grammars and parsers manipulate, and by developing some theory.
Throughout this paper, we define a graph G = (V, E) as a directed graph with edge labels from some label alphabet L. The graph consists of a finite set V of nodes and a finite set E ⊆ V ×V ×L of edges e = (u, v, l), where u and v are the nodes connected by e, and l ∈ L is the edge label. We say that e is incident to both u and v, and call the number of edges incident to a node its degree. We write u e ↔ v if either e = (u, v, l) or e = (v, u, l) for some l; we drop the e if the identity of the edge is irrelevant. Edges with u = v are called loops; we use them here to encode node labels. Given a graph G, we write n = |V |, m = |E|, and d for the maximum degree of any node in V .
If f : A B and g : A B are partial functions, we let the partial function f ∪ g be defined if for all a ∈ A with both f (a) and g(a) defined, we have f (a) = g(a). We then let is defined; and undefined otherwise.

The HR algebra of graphs with sources
Our grammars describe how to build graphs from smaller pieces. They do this by accessing nodes (called source nodes) which are assigned "public names". We define an s-graph (Courcelle and Engelfriet, 2012) as a pair SG = (G, φ) of a graph G and a source assignment, i.e. a partial, injective function φ : S V that maps some source names from a finite set S to the nodes of G. We call the nodes in φ(S) the source nodes or sources of SG; all other nodes are internal nodes. If φ is defined on the source name σ, we call φ(σ) the σ-source of SG. Throughout, we let s = |S|.
Examples of s-graphs are given in Fig. 1. We use numbers as node names and lowercase strings for edge names (except in the concrete graphs of Fig. 1, where the edges are marked with edge labels instead). Source nodes are drawn in black, with source names drawn on the inside. Fig. 1(b) shows an s-graph SG want with three nodes and four edges. The three nodes are marked as the R-, S-, and O-source, respectively. Likewise, the sgraph SG sleep in (c) has two nodes (one of which is an R-source and the other an S-source) and two edges.
We can now apply operations to these graphs. First, we can rename the R-source of (c) to an Osource. The result, denoted SG d = SG sleep [R → O], is shown in (d). Next, we can merge SG d with SG want . This copies the edges and nodes of SG d and SG want into a new s-graph; but crucially, for every source name σ the two s-graphs have in common, the σ-sources of the graphs are fused into a single node (and become a σ-source of the result). We write || for the merge operation; thus we obtain SG e = SG d || SG want , shown in (e). Finally, we can forget source names. The graph SG f = f S (f O (SG e )), in which we forgot S and O, is shown in (f). We refer to Courcelle and Engelfriet (2012) for technical details. 1 We can take the set of all s-graphs, together with these operations, as an algebra of s-graphs. In addition to the binary merge operation and the unary operations for forget and rename, we fix some finite set of atomic s-graphs and take them as constants of the algebra which evaluate to themselves. Following Courcelle and Engelfriet, we call this algebra the HR algebra. We can evaluate any term τ consisting of these operation symbols into an sgraph τ as usual. For instance, the following term encodes the merge, forget, and rename operations from the example above, and evaluates to the s-graph in Fig. 1(f). ( The set of s-graphs that can be represented as the value τ of some term τ over the HR algebra depends on the source set S and on the constants. For simplicity, we assume here that we have a constant for each s-graph consisting of a single labeled edge (or loop), and that the values of all other constants can be expressed by combining these using merge, rename, and forget.

S-components
A central question in graph parsing is how some s-graph that is a subgraph of a larger s-graph SG (a sub-s-graph) can be represented as the merge of two smaller sub-s-graphs of SG. In general, SG 1 || SG 2 is defined for any two s-graphs SG 1 and SG 2 . However, if we see SG 1 and SG 2 as subgraphs of SG, SG 1 || SG 2 may no longer be a subgraph of SG. For instance, we cannot merge the s-graphs (b) and (c) in Fig. 2 as part of the graph (a): The startpoints of the edges a and d are both A-sources and would thus become the same node (unlike in (a)), and furthermore the edge d would have to be duplicated. In graph parsing, we already know the identity of all nodes and edges in sub-s-graphs (as nodes and edges in SG), and must thus pay attention that merge operations do not accidentally fuse or duplicate them. In partic-1 Note that the rename operation of Courcelle and Engelfriet (2012) allows for swapping source assignments and making multiple renames in one step. We simplify the presentation here, but all of our techniques extend easily. Figure 2: (a) An s-graph with (b,c) some sub-sgraphs, (d) its BCCs, and (e) its block-cutpoint graph. ular, two sub-s-graphs cannot be merged if they have edges in common.
We call a sub-s-graph SG 1 of SG extensible if there is another sub-s-graph SG 2 of SG such that SG 1 || SG 2 contains the same edges as SG. An example of a sub-s-graph that is not extensible is the sub-s-graph (b) of the s-graph in (a) in Fig. 2. Because sources can only be renamed or forgotten by the algebra operations, but never introduced, we can never attach the missing edge a: this can only happen when 1 and 2 are sources. As a general rule, a sub-s-graph can only be extensible if it contains all edges that are adjacent to all of its internal nodes in SG. Obviously, a graph parser need only concern itself with sub-s-graphs that are extensible.
We can further clarify the structure of extensible sub-s-graphs by looking at the s-components of a graph. Let U ⊆ V be some set of nodes. This set splits the edges of G into equivalence classes that are separated by U . We say that two edges if we can reach f from an endpoint of e without visiting a node in U . We call the equivalence classes of E with respect to ∼ U the s-components of G and denote the scomponent that contains an edge e with [e]. In It can be shown that for any s-graph SG = (G, φ), a sub-s-graph SH with source nodes U is extensible iff its edge set is the union of a set of s-components of G with respect to U . We let an s-component representation C = (C, φ) in the s-graph SG = (G, φ ) consist of a source assignment φ : S V and a set C of s-components of G with respect to the set VS C = φ(S) ⊆ V of source nodes of φ. Then we can represent every extensible sub-s-graph SH = (H, φ) of SG by the s-component representation C = (C, φ) where C is the set of s-components of which SH consists. Conversely, we write T (C) for the unique extensible sub-s-graph of SG represented by the s-component representation C.
The utility of s-component representations derives from the fact that merge can be evaluated on these representations alone, as follows.

Boundary representations
If there is no C such that all conditions of Lemma 1 are satisfied, then T (C 1 ) || T (C 2 ) is not defined. In order to check this efficiently in the bottom-up parser, it will be useful to represent s-components explicitly via their boundary.
Consider an s-component representation C = (C, φ) in SG and let E be the set of all edges that are adjacent to a source node in VS C and contained in an s-component in C. Then we let the boundary representation (BR) β of C in the s-graph SG be the pair β = (E, φ). That is, β represents the s-components through the in-boundary edges, i.e. those edges inside the s-components (and thus the sub-s-graph) which are adjacent to a source. The BR β specifies C uniquely if the base graph SG is connected, so we write T (β) for T (C) and VS β for VS C .
In Fig. 2(a), the bold sub-s-graph is represented by β = {d, e, f, g}, {A:4, B:5} , indicating that it contains the A-source 4 and the B-source 5; and further, that the edge set of the sub-s-graph is The edge h (which is also incident to 5) is not specified, and therefore not in the sub-s-graph.
The following lemma can be shown about computing merge on boundary representations. Intuitively, the conditions (b) and (c) guarantee that the component sets are disjoint; the lemma then follows from Lemma 1.
Lemma 2. Let SG be an s-graph, and let β 1 = (E 1 , φ 1 ), β 2 = (E 2 , φ 2 ) be two boundary representations in SG. Then T (β 1 ) || T (β 2 ) is defined within SG iff the following conditions hold: (a) φ 1 ∪ φ 2 is defined and injective; (b) the two BRs have no in-boundary edges in common, i.e. E 1 ∩ E 2 = ∅; (c) for every source node v of β 1 , the last edge on the path in SG from v to the closest source node of β 2 is not an in-boundary edge of β 2 , and vice versa.
Furthermore, if these conditions hold, we have

S-graph grammars
We are now ready to define s-graph grammars, which describe languages of s-graphs. We also introduce graph parsing and relate s-graph grammars to HRGs.

Grammars for languages of s-graphs
We use interpreted regular tree grammars (IRTGs; Koller and Kuhlmann (2011)) to describe languages of s-graphs. IRTGs are a very general mechanism for describing languages over and relations between arbitrary algebras. They separate conceptually the generation of a grammatical derivation from its interpretation as a string, tree, graph, or some other object.
Consider, as an example, the tiny grammar in Fig. 3; see Koller (2015) for linguistically meaningful grammars. The left column consists of a regular tree grammar G (RTG; see e.g. Comon et al. (2008)) with two rules. This RTG describes a regular language L(G) of derivation trees (in general, it may be infinite). In the example, we can derive S ⇒ r 1 (VP) ⇒ r 1 (r 2 ), therefore we have t = r 1 (r 2 ) ∈ L(G).
We then use a tree homomorphism h to rewrite the derivation trees into terms over an algebra; in this case the HR algebra. In the example, the values h(r 1 ) and h(r 2 ) are specified in the second column of Fig. 3. We compute h(t) by substituting the variable x 1 in h(r 1 ) with h(r 2 ). The term h(t) is thus the one shown in (1). It evaluates to the s-graph SG f in Fig. 1(f). Figure 3: An example s-graph grammar.

Rule of RTG
In general, the IRTG G = (G, h, A) generates the language L(G · is evaluation in the algebra A. Thus, in the example, we have L(G) = {SG f }.
In this paper, we focus on IRTGs that describe languages L(G) ⊆ A of objects in an algebra; specifically, of s-graphs in the HR algebra. However, IRTGs extend naturally to a synchronous grammar formalism by adding more homomorphisms and algebras. For instance, the grammars in Koller (2015) map each derivation tree simultaneously to a string and an s-graph, and therefore describe a binary relation between strings and sgraphs. We call IRTGs where at least one algebra is the HR algebra, s-graph grammars.

Parsing with s-graph grammars
In this paper, we are concerned with the parsing problem of s-graph grammars. In the context of IRTGs, parsing means that we are looking for those derivation trees t that are (a) grammatically correct, i.e. t ∈ L(G), and (b) match some given input object a, i.e. h(t) evaluates to a in the algebra. Because the set P of such derivation trees may be large or infinite, we aim to compute an RTG G a such that L(G a ) = P . This RTG plays the role of a parse chart, which represents the possible derivation trees compactly.
In order to compute G a , we need to solve two problems. First, we need to determine all the possible ways in which a can be represented by terms τ over the algebra A. This is familiar from string parsing, where a CKY parse chart spells out all the ways in which larger substrings can be decomposed into smaller parts by concatenation. Second, we need to identify all those derivation trees t ∈ L(G) that map to such a decomposition τ , i.e. for which h(t) evaluates to a. In string parsing, this corresponds to retaining only such decompositions into substrings that are justified by the grammar rules.
While any parsing algorithm must address both of these issues, they are usually conflated, in that parse items combine information about the decomposition of a (such as a string span) with information about grammaticality (such as nonterminal symbols). In IRTG parsing, we take a different, more generic approach. We assume that the set D of all decompositions τ , i.e. of all terms τ that evaluate to a in the algebra, can be represented as the language D = L(D a ) of a decomposition grammar D a . D a is an RTG over the signature of the algebra. Crucially, D a only depends on the algebra and a itself, and not on G or h, because D contains all terms that evaluate to a and not just those that are licensed by the grammar. However, we can compute G a from D a efficiently by exploiting the closure of regular tree languages under intersection and inverse homomorphism; see Koller and Kuhlmann (2011) for details.
In practice, this means that whenever we want to apply IRTGs to a new algebra (as, in this paper, to the HR algebra), we can obtain a parsing algorithm by specifying how to compute decomposition grammars over this algebra. This is the topic of Section 5.

Relationship to HRG
We close our exposition of s-graph grammars by relating them to HRGs. It is known that the graph languages that can be described with s-graph grammars are the same as the HRG languages (Courcelle and Engelfriet, 2012, Prop. 4.27). Here we establish a more precise equivalence result, so we can compare our asymptotic runtimes directly to those of HRG parsers.
An HRG rule, such as the one shown in Fig. 4, rewrites a nonterminal symbol into a graph. The example rule constructs a graph for the nonterminal S by combining the graph G r in the middle (with nodes 1, 2, 3 and edges e, f ) with graphs G X and G Y that are recursively derived from the nonterminals X and Y . The combination happens by merging the external nodes of G X and G Y with nodes of G r : the squiggly lines indicate that the external node I of G X should be 1, and the external node II should be 2. Similarly the external nodes of G Y are unified with 1 and 3. Finally, the external nodes I and II of the HRG rule for S itself, shaded gray, are 1 and 3.
The fundamental idea of the HRG-to-IRTG translation is to encode external nodes as sources, and to use rename and merge to unify the nodes of the different graphs. In the example, we might say that the external nodes of G X and G Y are represented using the source names I and II, and extend G r to an s-graph by saying that the nodes 1, 2, and 3 are its I-source, III-source, and II-source respectively. This results in the expression where we write " I e → III " for the s-graph consisting of the edge e, with node 1 as I-source and 2 as III-source.
However, this requires the use of three source names (I, II, and III). The following encoding of the rule uses the sources more economically: ( This term uses only two source names. It forgets II as soon as we are finished with the node 2, and frees the name up for reuse for 3. The complete encoding of the HRG rule consists of the RTG rule S → r(X, Y) with h(r) = (3).
In the general case, one can "read off" possible term encodings of a HRG rule from its tree decompositions; see  or Def. 2.80 of Courcelle and Engelfriet (2012) for details. A tree decomposition is a tree, each of whose nodes π is labeled with a subset V π of the nodes in the HRG rule. We can construct a term encoding from a tree decomposition bottom-up. Leaves map to variables or constants; binary nodes introduce merge operations; and we use rename and forget operations to ensure that the subterm for the node π evaluates to an s-graph in which exactly the nodes in V π are source nodes. 2 In the example, we obtain (3) from the tree decomposition in Fig. 4 like this.
The tree-width k of an HRG rule is measured by finding the tree decomposition of the rule for which the node sets have the lowest maximum size s and setting k = s − 1. It is a crucial measure because Chiang et al.'s parsing algorithm is exponential in k. The translation we just sketched uses s source names. Thus we see that a HRG with rules of tree-width ≤ k can be encoded into an s-graph grammar with k + 1 source names. (The converse is also true.)

Graph parsing with s-graph grammars
Now we show how to compute decomposition grammars for the s-graph algebra. As we explained in Section 4.2, we can then obtain a complete parser for s-graph grammars through generic methods.
2 This uses the swap operations mentioned in Footnote 1. Given an s-graph SG, the language of the decomposition grammar D SG is the set of all terms over the HR algebra that evaluate to SG. For example, the decomposition grammar for the graph SG in Fig. 1(a) contains -among many othersthe following two rules: where SG f , SG e , SG b , and SG d are the graphs from Fig. 1 (see Section 3.1). In other words, D SG keeps track of sub-s-graphs in the nonterminals, and the rules spell out how "larger" sub-s-graphs can be constructed from "smaller" sub-s-graphs using the operations of the HR algebra. The algorithms below represent sub-s-graphs compactly using s-component and boundary representations.
Because the decomposition grammars in the sgraph algebra can be very large (see Section 6), we will not usually compute the entire decomposition grammar explicitly. Instead, it is sufficient to maintain a lazy representation of D SG , which allows us to answer queries to the decomposition grammar efficiently. During parsing, such queries will be generated by the generic part of the parsing algorithm. Specifically, we will show how to answer the following types of query: • Top-down: given an s-component representation C of some s-graph and an algebra operation o, enumerate all the rules C → o(C 1 , . . . , C k ) in D SG . This asks how a larger sub-s-graph can be derived from other sub-sgraphs using the operation o. In the example above, a query for SG and f R (·) should yield, among others, the rule in (4).
• Bottom-up: given boundary representations β 1 , . . . , β k and an algebra operation o, enumerate all the rules β → o(β 1 , . . . , β k ) in D SG . This asks how smaller sub-s-graphs can be combined into a bigger one using the  operation o. In the example above, a merge query for SG b and SG d should yield the rule in (5). Unlike in the top-down case, every bottom-up query returns at most one rule.
The runtime of the complete parsing algorithm is bounded by the number I of different queries to D SG that we receive, multiplied by the perrule runtime T that we need to answer each query. The factor I is analogous to the number of rule instances in schema-based parsing (Shieber et al., 1995). The factor T is often ignored in the analysis of parsing algorithms, because in parsing schemata for strings, we typically have T = O(1). This need not be the case for graph parsers. In the HRG parsing schema of , we have I = O(n k+1 3 d(k+1) ), where k is the treewidth of the HRG. In addition, each of their rule instances takes time T = O(d(k + 1)) to actually calculate the new item.
Below, we show how we can efficiently answer both bottom-up and top-down queries to D SG . Every s-graph grammar has an equivalent normal form where every constant describes an s-graph with a single edge. Assuming that the grammar is in this normal form, queries of the form β → g (resp. C → g), where g is a constant of the HRalgebra, are trivial and we will not consider them further. Table 1 summarizes our results.

Bottom-up decomposition
Forget and rename. Given a boundary representation β = (E , φ ), answering the bottom-up forget query β → f A (β ) amounts to verifying that all edges incident to φ (A) are in-boundary in β , since otherwise the result would not be extensible. This takes time O(d). We then let β = (E, φ), where φ is like φ but undefined on A, and E is the set of edges in E that are still incident to a source in φ. Computing β thus takes time O(d + s).
The rename operation works similarly, but since the edge set remains unmodified, the per-rule runtime is O(s).
A BR is fully determined by specifying the node and in-boundary edges for each source name, so there are at most O n2 d s different BRs. Since the result of a forget or rename rule is determined by the child β , this is an upper bound for the number I of rule instances of forget or rename.
We can check whether T (β 1 ) || T (β 2 ) is defined by going through the conditions of Lemma 2. The only nontrivial condition is (c). In order to check it efficiently, we precompute a data structure which contains, for any two nodes u, v ∈ V , the length k of the shortest undirected path u = v 1 ↔ . . . e ↔ v k = v and the last edge e on this path. This can be done in time O(n 3 ) using the Floyd-Warshall algorithm. Checking (c) for every source pair then takes time O(s 2 ) per rule, but because sources that are common to both β 1 and β 2 automatically satisfy (c) due to (a), one can show that the total runtime of checking (c) for all merge rules of D S G is O(n s 3 ds s).
Observe finally that there are I = O(n s 3 ds ) instances of the merge rule, because each of the O(ds) edges that are incident to a source node can be either in β 1 , in β 2 , or in neither. Therefore the runtime for checking (c) amortizes to O(s) per rule. The Floyd-Warshall step amortizes to O(1) per rule for s ≥ 3; for s ≤ 2 the node table can be computed in amortized O(1) using more specialized algorithms. This yields a total amortized per-rule runtime T for bottom-up merge of O(ds).

Top-down decomposition
For the top-down queries, we specify sub-s-graphs in terms of their s-component representations. The number I of instances of each rule type is the same as in the bottom-up case because of the one-toone correspondence of s-component and boundary representations. We focus on merge and forget queries; rename is as above.
Merge. Given an s-component representation C = (C, φ), a top-down merge query asks us to enumerate the rules C → || (C 1 , C 2 ) such that T (C 1 ) || T (C 2 ) = T (C). By Lemma 1, we can do this by using every distribution of the scomponents in C over C 1 and C 2 and restricting φ accordingly. This brings the per-rule time of topdown merge to O(ds), the maximum number of s-components in C.
Block-cutpoint graphs. The challenging query to answer top-down is forget. We will first describe the problem and introduce a data structure that supports efficient top-down forget queries.
Consider top-down forget queries on the sub-s-graph SG 1 drawn in bold in Fig. 2(a) An algorithm for top-down forget must be able to determine whether promotion of a node splits an s-component or not. To do this, let G be the input graph. We create an undirected auxiliary graph G U from G and a set U of (source) nodes. G U contains all nodes in V \U , and for each edge e that is incident to a node u ∈ U , it contains a node (u, e). Furthermore, G U contains undirected versions of all edges in G; if an edge e ∈ E is incident to a node u ∈ U , it becomes incident to (u, e) in G U instead. The auxiliary graph G {4,5} for our example graph is shown in Fig. 2(d).
Two edges are connected in G U if and only if they are equivalent with respect to U in G. Therefore, promotion of u splits s-components iff u is a cutpoint in G U , i.e. a node whose removal disconnects the graph. Cutpoints can be characterized as those nodes that belong to multiple biconnected components (BCCs) of G U , i.e. the maximal subgraphs such that any node can be removed without disconnecting a graph segment. In Fig. 2(d), the BCCs are indicated by the dotted boxes. Observe that 3 is a cutpoint and 1 is not.
For any given U , we can represent the structure of the BCCs of G U in its block-cutpoint graph. This is a bipartite graph whose nodes are the cutpoints and BCCs of G U , and a BCC is connected to all of its cutpoints; see Fig. 2(e) for the blockcutpoint graph of the example. Block-cutpoint graphs are always forests, with the individual trees representing the s-components of G. Promoting a cutpoint u splits the s-component into smaller parts, each corresponding to an incident edge of u. We annotate each edge with that part.

Forget.
We can now answer a top-down forget query C → f A (C ) efficiently from the blockcutpoint graph for the sources of C = (C, φ). We iterate over all components c ∈ C, and then over all internal nodes u of c. If u is not a cutpoint, we simply let C = (C , φ ) by making u an Asource and letting C = C. Otherwise, we also remove c from C and add the new s-components on the edges adjacent to u in the block-cutpoint graph. The query returns rules for all C that can be constructed like this.
The per-rule runtime of top-down forget is O(ds), the time needed to compute C in the cutpoint case. We furthermore precompute the blockcutpoint graphs for the input graph with respect to all sets U ⊆ V of nodes with |U | ≤ s − 1. For each U , we can compute the block-cutpoint graph and annotate its edges in time O(nd 2 s). Thus the total time for the precomputation is O(n s · d 2 s), which amortizes to O(1) per rule.
Top-down versus bottom-up. Fig. 5 compares the performance of the top-down and the bottomup algorithm, on a grammar with three source names sampled from all 1261 graphs with up to 10 nodes. Each point in the figure is the geometric mean of runtimes for all graphs with a given number of nodes; note the log-scale. We aborted the top-down parser after its runtimes grew too large.
We observe that the bottom-up algorithm outperforms the top-down algorithm, and yields practical runtimes even for nontrivial graphs. One possible explanation for the difference is that the topdown algorithm spends more time analyzing ungrammatical s-graphs, particularly subgraphs that are not connected.
Comparison to Bolinas. We also compare our implementations to Bolinas. Because Bolinas is much slower than Alto, we restrict ourselves to two source names (= treewidth 1) and sampled the grammar from 30 randomly chosen AMRs each of size 2 to 8, plus the 21 AMRs of size one. Fig. 6 shows the runtimes. Our parsers are generally much faster than in Fig. 5, due to the decreased number of sources and grammar size. They are also both much faster than Bolinas. Measuring the total time for parsing all 231 AMRs, our bottom-up algorithm outperforms Bolinas by a factor of 6722. The top-down algorithm is slower, but still outperforms Bolinas by a factor of 340.
Further analysis. In practice, memory use can be a serious issue. For example, the decomposition grammar for s=3 for AMR #194 in the corpus has over 300 million rules. However, many uses of decomposition grammars, such as sampling for grammar induction, can be phrased purely in terms of top-down queries. The top-down algorithm can answer these without computing the entire grammar, alleviating the memory problem.
Finally, we analyzed the asymptotic runtimes in Table 1 in terms of the maximum number d · s of in-boundary edges. However, the top-down parser does not manipulate individual edges, but entire s-components. The maximum number D s of scomponents into which a set of s sources can split a graph is called the s-separability of G by Lautemann (1990). We can analyze the runtime of the top-down parser more carefully as O(n s 3 Ds ds); as the dotted line in Fig. 5 shows, this predicts the runtime well. Interestingly, D s is much lower in practice than its theoretical maximum. In the "Little Prince" AMR-Bank, the mean of D 3 is 6.0, whereas the mean of 3 · d is 12.7. Thus exploiting the s-component structure of the graph can improve parsing times.

Conclusion
We presented two new graph parsing algorithms for s-graph grammars. These were framed in terms of top-down and bottom-up queries to a decomposition grammar for the HR algebra. Our implementations outperform Bolinas, the previously best system, by several orders of magnitude.
We have made them available as part of the Alto parser.
A challenge for grammar-based semantic parsing is grammar induction from data. We will explore this problem in future work. Furthermore, we will investigate methods for speeding up graph parsing further, e.g. with different heuristics.