Generic Axiomatization of Families of Noncrossing Graphs in Dependency Parsing

We present a simple encoding for unlabeled noncrossing graphs and show how its latent counterpart helps us to represent several families of directed and undirected graphs used in syntactic and semantic parsing of natural language as context-free languages. The families are separated purely on the basis of forbidden patterns in latent encoding, eliminating the need to differentiate the families of non-crossing graphs in inference algorithms: one algorithm works for all when the search space can be controlled in parser input.


Introduction
Dependency parsing has received wide attention in recent years, as accurate and efficient dependency parsers have appeared that are applicable to many languages. Traditionally, dependency parsers have produced syntactic analyses in tree form, including exact inference algorithms that search for maximum projective trees (Eisner and Satta, 1999) and maximum spanning trees (McDonald et al., 2005) in weighted digraphs, as well as greedy and beamsearch approaches that forgo exact search for extra efficiency (Zhang and Nivre, 2011).
Recently, there has been growing interest in providing a richer analysis of natural language by going beyond trees. In semantic dependency parsing (Oepen et al., 2015;Kuhlmann and Oepen, 2016), the desired syntactic representations can have indegree greater than 1 (re-entrancy), suggesting the search for maximum acyclic subgraphs (Schluter, 2014(Schluter, , 2015. As this inference task is intractable (Guruswami et al., 2011), noncrossing digraphs have been studied instead, e.g. by Kuhlmann and Johnsson (2015) who provide a O(n 3 ) parser for maximum noncrossing acyclic subgraphs. Yli-Jyrä (2005) studied how to axiomatize dependency trees as a special case of noncrossing digraphs. This gave rise to a new homomorphic representation of context-free languages that proves the classical Chomsky and Schützenberger theorem using a quite different internal language. In this language, the brackets indicate arcs in a dependency tree in a way that is reminiscent to encoding schemes used earlier by Greibach (1973) and Oflazer (2003). Cubic-time parsing algorithms that are incidentally or intentionally applicable to this kind of homomorphic representations have been considered, e.g., by Nederhof and Satta (2003), Hulden (2011), andYli-Jyrä (2012).
Extending these insights to arbitrary noncrossing digraphs, or to relevant families of them, is far from obvious. In this paper, we develop (1) a linear encoding supporting general noncrossing digraphs, and (2) show that the encoded noncrossing digraphs form a context-free language. We then give it (3) two homomorphic, nonderivative representations and use the latent local features of the latter to characterize various families of digraphs.
Apart from the obvious relevance to the theory of context-free languages, this contribution has the practical potential to enable (4) generic contextfree parsers that produce different families of noncrossing graphs with the same set of inference rules while the search space in each case is restricted with lexical features and the grammar.
Outline After some background on graphs and parsing as inference (Section 2), we use an ontology of digraphs to illustrate natural families of noncrossing digraphs in Section 3. We then develop, in Section 4, the first latent contextfree representation for the set of noncrossing digraphs, then extended in Section 5 with additional latent states supporting our finite-state axiomatization of digraph properties, and allowing us to control the search space using the lexicon. The experiments in Section 6 cross-validate our axioms and sample the growth of the constrained search spaces. Section 7 outlines the applications for practical parsing, and Section 8 concludes.

Background
Graphs and Digraphs A graph is a pair (V, E) where V is a finite set of vertices and E ⊆ {{u, v} ⊆ V } is a set of edges. A sequence of edges of the form {v 0 , v 1 }, {v 1 , v 2 }, ..., {v m−1 , v m }, with no repetitions in v 1 , ..., v m , is a path between vertices v 0 and v m and empty if m = 0. A graph is a forest if no vertex has a non-empty path to itself and connected if all pairs of vertices have a path. A tree is a connected forest.
. We will focus on loopfree digraphs unless otherwise specified, and denote them just by DIGRAPH for brevity. A digraph is d-acyclic (ACYC D ), aka a dag if no vertex has a non-empty directed path to itself, uacyclic (ACYC U ) aka a m(ixed)-forest if its underlying graph is a forest, and weakly connected (w.c., CONN W ) if its underlying graph is connected.
Dependency Parsing The complete digraph G S (V, A) of a sentence S = x 1 ...x n consists of vertices V = {1, ..., n} and all possible arcs A = V × V − {(i, i)}. The vertex i ∈ V corresponds to the word x i and the arc i → j ∈ A corresponds to a possible dependency between the words x i and x j .
The task of dependency parsing is to find a constrained subgraph G S (V, A ) of the complete digraph G S of the sentence. The standard solution is a rooted directed tree called a dependency tree or a dag called a dependency graph.
Constrained Inference In arc-factored parsing (McDonald et al., 2005), each possible arc i → j is equipped with a positive weight w i j , usually computed as a weighted sum w i j = w · Φ(S, i → j) where w is a weight vector and Φ(x, i → j) a feature vector extracted from the sentence x, considering the dependency relation from word x i to word x j . Parsing then consists in finding an arc subset A ⊆ A that gives us a constrained subgraph (V, A ) ∈ Constrained(V, A) of the complete digraph (V, A) with maximum sum of arc weights: The complexity of this inference task depends on the constraints imposed on the subgraph. Under no constraints, we simply set A = A. Inference over dags is intractable (Guruswami et al., 2011). Efficient solutions are known for projective trees (Eisner, 1996), various classes of mildly nonprojective trees (Gómez-Rodríguez, 2016), unrestricted spanning trees (McDonald et al., 2005), and both unrestricted and weakly connected noncrossing dags (Kuhlmann and Johnsson, 2015).
Parsimony Semantic parsers must be able to produce more than projective trees because the share of projective trees is pretty low (under 3%) in semantic graph banks (Kuhlmann and Johnsson, 2015). However, if we know that the parses have some restrictions, it is better to use them to restrict the search space as much as possible.
There are two strategies for reducing the search space. One is to develop a specialized inference algorithm for a particular natural language or family of dags, such as weakly connected graphs (Kuhlmann and Johnsson, 2015). The other strategy is to control the local complexity of digraphs through lexical categories (Baldridge and Kruijff, 2003) or equivalent mechanisms. This strategy produces a more sensitive model of the language, but requires a principled insight on how the complexity of digraphs can be characterized.

Constraints on the Search Space
We will now present a classification of digraphs on the basis of their formal properties.
The Noncrossing Property For convenience, graphs and digraphs may be ordered like in a complete digraph of a sentence. Two edges {i, j}, {k, l} in an ordered graph or arcs i → j, k → l in an ordered digraph are said to be crossing if min{i, j} < min{k, l} < max{i, j} < max{k, l}. A graph or digraph is noncrossing if it has no crossing edges or arcs. Noncrossing (di)graphs (NC-(DI)GRAPH) are the largest possible (di)graphs that can be drawn on a circle without crossing arcs. In the following, we assume that all digraphs and graphs are noncrossing.
Ontology Fig. 1 presents an ontology of such families of loop-free noncrossing digraphs that can be distinguished by digraphs with 5 vertices.
In the digraph ontology, a multitree aka mangrove is a dag with the property of being strongly unambiguous (UNAMB S ), which asserts that, given two distinct vertices, there is at most one repeat-free path between them (Lange, 1997). 1 A polytree (Rebane and Pearl, 1987) is a multitree whose underlying graph is a tree. The out property (OUT) of a digraph (V, E) means that no vertex i ∈ V has two incoming arcs { j, k} → i s.t. j = k.  Figure 1: Basic properties split the set of 62464 noncrossing digraphs for 5 vertices into 23 classes An ordered digraph is weakly projective (PROJ W ) if for all vertices i, j and k, if k → j → i, then either {i, j} < k or {i, j} > k. In other words, the constraint, aka the outside-to-inside constraint (Yli-Jyrä, 2005), states that no outgoing arc of a vertex properly covers an incoming arc. This is implied by a stronger constraint known as Harper, Hays, Lecerf and Ihm projectivity (Marcus, 1967).
We can embed the ontology of graphs (unrestricted, connected, forests and trees) into the ontology of digraphs by viewing an undirected defines the property ORIENTED). Out forests and trees are, by convention, oriented digraphs with an underlying forest or tree, respectively. 1 A different definition forbids diamonds as minors.
Distinctive Properties A few important properties of digraphs are local and can be verified by inspecting each vertex separately with its incident arcs. These include (i) the out property (OUT), (ii) the nonstandard projectivity property (PROJ W ), (iii) the inverse property (INV) and (iv) the orientedness (or.) property.
Properties UNAMB S , ACYC D , CONN W , and ACYC U are nonlocal properties of digraphs and cannot be generally verified locally, through finite spheres of vertices (Grädel et al., 2005). The following proposition covers the configurations that we have to detect in order to decide the nonlocal properties of noncrossing digraphs.
• If G / ∈ UNAMB S , then the digraph contains one of the following four configurations or their reversals: Proposition 1 gives us a means to implement the property tests in practice. It tells us intuitively that although the paths can be arbitrarily long, any underlying cycle containing more than 2 arcs consists of one covering arc and a linear chain of edges between its end points.

The Set of Digraphs as a Language
In this section, we show that the set of noncrossing digraphs is isomorphic to an unambiguous context-free language over a bracket alphabet.

Basic Encoding
Any noncrossing ordered graph ([1, ..., n], E), even with self-loops, can be encoded as a string of brackets using the algorithm enc in Fig. 2 In this way, we can simply encode the digraph Context-Freeness Arbitrary strings with balanced brackets form a context-free language that is known, generically, as a Dyck language. It is easy to see that the graphs NC-GRAPH are encoded with strings that belong to the Dyck language D 2 generated by the context-free grammar: S → [S]S | {S}S | ε. The encoded graphs, L NC-GRAPH , are, however, generated exactly by the context-free grammar Proposition 3. The encoded graphs, L NC-GRAPH , make an unambiguous context-free language.
The practical significance of Proposition 3 is that there is a bijection between L NC-GRAPH and the derivation trees of a context-free grammar.

Bracketing Beyond the Encoding
Non-Derivational Representation A nonderivational representation for any context-free language L has been given by Chomsky and Schützenberger (1963). This replaces the stack with a Dyck language D and the grammar rules with co-occurrence patterns specified by a regular language Reg. To hide the internal alphabet from the strings of the represented language, there is a homomorphism that cleans the internal strings of Reg and D from internal markup to get actual strings of the target language: To make this concrete, replace the previous context free grammar by S →  The representation L = h(D ∩ Reg) is unambiguous if, for every word w ∈ L, the preimage h −1 (w) ∩ D ∩ Reg is a single string. This implies that L is an unambiguous context-free language.
Proposition 4. The set of encoded digraphs, L NC-DIGRAPH , has an unambiguous representation.
is an unambiguous context-free language and the same as L 1 ∩ L 2 .
Proof. It is immediate that L 3 ⊆ L 1 ∩ L 2 and L 3 is an unambiguous context-free language. To show that L 1 ∩ L 2 ⊆ L 3 , take an arbitrary s ∈ L 1 ∩ L 2 . Since R 1 , R 2 ⊆ R 0 there is a unique s ∈ h −1 (s) such that s ∈ D ∩ (R 1 ∩ R 2 ). Thus s ∈ L 3 .

Latent Bracketing
In this section, we extend the internal strings of the non-derivational representation of L NC-DIGRAPH in such a way that the configurations given in Proposition 1 can be detected locally from these.

Classification of Underlying Chains
A maximal linear chain is a maximally long sequence of one or more edges that correspond to an underlying left-to-right path in the underlying graph in such a way that no edge in this chain is properly covered by an edge that does not properly cover all the edges in the chain. For example, the graph contains six maximal linear chains, indicated with their Roman numbers on each arc. We decide nonlocal properties of noncrossing digraphs by recognizing maximal linear chains as parts of configurations presented in Proposition 1. Every loose chain (like V and VI) starts with a bracket that is adjacent to a }-bracket. Such a chain can contribute only a covering edge to an underlying cycle. In contrast, a bracket with an apostrophe marks the beginning of a non-loose chain that can either start at the first vertex, or share starting point with a covering chain. When a nonloose chain is covered, it can be touched twice by a covering edge. The prefixes of chains are classified incrementally, from left to right, with a finite automaton ( Figure 4). All states of the automaton are final and correspond to distinct classes of the chains. These classes are encoded to an extended set of brackets.  Figure 4: The finite automaton whose state 0 begins non-loose chains and state 1 loose chains The automaton is symmetric: states with uppercase names are symmetrically related with corresponding lowercase states. Thus, it suffices to define the initial and uppercase-named states: 0 the initial state for a non-loose chain; I a bidirectional chain: u ↔ (v ↔)y; A a primarily bidirectional forward chain: u ↔ v → y; F a forward chain: u → v → y; Q a primarily forward chain: u → v ↔ (· · · →)y; C a primarily forward 1-turn chain: u → v ← y; E a primarily forward 2-turn chain: u → v ← x → y; Z a 3-turn chain; 1 the initial (and only) state for a loose chain;

Recognition of ambiguous paths in configurations
To support the recognition, subtypes of edges are defined according to the chains they cover. The brackets >I', \I', >I, \I, \A, >a, \Q, >Q, >q,\q, >C, \c, \E, >e indicate edges that constitute a cycle with the chain they cover. The brackets >V', \v', >V, \v indicate edges that cover 2-turn chains. Not all states make these distinctions.

Extended Representation
The extended brackets encode the latent structure of digraphs: the orientation and the subtype of the edge and the class of the chain. The total alphabet Σ of the strings now contains the boundary brackets {} and 54 pairs of brackets ( Figure 4) for edges from which we obtain a new Dyck language, D 55 , and an extended homomorphism h lat . The Reg component of the language representation is replaced with Reg lat , that is, an intersection of (1) an inverse homomorphic image of Reg to strings over the extended alphabet, (2) a local language that constrains adjacent edges according to Figure 4, (3) a local language specifying how the chains start, and (4) a local language that distinguishes pure oriented edges from those that cover a cycle or a 2-turn chain. The new component requires only 24 states as a deterministic automaton.
Proposition 6. h lat (D 55 ∩ Reg lat ) is an unambiguous representation for L NC-DIGRAPH .
The internal language L NC-DIGRAPH lat = D 55 ∩ Reg lat is called the set of latent encoded digraphs.
Example Here is a digraph with its latent encoding: The brackets in the extended representation contain information that helps us recognize, through local patterns, that this graph has a directed cycle (an arc without inverse) INV I = Σ * − Σ * Σ or Σ * (a state with more than 2 incoming arcs) OUT (directed path 1 → 2 → 7 → 1), is strongly ambiguous (two directed paths 2 → 1 and 2 → 7 → 1) and is not weakly connected (vertices 5 and 6 are not connected to the rest of the digraph).

Expressing Properties via Forbidden Patterns
We now demonstrate that all the mentioned nonlocal properties of graphs have become local in the extended internal representation of the code strings L NC-DIGRAPH for noncrossing digraphs. These distinctive properties of graph families reduce to forbidden patterns in bracket strings and then compile into regular constraint languages. These are presented in Table 1. To keep the patterns simple, subsets of brackets are defined: all brackets for oriented edges Σ inv all brackets for inverted edges

Validation Experiments
The current experiments were designed (1) to help in developing the components of Reg lat and the constraint languages of axiomatic properties, (2) to validate the representation, the constraint languages and their unambiguity, (3) to learn about the ontology and (4) to sample the integer sequences associated with the cardinality of each family in the ontology.

Finding the Components Representations of
Reg lat were built with scripts written using a finitestate toolkit (Hulden, 2009) that supports rapid exploration with regular languages and transducers.
Validation of Languages Our scripts presented alternative approaches to compute languages of encoded digraphs with n vertices up to n = 9. We also implemented a Python script that enumerated elements of families of graphs up to n = 6. The solutions were used to cross-validate one another. The constraint G n = B * ({}B * ) n−1 ensures nvertices in encoded digraphs. The finite set of encoded acyclic 5-vertex digraphs was computed with a finite-state approach  that takes the input projection of the composition where Id defines an identity relation and transducer T 55 eliminates matching adjacent brackets. This composition differs from the typical use where the purpose is to construct a regular relation (Kaplan and Kay, 1994) or its output projection (Roche, 1996;Oflazer, 2003). For digraphs with a lot of vertices, we had an option to employ a dynamic programming scheme (Yli-Jyrä, 2012) that uses weighted transducers.
Building the Ontology To build the ontology in Figure 1 we first found out which combinations of digraph properties co-occur to define distinguishable families of digraphs. After the nodes of the lattice were found, we were able to see the partial order between these.
Integer Sequences We sampled, for important families of digraphs, the prefixes of their related integer sequences. We found out that each family of graphs is pretty much described by its cardinality, see Table 2. In many cases, the number sequence was already well documented (OEIS Foundation Inc., 2017).

The Formal Basis of Practical Parsing
While presenting a practical parser implementation is outside of the scope of this paper, which focuses in the theory, we outline in this section the aspects to take into account when applying our representation to build practical natural language parsers.
Positioned Brackets In order to do inference in arc-factored parsing, we incorporate weights to the representation. For each vertex in G n , the brackets are decorated with the respective position number. Then, we define an input-specific grammar representation where each pair of brackets in D gets an arc-factored weight given the directions and the vertex numbers associated with the brackets.

Grammar Intersection
We associate, to each G n , a quadratic-size context-free grammar that generates all noncrossing digraphs with n vertices. This grammar is obtained by computing (or even precomputing) the intersection D 55 ∩ Reg lat ∩ G n in any order, exploiting the closure of contextfree languages under intersection with regular languages (Bar-Hillel et al., 1961). The introduction of the position numbers and weights in the Dyck language gives us, instead, a weighted grammar and its intersection (Lang, 1994). This grammar is a compact representation for a finite set of weighted latent encoded digraphs. Additional constraints during the intersection tailors the grammar to different families of digraphs.
Dynamic Programming The heaviest digraph is found with a dynamic programming algorithm that computes, for each nonterminal in the grammar, the weight of the heaviest subtree. A careful reader may notice some connections to Eisner algorithm (Eisner and Satta, 1999), context-free parsing through intersection (Nederhof and Satta, 2003), and a dynamic programming scheme that uses contracting transducers and factorized composition (Yli-Jyrä, 2012). Unfortunately, space does not permit discussing the connections here.
Lexicalized Search Space In practical parsing, we want the parser behavior and the dependency structure to be sensitive to the lexical entries or features of each word. We can replace the generic vertex description B * in G n with subsets that depend on respective lexical entries. Graphical constraints can be applied to some vertices but relaxed for others. This application of current results gives a principled, graphically motivated solution to lexicalized control over the search space.

Conclusion
We have investigated the search space of parsers that produce noncrossing digraphs. Parsers that can be adapted to different needs are less dependent on artificial assumptions on the search space. Adaptivity gives us freedom to model how the properties of digraphs are actually distributed in linguistic data. As the adaptive data analysis deserves to be treated in its own right, the current work focuses on the separation of the parsing algorithm from the properties of the search space. This paper makes four significant contributions.
Contribution 1: Digraph Encoding The paper introduces, for noncrossing digraphs, an encoding that uses brackets to indicate edges. Bracketed trees are widely used in generative syntax, treebanks and structured document formats. There are established conversions between phrase structure and projective dependency trees, but the currently advocated edge bracketing is expressive and captures more than just projective dependency trees. This capacity is welcome as syntactic and semantic analysis with dependency graphs is a steadily growing field.
The edge bracketing creates new avenues for the study of connections between noncrossing graphs and context-free languages, as well as their recognizable properties. By demonstrating that digraphs can be treated as strings, we suggest that practical parsing to these structures could be implemented with existing methods that restrict context-free grammars to a regular yield language.

Contribution 2:
Context-Free Properties Acyclicity and other important properties of noncrossing digraphs are expressible as unambiguous context-free sets of encoded noncrossing  (2015) or Kuhlmann and Johnsson (2015), YJ = Yli-Jyrä (2012) digraphs. This facilitates the incorporation of property testing to dynamic programming algorithms that implement exact inference.
Descriptive complexity helps us understand to which degree various graphical properties are local and could be incorporated into efficient dynamic programming during exact inference. It is well known that acyclicity and connecticity are not definable in first-order logic (FO) while they can be defined easily in monadic second order logic (MSO) (Courcelle, 1997). MSO involves set-valued variables whose use in dynamic programming algorithms and tabular parsing is inefficient. MSO queries have a brute force transformation to first-order (FO) logic, but this does not generally help either as it is well known that MSO can express intractable problems.
The interesting observation of the current work is that some MSO definable properties of digraphs become local in our extended encoding. This encoding is linear compared to the size of digraphs: each string over the extended bracket alphabet encodes a fixed assignment of MSO variables. The properties of noncrossing digraphs now reduce to properties of bracketed trees with linear amount of  Figure 5: Testing ACYC U in logarithmic space latent information that is fixed for each digraph. A deeper explanation for our observation comes from the fact that the treewidth of noncrossing and other outerplanar graphs is bounded to 2. When the treewidth is bounded, all MSO definable properties, including the intractable ones, become linear time decidable for individual structures (Courcelle, 1990). They can also be decided in a logarithmic amount of writable space (Elberfeld et al., 2010), e.g. with element indices instead of sets. By combining this insight with Proposition 1, we obtain a logspace solution for testing acyclicity of a noncrossing graph ( Figure 5).
Although bounded treewidth is a weaker constraint than so-called bounded treedepth that would immediately guarantee first-order definabil-ity (Elberfeld et al., 2016), it can sometimes turn intractable search problems to dynamic programming algorithms (Akutsu and Tamura, 2012). In our case, Proposition 1 gave rise to unambiguous context-free subsets of L NC-DIGRAPH . These can be recognized with dynamic programming and used in efficient constrained inference when we add vertex indices to the brackets and weights to the grammar of the corresponding Dyck language.
Contribution 3: Digraph Ontology The context-free properties of encoded digraphs have elegant nonderivative language representations and they generate a semi-lattice under language intersection. Although context-free languages are not generally closed under intersection, all combinations of the properties in this lattice are context-free and define natural families of digraphs. The nonderivative representations for our axiomatic properties share the same Dyck language D 55 and homomorphism, but differ in terms of forbidden patterns. As a consequence, also any conjunctive combination of these two properties shares these components and thus define a context-free language. The obtained semilattice is an ontology of families of noncrossing digraphs.
Our ontology contains important families of noncrossing digraphs used in syntactic and semantic dependency parsing: out trees, dags, and weakly connected digraphs. It shows the entailment between the properties and proves the existence of less known families of noncrossing digraphs such as strongly unambiguous digraphs and oriented graphs, multitrees, oriented forests and polytrees. These are generalizations of out oriented trees. However, these families can still be weakly projective. Table 2 shows integer sequences obtained by enumerating digraphs in each family. At least twelve of these sequences are previously known, which indicates that the families are natural.
We used a finite-state toolkit to build the components of the nongenerative language representation for latent encoded digraphs and the axioms. 2 Contribution 4: Generic Parsing The fourth contribution of this paper is to show that parsing algorithms can be separated from the formal properties of their search space.