Transition-Based Coding and Formal Language Theory for Ordered Digraphs

Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.

The study of transition systems in general ranges from Turing complete transition systems (Woods, 1970;Goldin et al., 2004;Thomas, 2002) to well-understood transition systems that build projective dependency structures, context-free parse trees and noncrossing graphs (Nivre, 2003(Nivre, , 2004Goldberg and Elhadad, 2010;Kuhlmann et al., 2011;Sagae and Tsujii, 2008;Honnibal and Johnson, 2015). The need to model nonlocal dependencies and crossing edges in parses have motivated the study of transition systems that balance the computational complexity and the coverage of the possible outputs. Many of these systems are extensions of stack-based transition systems (Attardi, 2006;Nivre, 2009;Gómez-Rodríguez and Nivre, 2013;de Lhoneux et al., 2017;Qi and Manning, 2017; but there are also some proposals for transition systems that are based, solely or additionally, on some other memory model, such as a shack, a list, registers, a set, or a cache (Kornai and Tuza, 1992;Covington, 2000;Choi and McCallum, 2013;Pitler and McDonald, 2015;Fernández-González and Gómez-Rodríguez, 2018;Gildea et al., 2018;Coavoux and Cohen, 2019).
In this paper, we present a transition system that implements an efficient, invertible function between its action sequences and arbitrary ordered digraphs. The action sequences of the system can also viewed as strings of balanced brackets, constituting formal languages that have elegant Chomsky-Schützenberger representations and many desirable characteristics of input-driven languages. The transition-based transformation between the relational and sequential representations of digraphs opens a possibility to apply classical formal language theory of subsets of free monoids to the classes of digraphs.
Our transition system takes advantage of a new kind of decomposition of a digraph: the rope decomposition views the underlying graph as a union of subgraphs what we call ropes. The longest edge of a rope shares exactly one endpoint with each of the other edges in it.
The theoretical notions of rope decompositions and the new transition system are introduced in Sections 3-4. The action sequences of the transition system are related to formal languages in Sections 5-6. Sections 7-8 contain corpus-based empirical evaluation and a discussion that argues that the developed encoding for digraphs contributes to the work in some related and important research areas in graph theory and computer science.

Basic Definitions
Denote the empty string with ϵ. Denote transpose of a binary relation X as X T . Define the composition of two binary relations X, Y as Abbreviate an assignment S ← S ∪ T as S ∪ ← T . Let V n denote the finite set of integers {1, ..., n}. Let the parameter d ∈ {<, >, <>} indicate the choice between leftward, rightward and bidirectional orientation of arcs in actions that produce these arcs.
A (finite ordered) graph is a pair (V n , E) where V n is a finite set of ordered vertices and E ⊆ {(u, v) ∈ V n × V n | u < v} is a set of edges. For each edge (i, j) ∈ E, we call i the left index and j the right index of the edge. A (finite ordered) digraph is a pair (V n , A) where V n is a set of vertices and

Rope Cover
Definition 3.1. Let (V n , E) be an ordered graph. In this graph, edge (h, k), where h < k, is a (properly-longer shared-endpoint) covering edge for a shorter edge (i, j) if either h = i and i < j < k, or j = k and h < i < j. Denote this situation by (h, k) : (i, j). Proof. Let (V n , E) be an arbitrary graph and let R, R ′ ⊆ E be two PRCs of the graph. Assuming that R ̸ = R ′ and that there is (x 0 , y 0 ) ∈ R\R ′ , we show, by induction, that there is an infinite sequence of distinct edges (x 0 , y 0 ), (x 2 , y 2 ), (x 4 , y 4 ), ... ∈ R\R ′ and (x 1 , y 1 ), (x 3 , y 3 ), (x 5 , y 5 ), ... ∈ R ′ \R where (x i+1 , y i+1 ) : (x i , y i ) for every i ≥ 0. To prove the required induction steps, there is, by the definition of a PRC, a cov- Such an infinite sequence of distinct edges requires E to be infinite. By contradiction, the PRC of the graph is unique. Definition 3.3. For graph (V n , E) with a PRC R, the rope-thickness of a vertex i ∈ V n−1 is the number of edges (h, j) ∈ R satisfying h ≤ i < j. The rope-thickness of the graph is the maximum over the rope-thicknesses of all vertices i ∈ V n−1 in the graph. Proof. Let (V n , E) be a graph. To construct the PRC, start with R 0 = ∅ and E 0 = E. Given R i and E i , i ≥ 0, construct R i+1 as the set all edges in E i that do not have a covering edge in E i , and E i+1 as the set of all edges in E i that do not have a covering edge in R i+1 . Each such iteration is computed in O(n 2 ). Clearly, E i+1 ⊊ E i unless E i = ∅, and E ⌊n/2⌋ = ∅. The PRC of the graph is the set R = R 1 ∪R 2 ∪...∪R ⌊n/2⌋ , and it is constructed in O(n 3 ) time.
Corollary 3.5. The rope thickness of a graph can be computed in cubic time.

Convention 3.5 ("Indirect Edges").
When an edge (i, j) has a covering edge (h, j), h < i < j, we refer to the edge (i, j) indirectly, via the pair (i, h) where h is the left index of the covering edge.
The PRC of the graph (4) comprises the edges {(1, 6), (2, 5)}, while the remaining edges are covered by these. Edge (1, 5) is a usual edge, and there are indirect edges, (3, 2) and (4, 2), that we draw under the vertices. The rope thickness of vertices 2 -4 is two, which is also the maximum for the whole graph.

Ropes
Definition 3.6. An ordered graph (V n , E) is called a rope if n = 1 or the PRC of the graph Ropes can be used in algorithms that construct graphs while processing vertices. Example (5) shows a digraph whose underlying graph is a complete rope. Some of its edges would cross one another if the edges were drawn above a line containing the vertices.
Two-way algorithms can build a complete rope in two passes with a vertex counter and one memory unit that contains a reference to one index of the covering edge. After memorizing the left index of the covering edge to variable x, such an algorithm processes vertices 2 to 5 in Example (5) and builds edges (x, 2), (x, 3), (x, 4) and (x, 5). During the backward pass over the vertices, the algorithm saves the right index of the covering edge to variable x and builds edges (4, x), (3, x), and (2, x).
One-way algorithms process each vertex only once as their output can represent the edges (i, n), 2 ≤ i ≤ n − 1, in Example (5) indirectly, by a reference to the left index of the respective covering edge. In the output, the edge (2, 5) is represented as an indirect edge (2, 1) where 1 identifies the left index of the covering edge (1, 5). After processing the vertices, the composition of (2, 1) and (1, 5) is computed to obtain the actual edge (2, 5).

Rope Assignment
Theorem 3.6. There are graphs where an edge has two distinct covering edges in the PRC.
Proof. The PRC of the graph of Example (6) is The edge (2, 3) is covered by the edge (1, 3) with which it shares the right index, and by the edge (2, 4) with which it shares the left index.
When an edge has two covering edges, we need a consistent policy for treating them. In (6), we can assign the edge (2, 3) to either of the covering edges (1, 3), (2, 4), or both. Convention 3.7 ("Earliest"). We adopt a convention according to which the ambiguity between two possible covering edges is resolved by assigning the edge to the earliest available covering edge.
By Convention 3.7, the arc (2, 3) in Example (6) will be assigned to the Earliest covering edge (1, 3), which can be identified by its left index 1, by Convention 3.4. The covered arc (2, 3) is thus represented as an indirect edge (2, 1) by Convention 3.5.

Rope Decomposition
This section introduces a new representation for digraphs. The idea is to start from the PRC of the underlying graph, and then assign the remaining arcs to covering edges.
satisfying the following conditions: i<j≤k} are, respectively, left and right arcs whose left index coincides with the left index of the covering edge, ∈R, h<i<j} are, respectively, indirect representations for arcs whose right index coincides with the respective covering edge but is represented indirectly, via the left index of the respective covering edge.

The four sets of arcs represent together the original set of arcs:
Lemma 3.7. Under the "Earliest" convention, the relation between digraphs and rope decompositions is a bijection.
Proof. (⇒): Let G = (V n , A) be a digraph and G ′ = (V n , E A ) its underlying graph. By Theorem 3.3, G ′ has a unique PRC R ⊆ E A . By the "Earliest" convention, we choose the earliest available covering edge for each edge and first construct the sets of indirect arcs After this, we construct the sets of arcs whose left index coincides with the covering edges:

A New Transition System
By Lemma 3.7, there is a bijection between (a class of) rope decompositions and digraphs. We complement this result by relating each rope decomposition bijectively to a sequence of actions. The actions are controlled by a transition system.
Our transition system has a buffer β ∈ N, a main stack σ ∈ N * and an auxiliary stack τ ∈ N * , each containing vertex indices. The tuple (σ, τ, β) of these three structures forms the core of the configurations between which the transition system moves. As an input, the system takes a sequence of actions that tell how to build a rope decomposition in an incremental manner. The possible types of actions of the transition system are listed in Table 1.
Initially, both the stacks are empty and the buffer β consists of the list of positive integers. The final configurations of the system consists of all those configurations (ϵ, ϵ, β) where both stacks are empty, and β contains a suffix [n, n + 1, ...] of the list of positive integers. When the system reaches a final configuration, it has produced a relational struc- of a rope decomposition. By doing so, the transition system maps the input sequence of actions to a rope decomposition that represents a digraph. It is not too difficult to define the inverse of this function, but we suppress the details in the interest of space.

Main Actions
The most important actions of the transition system create the set of edges in the PRC R. By a shift (sh) action, the system removes a vertex index i from the front of the buffer and places it to the top of the stack in order to prepare for a future situation where the index i is the left index of an edge in the PRC, i.e., ∃j.R(i, j). When the index j becomes available in the front of the buffer, the system creates the edge (i, j) ∈ R by a reduce (re) action and removes the index i from the stack. The specifier d ∈ {<, >, <>} of the action tells whether the corresponding arc (i, j) is to be added to A < , A > or both. Only one reduce action is allowed in a row. By a next (nx) action, the system removes an index i from the front of the buffer to secure the situation where no covering edge has vertex i as its left index, i.e., ¬∃j.R(i, j).
The PRC R, and the corresponding arcs in A < and A > of the rope decomposition of this digraph are created by the action sequence: The main actions involved in this example do not use the auxiliary stack. The edges in R and the corresponding arcs are created by the reduce actions as follows:

Intermediate Actions
In order to create arcs whose The pass (pass) and insert (ins) actions create arcs whose underlying edges do not belong to the PRC: To prevent multiple re-entry to the same configuration and repeating the intermediate actions, the detailed configurations of the transition system include control variables that restrict the available actions in different phases of the transition system. As the insert actions set +ins, the reduce and pass astions become blocked until the next shift/next action.  (1, 3), (2, 4)} that is created through a sequence that involves intermediate actions. The preceding pass and the following insert actions allow the first reduce action to get access to a non-top element of the main stack and to remove it from the stack: The configuration ([1], ϵ, [2..]) allows a combination of a pass and insert actions. The insert action puts, to I < , the pair (2, 1) representing the arc (3, 2) in Example (6).

Correctness Lemma 4.1. For every digraph, there is a unique action sequence that creates the edges of its PRC.
Proof sketch. By Proposition 3.1, no vertex needs to start or finish more than one edge in the PRC. The transition system allows to start one covering edge per vertex with a shift action and finish any previously started cover edge with a reduce action. A crossing cover edge can be finished by accessing the nontopmost stack elements with pass 0 and ins 0 actions in their appropriate time.
Theorem 4.2. The transition system is able to produce every possible rope decomposition, capturing every digraph.
Proof sketch. By Lemma 4.1, we have a way to create the PRC and the corresponding arcs. The set of arcs is extended to represent the rest of the arcs via pass and insert actions that create the arcs that are properly covered by the edges in the PRC.

Linear Encoding
The action sequences of the transition system can be seen as linearisations for the digraphs.
In particular, the undirected graph in Example (6) is encoded as the action sequence in (14). To make the linearisation more convenient for eyes, we replace actions with brackets and other symbols in (15).
Convention 5.1. By convention, the bracketing scheme renames the actions of the transition system as follows: The convention improves the readability of action sequence and gives compact action sequences: especially, the digraph in Example (7) is encoded as string Example (17) demonstrates how the bracket(s) now correspond almost iconically to the represented arcs.
Convention 5.1 benefits us when we analyse the formal, language theoretic properties of such action sequences altough the encoding is not otherwise meant for human inspection and its direct manipulation by hand is prone to errors. The convention borrows ideas from Yli-Jyrä (2017) but differs from it in four important aspects: These changes are necessary to deal with crossing arcs and the actions that operate on two stacks.

Syntactic Properties
In order to analyse the formal properties of the transition system, we need to understand how the actions, or the corresponding brackets, form strings that correspond to ordered graphs in a bijective manner. We induce the following principles: 1. Each vertex corresponds to a sequence of closing brackets followed by a sequence of opening brackets.
This corresponds to the convention 3.7 that always chooses the "earliest" covering edge in the PRC of the underlying graph. We do not need "]"-brackets to produce arcs as we have an opportunity to produce same arcs with "["-brackets earlier.
3. The unnecessary pair of pass and insert actions, marked with bracket substring "] 5. The maximum number of brackets per vertex is n − 1.

Left brackets
7. The rope thickness of the (di)graph is the maximum number of momentarily open brackets in its encoding.
According to the principles 1-4, the bracket substrings that correspond to different vertices constitute a context-free language W that is generated by the grammar G W : Lemma 5.1. The action sequences that conform to the seven principles allow only one way to represent each ordered digraph.
Proof. The strong brackets are crucial for encoding all arcs of the digraph and the PRC of its underlying graph in particular. Every digraph (V n , A) has a unique PRC, and, due to the principles 1-4, it is not possible to build the same PRC with the correct arc orientations in two different ways. By the second principle, all non-PRC arcs are assigned to a unique covering edge. There is thus only one moment when the right combination of indices is available in the configuration for constructing each arc, and there is only one action sequence that can construct any given digraph.
We also observe that string concatenation of two action sequences gives an action sequence that produces a digraph concatenation of two ordered digraphs with one shared vertex.
Proposition 5.2. The encoding from digraphs to strings is a mapping that preserves the structure of the digraph concatenation monoid and sends it the structure of a string concatenation monoid.

Formal Language Theory
Chomsky-Schützenberger (CS) parsing (Yli-Jyrä, 2005, 2012Hulden, 2009;Yli-Jyrä and Gómez-Rodríguez, 2017;Ruprecht and Denkinger, 2019) combines a particular kind of language representations with weighted automata techniques. A prototypical CS style language representation h(L ∩ D) involves a homomorphic mapping (h) applied to an intersection of a a regular language component L and a Dyck language D. Yli-Jyrä and Gómez-Rodríguez (2017) used this kind of language representations to show that their encoding for the noncrossing digraphs (L NC-DIGRAPH ) is a context-free language and admits an efficient algorithm for finding maximal constrained subdigraphs in a weighted complete digraph.
This section gives a CS style representation for the language of all encoded digraphs (L DIGRAPH ) by relaxing the requirement that the L component of the language representation is a regular language. The represented language is then not context-free, but the representation is loosely speaking of "the CS style". The similarity becomes more obvious when we derive a context-free approximation of it. Lemma 6.1. There is a CS style representation for the language L DIGRAPH . Proof. We start by defining a Dyck language D that checks for balanced bracketing. Let the internal alphabet of the representation be Σ = {{, }} ∪{[ (l,r) , ] (l,r) | l ∈ B L , r ∈ B R }. Let D be the language generated by the grammar where l ∈ B L and r ∈ B R . Let h : Σ * → ({ • }∪B L ∪B R ) * be a homomorphism defined in such a way that, for all l ∈ B L , r ∈ B R , Instead of the usual regular component of CS representations, we use a marked concatenation closure of a context-free language: let L be the context-free language W ( • W ) n−1 , whose inverse homomorphims h −1 (L) is also a marked concatenation closure of a context-free language.
The set of encoded digraphs is now given as Lemma 6.2. The subset of the encoded digraphs L DIGRAPH , where the number of brackets per vertex is bounded by k, is context-free.
Proof. We start from the CS style representation (the proof of lemma 6.1) for L DIGRAPH and replace the context-free language L, with a regular approximation L <k = W <k ( • W <k ) * where W <k is a finite subset of W restricted to contain at most k nested brackets in the strings. This gives a more prototypical CS rep- , which yields a context-free subset of L DIGRAPH . Lemma 6.3. The subset of encoded graphs L DIGRAPH , where the rope thickness of the encoded digraphs is bounded by t, is regular.
Proof. As the rope thickness is bounded by t, the number of brackets per vertex is bounded by 2t. Thus, we start from the CS style representation (the proof of Lemma 6.2) for encoded graphs where the number of brackets per vertex is bounded. By the bound t for the rope thickness, we replace the context-free language D with a regular subset D t ⊂ D that can contain t levels of nested brackets. The 2t, t-bounded set of encoded graphs is given by L DIGRAPH,2t,t By the closure properties of regular languages, L DIGRAPH,2t,t is regular.

Evidence for Linguistic Relevance
To assess the linguistic relevance of the currently presented encoding, we carried out a small experiment where we computed the rope-thickness of dependency trees in the Universal Dependencies v 2.4 treebanks (Nivre et al., 2019). The compacted results are presented in Table 2. The results indicate that a very high proportion of the observations is captured when rope-thickness is 4 or higher.
According to our preliminary experiments on graph banks, a very similar distribution of rope-thickness is observed in more general annotation graphs.

Discussion
Among the earliest encoding schemes for graphs are the Prüfer sequences for labeled trees (Prüfer, 1918) that have been extened to DAGs (Steinsky, 2003). More recently, Turán (1984) introduced the problem of graph representation given an adjacency matrix. There are now some efficient representations for unlabled and labeled graphs (Turán, 1984;Naor, 1990;Farzan and Munro, 2013). Our representation for digraphs is also efficient: it has a cubic-time encoder and a linear-time decoder.
The currently presented encoding for digraphs is a generalisation of an earlier representation (Yli-Jyrä, 2017, 2019 that is itself an optimized alternative for the balanced bracketing proposed for weighted dependency parsing in (Yli-Jyrä, 2012). Several edgeweighted parsing algorithms have been presented earlier (Dixon et al., 1992;Charniak et al., 1998;Sasano et al., 2000;Kuhlmann and Jonsson, 2015), but these newer methods apply to up to 50 families of dependency graphs and the currently presented encoding is expected to help in their generalization. It would be also interesting to study how rope graphs relate to 1-endpoint crossing graphs (Pitler et al., 2013;Kurtz and Kuhlmann, 2017).  The languages of encoded graphs have applications to constrained graph enumeration problems. Hoppe and Petrone (Hoppe and Petrone, 2016) have exhaustively enumerated all simple, connected graphs of a finite order and computed a selection of invariants over the sets in order to discover and add 141 new integer sequences to the Online Encyclopedia of Integer Sequences (OEIS). Our previous encoding scheme (Yli-Jyrä and Gómez-Rodríguez, 2017) gave context-free characterisations for some graph properties. This led to the discovery of dozens of known and new integer sequences by graph enumeration. These new computational methods complement the research that spans from the "Abzählsatz" of (Pólya, 1937) to more recent work on graph enumeration (Wormald, 1979;Mckay, 1983;Kapoor and Ramesh, 2000;Acuña et al., 2012;Conte et al., 2018;Equi et al., 2019).
The language L DIGRAPH is not only contextsensitive but even an indexed language (Aho, 1968): it is possible to construct an indexed grammar that generates the same set of strings. However, the existence of a simple transition system, a finite representation, and a finite indexed grammar for the encoded digraphs should not be confused with condition under which digraphs themselves become finitely generated. Ogawa (2004) has presented a complete, infinite set of generators for the graphs. We also need an infinite set of generators for the language L DIGRAPH , because the the paths that take the transition system from one final configuration to another final configuration constitute an infinite set of code words over which the encoded digraphs are generated. This set remains infinite even for digraphs with bounded rope thickness, but the context-free and regular subsets of L DIGRAPH may have some other ways to motivate finite algebraic axiomatisations.

Conclusion
This paper contributes to the research on graph representations (Turán, 1984) by developing a linear-time decodable encoding for arbitrary labeled digraphs that we preferred to call ordered digraphs. The particular design of our linear encoding is motivated by the success of similar representations (Yli-Jyrä, 2005, 2012Yli-Jyrä and Gómez-Rodríguez, 2017) in the characterisations of several families of noncrossing digraphs and by the effectiveness of the recently improved representation (Yli-Jyrä, 2017, 2019. Evidently, both kinds of graph representations have potential applications in graph enumeration (Yli-Jyrä and Gómez-Rodríguez, 2017) and weighted Chomsky-Schützenberger parsing (Yli-Jyrä, 2005, 2012Hulden, 2009;Yli-Jyrä and Gómez-Rodríguez, 2017;Ruprecht and Denkinger, 2019).
More specifically, the paper contributes a general transition system that decodes arbitrary digraphs from linear action sequences. Crucial notions -the proper rope cover (PRC) and the related rope decomposition -are defined and used in this transition system. The first PRC-based measure for the complexity of the graphs is introduced. Context-free and regular approximations of the encoded graphs are defined and shown to contain the dependency annotations of the UD 2.4 treebanks.