Graph Transductions and Typological Gaps in Morphological Paradigms

Several typological gaps have attracted a lot of interest in the linguistic literature recently. These concern the Person Case Constraint and the absence of ABA patterns in adjectival gradation, pronoun suppletion, case syncretism, and singular noun allomorphy, among others. This paper is the first to provide a unified explanation of all these phenomena, and it does so via weakly non-inverting graphtransductions. A pattern P is absent from the typology whenever such transductions cannot produce the graph corresponding to P from some fixed underlying base graph. I show that weakly non-inverting graphtransductions are particularly simple from a computational perspective, and consequently all these typological gaps follow from general simplicity desiderata.


Introduction
One peculiar property of natural language is that its typology rarely cover the full range of logically possible options. The Person Case Constraint (PCC), for instance, blocks certain combinations of direct objects (DOs) and indirect objects (IOs) based on their person specification.
(1) a. * Roger A similar case of limited variation is the * ABA generalization, which was first stated by Bobaljik (2012) with respect to adjectival gradation. While many adjectives have regular comparative and superlative forms (smart, smarter, smartest), some adjectives display stem suppletion (good, better, best). Bobaljik (2012) claims that there are no languages where the comparative is suppletive while the superlative is regular (good, better, goodest) -in other words, there are no ABA patterns. Since then the * ABA generalization has been observed in a large number of morphological paradigms, and many proposals have been put forward to explain the absence of ABA patterns.
However, cases of limited complexity like the PCC and the * ABA generalization have not received much attention from mathematical linguists. One reason may be that these restrictions on natural languages do not seem to line up with the usual notions of generative capacity, computational complexity, learnability or minimal description length. The PCC, for instance, is utterly unremarkable from a formal perspective: the constrained elements are string adjacent clitics, and the sets of permitted and blocked configurations are both finite. As a result, every PCC variant is strictly 2-local over strings (McNaughton and Papert, 1971), making it even less complex than simple phonological processes such as intervocalic voicing and locally bounded vowel harmony (Heinz, 2015). Since different PCCs only vary in which one of six IO-DO combinations they allow, there are no quantifiable consequences for learnability, either. The tools of mathematical linguistics are geared towards vertical variation -hierarchies of expressivity and complexity -whereas phenomena like the PCC and the * ABA generalization pertain to horizontal variation, i.e. limitations that seem arbitrary and pointless from a computational perspective. I argue in this paper that mathematical linguistics does in fact have a lot to say about such cases of horizontal variation. Not only does a mathematically informed perspective allow for a level of abstraction where the PCC and the * ABA generalization can be given a unified explanation, it even allows us to derive the limits of variation from computational considerations. Contrary to initial appearances, then, horizontal variation is indeed interwoven with vertical variation upon closer inspection.
More concretely, I show that both the PCC and the * ABA generalization can be decomposed into two components: a base hierarchy that is represented by a graph, and a graph transduction that produces a language-specific ordering from the base hierarchy. The restrictions on cross-linguistic variation arise from limitations on how the graph transduction may change the ordering relations in the base hierarchy. These limitations, in turn, guarantee that the transductions belong to an especially weak class of mappings. Viewed from the perspective of strings, they are input strictly 1-local relations (Chandlee, 2014).
The paper is laid out as follows. The few required basics of graph theory are summarized and exemplified in Sec. 2 in an effort to accommodate readers from various backgrounds. I then discuss Graf's (2014) algebraic account of the PCC, which forms the basis of my graph-theoretic analysis. Said analysis is subsequently extended to a number of phenomena in Sec. 4. All of them are instances of the * ABA generalization or at least closely related to it: adjectival gradation, pronoun allomorphy, case syncretism, and noun stem allomorphy, With the full analysis in place, I then turn to the computational investigation (Sec. 5) and address some methodological concerns about the viability of studying horizontal variation across natural languages (Sec. 6).

Preliminaries
Even though the paper presupposes only minimal familiarity with graph theory, I include a slightly more accessible explanation of the basic concepts due to the interdisciplinary subject matter, which might attract readers without the expected mathematical background. The reader can safely skip this section if they are not puzzled by terms like weakly connected graph and graph transduction.
A directed graph G := V, E consists of a set V of vertices and a set E ⊆ V × V of edges that connect these vertices. Both V and E may be empty, so there is no requirement for a graph to contain any vertices or that any of its vertices are connected by edges. We say that vertex v is immediately reachable from vertex u iff there is an edge from u to v (i.e. u, v ∈ E). In the special case where u = v the edge is called a loop. Furthermore, u is reachable from u iff there are vertices v 1 , . . . , v n such that v 1 is immediately reachable from u, and v i+1 is immediately reachable from v i (1 ≤ i < n), and u is immediately reachable from v n . Reachability thus holds iff u, v ∈ E + , where E + is the transitive closure of E. In this case we also write u v. If a vertex is reachable from itself, the graph contains a cycle.
As an example, consider the following directed graph G: Its set of vertices is {1, 2, 3, 4}, and the set of edges is { 1, 2 , 2, 1 , 1, 3 , 2, 3 }. Therefore 2 is immediately reachable from 1, and the other way round. We also see that 3 is immediately reachable from 1 and 2, but no vertex is immediately reachable from 3. Moreover, 1 is reachable from itself even though it is not immediately reachable from itself. This is the case because 2 is immediately reachable from 1 and from there we can immediately reach 1. Formally, we have 1, 2 ∈ E and 2, 1 ∈ E, which implies 1, 1 ∈ E + . This also entails that G contains a cycle even though there is no loop 1, 1 ∈ E.
A graph is undirected iff its edge relation is symmetric: In an undirected graph, u is (immediately) reachable from v iff v is (immediately) reachable from u. An undirected graph is connected iff every node is reachable from every other node: • weakly connected iff adding v, u to E for every u, v ∈ E yields an undirected graph that is connected.
The example graph G above is not connected because I) 1 and 2 are not reachable from 3, and II) no node can reach 4 or be reached from 4. Due to II G is not weakly connected either. If G were weakly connected, then we could turn it into a connected undirected graph by adding the symmetric counterpart of every existing edge. But this only grows the edge relation E of G from { 1, 2 , 2, 1 , 1, 3 , 2, 3 } to { 1, 2 , 2, 1 , 1, 3 , 3, 1 , 2, 3 , 3, 2 }. The resulting graph is still not connected because there is no edge from or to 4. But if 4 were to be removed from the set of vertices, the graph would indeed be weakly connected (but not connected).
A graph transduction τ is any computable binary relation between graphs. In this paper, however, I only consider transductions that do not change the set of vertices. Given a graph G, τ (G) := {G | G, G ∈ τ }. In order to distinguish reachability in G from reachability in some g ∈ τ (G) I sometimes use the symbol instead of .
Graph transductions generalize string transductions and tree transductions from strings and trees to arbitrary graphs. String transductions are closely related to phonological and morphological rewrite rules (Johnson, 1972;Kaplan and Kay, 1994;Mohri, 1997;Chandlee, 2014Chandlee, , 2016. Tree transductions are the formal counterpart to syntactic transformations, as is explicitly mentioned in Rounds (1970), one of the earliest papers on tree transducers; others include Engelfriet (1975) and Baker (1978;1979). For modern surveys see Knight (2007) and Maletti (2010). For the purposes of this paper, the technical aspects of graph transductions are of little concern. The only relevant point is that just like string and tree transductions, graph transductions differ in their computational requirements so that some transductions are easier to compute than others. For a more formal perspective on graph transductions, the reader is referred to Courcelle (1992) and Courcelle and Engelfriet (2012).

Person Case Constraint
The vantage point for this project is the algebraic analysis of the Person Case Constraint (PCC) in Graf (2014). Once the analysis is recast in graphtheoretic terms, it is easily extended to the * ABA generalization in Sec. 4. From a didactic perspective this order of topics is slightly lopsided be-cause Graf's (2014) PCC treatment is more complex than the morphological paradigms I extend it to. But it is still fairly simple, and mastering the complex case first will greatly speed up the discussion of the simpler phenomena later on.
As I mentioned in the introduction, the PCC renders the well-formedness of DO-IOcombinations contingent on their person specifications. Four PCCs are attested in the literature (Walkow, 2012). Using 1, 2, and 3 as shorthands for first, second, and third person, respectively, they are defined as follows: S(trong)-PCC DO must be 3. (Bonet, 1994) U(ltrastrong)-PCC DO is less prominent than IO, where 3 is less prominent than 2, and 2 is less prominent than 1. (Nevins, 2007) W(eak)-PCC 3IO combines only with 3DO. (Bonet, 1994) M(e first)-PCC If IO is 2 or 3, then DO is not 1. (Nevins, 2007) Note that cases where IO and DO have the same person feature are frequently treated separately in the literature, so I will not consider them here either. Graf (2014) provides a mathematical account of the PCC that gradually moves from presemilattices as a purely descriptive device to a more theoretical proposal that can be recast in graphtheoretic terms. Rather than reiterate this gradual development, I immediately skip ahead to the three essential components of the final account.
1. All variants of the PCC are subsumed under the G(eneralized)-PCC, which states that IO must not be (strictly) less prominent than DO (IO < DO). This constraint will produce exactly the four attested PCC variants if combined with the directed graphs in Fig. 1, where m is more prominent than n (n < m) iff n is reachable from m.
2. The independently motivated person hierarchy 3 < 2 < 1 of Zwicky (1977) is posited as a universal base ordering for person. From our perspective, Zwicky's person hierarchy is identical to the graph for the U-PCC.
3. The four PCC-specific prominence rankings in Fig. 1 are obtained from Zwicky's hierarchy by a graph transduction τ that adds or removes edges while preserving three essential properties of the base structure. In the following, and denote the transitive closure of the edge relations in the input and output graph, respectively, and all graphs are assumed to contain no loops.
Weak connectedness The output graph produced by τ must be weakly connected. Weak maximality If there is no y such that y x, then z x only if we also have x z. Strong minimality If there is no y such that x y, then there is no z such that x z.
The reader is invited to verify for themselves that the four graphs in Fig. 1, and only those, can be obtained from Zwicky's person hierarchy without violating any of the three constraints above.
As an example of how this account enforces a specific PCC consider the S-PCC, which only allows IO-DO combinations if DO is 3. Hence the only allowed combinations are IO1-DO3 and IO2-DO3. The G-PCC requires IO < DO, and the graph for the S-PCC establishes 2 < 1, 1 < 2, 3 < 1, and 3 < 1. So any instance where DO is 2 or 1 necessarily results in a violation of the G-PCC: for 1, IO cannot be 2 or 3, and for 2, IO cannot be 1 or 3. With a third person DO, on the other hand, IO can freely vary between 1 and 2. Consequently, the only allowed combinations are indeed those where DO is 3.
This graph-based account is remarkably simple in comparison to syntactic proposals, which not only have to capture the typological variation but must also provide a syntactic encoding for both the G-PCC and the person hierarchy (Anagnostopoulou, 2005;Adger and Harbour, 2007;Nevins, 2007). The specificity of linguistic proposals has certain advantages, as I discuss at the end of Sec. 5, but it also comes with its fair share of problems that the graph-theoretic view avoids. In particular, abstracting away from the details of syntactic implementation provides a greater degree of flexibility and makes the account more accommodating to new data. For example, recent results from Slovenian (Stegovec, 2016) suggest that there are inverted variants of the PCC where the G-PCC is DO < IO instead of IO < DO. Most syntactic accounts are entirely built around the idea that all instances of the PCC involve an IO < DO asymmetry, and thus they must now be rethought from the ground up or reinterpret the Slovenian data. The graph-based proposal, by contrast, ends up even less complex because the very specific G-PCC has now been reduced to a general ban against prominence mismatches: x < y, with languages differing in how they instantiate x and y as IO and DO.
For the purposes of this paper, however, the more interesting aspect of the graph-theoretic view is how it captures typological variation in the PCC: languages all start out with the same base hierarchy but may modify it as long as the distinguished roles of the top and bottom positions are not completely destroyed. In fact, weak maximality and strong minimality are instances of more general order preservation properties.
Strongly non-inverting If x y, then it is not the case that y x.
Weakly non-inverting If x y, then y x only if x y.
The transductions that produce the PCC graphs in Fig. 1 are weakly non-inverting but go a little bit beyond that because they are all strongly noninverting with respect to 3.
If weak maximality and strong minimality were completely replaced by the property of being weakly non-inverting, this would allow for several new graphs. However, the prominence ranking < is defined in terms of reachability rather than immediate reachability, and all the new graphs turn out to define the same reachability relations as one of the two graphs depicted in Fig. 2. One is a variant of U-PCC where we also have 1 < 2 and 2 < 3, the other one a version of the M-PCC with 2 < 3 and 3 < 2. As noted in Graf (2014) the former is actually attested in Cairene Arabic as a ban against all DO-IO combinations (Shlonsky, 1997). Although this phenomenon may not be a genuine PCC, we may classify it as the I(ndiscriminate)-PCC. The second PCC variant changes the Me first-PCC into a Me second-PCC. This M2-PCC is still unattested. Whether the mathematically more pleasing notion of being weakly non-inverting fully captures the PCC thus has to remain an open question.
Even though not all weakly non-inverting graph transductions may be suitable for the PCC, it is certainly the case given our current data that all PCC graph transductions are weakly noninverting. In the next section, I argue that many aspects of morphology are also closely tied to weakly non-inverting graph transductions. In particular, the restriction to weakly non-inverting graph transductions is sufficient to derive the ban against ABA patterns that has attracted a great amount of attention since Bobaljik (2012).

Stem Suppletion in Adjectival Gradation
The * ABA generalization refers to a particular gap in various morphological paradigms. Given a morphological subsystem where one may posit an underlying hierarchy x < y < z, z cannot pattern with x to the exclusion of y. At the beginning of this paper I already presented an example from suppletion in adjectival gradation, analyzed at great depth in Bobaljik (2012). Bobaljik points out that if a language allows for stem suppletion in either comparatives or superlatives, it must allow for both. Data illustrating this generalization is given in Tab. 1. If one follows the convention to list the three forms in the order positive, comparative, superlative and uses letters to indicate which forms use the same stem, one can decompose the gap into two constraints, * AAB and * ABA.
In Bobaljik (2012), these constraints are explained via structural assumptions. Bobaljik decomposes adjectival forms into a tree template such that comparatives contain the positive base form as a subtree and are in turn themselves subtrees of the corresponding superlative forms. Then * AAB and * ABA follow from specific assumptions about the rewrite rules (tree-to-string transducers in computational terms) that map these tree structures to the output string. Bobaljik and Sauerland (2017) provide a less stipulative explanation grounded in the combinatorics of feature systems, which is closer to my graph-theoretic proposal (although they ultimately reject this content-agnostic solution in favor of the structural account). Both works, however, agree that * ABA is the more important constraint of the two - * AAB seems to be specific to adjectival suppletion whereas * ABA holds for many morphological paradigms (more on that in the next subsections).
The increased importance of * ABA relative to * AAB is noteworthy because the former is indeed more complex than the latter from the perspective of graph transductions. Suppose that there is a universal underlying hierarchy of the form H U := positive < comparative < superlative, which we may identify with the U-PCC graph and thus abbreviate as 1 < 2 < 3 (I stipulate that x < y iff x y rather than y x in order to stay close to linguistic intuitions, but this is immaterial for the actual account). Assume furthermore that two forms m and n of an adjective involve the same (original or suppletive) stem only if m < n and n < m in the language-specific hierarchy H L . Applying this idea to the graphs in Fig. 1 and 2   cially, ABA is not among these graphs. So the ABA pattern cannot be produced from the 1 < 2 < 3 base order assuming that the graph transductions • are weakly non-inverting, and • produce weakly connected graphs, and • do not relabel any nodes, and • do not delete any nodes.
Most of these assumptions are innocent from a linguistic perspective. Deletion of nodes makes no sense in this case as it would amount to removing the positive, comparative, or superlative form, but we are only interested in languages with all three forms because the * ABA generalization is trivially satisfied otherwise. Relabeling nodes would create an "anything goes" scenario where adjectival gradation hierarchies could even be mapped to person and number with no rhyme or reason. And the output graphs must be weakly connected because hierarchies in natural language never allow for elements that are completely unordered with respect to the other elements in the hierarchy. This leaves only two non-trivial assumptions that do the actual work of blocking ABA patterns: graph transductions must be weakly non-inverting, and the base hierarchy is H U := positive < comparative < superlative.
Note that positing this hierarchy does not entail that the ordering needs to be reflected in the structure of adjectives as proposed by Bobaljik (2012). Instead, the hierarchy may be taken to reflect the semantics of these constructions or arise from some other unknown factor. For our purposes, it only matters that we have such an underlying base hierarchy, not what its origins may be. And this is not a peculiarity of this approach: even stating the * ABA generalization for purely descriptive purposes presupposes this order. If one instead assumed an order of, say, comparative < superlative < positive, then the banned pattern would be BAA instead of ABA. But the latter is equivalent to ABB, which is allowed in many other morphological paradigms that have nothing to do with adjectives. So an implicit commitment to H U is required whenever one seeks to analyze adjectival stem suppletion as an instance of the general ban against ABA patterns.
As a matter of fact, though, our finding can be strengthened so that it is compatible with a number of underlying hierarchies rather than just H U . As long as the directed graph we start with is one of the connected PCC graphs, the ABA pattern cannot be produced.
Theorem 1. Let τ be a non-deleting, weakly noninverting graph transduction that does not relabel any nodes and only produces connected graphs, and let S be one of the connected graphs in Fig. 1  and 2. Then no G ∈ τ (S) allows for the ABA pattern.
Proof. Recall that by definition two vertices u and v may have the same realization iff u v and v u. Therefore the ABA pattern can only be produced by graphs where both 1 3 and 3 2 hold but for all x ∈ {1, 3}, x 2 holds only if 2 x does not. No such graph can be produced by τ from any of the four choices for S without deleting 2 or relabeling nodes.
But 3 2 implies 2 3 because τ is weakly noninverting. So either 3 2 and 2 3 or it does not hold that both 3 1 and 1 3.
The proof reveals that the * ABA generalization is compatible with any universal base hierarchy that specifies at least positive ≤ comparative and comparative ≤ superlative.
While * ABA follows immediately if the graph transductions must be weakly non-inverting, * AAB is much harder to derive. As shown in Tab. 2, AAB patterns are produced by the S-PCC graph. In order to block this graph, one has to disallow 2 1. But then the I-PCC graph would be blocked, too. This only leaves the stipulative option of banning 2 1 unless 3 1. Intuitively, this states that 1 loses its privileged status only if 1, 2, and 3 are all equally prominent. Just as in the case of the PCC, then, we have to slightly strengthen the requirements on the graph transduction to avoid overgeneration. That this strengthening pertained to 3 in the case of the PCC but to 1 in the case of adjectives is not significant since the two are each other's duals. We could have just as well identified H U with the inverse of the U-PCC graph and obtained a strengthening requirement with respect to 3 this way.
Putting aside these minor details, we can now say with certainty that the PCC and the * ABA generalization are remarkably similar from a graphtheoretic perspective. Both operate within a class of graphs that are obtained from an underlying base order by some weakly non-inverting graph transduction. Each one puts an additional restriction on the transduction, and in each case the restriction is designed to preserve the special status of an element at the top/bottom of the underlying hierarchy.

Other Morphological Paradigms
As mentioned earlier on, the ban against ABA patterns also holds with respect to other morphological paradigms. Some of those can be explained in exactly the same manner as the ABA ban with adjectives, whereas others require minor modifications.
Pronoun allomorphy The simplest case arises with pronoun allomorphy. Harbour (2015) conducts an extensive survey of pronoun systems and shows that all of them adopt one of four systems with respect to person: • all persons are the same (AAA), • first and second person are the same (AAB), • second and third person are the same (ABB), • all persons are different (ABC).
Again the ABA pattern is missing, and this fact is expected if graph transductions must be weakly non-inverting and the underlying person hierarchy fixes 3 < 2 and 2 < 1, as we already had to assume for the PCC. However, a quick glance at Tab. 2 reveals that an even stronger result holds: AAA, AAB, ABB, and ABC are exactly the patterns that can be generated under our account. Pronominal systems, then, are the first instance where our base assumptions give a full characterization of the morphological paradigm and no extra stipulations are needed.
Case syncretism Caha (2009;2013) proposes the Strong Case Contiguity Hypothesis according to which case syncretism may only target contiguous areas of Blake's Case Hierarchy (Blake, 2001): This means that a language may mark, say, accusative, genitive, dative and instrumental the same, but not accusative and instrumental to the exclusion of dative and genitive. In other words, the Strong Case Contiguity Hypothesis extends the * ABA generalization beyond systems with three-way contrasts.
Using Blake's Case Hierarchy as a baseline, it is possible with our current assumptions to generate graphs that instantiate some ABA patterns. Two examples are displayed in Fig. 3. A notable property of these graphs is that they are not connected, even though they are weakly connected. If the graph transductions are limited to producing connected graphs, then no ABA patterns can be generated anymore. So case syncretism may once again not be too different from the PCC, adjectival gradation, or pronoun allomorphy, except that it puts more stringent restrictions on what a valid case hierarchy may look like: no two cases may be unordered with respect to each other. That said, Harðarson (2016) points out some apparent exceptions to Caha's Strong Case Continuity Hypothesis in Germanic languages, which display accusative-dative syncretism in some case paradigms but not accusative-genitive-dative syncretism. One solution would be to posit a more relaxed version of Blake's hierarchy where genitive and dative are unordered with respect to each other. This allows for all the syncretism patterns of Blake's original hierarchy but also includes accusative-dative syncretism. Whether this is the right way to deal with these exceptions has to remain an open issue for now. Thankfully the typological literature on this topic is very rich (see (Zompí, 2016) and references therein), so a deeper exploration should be possible in the near future.
Noun stem allomorphy Case syncretism has also been studied with respect to the noun stems that are chosen for specific cases. In Latin, for example, the nominative of 'man' is hom-o, whereas the accusative is homin-em. Nominative and accusative thus are formed with different stems of the same noun. In the following, I only consider the behavior of singular stems because the typology of plural stem allomorphy is still understudied to the best of my knowledge.
McFadden (2017) proposes that all languages obey the Nominative Stem-Allomorphy Generalization: if noun stem allomorphy is conditioned by case, it distinguishes the nominative from all other cases. In other words, noun stem allomorphy always displays an AB n pattern. For a language with three cases, McFadden's generalization permits only AAA and ABB while excluding AAB, ABA, and ABC. This is an even more restrictive paradigm than the one we encountered for case syncretism. But it can still be explained in terms that fit naturally into the graph-theoretic framework. Note that, as indicated in Tab. 2, AAA and ABB are exactly the patterns generated by the graphs in Fig. 2 the complement set of our four main PCC graphs from Fig. 1. While at first counterintuitive, this actually makes it possible to describe noun stem allomorphy as the combination of case syncretism with an inverted PCC. First, suppose once more that the graph transduction must produce a connected graph, as we did for case syncretism. Then we only need to enforce two more properties for graph transductions. The first one is weak maximality, which was also part of our account for the PCC. The second is weak non-maximality: Weak non-maximality If there is a y such that y x, then x z iff z x.
When applied to Blake's hierarchy, these two properties ensure that nominative is always a maximal vertex, whereas all other vertices are reachable from each other. This guarantees that only AB n and A n patterns are possible.

Interim summary
We have looked at five different phenomena where typological variation is much more narrow than one would expect from a computational perspective -the PCC, adjectival gradation, pronoun suppletion, case syncretism, and singular noun stem allomorphy. In all five cases, the typology could be derived from a very natural and independently motivated base hierarchy in combination with certain assumptions about structure preservation. For each language, the base hierarchy is converted into a languagespecific hierarchy by some graph transduction τ that must not delete or relabel any nodes, has to produce weakly connected graphs, and, crucially, is weakly non-inverting. In some cases, this already explains the full range of variation, while other paradigms seem to invoke additional restrictions on τ . A succinct overview is given in Tab. 3.

Why These Properties?
The previous two sections have established that the range of typological variation across many morphological paradigms is accurately delimited if one assumes that there are universally shared base hierarchies that may only be manipulated in narrowly restricted ways. At the very center of my formal investigation was the requirement that graph transductions be weakly non-inverting. While descriptively adequate, it seems puzzling that natural languages should obey such a particular property. And even if one grants that being weakly non-inverting is advantageous for some reason, why is the requirement not strengthened so that all graph transductions must be strongly noninverting. If one is good, then the other should be even better. I contend that there are indeed reasons that make weakly non-inverting graph transductions particularly simple from a computational perspective, whereas strongly non-inverting graph transductions do not further improve on this simplicity. Weakly non-inverting graph transductions therefore represent a sweet spot between flexibility and computational simplicity. Several computational considerations underly this claim. First, the property of being weakly non-inverting enforces a limited amount of order preservation, and order preservation is known to play a central role for other aspects of language, too. Mönnich (2006; shows that standard Minimalist grammars (Stabler, 1997), a formalization of the Minimalist syntax (Chomsky, 1995), generate tree languages that are the image of regular tree languages under direction preserving MSO transductions. The tree languages of tree adjoining grammars (Joshi, 1985), on the other hand, are the image of regular tree languages under inversely direction preserving MSO transductions (Mönnich, 2006(Mönnich, , 2012. Either way order preservation seems to be an important aspect of tree transductions in syntax, so it is not unreasonable that graph transductions in morphology may display similar limitations.
But there are stronger arguments that go beyond mere analogy. If the computational complexity of transductions is severely restricted, they are simply incapable of reversing order and hence are necessarily weakly non-inverting. Unfortunately the current knowledge of very weak graph transductions is not as well-developed as that for string and tree transductions, so I will illustrate my point with string transductions instead.
Note first that all the graphs in this paper are strings or string-like. When a graph is not a string, that is because there are two vertices that either form a cycle or are not immediately reachable from each other. We may use the dedicated symbols -and | for these cases such that u-v means that u and v form a cycle, and u|v denotes that u is not immediately reachable from v, and vice versa. With this notation, the four PCC graphs in Fig. 1 correspond to the strings 1-2 3, 1 2 3, 1|2 3, and 1 2|3, respectively. The notation will bring to light that weakly non-inverting graph transductions invoked in this paper correspond to extremely weak string transductions.
Towards the end of the discussion of case syncretisms I entertained the hypothesis that the base hierarchy might not be totally ordered. In order to emulate such cases, the string transduction τ must also be allowed to delete the symbols -and |. Now suppose that our base is 1 2 3|4 5, which may be regarded as a truncated version of the partially or-

Target graph
Additional properties of τ PCC weakly connected weak maximality, strong minimality Adjectival gradation weakly connected 2 1 → 3 1 Pronoun allomorphy weakly connected none Case syncretism connected none Noun stem suppletion connected weak maximality, weak non-maximality Table 3: Parameters of each morphological paradigm dered case hierarchy I proposed. Then τ yields all strings of the form 1 u 2 v 3 x 4 y 5, where u, v, x, and y may each be |, -, or the empty string ε. Close inspection of this pattern reveals that τ still encodes a weakly non-inverting graph transduction. Among string transductions, τ belongs to a very weak class. It is computed by a transducer with a single state (Fig. 4) and can be regarded as input strictly 1-local (ISL-1) in the sense of Chandlee (2014). 1 Transductions that invert the order of symbols in arbitrary strings do not belong to this class. Even switching the order of adjacent symbols cannot be accomplished. This is very clear from the ISL perspective. A transduction is k-ISL iff the output for a given node n depends on the label of n and the labels of the preceding k − 1 symbols. Chandlee (2014; proves that k-ISL transductions are only capable of k-bounded metathesis, which means that two symbols in the input string can be switched iff they are separated by at most k − 2 symbols. This immediately entails that the order of two symbols can be reversed iff k ≥ 2, wherefore 1-ISL transductions are incapable of reversing order.
It seems, then, that the restriction to weakly non-inverting graph transductions can be derived from general simplicity desiderata. The recourse to strings is inelegant but unfortunately necessary as long as the class of ISL transductions has not been lifted from strings to graphs. Hopefully this will be rectified in the near future.
This still leaves open the question, though, why subparts of morphology and morphosyntax should impose additional criteria, in particular odd ones like 2 1 → 3 1 for adjectival gradation. While it is of course possible that a better computational understanding of graph transductions may eventually offer a satisfying explanation, a more likely scenario is that these properties are "echoes" of mechanisms that operate at a lower level of description. The graph-theoretic approach provides a more unified perspective than alternative proposals in the literature because it deliberately abstracts away from how these graphs and transductions are implemented in the grammar. There is no mention of features, agreement operations, or structural constraints because those vary wildly across domains and would obscure what the phenomena have in common. But these low-level processes might be subject to additional constraints that limit the range of typological variation even more. Abstracting away from them means losing the motivation behind those restrictions.
This highlights that the graph-theoretic view supplements existing approaches in linguistics, rather than replacing them. Its abstract nature makes it a lot easier to state general properties that are shared by all morphological paradigms. But when studying a single phenomenon in depth, the more fine-grained approaches favored by linguists may provide the necessary level of detail to explain aspects that are reduced to ad hoc stipulations in the graph-theoretic view.

Remarks on Data Reliability
This paper is, in essence, a mathematical exploration of a few particularly prominent typological universals. A common concern in this regard is the reliability of the data on the basis of which these universals are posited. While the generalizations I discussed in Sec. 3 and 4 draw from a wide range of typologically diverse languages, even very extensive surveys such as Smith et al. (2016) include only about 60 languages. Considering that there are an estimated 6000 languages spoken today, this covers only 1% of the potential data. What more, languages frequently display a large amount of variation across their dialects, wherefore the amount of undetected "typological dark matter" may be even larger. One has to wonder, then, how reliable these generalizations are and whether it is even worthwhile to explore them from a formal perspective.
Unsurprisingly, I believe that they are worth exploring and that the arguments that are commonly marshaled against the enterprise do not stand up to closer scrutiny. If claims about substantive universals are unreliable due to the relative scarcity of data, then mathematical linguists should not put much stock into the mild context-sensitivity hypothesis either. After all, there may be unknown languages out there that are not even contextsensitive. One may argue that this is unlikely because the parsing algorithm for such a language could not run in polynomial time, but this holds only if one adopts the competence-performance distinction.
More importantly, it is just as conceivable that variation in the realm of substantive universals is also limited by independent factors -I presented a computational argument along these lines in Sec. 5, but beyond that there may be general principles of human cognition that prefer, say, a base ordering of 1 < 2 < 3 over 3 < 1 < 2. Therefore the exploration of substantive universals is methodologically no different from the study of formal universals; if the latter is a viable enterprise, the former is too. Being doubtful about all substantive universals while embracing formal universals cannot be motivated on logical grounds. And refraining from making any claims in the absence of rock solid data is unscientific: science proceeds in the absence of perfect knowledge, and every inductive step necessarily requires a leap of faith regarding the universality of the existing data.
That said, there is of course a bigger risk of overfitting the data in the area of morphosyntax because the models must characterize much smaller classes. The mildly context-sensitive hypothesis leaves a lot more room for crosslinguistic variation, and if it were to be disproved the problem would be solved by adding an additional mechanism to push weak generative capacity to the required level. Designing a model around four attested variants of the PCC or the * ABA generalization increases the risk that a single data point will render the whole model unsalvageable.
This has happened before: all Minimalist accounts of the PCC assumed a strict asymmetry with DO more prominent than IO, and consequently they may now need to be redesigned from the ground up if some languages do indeed display a mirror-PCC with IO more prominent than DO (Stegovec, 2016). However, the root of the problem is not that these proposals treated the data that was known at that point as an upper bound on the range of variation, but rather that their parameters were too tightly intertwined to allow for easy modification in the future.
The graph-transduction perspective in this paper, on the other hand, is similar to other mathematical approaches in that it displays a great amount of malleability to accommodate a shifting empirical landscape. Suppose for the sake of argument that an ABA pattern exists in some languages for some morphological or morphosyntactic domain. That would disprove the * ABA generalization, but it would not change the fact that ABA patterns are much rarer than any of the alternatives. From a formal perspective, this is easy to accommodate by moving to weighted graph transductions that penalize reversal and thus make ABA patterns more costly. The main insights about the importance of being weakly non-inverting stay the same, but they are extended from the Boolean domain to a weighted one.
The move towards a quantitative perspective is prudent anyways because it generalizes claims about the possibility of certain paradigms to claims about their relative frequency, which can be tested even with a non-exhaustive data set. For example, Tab. 2 lists three different graphs that produce ABC patterns, whereas only one graph each gives rise to AAB, ABB, and AAA. It seems unlikely that ABC is typologically three times more common than AAA, but a more sophisticated analysis may be able to derive better quantitative predictions. At any rate the approach presented in this paper has the requisite flexibility to be viable even with limited data, and in particular to avoid irreparable damage due to overfitting.

Conclusion
The account proposed in this paper derives typological gaps from two components: a fixed underlying hierarchy shared across all languages (a person hierarchy, case hierarchy, and so on), and heavily restricted graph transductions that generate the language-specific graph(s) from said hierarchy. The most important restriction is that the transductions be weakly non-inverting. Not only does this property severely limit their ability to alter the underlying hierarchy, it also reduces their complexity tremendously. Applying concepts from the theory of subregular string transductions, we may view these transductions as input strictly 1-local, which is the weakest nontrivial class of transductions. Overall, then, the graph-theoretic view sheds new light on these typological gaps and demonstrates the virtues of a mathematical approach that abstracts away from matters of implementation.
Of course a lot of work remains to be done. The literature on typological generalizations is enormous, and only a few could be touched on here. It will be particularly important to extend this approach to phenomena where multiple hierarchies are combined, e.g. number and person in pronoun hierarchies. Some other phenomena such as resolved agreement have more of a grouptheoretic flavor. Resolved agreement refers to cases where an adjective agrees with multiple coordinated noun phrases. In Icelandic, for example, the adjective displays masculine agreement if all noun phrases are masculine, feminine if all noun phrases are feminine, and neuter in all other cases. It is still unclear whether the graph-theoretic perspective can be fruitfully expanded to such phenomena or whether algebraic techniques might provide a better fit. Irrespective of the final answer, there is no doubt that the abstraction and flexibility of mathematical approaches will be a great aid in the study of typological gaps.