Sensing Tree Automata as a Model of Syntactic Dependencies

Various aspects of syntax have recently been characterized in subregular terms. However, these characterizations operate over very different representations, including string encodings of c-command relations as well as tiers projected from derivation trees. We present a way to unify these approaches via sensing tree automata over Minimalist grammar dependency trees. Sensing tree automata are deterministic top-down tree automata that may inspect the labels of all daughter nodes before assigning them speciﬁc states. It is already known that these automata cannot correctly enforce all movement dependencies in Minimalist grammars, but we show that this result no longer holds if one takes into account several well-established empirical restrictions on movement. Sensing tree automata thus furnish a strong yet uniform upper bound on the complexity of syntactic dependencies.


Introduction
This paper proposes a novel, unified upper bound on the subregular complexity of syntactic dependencies. Since the merit of this result might be opaque to a reader who is not intimately familiar with the most recent developments in subregular complexity, we start out with a very detailed background discussion.
The subregular program seeks to identify formal machinery that provides a tighter characterization of natural language than the familiar classes of the Chomsky hierarchy. Subregular phonology took as its vantage point the well-known result that phonology is regular (Johnson, 1972;Kaplan and Kay, 1994) and then identified proper subclasses that are still powerful enough for specific types of phonological dependencies. Among these subclasses are SL, SP (Rogers et al., 2010), IBSP (Graf, 2017(Graf, , 2018a, TSL (Heinz et al., 2011;Mc-Mullin, 2016), and several extensions of the latter (Baek, 2018;Graf and Mayer, 2018). The highly restrictive nature of these classes has allowed for new learning algorithms (Heinz et al., 2012;Jardine and McMullin, 2017) and also furnishes computational explanations for typological gaps.
Syntax cannot be subregular in this strict sense by virtue of being at least mildly context-sensitive (Huybregts, 1984;Shieber, 1985;Michaelis and Kracht, 1997;Kobele, 2006, a.o.). However, linguists formulate syntactic dependencies over trees, not strings. This shift in perspective has also taken place within formal grammar, first with modeltheoretic syntax (Blackburn et al. 1993;Backofen et al. 1995;Cornell and Rogers 1998;Rogers , 2003 and then the two-step approach (see Morawietz 2003, Mönnich 2006, andreferences therein). The two-step approach highlighted how the mildly context-sensitive nature of syntax arises from the interaction of two finitestate components: a regular tree language that encodes a kind of "deep structure" and a finite-state tree transduction to the intended "surface structure". This perspective has proven particularly fruitful in Minimalist grammars (MGs; Stabler, 1997Stabler, , 2011a, where derivation trees provide the regular tree language and the transduction is a formal analogue to Chomsky's notion of movement (Kobele et al., 2007;Graf, 2012b). With the regular nature of syntax properly identified, a subregular characterization of syntax is suddenly feasible. Graf (2012a) provided the first subregular (un)definability results for MG derivation tree languages, but these results were expanded on only recently. Notably, all follow-up work strived to remain closer to the classes that enjoy prominence in subregular phonology. This, however, also led to a marked divergence in approaches. Graf (2018b) operates directly over MG derivation trees. Following the tradition of model-theoretic syntax, Graf equates the MG operations Merge and Move with constraints on MG derivation tree languages and shows that they belong to the tree analogue of the subregular string class TSL. This view is also adopted by Vu (2018) and Vu et al. (2019) in the analysis of negative-polarity items and case licensing, respectively.  and , on the other hand, pursue a purely string-based perspective of syntactic dependencies. For each node they identify its string of c-commanders, the shape of which must follow the constraints imposed by, say, Principle A or NPI-licensing. When construed as such string constraints, syntactic licensing conditions not only turn out to be subregular, they also fit into classes that have been proposed for subregular phonology. The work so far thus has unearthed two distinct subregularity results: MG operations are subregular over MG derivation trees, and licensing conditions are subregular over a specific string representation grounded in c-command (cf. Frank and Vijay-Shanker, 2001).
Even though each perspective is worthwhile and has proven very fruitful, their apparent incommensurability raises the question how these two notions of subregularity can be brought to bear on each other. The central contribution of our paper is a uniform upper bound on syntax that encompasses both MG operations and licensing conditions. This upper bound takes the form of sensing tree automata (STAs) operating over dependency tree representations of MG derivations (these dependency trees are distinct from the MG dependency trees of Boston et al. 2010). STAs provide a minimal amount of look-ahead to deterministic top-down tree automata: the automaton may inspect the labels of all daughter nodes before assigning them specific states. Far more than just a mathematical curiosity, limiting syntax to STA-recognizable constraints over MG dependency trees is very natural in several respects: 1. MG dependency trees are a natural encoding of head-argument relations.
2. STAs explain why licensing conditions are mediated by c-command instead of the many alternative command relations one could imagine (Barker and Pullum, 1990).
3. In order for movement to be regulated by STAs, it must obey additional restrictions beyond those of the standard MG formalism. These restrictions coincide with well-known empirical phenomena such as the Specifier Island Constraint and the Coordinate Structure Constraint.
4. As a single STA can handle movement and licensing conditions at the same time, it is unsurprising that the two occasionally interact, e.g. when movement induces a licensing configuration or violates one.
5. Since STAs are deterministic top-down automata with a minimal amount of look-ahead, they are a natural match for top-down parsing, which has been argued to play a central role in human sentence processing (Stabler, 2013;Kobele et al., 2013;.
Hence our contribution does not merely unify the two existing approaches to subregular syntax, it also accounts for empirical aspects of syntax that the latter leave unexplained. At the same time, we do not intend for our perspective to supplant the existing ones. Each one provides useful insights and can now be safely pursued in the knowledge that there is a principled formal connection to the other approaches.
The paper is laid out as follows: Section 2 introduces our mathematical notation for trees and tree languages ( §2.1), which form the basis for our definition of STAs ( §2.2). We then define MGs and their dependency trees in §3. In order to simplify some of the subsequent proofs, we do not follow the standard definitions and instead adopt a format that is partly inspired by Kobele et al. (2007). Sections 4 and 5 cover the two core results of this paper: dependency trees of MGs with the Specifier Island Constraint are STA-recognizable, and so are licensing conditions based on c-command. The last section briefly sketches some extensions of these results, including interactions of movement and c-command ( §6.1), subcommand ( §6.2), recognizability of MG derivation trees ( §6.3), adjunct islands and the coordinate structure constraint ( §6.4, §6.5), and connections to top-down parsing ( §6.6).

Trees and Tree Languages
A ranked alphabet Σ is a finite set of symbols, each one of which has a rank or arity assigned by the function r : Σ → N. We write Σ (n) to denote {σ ∈ Σ | r(σ) = n}, and σ (n) indicates that σ has rank n. Given a ranked alphabet Σ, the set T (Σ) of Σ-trees contains all σ (0) and all terms σ (n) (t 1 , . . . , t n ) (n ≥ 0) such that t 1 , . . . , t n ∈ T (Σ).
Given a term m (n) (s 1 , . . . , s n ) where each s i is a subtree with root d i , we call m the mother of the daughters d 1 , . . . , d n (1 ≤ i ≤ n). If two distinct nodes have the same mother, they are siblings. We use the term proper dominance for the transitive closure of the mother-of relation, and reflexive dominance for the reflexive, transitive closure. A node a is an ancestor of node n iff a properly dominates n.
Every node u in a Σ-tree has a unique address g(u) as defined in Gorn (1967). The subtree of Σtree t rooted in Gorn address g(u) is denoted by t/g(u). Given two Σ-trees s and t and node u in t, t[g(u) ← s] is the result of replacing t/g(u) in t by s. Throughout the paper, we write u instead of g(u) so that t/u and t[u ← s] are shorthands for t/g(u) and t[g(u) ← s], respectively.

Sensing Tree Automata
A sensing tree automaton (STA; Martens et al., 2008) is a deterministic top-down tree automaton that may also take the labels of a node's daughters into account before assigning states to them. In other words, these automata have a finite lookahead of 1. This intuition is reflected in our definition of sensing rules below. Following the format of Comon et al. (2008, p. 38) for top-down tree automata, we deliberately define sensing rules in a tree transducer format. This is a marked deviation from Martens et al. (2008), but it should make it easier for future work to extend our perspective from constraints to transformations.
Let Σ be some ranked alphabet and Q a set of symbols of rank 1 disjoint from Σ. Members of Q are called states. In addition, X is a countably infinite set of variable symbols (X ∩ Σ = X ∩ Q = ∅). For the rest of this section, we use σ i as a shorthand for σ i (x i 1 , . . . , x in ), with σ i ∈ Σ (n) and x i 1 , . . . , x in ∈ X.
Note that for σ (0) , sensing rules are of the form q(σ (0) ) → σ (0) . Such sensing rules effectively remove states from leaf nodes, whereas sensing rules for σ (≥1) remove the state of σ and instead add a state on top of each daughter of σ.
Given a set ∆ of sensing rules, we write t → ∆ t iff t → δ t for some δ ∈ ∆. The reflexive, transitive closure of → ∆ is denoted by → * ∆ . Intuitively, t → * ∆ t iff t can be obtained from t by a sequence of sensing rules.
Definition 3 (Sensing tree automaton). A sensing tree automaton (STA) over Σ is a 4-tuple A := Q, Σ, q I , ∆ such that Q is a finite set of states disjoint from alphabet Σ, q I ∈ Q is the initial state, and ∆ is a finite set of sensing rules over Σ and Q. The tree language recognized by A is L(A) := {t ∈ T (Σ) | q I (t) → * ∆ t}. The class of all tree languages that are recognized by at least one STA is called STA.
An STA thus recognizes a tree t iff there is a sequence of sensing rules that starts from the initial state q I and passes states through t until they are all removed again at the leaves of t. If the STA ever gets stuck because there is no suitable sensing rule, t is rejected.
It will also be convenient in some proofs to employ a substitution-based characterization of the tree languages that are recognized by STAs. The characterization capitalizes on the fact that every STA only has a look-ahead of 1. Consequently, the state it assigns to a node n depends only on three components: I) the label of n and its siblings, II) the states assigned to n's ancestors, and III) the states assigned to the siblings of each ancestor. If all of those are kept constant, n will always receive the same state no matter what the rest of the tree looks like.
Definition 4 (Spine closure). Given a node u of some Σ-tree t, lsib t (u) is the string consisting of the label of u's left siblings (if they exist) followed by the label of u. Analogously, rsib t (u) is the string consisting of the label of u and the label of its right siblings (if they exist). Also, and ⇓ are two distinguished symbols not in Σ. 1 Let u 1 , . . . , u n be the shortest path of nodes extending from the root of t to u. That is to say, each u i is the mother of u i+1 (1 ≤ i < n), u 1 is the root, and u n = u. By spine t (u) we denote the string recursively defined by spine t (u 1 ) = u 1 u 1 and spine t (u 1 , . . . , u n ) = spine t (u 1 , . . . , u n−1 ) ⇓ lsib t (u n ) rsib t (u n ) A regular tree language L is spine-closed iff it holds for all trees s, t ∈ L and nodes u and v belonging to s and t, respectively, that spine s Theorem 1 (Martens 2006). A regular tree language L belongs to the class STA iff L is spineclosed.
Example 3. Consider the left tree l and right tree r in Fig. 1. Let l0 and r0 be the left daughter of the root in l and r, respectively. Then spine l (l0) = spine r (r0) = merge merge ⇓ merge merge merge. Hence a tree language that contains both l and r is an STA language iff it also contains r[r0 ← r/l0].
In linguistic terms, spine-closure tells us that two subtrees s/u and t/v with identical root labels can be freely exchanged whenever they have the same ancestors and the same c-commanders. This will be of great importance throughout this paper.

Minimalist Grammars
Minimalist grammars (MGs; Stabler, 1997) are a formalization of Minimalist syntax (Chomsky, 1995). Readers who are unfamiliar with the formalism should consult Stabler (2011a) for a more accessible introduction.
Every MG consists of a finite set of feature annotated lexical items. Each lexical item is a pair of a phonetic exponent and a finite, non-empty string of features. There are four types of features, whose job it is to trigger the structure-building operations Merge and Move. Merge establishes head-argument relations and is triggered when a selector feature F + on a head finds a matching category feature F − on an argument. For example, the noun guest carries a feature N − , for which we also write guest :: N − . It can be merged with the :: N + D − to yield a DP, thanks to the matching category and selector features. Move displaces a subtree from its current position to a higher position in the syntactic structure. Move takes place when a licensor features f + on a head that provides a landing site can be checked by a corresponding licensee feature f − on a mover. The order of features on a lexical item determines the order in which the corresponding operations are triggered. Hence the determiner which :: N + D − wh − would first select a noun phrase, get merged with a head looking for a DP, and then undergo whmovement.
The sequence of Merge and Move steps is commonly represented as a derivation tree, e.g. in Fig. 2. However, we will use a dependency tree representation instead. Our dependency trees are merely a more compact encoding of MG derivation trees and have no connection to the MG dependency trees of Boston et al. (2010). We will directly define MGs as sets of well-formed dependency trees, mirroring earlier definitions in terms of derivation trees (primarily Kobele et al. 2007).
We pick a ranked alphabet Σ such that Σ (n) contains all lexical items of MG G, and only those, that carry exactly n selector features. For simplicity, we use G to also refer to this alphabet. Not every G-tree is a well-formed dependency tree, though, due to the constraints of the MG feature calculus. The calculus is illustrated in Fig. 2, where each node in the dependency tree is annotated with the feature configuration corresponding to its subtree. We formalize this calculus via a recursive function feat that computes these values based on more primitive functions for the feature checking steps that trigger Merge and Move.
Definition 5 (Dependency tree language). If G is an MG with n distinct licensee features, then the set dep(G) of well-formed dependency trees of G is {t ∈ T (G) | feat(t) = C − , ε 1 , . . . , ε n }.
The remainder of this section defines feat in terms of feature checking operators M and ⊗ for Move and Merge, respectively. All operations manipulate one or more sequences of feature strings. The Shortest Move Constraint (SMC) will be used to filter out illicit sequences.
Definition 6 (SMC). Let G be an MG and Lce its set of n licensee features. Given a sequence s := s 1 , . . . , s m (m ≥ 0) of strings in Lce * , SMC : The SMC ensures that Move is unambiguous in the sense that there can never be more than one active mover of a specific type (wh, topicalization, and so on). It is an integral part of MGs, and removing it would greatly alter their expressivity (Salvati, 2011).
Next we add a helper function sort that orders SMC-approved sequences based on the first licensee feature of each feature string.
Definition 7 (sort). Let G and Lce be as before. Now fix some bijection b between Lce and {1, . . . , n}. Then sort maps s := s 1 , . . . , s m ∈ Lce * to the sequence s 1 , . . . , s n such that Now we can finally define M for Move, which is also a crucial part of the Merge operator ⊗. Throughout we use γ as a shorthand for any string of features, and δ for a (possibly empty) string of licensee features.
if f is the licensor feature f + , and e otherwise.
Note how Merge is always followed by an application of the Move operator, but this does not trigger any feature checking unless γ starts with a licensor feature. Merge steps are thus interleaved with movement checks, not all of which may actually result in movement.

CP C TP
Lemma 2. Let G be an MG with n distinct licensee features and s a subtree of some t ∈ dep(G). Then feat(s) must be of the form F − δ, δ 1 , . . . , δ n for some category feature F − .
Lemma 2 is apparent from the derivation tree example in Fig. 2. It is a minor extension of the well-known fact that a lexical item may occur in a well-formed MG derivation iff its feature string is of the form φF δ, where φ is either ε or a selector feature followed by 0 or more selector and licensor features, F is a category feature, and δ is a (possibly empty) string of licensee features.

Merge and Move via STAs
With all the preliminaries in place, we can finally turn to the core results regarding the STArecognizability of MGs with respect to Merge and Move. The next sections then extend this to licensing conditions and some other special cases. Graf (2012a) uses the argument from example 3 in §2.2 to prove that MG derivation tree languages are not STA languages. However, this proof does not carry over to MG dependency trees. Adopting the terminology of Graf (2012a), we use MDEP[merge, move] for the full class of MG de-pendency tree languages and MDEP[merge] for the subclass of movement-free MGs (no lexical item carries any licensee features).

Theorem 2. MDEP[merge] STA
This is just a corollary of a more fundamental property of MGs. For any arbitrary L ∈ MDEP[merge] and nodes u and v of s, t ∈ L, spine s (u) = spine t (v) necessarily entails that feat(s/u) = feat(t/v), so that Theorem 2 immediately follows from Lemma 1. We omit a full proof here as Lemma 3 will cover a more complex case that subsumes this one.
Even with the dependency tree format, though, STAs are too weak for standard MGs with both Merge and Move. The example in Fig. 3 is peculiar, though. The left dependency tree encodes the derivation for Who does a teacher of like the father of John, which is severely degraded. It has been argued that such cases of left-branch subextraction are generally forbidden. MGs can be equipped with the Specifier Island Constraint (SpIC) to rule out movement from within a specifier. This takes the form of an additional restriction on feat that only allows complements to properly contain unchecked licensee features. Since the complement of a head is its rightmost daughter in the dependency tree (rather than the leftmost one), the SpIC amounts to a restriction on all daughters except the last one.
Note that any licensee features on the head of a specifier are part of γ, not δ j , so specifiers can still move without violating the SpIC. Only extraction of a proper subtree from within a specifier is not allowed. Also note that our version of the SpIC only bans extraction from base specifiers, but not from specifiers that are derived via movement. This is why it can be easily stated over dependency trees. Even so, this limited version of the SpIC greatly limits the weak generative capacity of MGs with the SMC, while still keeping them mildly context-sensitive (Michaelis, 2004(Michaelis, , 2009Kobele and Michaelis, 2011). For our purposes, though, the major contribution of the SpIC is that it also lowers the complexity of MG dependency trees into STA.
Lemma 3. Given an MG G that obeys the SpIC, pick arbitrary nodes u and v of s, t ∈ dep(G), respectively. If spine s (u) = spine t (v), then feat(s/ u) = feat(t/v).
Proof. Since s and t are well-formed, both feat(s/u) and feat(t/v) must be defined. By Lemma 2 we may assume w.l.o.g. that feat(s/ u) := F − δ, δ 1 , . . . , δ n and feat(t/v) := F − δ , δ 1 , . . . , δ n . As u and v have identical spines, u and v themselves must be identical. Therefore both F − = F − and δ = δ hold. Now suppose that δ i = δ i for some 1 ≤ i ≤ n. Then either s/u or t/v is missing a licensee feature f − that is present in the other. Suppose it is s/u that is missing a feature present in t/v. Note that this immediately entails by the SpIC that t/v is a complement, wherefore s/u is also a complement because u and v have identical spines. Since both s and t are well-formed, whatever feature is missing in s/u must occur somewhere else in s to match some f + in the spine of u. But by the SpIC, f − cannot occur properly inside any specifier. We already know that s/u is a complement, so f − can only occur on an ancestor of u or on one of its left siblings. But then it would occur on a node in the spine of u, which contradicts our initial assumption that spine s (u) = spine t (v). Hence δ i = δ i after all, wherefore feat(s/u) = feat(t/v). 2 Theorem 4. For every MG G that obeys the SpIC, it holds that dep(G) ∈ STA.
Proof. Lemma 1 and 3 jointly imply that dep(G) is spine-closed, which guarantees that it can be recognized by an STA (Thm. 1). 2 Intuitively, Theorem 4 holds because the SpIC creates a unique "elsewhere case" for missing movers. Suppose that an STA is at node n in a Gtree t. Since it has processed t top-down, it knows exactly which movers it has to look for by virtue of the licensor features it has come across. With its look-ahead of 1, the STA can scan the daughters of n to see if any of them carry some of the desired licensee features. Any licensee features that are not among them must be embedded deeper in the tree. Due to the SpIC, though, they can only reside in the complement, i.e. the subtree rooted in the rightmost daughter of n.
The SpIC is just one way of creating such an elsewhere case. STA-recognizability would also hold if movers could only escape from the leftmost argument, or if the label of the selecting head decides which one of its arguments can be extracted from. Extraction from arguments could also be parameterized for each feature so that whmovers may only leave complements whereas topicalization is only allowed from the last but one argument, if it exists. Or the STA could switch between these four constraints depending on the number of ancestors of the current node modulo 4. STA-recognizability holds as long as distributing the head's δ i across its arguments is fully deterministic based on the information available to a sensing rule (current state, label of selecting head, labels of selected heads). Hence the class of STArecognizable MG dependency tree languages is larger than what is allowed by the SpiC.
This does not change the fact, though, that the initial finding of Graf (2012a) regarding the insufficiency of STAs is incomplete. In the case of Merge, the insufficiency disappears with the more compact representation format of MG dependency trees. Alternatively, one could also keep the derivation tree format while increasing the STA look-ahead beyond just one levelthis is a point we will revisit soon in §6.2 and §6.3, for very different reasons. With respect to Move, the choice of representation format is immaterial. Neither derivation trees nor dependency trees make movement as defined in MGs STA-recognizable. However, the SpIC does make movement STA-recognizable because specifiers do not need to be probed deeper than their head. For MG dependency trees, this coincides with the 1-level look-ahead of STAs, whereas derivation trees once again require a more generous lookahead window. The choice of representation thus has an impact on the amount of required lookahead, but the essence of our STA-recognizability result for MGs rests on the SpIC, not the tree format.

STAs and C-Command Conditions
The previous section has successfully established that the central operations of MGs can be han-dled by STAs, assuming that I) they are construed as constraints on MG dependency trees, and II) movement is subject to the SpIC. But the structure-building operations Merge and Move are just one part of syntax. Licensing conditions also play a major role, in particular those rooted in ccommand. This section shows that these conditions are also captured by STAs.
Licensing conditions based on c-command are ubiquitous in the syntactic literature. They were recently studied from a subregular perspective by  and . Both papers use similar ideas, but define them very differently. We adopt the formalism of  because it defines all essential concepts directly in terms of dependency trees.
Definition 12 (C-string). Let t be some MG dependency tree. For every node n of t in configuration m(d 1 , . . . , d i , n, d i+1 , . . . , d j ) , its immediate c[ommand]-string is ics(n) = d 1 · · · d i n. The augmented c[ommand]-string acs(n) of n is recursively defined as shown below, where ↑ is a distinguished symbol: Licensing conditions are then formalized as constraints on the shape of permissible c-strings. This is comparable to restricting a tree via its path language, except that paths are now replaced by c-strings. Definition 13 (C-string constraints). A c-string constraint C is some string language L over Σ ∪ {↑}. A Σ-tree t is well-formed with respect to C iff acs(n) ∈ L for every node n of t. Which subregular class provides the most appropriate fit for syntactic licensing conditions is still a matter of debate.  propose IO-TSL as a generous upper bound. IO-TSL is a subregular string language recently defined in (Graf and Mayer, 2018) in their analysis of Sanskrit n-retroflxion. , on the other hand, argue that at least island constraints are best captured by IBSP, another class from subregular work on phonology (Graf, 2017(Graf, , 2018a. Neither class is particularly well-understood at this point. Intuitively, IO-TSL treats local dependencies as primitive and reduces non-local constraints to local ones over enriched representations. IBSP, on the other hand, takes all constraints to be nonlocal and then uses locality domain to prune down their reach. Either way there is ample evidence that all attested conditions that can be correctly stated over c-strings are at most regular. That's all we need to show that they can be enforced by an STA. Before we proceed, the reader should take note that c-strings as defined in  do not quite capture the standard notion of ccommand over phrase structure trees. First of all, movement is factored out, so that c-strings only capture the c-command relations between the base positions where arguments enter the derivation.  point out several options for adding movement, but they do not explore them in depth. We will provide our own STA account for movement interactions later on ( §6.1). Another, less important deviation from standard ccommand pertains to the status of heads and specifiers. If one interprets linear precedence in command strings as c-command, then specifiers do not c-command their selecting head even though they c-command the head's object. At the same time, the head c-commands the specifier. While there seem to be no cases in the syntactic literature where this difference to c-command matters, it nonetheless highlights that c-strings only approximate the standard notion of c-command.
This approximation of c-command is very convenient for our purposes, though. The construction of a node's c-string closely mirrors the definition of a node's spine, so it is perhaps unsurprising that every regular c-string constraint can be enforced by an STA. The construction of an STA automaton for this purpose is remarkably straightforward.
First, we simplify c-string constraints by considering only constraints that generate prefixclosed sets of c-strings. The lemma below establishes that this assumption is innocuous for determining tree well-formedness.
Lemma 4. Let L be some regular language of well-formed c-strings, and let L p be the largest subset of L such that u / ∈ L p entails uv / ∈ L p for all u ∈ (Σ ∪ {↑}) + and v ∈ (Σ ∪ {↑}) * . Then a Σ-tree is well-formed with respect to L iff it is well-formed with respect to L p .
Proof. We only consider c-strings that do not start or end with ↑ as these never occur in any trees to begin with. Since L p ⊆ L and, by Def. 13, a tree is well-formed with respect to c-string set C iff all its c-strings are members of C, two entailments follow immediately: I) if t is well-formed with respect to L p , it is well-formed with respect to L, and II) if t is ill-formed with respect to L, it is illformed with respect to L p .
Next, suppose t is well-formed with respect to L. Since it contains no illicit c-string, t could only be ill-formed with respect to L p if it contains some licit c-string uv such that u / ∈ L p , wherefore uv / ∈ L p . But if uv is a c-string of t, then so is every non-empty prefix of uv that does not end in ↑. By our initial assumption, t is well-formed, and hence every non-empty prefix of uv is a member of L. It then must also be a member of L p because, by definition, L p must be largest among the prefix-closed subsets of L. It follows that t is well-formed with respect to L p , too.
Finally, consider the case where t is ill-formed with respect to L p . Then there is some c-string u of t such that u / ∈ L p but every non-empty prefix of u (that does not end in ↑) is a member of L p . In this case it must also hold that u / ∈ L, for otherwise L p is either not largest or violates prefix closure. Consequently, t is also ill-formed with respect to L. 2 Intuitively, Lemma 4 capitalizes on the fact that whenever a tree contains at least one unlicensed node, the status of other nodes no longer matters because the tree is already ill-formed. Hence we may freely assume that the c-string of node n is illicit as soon as a prefix of that c-string is illicit, even in cases where n itself would be licensed. Now let D := Σ ∪ {↑} , Q, q I , F, δ be the complete, deterministic finite-state string automaton that recognizes L p as defined in Lemma 4. As D is deterministic, it has a unique initial state q I . Its transition relation δ : (Q × Σ) × Q is a function. We expand δ to strings such that δ(q, σ 1 σ 2 · · · σ n ) := δ(· · · δ(δ(q, σ 1 ), σ 2 ) · · · , σ n ). Since L p is prefixclosed, it also holds that Q = F ∪ {s}, where s / ∈ F is some sink state from which no other state can be reached except s itself. Prefix closure also entails that q I ∈ F , so that empty c-strings are always allowed (since by definition c-strings are never empty, this is innocuous).
We then construct the corresponding STA A D := Σ, Q, q I , ∆ . For n ≥ 1, each sensing rule in ∆ is of the form q(σ (n) ( σ 1 , σ 2 , . . . , σ n )) → σ(δ(q, σ ↑)( σ 1 ), . . . , δ(q, σ ↑ σ 1 σ 2 · · · σ n−1 )( σ n ))) where q ∈ F and, as previously defined at the beginning of §2.2, the use of σ i with some symbol σ (n) i is a shorthand for σ i (x i 1 , . . . , x in ). For leaf nodes, we require q(σ (0) ) → σ (0) ∈ ∆ iff both q ∈ F and δ(q, σ) ∈ F . This construction effectively simulates runs of the string automaton D over c-strings. The only complication is that the process of assigning a state to σ i does not consider σ i itself. But σ i does affect the states of its right siblings and all its daughters. And if σ i is a leaf, it indirectly determines whether the state can be removed so that the tree may be accepted.
Theorem 5. Let L be a regular language of wellformed c-strings. Then there is some STA A such that L(A) is the set of all Σ-trees that are wellformed with respect to L.
Proof. Following Lemma 4, we replace L by L p and consider the complete, deterministic automaton D that generates L p . We construct A from D in the manner described above. We then give a proof by induction on the depth of Σ-trees.
Pick some Σ-tree t and suppose t ∈ Σ (0) . The only c-string of t is t. Then A recognizes t iff q I ∈ F and δ(q I , t) ∈ F . The former holds by definition, so t ∈ L(A) iff t ∈ L(D). This establishes the base case.
Next, consider any arbitrary configuration q(σ (n) (σ 1 , . . . , σ n )). By our induction assumption, q = δ(q I , u) with acs(σ) = uσ. Suppose acs(σ i ) / ∈ L p for some 1 ≤ i ≤ n. Then δ(q I , acs(σ i )) = δ(q, σ ↑ σ 1 · · · σ i ) / ∈ F . If i < n, A will assign some non-final state to σ i+1 . If i = n and σ i is not a leaf, A assigns the nonfinal state to the leftmost daughter of σ i . In both cases, t now contains some node with a non-final state. But there is no sensing rule with a nonfinal state on its left-hand side, so that A correctly rejects t. If i = n and σ i is a leaf, then δ(δ(q, σ ↑ σ 1 · · · σ i−1 ), σ i ) is not final and consequently there is no suitable leaf rewrite rule. As this covers all possible configurations for σ i , we conclude that A rejects every tree that is ill-formed with respect to L p .
In the other direction, suppose A rejects t.
Then there must be some configuration σ(q 1 (σ 1 ), . . . , q i (σ i ), . . . , q n (σ n )) such that q i is not a final state. But then acs(σ i−1 ) / ∈ L p , as desired. 2 Theorem 5 establishes that the same subregular machinery of STAs can be used for both the structure-building operations Merge and Move and the c-command licensing conditions that deeply permeate syntax. The STA perspective also provides a new answer as to why c-command should play such an important role in syntax. How an STA treats a given node depends solely on the spine of that node. The spine contains the node itself, all its ancestors, and the siblings of all the nodes in the spine. This immediately precludes generalized notion of command like the S-command relation of Barker and Pullum (1990), which in modern terminology would be CP-command: x CP-commands y iff x does not reflexively dominate y (or the other way round) and every CP that properly dominates x properly dominates y. As x and y can be arbitrarily deep within distinct subtrees while S-commanding each other, this is not an STA-recognizable command relation. C-command, on the other hand, stays within the narrow confines of STAs.
Admittedly STAs could also selectively ignore some c-commanders, operate with "inverse ccommand" where a complement c-commands into its specifiers, or switch between different notions of c-command based on some modulo counting condition. So just as with the SpIC, the power of STAs goes quite a bit beyond what is desirable for c-command. Still, it is striking that ccommand is a very natural relation from STA perspective, whereas more global notions of command are correctly ruled out. As long as one is willing to accept dependency structures as a natural representation that arises from head-argument relations, c-command is a natural companion of STA-recognizability.
A lot of work remains to be done, though. As just discussed, STAs allow for some very unnatural command relations. Even more troubling is that any arbitrary regular constraint over c-strings can be enforced by an STA. Seeing how syntactic constraints are very limited in the shape of depen-dencies they enforce, STAs are overly powerful. Hence STAs can only act as an upper bound.
At the same time, our current STA approach is too limited. In cases where licensing conditions interact with movement, the licensing element might not be part of the c-string of a node. Our STA construction, which is based purely on c-strings, will necessarily fail in these cases. In addition, some cases of licensing involve generalized notions of c-command that go beyond what STAs can handle. We are confident, though, that these issues can be addressed in future work. The next section briefly sketches the solutions we have in mind.
6 Expanding the Core Results

Interactions of Movement and Licensing
Linguists have identified many cases where movement obfuscates licensing configurations by displacing the licensed element. A simple example would be [which book about himself] i does John like t i , where the reflexive is not c-commanded by its antecedent John in the corresponding phrase structure tree. These cases are entirely unproblematic for STAs since the MG dependency trees fully factor out movement, so which book about himself remains in the object position where it is ccommanded by the subject John. The problematic cases are much rarer, to such a degree that convincing examples are hard to come by: I) movement of a licensed element bleeding licensing, and II) movement of a licensing element feeding licensing.
The first case covers configurations where a licensed element -or a subtree containing itmoves to a position above the element's licensor and where the subsequent change in c-command relations does make licensing impossible (in contrast to the binding example above, where licensing holds nonetheless). In MGs with the SpIC this can be captured by an STA. The states of the STA would be n-tuples similar to the output of the featfunction we defined for MGs. The first component records c-string states in the usual fashion, whereas each other component i records the most recent head h with an f + i -feature, plus the state that was assigned to h. When an f i -mover m is found, the information in the i-th component is used to compute the state for m as if m were a left daughter of h. In this case, m is also excluded from the state computation of its siblings in the dependency tree. The details remain to be worked out, but movement of a licensee should be easy enough to handle because the STA can separately keep track of the c-command configurations for each position that is targeted by a mover.
The second option is more complicated. Here some phrase must move into a higher position from where it can c-command the element that must be licensed. In combination with the SpIC, this means that the licensing of a node can be contingent on the nodes contained by its rightmost sibling. An STA can still handle this, but it requires a very different construction from the one described in this paper. As in the bleeding case, a state consists of multiple components, each one of which corresponds to a movement feature. Component i records all the types of licensing conditions that still must be met for nodes that are ccommanded from the most recent head with licensor feature f + i . The states also keep track of the ccommand relations between the heads hosting the relevant licensor features. When a mover with f − i is encountered, the STA checks if the mover can satisfy any of the licensing requirements in component i. If so, those are removed from component i and all other components whose heads are c-commanded by the head for component i. At the end, no state may contain any non-empty components. This strategy reimplements licensing as a mechanism where the STA accrues "licensing debt" while moving through the tree. This debt has to be paid off by movers at a later point. The strategy works because the SpIC allows us to correctly synchronize the licensing debt across the states of all daughter nodes.
While each strategy is relatively simple on its own, integrating them is more difficult. An element may move to a higher position p from where it is only licensed by another element that moves to an even higher position. This requires keeping track of potential licensing debts for p which are then narrowed down to the actual licensing debt once it is known what actually moves to p, and then this debt must be paid off by whatever moves to a position above p. Further complicating the picture, some licensing requirements only need to be satisfied once (e.g. Principle A in the example above), whereas others hold throughout the derivation and are enforced after each movement step. The individual components of the automaton states thus must be synchronized in just the right way, which complicates the construction of the STA even more.

Subcommand
It has been argued that some cases of longdistance binding involve subcommand instead of c-command (see Tang 1989 andHuang andLiu 2001, a.o.). A node x subcommands y iff x ccommands y or x is a specifier of some z that ccommands y. From the perspective of c-strings, x c-commands y iff there is some z in the c-string of y such that x = z or x is the left sibling of a daughter of z. For instance, if Principle A in English allowed for subcommand instead of just c-command, then John's picture pleases himself would be well-formed as John is a specifier of the subject DP and thus subcommands himself.
Subcommand is beyond the reach of STAs because it makes the status of a node n dependent on the daughters of n's left siblings. Similar to the case of movement interactions, there are two possible replies to this. One could point out the rarity of subcommand, and that in the few cases where it arises, it serves as a means to furnish additional antecedents for reflexives beyond those that are already provided via c-command. Hence subcommand might be limited to the syntax-semantics interface and may not directly factor into licensing. Alternatively, one could simply increase the lookahead of STAs from 1 to 2 so that the daughters of daughters are also taken into account. Subcommand then is just a more demanding case of c-command.

Extension to MG Derivation Trees
Once one equips STAs with a more powerful lookahead mechanism to handle subcommand, MG derivation trees once again become a viable alternative to MG dependency trees. All the results in this paper extend from dependency tree to derivation trees if one generalizes STAs to deterministic top-down tree automata with finite look-ahead. That is not surprising because dependency trees can be converted to derivation trees by a linear tree transduction. This shows that derivation trees are the result of separating mothers and daughters in a dependency tree by a finitely bounded amount of material (Merge and Move nodes). Finite lookahead is a means to reconstruct the relations of the dependency tree that have been obfuscated by this additional material.
It remains to be seen which one of the two representation formats ultimately provides for more insightful characerizations. One issue that derivation trees might shed some light on is the monotonic nature of c-command. With an STA over dependency trees, the daughters could be evaluated in any order to determine the states for a c-string constraint. While our current model proceeds leftto-right, we could have just as well gone right-toleft, inside out, or switched between those options based on the label of the mother. Yet only the first option corresponds to c-command as we know it. In a derivation tree, all information is conveyed via dominance. A specifier is not a sibling of the complement but rather resides in a structurally higher position (cf. the positions of John and the CP-complement in Fig. 2). Among all the command variants we just described, the empirically attested one in the form of c-command is the only one that is monotonic with respect to dominance.

Adjuncts
Another problem of dependency trees is the status of adjuncts. In order to obtain the correct ccommand relations, an adjunct of node n would have to be treated as a left sibling of all the specifiers of n. But since adjunction is unbounded, this would mean that a node can have arbitrarily many daughters, whereas STAs are usually defined for trees with a finitely bounded arity. Adjuncts in derivation trees, on the other hand, do not create such issues because each adjunct grows the tree vertically rather than horizontally.
Either way, adjuncts must be subject to the same restriction as specifiers to preserve recognizability by STAs or a suitably generalized variant: even though an adjunct may move on its own, nothing may move out of an adjunct. This is of course a well-established property of adjuncts, known as the Adjunct Island Constraint. Once again, then, our specific subregular perspective derives a wellknown limitation of movement.
The Adjunct Island Constraint has some principled exceptions such as [which car] i did John drive Mary crazy while trying to fix t i (Truswell, 2007). It remains to be seen how these exceptions can be reconciled with our approach.

Across-the-Board Movement
Another well-known island constraint is the Coordinate Structure Constraint (CSC), which forbids extraction from a conjunct. The CSC itself is easy enough to enforce with an STA. It is the exception to the CSC that is of interest here: extraction from a conjunct is permitted if extraction takes place from all conjuncts in the same coordination. Hence which beer did Ed buy and Greg drink is well-formed even though which beer did Ed buy wine and Greg drink is illicit. This is known as across-the-board movement (ATB).
Curiously, the ATB-exception is very natural from the STA perspective. STAs fail if one cannot clearly tell from the local configuration which one of several subtrees a move feature should be passed into. The SpIC and the Adjunct Island Constraint address this by excluding specifiers and adjuncts from this equation, leaving the complement as the only subtree that might contain additional movers. ATB-movement constitutes the opposite solution where the feature is instead passed into every subtree. This, too, is a fully deterministic process and thus within the limits of STAs. If even one conjunct did not need to contain a mover, then STAs would be faced with a non-deterministic choice that they cannot handle. While this has to be explored in greater detail based on a proper formalization of ATBmovement (e.g. Torr and Stabler 2016), it is remarkable that the abilities of STAs closely line up with the attested movement configurations.

Connection to Parsing
The preceding discussion shows that STAs not only furnish a tighter upper bound on syntactic complexity, they also explain core aspects of syntax: the importance of island constraints, the availability of ATB-movement, and the central role of c-command, which merely co-opts mechanisms that are independently needed for movement. But this in turn raises the fundamental question why STA-recognizability should be a relevant concept in syntax. We conjecture that STAs exhibit two properties that are attractive for parsing: top-down recognition, and determinism.
The highly predictive nature of human sentence processing suggests that some top-down strategy is employed, either directly in the form of recursive descent parser, or as a top-down filter of a leftcorner parser. Given a choice between bottom-up or top-down recognition, the latter is a much more natural match for such parsing algorithms. Precompiling a top-down filter into the parse schema of an MG parser like the one in Stabler (2011b) or Stanojević and Stabler (2018) is not trivial, but feasible. Determinism ensures that this precompilation does not explode the parse space, which would slow down the parser. Hence STAs are a good match for current MG models of human sentence processing (Kobele et al., 2013;Gerth, 2015;. Of course a deterministic top-down automaton would also exhibit these properties, but these automata cannot even handle Merge, let alone Move. STAs are a minimal extension of deterministic topdown automata while also furnishing plenty of power for syntactic dependencies. They require some restrictions on movement, but each one of them also improves parsing performance by exponentially reducing the search space (see the discussion of the SpIC's impact on parsing performance in Stabler 2013).
If one assumes that the grammar is but a highlevel description of the parser, the restriction to STAs may be an abstract counterpart to various parser constraints that are meant to improve efficiency. Subregular complexity in syntax may thus be closely connected to parsing.

Conclusion
We have shown that all current results on the subregular complexity of syntax are insightfully subsumed by sensing tree automata operating over MG dependency trees.
The limits of STA-recognizability line up closely with wellestablished restrictions on movement and syntactic licensing conditions. A lot of issues remain to be formally worked out, but we are confident that the perspective developed in this paper will greatly expand our understanding of subregular syntax.