On the Logical Complexity of Autosegmental Representations

Autosegmental mapping from disjoint strings of tones and tone-bearing units, a commonly used mechanism in phonological analyses of tone patterns, is shown to not be definable in monadic second-order logic. This is abnormally complex in comparison to other phonological mappings, which have been shown to be monadic second-order definable. In contrast, generation of autosegmental structures from strings is demonstrated to be first-order definable.


Introduction
This paper applies logical transductions as introduced by Courcelle (1994) to study the cognitive complexity of non-string representations and transformations in phonology. Generative phonology studies phonological patterns both in terms of transformations, or relations between input underlying representations (URs) and output surface representations (SRs), and phonotactics, or generalizations about the well-formedness of SRs. Studies of the computational complexity of these patterns have established clear bounds on the expressivity needed to describe them. For example, Johnson (1972) and Kaplan and Kay (1994) showed that the ordered rewrite-rule grammars of Chomsky and Halle (1968) describe exactly Regular string relations, and more recent cross-linguistic studies have shown that phonological transformations fall into more restrictive subclasses of the Regular class (Chandlee and Heinz, 2012;Chandlee, 2014;Heinz and Lai, 2013;Payne, 2014;Jardine, 2016a). Similarly, phonotactic patterns have been shown to fall into sub-Regular classes of formal languages (Heinz, 2009(Heinz, , 2010Rogers et al., 2013). This has led to a hypothesis that there is a sub-Regular bound on phonology (see, e.g., Heinz and Idsardi, 2013), which has clear connections to cognitive complexity (Rogers and Pullum, 2011;Rogers et al., 2013) and how humans learn sound patterns (Heinz, 2009(Heinz, , 2010Lai, 2015).
However, these complexity classes are defined in terms of strings, and since the advent of autosegmental phonology (Goldsmith, 1976), generative phonology has commonly employed the use of non-string structures. Perhaps the most commonly used of these has been autosegmental representations (ARs), which represent words with graph structures in which disjoint strings are associated to one another in some fashion. For example, Fig. 1 shows an autosegmental derivation for the Mende word [félàmà] 'junction', which is comprised of a high-toned syllable followed by two low-toned syllables (following convention syllables are represented with σ). In the AR, the tone pattern of [félàmà] is represented as a HL (high-low) string associated to three syllables as depicted on the right-hand side of Fig. 1 (with association depicted by straight lines). While finite-state models of ARs and AR transformations exist, much of this work has found the need to use enriched automata with additional tapes (Kay, 1987;Wiebe, 1992;Kornai, 1995) or synchronized states (Bird and Ellison, 1994).
Instead, this paper takes a logical approach to studying the complexity of autosegmental representations (thus following the work of Bird and Klein, 1990;Jardine, 2014), as it allows for more flexibility with respect to the structures we can describe. This builds on a few key results. First, the Regular sets of strings are exactly those definable by monadic second-order (MSO) logic (Büchi, 1960;Elgot, 1961;Trakhtenbrot, 1961). Second, Courcelle (1994) introduced MSO transductions for graph structures, in which the output structure is determined as an MSO interpretation of the input structure. As MSO-definable string transductions subsume Regular functions (Filiot and Reynier, 2016), we can then recast the Regular hypothesis for phonology in logical terms: phonology is at most MSO-definable. This leads to the two results of this paper, one negative and one positive. First, tone mapping transformations, in which an unassociated AR is mapped to a fully associated one-exemplified in Fig. 1-are not MSO-definable. Second, (at least some) ARs are first-order (FO) definable from strings; i.e., we can write a FO transduction from a string representing a sequence of toned syllables to its corresponding AR. Because this transduction is defined in the terms of the FO language of the input, this means that any FO formula we write over these ARs can be translated into the FO logic of their corresponding strings. This means that any FO constraint written over these ARs still describes a Regular set of strings-i.e., that ARs are not significantly more expressive than strings. This paper is structured as follows. §2 introduces string models and logic, and §3 details how this relates to the study of phonology. §4 discusses the non-definability of tone mapping in MSO, and §5 discusses the FO-definability of ARs from strings. §6 concludes.

String models and logics
Let an alphabet Σ be a finite set of symbols and a string w be a sequence of symbols in Σ; let |w| denote the length of w. Let Σ * represent all possible strings over Σ, including the empty string λ (|λ| = 0). A stringset (or formal language) is some subset of Σ * . For some σ ∈ Σ, σ n denotes the string consisting of n repetitions of σ.
A relational model U, R 1 , R 2 , ..., R n is a representation of some structure with a universe U of elements and n relations R i ⊆ U k for some finite k. We can represent a string w ∈ Σ * with a finite relational model M w = U, ≺, (P σ ) σ∈Σ where U = {1, 2, ..., |w|} is an initial segment of the natural numbers representing the positions in the string, ≺ is a binary relation representing the natural order over the positions in the string, and each P σ is a unary relation representing the set of positions containing the symbol σ. For example, for Σ = {a, b} the model for the string aba is We can use these models of this form to define a first order (FO) predicate logic over strings in Σ * . Let x, y, ... denote variables that range over positions in a string. For variables x and y, we can then use x ≺ y and σ(x) for each σ ∈ Σ as atomic predicates which are true when x and y are interpreted as positions related by ≺ in a string model and when x is interpreted as a position in the unary relation P σ of a model, respectively. We also assume an additional atomic predicate x = y which is true when x and y are interpreted as the same position. A FO logic is then the set of formulas built recursively out of these atomic predicates and the logical connectives ¬, ∧, ∨, → and the quantifiers ∃, ∀ in the usual way. A free variable is a variable not bound by a quantifier; we write ϕ(x 1 , x 2 , ..., x n ) to indicate that x 1 , x 2 , ..., x n is the exhaustive set of free variables in a FO formula ϕ. For example, we can define the following useful formulas with one free variable: We also define a two-variable formula for the successor relation (using infix notation).
A formula ϕ with no free variables is called a sentence. Let satisfaction of a model M of ϕ, written M |= ϕ, be defined in the usual way. The set of strings L(ϕ) described by ϕ is the set of strings {w ∈ Σ * |M w |= ϕ}. For example, if then L(last a ) is the set of strings that end in a.
A monadic second order (MSO) logic is a FO logic extended with the ability to quantify over arbitrary sets in the string. Let set variables X, Y, ... which range over sets of positions in a string. A MSO logic is thus FO logic to which we add the atomic formulas X(x), Y (x), etc., which are true when x is interpreted as a position in the set assigned to X, Y , etc., and in which ∃ and ∀ can also bind set variables.

Logically definable transductions
We can also use logic to define a transduction from an input structure to an output structure, as first introduced by Courcelle (1994) for graphs and later related to string transductions and their automata-theoretic characterizations (Engelfriet and Hoogeboom, 2001;Filiot, 2015). (For an overview of related work see Filiot and Reynier 2016.) In such a logical transduction, the output structure is defined by an interpretation over a finite number of copies of the input structure (where 'interpretation' is used in the sense of a translation from the logical language of one structure into that of another; see, e.g. Hodges 1997). MSO and FO transductions are defined as follows.
Definition 1 (MSO/FO transduction) Given some natural number k, an input alphabet Σ and an output alphabet Γ, an MSO (resp. FO) transduction is defined by • ϕ dom , a domain formula, or sentence in the MSO (FO) logic of the input that defines the domain of the transduction, • For each 1 ≤ n ≤ k and γ ∈ Γ, a formula ϕ n γ (x) in the MSO (FO) logic of the input with exactly one free variable, and • For each 1 ≤ n, m ≤ k, a formula ϕ n,m ≺ Γ (x, y) in the MSO (FO) logic of the input with exactly two free variables To restrict our domain to strings in Σ * , we include in our domain formula the sentence string Σ as defined by The output of such a transduction is defined as follows. For each position x in the input and for each n for which exactly one ϕ n γ (x) is true, a copy of x labeled γ appears in the output. For each pair of positions x, y and for each pair n, m for which ϕ n,m ≺ Γ is true, the nth copy of x precedes the mth copy of y with respect to the output ordering ≺ Γ on positions in the output. For example, to rewrite strings over the alphabet Σ = Γ = {a, b} such that each b immediately following another b in the input is written out as an a then we set k = 1 and This transduction is illustrated in Fig. 2 for an input string abba. First, ϕ 1 a (x) is defined to be true in the output for all positions x in the input that are either labeled a or are labeled b but also succeed some input b. Thus, the copies corresponding to each a in the input are labeled as in the output, as well as the copy for the second input b in the input. Likewise, ϕ 1 b (x) is defined to be true in the output for all positions x in the input that are labeled b and do not succeed another b. In Fig. 2, this is true for the first b in the input, so its copy is also labeled b. As the output order ϕ 1,1 ≺ Γ (x, y) is defined to be true when the input order x ≺ y is true, the order is preserved exactly in the output. Note as there is only one interpretation per input, these transductions are functional. (For non-functional MSO transductions, see Engelfriet and Hoogeboom 2001.) To give one more example, we can define a transduction that 'doubles' a string, i.e. given an input w ∈ Σ * outputs ww. We set k = 2, ϕ dom def = string Σ , and (True and False indicate a formula is evaluated to true or false for any input positions.) This is interpreted as follows. Each a and b in the input is given two identical copies in the output. As both ϕ 1,1 ≺ Γ (x, y) and ϕ 2,2 ≺ Γ (x, y) are set equal to x ≺ y, the first set of copies has the same order as the input, as does the second. That ϕ 1,2 ≺ Γ (x, y) is set to True states that all copies in the first set precede all copies in the second set; this establishes an order between the two sets of copies. Finally, setting ϕ 2,1 ≺ Γ (x, y) to False ensures that second copies never precede first copies. An example with abba in the input is depicted in Fig. 3.

Figure 3: Doubling abba
This second example shows that we can freely manipulate the order of elements in the output; indeed, the output need not be a string. In fact, we can define the output structure to have a new binary relation R not present in the input structure by definining a predicate ϕ m,n R (x, y) in terms of the MSO logic of the input structure. We make use of this in §4.2. Importantly, as they are defined in terms of interpretations, both MSO and FO transductions are closed under composition for graph structures in general (Courcelle, 1994).

Logic and phonology
Because of its well-known connections to computational complexity, we can apply logic to the study of the complexity of phonological patterns. This section reviews relevant results from the study of phonotactic (phonological surface wellformedness) patterns as stringsets and phonological transformations (mappings from URs to SRs) as transductions. Both show that MSOdefinability is a clear, if loose, bound on the complexity of phonology.

Stringsets
Phonotactics are language-specific wellformedness constraints on how sounds can be combined to create words. An example from Kagoshima Japanese is given in Table 1: all words have a high tone either on the final or penultimate mora (Kubozono, 2012 All previous work on natural language phonotactics as stringsets has found these patterns to be at most Regular stringsets, with all but a few exceptions being sub-Star Free (Heinz, 2007(Heinz, , 2009(Heinz, , 2010Heinz et al., 2011;Rogers et al., 2013). In logical terms, this means that definability in MSO is a clear bound on the complexity of phonotactics, with most patterns being FO-definable. To illustrate, the stringset representing the Kagoshima tone pattern can be defined with the FO sentence where last H is defined as last a above and penult H is defined as (∀x, ∃y)[(x ⊳ y ∧ last(y)) → H(x)]. This sentence describes the set of strings that has exactly one H either in final or penultimate position. These results are important for a theory of phonology because they allow for the hypothesis that phonotactics are at most MSOdefinable, a hypothesis which can be interpreted in terms of cognitive complexity (Rogers and Pullum, 2011;Rogers et al., 2013) and how humans learn phonotactics (Heinz, 2010;Lai, 2015;McMullin and Hansson, 2015). More restrictive characterizations exist, based on subclasses of the Star-Free stringsets (see, e.g., Heinz, 2010), but for the present purposes it is enough to consider FO-and MSO-definability.

Transductions
We can also fruitfully apply logical transductions to phonological theory (Heinz, forthcoming), as mainstream theories of generative phonology aim to explain linguistic sound patterns through a transformation from an input UR to an output SR (Chomsky and Halle, 1968;Prince and Smolensky, 2004). Indeed, these transformations have been studied from an automatatheoretic perspective, leading to restrictive characterizations of phonology. Johnson (1972) and Kaplan and Kay (1994) show that the phonological rewrite rules of Chomsky and Halle (1968) are describable with finite state machines; that is, that they describe Regular relations. Subsequent work on phonological transformations has demonstrated for a wide variety of processes-including local assimilation, deletion, and epenthesis (Chandlee, 2014), dissimilation (Payne, 2014), metathesis (Chandlee and Heinz, 2012), and vowel harmony (Heinz and Lai, 2013)-to lie in even more restrictive subclasses of Regular string transductions. The single known possible exception to this is full reduplication-i.e. the copying over of an entire input form, as in the Indonesian bukubuku 'books ', lit. 'book-book' (Sneddon et al., 2010). This is not a Regular relation, although it can be argued that this process is morphological and not phonology proper (for discussion see Chandlee and Heinz, 2012).
From the logical perspective, all of these results place phonological transformations squarely within the class of MSO-definable transductions. Any functional Regular relation is MSO-definable (Filiot and Reynier, 2016), so any phonological transformation describable with a (functional) rewrite rule is MSO-definable. Even full reduplication is FO-definable, as witnessed by the string doubling transduction defined in §2.2. Thus, MSO-definability appears to be a loose, yet clear, bound on the computational complexity of phonological transformations.

Interim summary
The above has reviewed the evidence for MSOdefinability as a complexity bound on phonology. The advantage of viewing such a complexity bound in logical terms is that we are able to view the complexity of both phonotactics and transformations in unified terms.
A further advantage of the logical perspective is that it allows us to study the complexity of nonstring representations in the same terms. The remainder of the paper studies autosegmental representations (ARs) in the same terms.

The complexity of tone mapping
This section motivates tone mappings and ARs using a well-known empirical case, then it is shown that tone mapping is not MSO-definable.

Tone mapping in Mende
Mende (Leben, 1973(Leben, , 1978 is a classic example of a tone pattern which has been argued to be best analyzed in terms of autosegmental mapping of tones to syllables. Mende nouns fall in to one of five categories: 1) words for which all syllables are pronounced with a high tone (e.   (Leben, 1978) Of interest is the fact that contour tonesthat is, the rising and falling toned-syllablesand plateaus, or sequences of like-toned syllables, only occur on the right edge of the word. For example, [nyáhâ] 'woman' is attested, but a word like *[nyâhá], with a rising toned-syllable on the left edge, is not attested. Likewise, words like [félàmà] 'junction', with a sequence of two lowtoned syllables on the right edge, are commonly attested, whereas words like *[fèlàmá], are rare. 1 Furthermore, Mende words conform to one of the five tonal shapes exemplified in Table 2; words showing a falling-rising pattern, for example, are unattested.
Furthermore, these tonal shapes are maintained when toneless suffixes are affixed to the noun, resulting in the tone of the suffix varying depending on the tone pattern of its root. The following data illustrate this with the toneless suffix /-ma/ 'on'.  (Leben, 1978) Note that the suffixed forms also preserve the generalizations noted above restricting contours and plateaus to the right edge, and so the tone pattern 'stretches' to accomodate the new syllable. Thus [mbû] 'owl', which has a contour falling tone in isolation, is realized with a sequence of pure high and low-toned syllables as [mbú-mà] 'on owl' when suffixed.
Following a proposal by Leben (1973), and subsequent work in autosegmental phonology (e.g., Goldsmith, 1976;Pulleyblank, 1986;Yip, 2002) tone patterns like Mende's have been explained by a left-to-right mapping of a melody, or string of tonal units, to a string of syllables. These disjoint strings are referred to as tiers, and the representation as a whole is an AR. For example, the words in Table 2, Row 3 share a HL (high-low) melody, which is then mapped to the syllables in the words as depicted in Fig. 4. Following convention, syllables are denoted with σ and the association relation depicted as lines drawn between units on distinct tiers.
Thus, the contour falling tone of [mbû] 'owl is the result of an HL sequence associating to a single syllable; likewise the plateau of low-toned syllables in [félàma] 'junction' is the result of an L tone associating to multiple syllables. The question then is how to restrict association such that this multiple association occurs only on the right edge Figure 4: ARs illustrating mapping of HL melody to words of various syllable length of the word. Because this association pattern holds for all lexical items, including the suffixed forms in Table 3, it is thus entirely predictable and taken not to be present in the UR of a word. Thus, it must be created by some phonological transformation that associates tones to syllables. This transformation has been analyzed as proceeding according to laws often referred to as the well-formedness conditions (WFCs). The following definition is due to Yip (2002).
Definition 2 The well-formedness conditions a. Every syllable must have a tone. b. Every tone must be associated to some syllable. c. Association proceeds one-to-one, left-toright. d. Association lines must not cross.
Intuitively, the WFCs ensure that in the SR, every tone is associated to some syllable, and viceversa, by a step-by-step process in which first tone and first syllable are associated, then the second tone and second syllable, and so on. (It bears mentioning that 'one-to-one' here is used not as it is to describe mathematical functions, but in terms of how pairs of tones and syllables are associated one after another.) If there are remaining tones or syllables on the right edge of a tier that have not been paired off, WFCs (a), (b), and (d) associate them to the rightmost unit on the opposite tier: (a) and (b) require all units to be associated, but (d) forbids the crossing of any existing associations to do so. Fig. 5 shows how this process works for [nyàhâ] 'woman', [nyàhá-mà] 'on woman', both of which have an underlying LHL melody, and [félàmà] 'junction', which has an underlying HL melody. This figure demonstrates that the WFCs explain the generalization in Mende that contours and plateaus only occur on the right edge of the word through a transformation from a UR with no association to a fully associated SR.
The WFCs have been shown not to be strictly universal; whether or not tones are as- Step-by-step breakdown of the association transformation sociated from left-to-right or right-to-left, or whether or not contours are built out of leftover tones or are simply left unpronounced, have been shown to vary from language to language (Goldsmith, 1976;Newman, 1986;Pulleyblank, 1986;Hewitt and Prince, 1989;Archangeli and Pulleyblank, 1994;Yip, 2002). Also, there have since been non-derivational approaches to tone mapping (Zoll, 2003). However, all generative explanations of tone mapping patterns that use ARs rely on the idea of an unassociated UR being transformed into a SR with associations, with one-to-one association forming the basis of the transformation. The following shows that such a transformation is not a MSO-definable transduction.

Tone mapping as a logical transduction
We can characterize this transformation, as depicted in Fig. 6, as a transduction that takes two strings of length n and m, respectively (n, m > 0), as input and adds an association relation between the positions in the strings that follows the WFCs outlined in Def. 2. As long as we can assume there is some property that distinguishes between units on each tier, we can abstract away from distinctions between units on a particular tier and instead focus on predicates a(x) and b(x) which are true if and only if x is on the 'upper' and 'lower' tier, respectively. (For example, a(x) can mean 'x is a tone' and b(x) can mean 'x is a syllable'.) In terms of relational models, the transduction takes models of the form U, ≺, P a , P b where • denotes a new relation representing association. This relation must conform to the WFCs in Def. 2; this is formalized in Def. 3 of the association relation.
Definition 3 For an AR whose tiers are a pair of disjoint strings a 1 a 2 ...a n and b 1 b 2 ...b m , and for ℓ being the lesser of n and m, the association relation • is the unique relation comprised of the symmetric closure of all pairs (a i , b i ) for 1 ≤ i ≤ ℓ unioned with (a n , b n+1 ), ..., (a n , b m ) if n = ℓ or (a m+1 , b m ), ..., (a n , b m ) if m = ℓ.
No MSO definition of this relation is possible. To show how, we can rely on two facts established above: 1) MSO transductions are essentially interpretations where relations in the output structure are defined in terms of the logical language of the input structure; and 2) MSO transductions are closed under composition.
First, we can instead consider as input strings of the form a n b m (again, with n, m > 0). This set is MSO definable, as witnessed by the formula string a n b m defined below: These strings are equivalent, in terms of MSO, to pairs of disjoint strings of shape a n and b m . The reason is that we can write a MSO transduction from one to another. If ≺ is the ordering in the input a n b m string, we simply define the order ≺ Γ for the output structure such that it omits all precedence between a positions and b positions.
An example of this is given in Fig. 7. Input: Figure 7: Relating a n b m strings to disjoint tiers a n and b m .
Thus, we know that MSO statements over pairs of disjoint strings a n and b m are equivalent to statements over strings in a n b m . We can then use this to prove Theorem 1.
Theorem 1 Association between two disjoint strings according to the WFCs in Definition 2 is not MSO definable.
Proof: The proof is by contradiction. Assume the converse, that we can define using MSO a transduction that takes disjoint pairs of strings a n and b m and outputs them as associated autosegmental representations with some association relation • that obeys the WFCs as defined in Def. 3.
With this relation we can write a sentence ϕ eq def = (∀x, ∃y, ∀z)[x • y ∧ (x • z → z = y)], which holds that every position is associated to exactly one position. If • obeys the WFCs, then per the discussion of the structure of • in Def. 3, ϕ eq is only true for structures whose association relation is the set of pairs of the form (a i , b i ) (and their converse). That is, the only structures for which ϕ eq are true are pairs of disjoint strings a n and b m for which n = m. Now consider strings of the form a n b m . As shown above, there is a MSO transduction from these strings to disjoint pairs of a n and b m . By assumption, there is then a MSO transduction that adds an association relation • to these pairs. Because MSO transductions are closed under composition, then ϕ eq can be written as the MSO language of strings {a, b} * . Thus the sentence string a n b m ∧ ϕ eq restricts us exactly the set of strings a n b m for which n = m. It is well-known that this set is not regular (see, e.g., Hopcroft et al., 2006), and thus not MSO definable. Thus we have a contradiction, and so the assumption must be false.
Importantly, because MSO transductions are closed under composition, there is no breakdown of this process into a finite set of composite steps (such as those illustrated in Fig. 5) that are themselves MSO-definable. Note also that the proof highlights that in particular it is the one-to-one requirement on association that makes it not MSOdefinable: it is this property that introduces the ability to check the parity of the a and b tiers.

Interpreting the result
We have thus shown that, as a transduction from an unassociated pair of tiers to an associated one following the WFCs in Def. 2, tone mapping is not MSO-definable. This puts it in sharp contrast to all other phonological UR-SR transductions for whom complexity results exist: as discussed in Sec. 3.2, these processes have been shown to be well within MSO-definability. This makes tone mapping aberrant in terms of its computational complexity. How do we interpret this result?
One answer is to take this as evidence that tone has access to more computational power than other parts of phonology. In fact, this has been argued by Jardine (2016a) on the basis of comparisons between the complexity of tonal phenomena and segmental phenomena when viewed as string transductions. However, an issue with this interpretation is that the tonal phenomena that Jardine cites are still Regular relations and thus MSOdefinable. Thus tone mapping is still highly complex, even compared to other tonal phenomena.
Another possible interpretation is that tone mapping is simply an incorrect characterization of the data available. For example, both Dwyer (1978) and Shih and Inkelas (2015) take issue with Leben (1973)'s tone mapping characterization of Mende, and offer alternative explanations using representational assumptions that do not require tone mapping (or an analogue thereof). However, other tone patterns that have been successfully accounted for using tone mapping include those of Hausa (Newman, 1986(Newman, , 2000, Kukuya (Hyman, 1987), and the wide variety of languages analyzed in Goldsmith (1976), Pulleyblank (1986), andZoll (2003). The alternative explanations mentioned above have yet to be shown to enjoy the same broad empirical coverage (though future work may indeed show this).
Another explanation is that it is wrong to assume that the mapping generalization holds for tiers of unbounded length. The proof above relies on the fact that the input strings are of the form a n b m for any n, m-if either n or m had some bound, the proof would no longer hold. Indeed, Yli-Jyrä (2013) gives a finite-state (and thus MSO-definable) implementation of tone mapping given the assumption that the tonal tier is bounded. However, it is not clear that this can be assumed for all cases. For example, in Kikuyu (Clements and Ford, 1979), morphological concatenation can extend the tonal tier before mapping occurs. Regardless, "tonal tiers must be bounded" is a hypothesis worth further testing, as the result here shows it has consequences for the complexity of phonology.
A final interpretation of the result is to posit that the one-to-one property of tone mapping as a phonological universal and thus is not relevant to the study of the complexity of language-specific phonological phenomena. As noted in §4, there are languages whose patterns have been shown to violate the WFCs in Def. 2 with respect to directionality and whether or not all tones or syllables are associated. However, the one-to-one property appears to be shared by all such patterns. Jardine (2016b, to appear) demonstrates for many of these patterns that, if one-to-one association is assumed in the representation, these language-specific constraints on association can be described with a restricted propositional logic, well within the complexity of MSO. Thus, if we isolate the one-toone property of association, shown in the proof of Thm. 1 to be responsible for its non-definability in MSO, from the other aspects of tone mapping, then we can maintain MSO-definability as a cohesive bound on the complexity of language-specific phonological phenomena. How this separation might be implemented in a theory of tone will be left for future work.

Deriving autosegmental representations from single strings
The result in the previous section raises an important question: How powerful are ARs? Specifically, does invoking ARs allow for grammars that are too expressive to provide a reasonable theory of phonological patterns? To put it in more concrete terms, we can represent the tone pattern of a word either as a string of toned syllables or as an AR. 2 Table 4 gives some examples, two from Mende and one hypothetical, where strings are over an alphabet {H, L, F, R} whose symbols represent high-, low-, falling-, and rising-toned syllables, respectively. Thus, for any string over {H, L, F, R}, there is a corresponding autosegmental representation. Note that these ARs obey WFCs (a), (b), and (d) from Def. 2, in that each tone is associated to a syllable and vice versa, and these association lines do not cross. However, the AR for LLRH violates WFC (c), as the tones have not associated in a leftto-right manner. We can thus talk about ARs that obey (a), (b), and (d), but will ignore (c), as the latter would restrict us to a subset {H, L, F, R} * .
We can then talk about strings and their corresponding ARs. Jardine and Heinz (2015) define such a relationship in terms of concatenation, but they do not address how this relationship connects to complexity. As discussed in §3.1, natural language phonotactics are largely describable with FO-definable stringsets. The question then is, given, for example, an FO logic over ARs, can we describe sets of strings that are not FO-definable? This is a valid question as, for example, even restrictions on FO over trees can generate Context Free stringsets (Rogers, 1997).
The following demonstrates otherwise: ARs are FO-definable from strings, and thus any FO formula over ARs is translatable into a FO formula in strings (given ≺). It should be noted that autosegmental phonology is not a monolithic theory, and in practice various definitions of ARs have been proposed (one formal overview can be found in Coleman and Local, 1991). A full survey of these and how they might be defined is beyond the scope of this paper; instead, this section focuses on demonstrating that basic ARs obeying the WFCs (a), (b), and (d) in Def. 2 are FO-definable from strings over {H, L, F, R}. In other words, we formalize the relationship between strings and ARs exemplified in Table 4. This illustrates that the fundamental ideas of autosegmental structuredistinct tiers associated to one another according to some well-formedness conditions-is FOdefinable from strings.

Definition of transduction
We define a transduction from string models of the form U, ≺, P H , P L , P F , P R to autosegmental models of the form Essentially, we define the transformation from the second column of Table 4 to the third, for all strings in {H, L, F, R} * . We do this by defining the notion of a span, or a series of consecutive positions in U that share the same tone and thus will be associated to the same tone on the melody tier in the output AR. We then create extra copies of each element in U that represents a change in spans. These extra copies become the tones in the melody tier. Note that the order in the output is a successor relation ⊳ ′ ; this is not essential, but was chosen for two reasons. One, it is more straightforward to depict in the examples below. Two, its definition gives formal weight to an idea long noted by phonologists: local relationships (i.e. those over ⊳ ′ ) in ARs correspond to long-distance relationships (i.e. those over ≺) in strings (see, e.g., Odden, 1994).
First, we define some useful formula and notational shortcuts. The first denotes when y lies between some x and z.
We then define formulas in the logic of the input string that represent the tonal relationships between units in the string. The following formula sametone(x, y) is true when x ends with the same tone that y begins with (thus it is true for an H and an F pair as F starts high).
We do this because we will create tones on the melody tier exactly at syllables where there is a change in tone. This marks the beginning of a span of one or more like-toned syllables.
For example, the positions in the following strings that satisfy spanfirst(x) are underlined: HLL, LF, and LLRH. Note that neither the R nor the H in LLRH satisfy this formula because sametone(x, y) returns true when x is L and y is R and likewise when x is R and y is H.
We can then define the transduction from strings to autosegmental representations by setting k = 3. One set of copies transfers over the syllables, the next initial tones. The third set of copies is for creating the additional tones in the F and R contours. (In general, for strings whose symbols represent contours of at most length n, k = n + 1.) We define the unary labeling relations in the autosegmental representation as follows.
All other unary formulas are set to false; that is, False. This works as depicted in Fig. 8 for the strings HLL, LF, and LLRH. Each set of copies in the output is organized into a labeled row.
As ϕ 1 σ (x) is set to True, every element in the input has a copy in set 1 labeled σ. This corresponds to the intuition that each position in the input string represents a syllable. In copy set 2, Hs and Ls are copied over only at the positions in a string representing a change in tone, as ϕ 2 H (x) and ϕ 2 L (x) are defined to be true only when spanfirst(x) is true. (Note again that for the string LLRH, this is false for the positions labeled R and H.) Finally, ϕ 3 H (x) and ϕ 3 L (x) ensure that the contour-toned syllables F and R in the input are given a third copy L and H, respectively, in the output. The next step is to define an order on the elements of the output. Again, we define the successor relation ⊳ ′ . Like the definition of the unary relations, this definition will make use of the spanfirst(x) and span(x, y) formulas.
We set ϕ m,n ⊳ ′ (x, y) for all other values of m and n to false. These definitions work, as illustrated in Fig. 9, as follows. The formula ϕ 1,1 ⊳ ′ (x, y) copies the successor relation from the input faithfully for the initial copies (i.e. those labeled σ). Next, ϕ 2,2 ⊳ ′ (x, y) draws a successor relation between the second copies of the initial positions for adjacent spans. Finally, ϕ 2,3 ⊳ ′ (x, y) and ϕ 3,2 ⊳ ′ (x, y) deal with the extra elements in a contour. The formula ϕ 2,3 ⊳ ′ (x, y) draws a successor relation from the second copy of a contour-toned syllable to its third (i.e., between the two tones in the contour) when the contour is first in a span (e.g. F in LF). In case the first part of a contour is part of a previous span (e.g. in the case of R in LLRH), it draws a successor relation from the first position in the previous span to the second part of the contour. The formulas ϕ 3,2 ⊳ ′ (x, y) and ϕ 3,3 ⊳ ′ (x, y) then similarly draw a successor relation from the second part of a contour that is the start of a span to the initial position of the next successive span (these latter two formulas are not used in the examples). In: In: Finally, we draw the associations between the tiers. This is relatively simple: we define formulas that relate a second or third copy of a node with its own first copy as well as any first copies in its span (and vice versa, to obtain a symmetric relation).
We set ϕ m,n • (x, y) for all other values of m and n to false. This obtains the association relations as depicted in Fig. 10.
As the reader can confirm, we have thus obtained the relationship between strings and ARs as originally examplified in Table 4. Proof: (Sketch.) WFCs (a) and (b) in Def. 2 stipulate that every syllable is associated to a tone and vice versa. Note that for any position x in the input string, either spanfirst(x) will be true of this position or span(y, x) is true of this position in some y. The formulas defining the • relation ensures all such pairs are associated and each spanfirst(x) is associated with its copy on the melody tier, and that this relation is symmetric. For WFC (d), that the definitions depend on both spanfirst(x) and span(x, y) means that no association line will 'cross' into a new span.

Discussion
We have thus demonstrated a set of ARs that are FO-definable from strings representing sequences of toned syllables. Because this transduction is defined as an interpretation of the input structure, the relations in the AR model are equivalent to FOstatements in the string model. For example, the atomic formula x⊳ ′ y in the AR model is true when either of the formulas ϕ n,m ⊳ ′ is true. In other words, x⊳ ′ y ≡ ϕ 1,1 ⊳ ′ (x, y) ∨ ϕ 2,2 ⊳ ′ (x, y) ∨ ϕ 2,3 ⊳ ′ (x, y) ∨ ϕ 3,2 ⊳ ′ (x, y) ∨ ϕ 3,3 ⊳ ′ (x, y). The same is true for the other atomic formulas x • y, H(x), L(x), and σ(x) in the FO logic of the AR model. This means that any FO formula in the logic of the AR model can be translated into the FO logic of the string model. Thus, FO over these ARs is equivalent to FO in the string model.
One caveat is that in the definition for the AR successor order, ϕ 2,2 ⊳ ′ (x, y), ϕ 2,3 ⊳ ′ (x, y), ϕ 3,2 ⊳ ′ (x, y), and ϕ 3,3 ⊳ ′ (x, y) all used the string precedence predicate x ≺ y, either directly in the definition or through the use of the predicate span(x, y). While concerns for space preclude a full proof, it is easy to see that these same predicates could not be defined using the string successor x ⊳ y and still account for spans of arbitrary length. This means that including the precedence relation ≺ in the string model is crucial for the definition of the AR successor ⊳ ′ (note that x ⊳ y is FO-definable from x ≺ y but the reverse is not true). As mentioned above, this means that successor in the AR, specifically successor on the melody tier, corresponds to precedence in the string model.
To summarize, this section has introduced a method for defining ARs in FO from strings representing sequences of toned syllables. Thus, FO statements over ARs are no more powerful than FO statements over strings (with ≺). Note again that this definition is categorically different from the tone-mapping transformation discussed in the previous section, which was shown to not be MSO-definable.

Conclusion
This paper has presented two new results, one negative and one positive, regarding complexity and autosegmental representations in phonology. The first result is that tone mapping transformations assigning units on one tier to units on another tier in a one-to-one fashion are not MSO-definable. This is in sharp contrast to other phonological patterns, which have been shown to be at least MSO definable and, in most cases, FO-definable. The second, positive, result is that ARs are FO-definable from strings, showing that they do not significantly increase the expressive power of phonotactic grammars. It is thus also likely that they do not significantly increase the expressive power of string mappings, although the logical study of phonological transformations is still ongoing (see, e.g., Heinz, forthcoming). This work thus represents one of many steps towards an understanding of phonological computation and representation.