Harmonic Serialism and Finite-State Optimality Theory

This paper presents a new ﬁnite-state model of Optimality Theory (OT). In this model, two assumptions are imposed on the OT framework. Firstly, I adopt the Harmonic Serialism version of OT, in which output forms are derived from input forms via a series of incremental changes. Secondly, constraints are assumed to be strictly local in the sense that each markedness constraint speciﬁes a set of banned sequences, each occurrence of which is penalized. I show that these two assumptions sufﬁce to reduce the power of OT to rational relations.


Introduction
The seminal paper of Frank and Satta (1998) showed that grammars in the Optimality Theory (OT) framework can generate non-rational relations, but that a finite-state implementation is possible if each grammar specifies a bound on the number of violations that can be assigned by a constraint. Since then, various finite-state approximations of OT have been developed that achieve rationality by modifying the framework to reduce its computational power. Karttunen (1998), for example, implemented Frank and Satta's violationbounded proposal by composing constraints using an operation called lenient composition. Improving upon this, Gerdemann and Van Noord (2000) and Gerdemann and Hulden (2012) developed a technique called matching that compares candidates based on the locations where violations are assigned. Eisner (2000) and Eisner (2002) propose a model called directional OT that prefers candidates whose violations are incurred as close as possible to the left or right boundary of the string. Finally, Riggle (2004) presents an algo-rithm, called the Optimality Transducer Construction Algorithm (OTCA), that takes an OT grammar as input and produces a finite-state transducer that correctly computes the grammar if and only if the grammar defines a rational relation.
In this paper, I present a new formalization of OT that limits the generative capacity of OT in two ways. Firstly, I adopt the Harmonic Serialism (HS) version of OT. Whereas the standard version of OT simply maps each input to the candidate that best satisfies a sequence of constraints, HS produces outputs by effecting a series of incremental changes to the input. Secondly, I assume that all constraints are strictly local, in the sense that each constraint designates a set of marked sequences and assigns a violation for each occurrence of a marked sequence. I show that these two assumptions suffice to reduce the power of OT to rational relations.
The structure of this paper is as follows. In Section 2, I introduce technical definitions and terminology used in this paper. Section 3 motivates the use of strictly local constraints and HS as restrictions on OT. Section 4 presents a formalization of HS, and Section 5 presents a finite-state model of HS. Section 6 concludes.

Preliminaries
As usual, Z is the set of integers, and N ⊆ Z is the set of non-negative integers. Unless otherwise specified, Σ denotes a finite alphabet, Σ * denotes the set of all strings over Σ, and Σ + denotes the set of all nonempty strings over Σ. The special symbols and are assumed not to be elements of Σ. When used, these symbols represent the left and right boundaries of a string, respectively. The length of a string x is denoted |x|, and λ denotes the empty string, the string of length 0. Symbols from Σ are identified with strings of length 1, and for any k, Σ k denotes the set of strings of length k over Σ. For any strings a, b ∈ Σ * , ab is the concatenation of a and b. If A, B ⊆ Σ * , then AB = {ab|a ∈ A, b ∈ B}. If a ∈ Σ * and B ⊆ Σ * , then aB = {a}B and Ba = B{a}. A string a is a substring or a subsequence of b if one can write b = lar for some l, r ∈ Σ * .
For any sets A and B, A × B denotes the set { a, b |a ∈ A, b ∈ B}. A relation over A and B is a subset R ⊆ A × B. The transitive closure of a relation R ⊆ A × A is the smallest relationR ⊆ A × A such that R ⊆R and if x, y , y, z ∈R, then x, z ∈R. For any relations R and S, the composition of R and S is the relation • Q is a finite set of states; • Σ is an alphabet called the input alphabet; • Γ is an alphabet called the output alphabet; • I ⊆ Q is the set of initial states; • F ⊆ Q is the set of final states; and The extended transition relation of T is the smallest setδ ⊆ Q × Σ * × Γ * × Q such that δ ⊆ δ * ; for every q ∈ Q, q, λ, λ, q ∈δ; and if q, x, y, r ∈ δ and r, a, b, s ∈ δ, then q, xa, yb, s ∈δ.
The behavior of T is the relation [T ] such that x, y ∈ T if and only if for some q ∈ I and r ∈ F , q, x, y, r ∈δ. A relation is rational if it is the behavior of a finite-state transducer. A language L is k-strictly local if, for some set S ⊆ (Σ∪{ , }) k , L is the set of strings such that every substring of length k is in S.

Restrictions on Constraints
Finite-state models of OT have typically achieved finite-stateness by imposing limitations on the power of constraints. The standard assumption, due to Ellison (1994), is that markedness constraints are finite-state mappings from strings to sequences of violation marks. The violation bound of Frank and Satta (1998) and Karttunen (1998) and the directional evaluation mechanism of Eisner (2000) and Eisner (2002) both refine the class aaabb DEP ID AGR MAX a. aaabb * ! b. aaacbb * ! c. aaaaa * ! d. aaa * * e. bb * * * ! Figure 1: Tableau for a non-rational OT grammar of possible constraints to a strict subclass of rational functions from Σ * to N.
In this paper, I propose to restrict constraints to a proper subclass of rational functions motivated by recent work on the subregular hierarchy. The subregular hierarchy consists of subclasses of regular languages and rational relations that characterize empirically attested patterns in phonology. Among these are the strictly local languages of McNaughton and Papert (1971), the tier-based strictly local languages of Heinz et al. (2011), and the input strictly local and output strictly local functions of Chandlee (2014). All four subclasses formalize the observation that markedness constraints in phonology generally designate a set of undesirable sequences as marked, and penalize strings that contain such sequences. Based on this intuition, I define a class of constraints called strictly local constraints. Definition 1. A strictly local constraint is a function c : Σ * → N such that for some finite set S c ⊆ (Σ ∪ { , }) * , c(x) is the number of unique decompositions x = wyz such that y ∈ S c . We say that c bans the sequence y if y ∈ S c .
A strictly local constraint is a constraint of the form "assign one violation for every instance of s 1 , s 2 , . . . , or s n ," where each s i is a marked sequence. It can be easily shown that strictly local constraints are input strictly local functions from Σ * to N.
Unfortunately, strict locality of markedness constraints is not a sufficient condition for finitestateness. Gerdemann and Hulden (2012) construct a non-finite-state OT grammar using only strictly local constraints as follows. The sole markedness constraint is AGR, which penalizes occurrences of the sequences ab and ba. AGR is outranked by the standard faithfulness constraints DEP and ID, which penalize insertion and substitution of symbols, respectively, while MAX, which penalizes deletion, ranks below all other constraints. This constraint ranking requires that aaabb DEP ID AGR MAX a. aaabb * b. aaacbb * ! c. aaaab * ! * d. aaab * * ! e. aabb * * ! Figure 2: HS version of Figure 1 ab and ba sequences be destroyed by deleting segments. In order to remove all instances of ab or ba, either all as must be deleted, or all bs must be deleted. Between these two options, MAX favors the one that involves less deletion. Thus, if the input has more as than bs, then the bs will be deleted; otherwise, the as will be deleted. To illustrate, the tableau in Figure 1 shows the derivation of the output aaa, obtained by deleting all bs from the input aaabb.
Deleting the least frequent symbol from a string is non-finite-state because such a mapping requires counting the number of occurrences of each symbol. Since MAX adjudicates between the two candidates obtained by deletion, MAX is responsible for counting in this example. While strict locality limits the power of markedness constraints, this observation suggests that the power of faithfulness constraints should be limited as well. I propose to do this using Harmonic Serialism (HS), an alternate version of OT described in McCarthy (2000). In HS, GEN only produces candidates that differ from the input by one symbol. The winner chosen by EVAL is fed back into the grammar until a faithful mapping is obtained. To show how HS can restrict the power of MAX, Figure 2 shows an HS version of the tableau in Figure 1. Due to the restricted power of GEN, only one deletion can be performed at a time. The ab sequence in the input aaabb cannot be destroyed by deleting only one symbol, so MAX simply chooses the faithful candidate. Since a faithful mapping is obtained, this candidate is not fed back into the grammar.

Formalization of Harmonic Serialism
Having motivated the use of HS and strictly local constraints, I now present a formal model of HS.
An HS grammar, like a standard OT grammar, computes a relation R ⊆ Σ * × Σ * via the three components GEN, CON, and EVAL. At the beginning of the computation, the grammar takes a string x as input. GEN reads this input and returns a set of candidates. CON assigns to each candidate a vector of natural numbers known as violations. Finally, EVAL, based on a linear ordering of N × N × · · · × N, reads the set of candidates and their violations and returns the candidate y with the optimal violation vector. If y = x, then y is the output of the grammar. Otherwise, a recursive call to the grammar is made with y as the input, and the output from this call is the output of the grammar.
As discussed in the previous section, HS differs from standard OT in two ways. Firstly, recursive calls to the grammar are not featured in standard OT; instead, EVAL chooses the output in "one fell swoop." Secondly, in HS, GEN is restricted so that changes can only be made to the input "one at a time," so that each call to the grammar produces an optimal candidate that is only minimally different from the input.
These ideas are formalized in the remainder of this section. Let us begin with the notion of a change. A single change to a string is defined as insertion, substitution, or deletion of a single symbol in that string. Definition 2. An operation is an ordered pair such that a = uxv and b = uyv. An application of an operation x, y is called a change when the operation x, y is not specfied.
Strictly local constraints, defined in the previous section, formalize markedness constraints. To treat faithfulness constraints, I adopt the standard view that faithfulness constraints militate against certain kinds of changes to the input. I will assume that GEN is restricted so that on input a, GEN only produces candidates b such that a, b is a change. Since only one change can be made to a, faithfulness constraints can be seen as binary functions that penalize applications of banned operations. Definition 4. A faithfulness constraint is a function f : Σ * ×Σ * → N such that for some set O f of operations not including identities, f (a, b) = 1 if a, b is an application of some x, y ∈ O f , and f (a, b) = 0 otherwise. If x, y ∈ O f , then we say that f bans x, y .
CON contains a set of constraints, which are as-22 sumed to be ranked with respect to one another. The ranking is represented here as a sequence of constraints.
Definition 5. For any strictly local constraint c : Σ * → N, let c be extended to a function c : Σ * × Σ * → N defined by c(x, y) = c(y). A constraint ranking is a sequence of functions c 1 , c 2 , . . . , c n where for each i, c i : Σ * × Σ * → N is either a strictly local constraint or a faithfulness constraint. For any constraint ranking C, the number k C ≥ 0 is the length of the longest sequence banned by a strictly local constraint of C. Among the candidates produced by GEN, EVAL chooses the one that violates the constraints the least. Given a constraint ranking C = c 1 , c 2 , . . . , c n and an input x, this is determined by considering for each candidate y the value c i (x, y). The winner chosen by EVAL is the one that minimizes this value for the most highly ranked constraints possible. To compare different candidates, I define here the notions of cost, benefit, and harmonicity. Definition 6. The cost of a change x, y with respect to a constraint ranking C = c 1 , c 2 , . . . , c n is the vector Definition 7. A vector a = a 1 , a 2 , . . . , a n ∈ Z n is more harmonic than a vector b = b 1 , b 2 , . . . , b n ∈ Z n if there exists j such that a j < b j and for all i < j, Putting these definitions together, an HS grammar is defined as a system, parameterized by a constraint ranking C, that takes a string as input and applies the change that results in the greatest benefit with respect to C. Recursion is performed until the most beneficial change is an identity. Definition 8. An HS grammar is an ordered triple where max is taken with respect to H over strings y such that u, y is a change; and • lettingĤ C be the transitive closure of H C , the relation H * C is defined by

Finite-State Harmonic Serialism
The central result of this paper is that for any HS grammar C, H C , H * C , the relation H * C is rational. In section, I derive this result in two steps. Firstly, I construct a finite-state transducer whose behavior is H C . This shows that a single nonrecursive call to a Harmonic Serialism grammar can be modelled as a rational relation. Secondly, I show that this transducer can be extended to a transducer whose behavior is H * C .

H C as a Rational Relation
This subsection describes a construction for a finite-state transducer that, for any constraint ranking C, computes H C . The construction relies on the property that the benefit of an application a, b of x, y can be computed using only information about a context of bounded size around the position of x in a and y in b. Since there are only finitely many such contexts, this locality property allows us to reduce the set of possible changes performed by H C to a finite number of casesone for each possible context. For each context, we can then construct a transducer that effects the most beneficial change for that context while ensuring that no context allowing for a more beneficial change is available. The union of all such transducers computes H C . Let us now prove the locality property. To that end, I first introduce the definition of a contextsensitive rule, which captures the notion of an operation that only applies in a certain context. Definition 9. A rule is an ordered quadruple x, y, c, d , where x, y is an operation and c, d ∈ (Σ ∪ { , }) * . We denote x, y, c, d by x → y / c d. Definition 10. An application of a rule x → y / c d is a pair a, b such for some u, v ∈ (Σ ∪ { , }) * , a = ucxdv and b = ucydv.
The locality property then states that every application of a rule with a sufficiently large context has the same benefit. Proposition 11. Let C be a constraint ranking, and suppose a 1 , b 1 and a 2 , b 2 are applications of a rule x → y / c d. If |c|, |d| ≥ k C − 1, then b C (a 1 , b 1 ) = b C (a 2 , b 2 ).
Proof. Write C = c 1 , c 2 , . . . , c n . We need to show that for each i, Fix any i ∈ {1, 2, . . . , n}. Note that a 1 , b 1 and a 2 , b 2 are applications of the same operation. Therefore, if c i is a faithfulness constraint, then c i (a 1 , b 1 ) = c i (a 2 , b 2 ). Since a faithfulness constraint cannot ban the application of an identity, c i (a 1 , a 1 ) = c i (b 2 , b 2 ) = 0. From this the equation above follows. Now suppose c i is a strictly local constraint. The equation above can then be rewritten as follows.
Let S i be the set of sequences banned by c i . For any string w, c i (w) is the number of occurrences of elements of S i in w. Since b 1 and a 1 , as well as b 2 and a 2 , only differ by x and y, any occurrence of an element of S i in b 1 but not a 1 or in b 2 but a 2 must contain the x that is replaced by y. Similarly, any occurrence of an element of S i in a 1 but not b 1 or a 2 but not b 2 must contain the y that replaces x. Since |c|, |d| ≥ k C −1 and |s| ≤ k C for all s ∈ S i , any occurrence of some s ∈ S i that includes either the x or the y must be a substring of cxd or cyd. Thus, we have giving us the equation above.
Definition 12. Let C be a constraint ranking. The benefit of a rule r = x → y / c d with respect to a constraint ranking C, denoted b C (r), is defined by b C (r) = b C (c xd , c yd ), where c and d are c and d, respectively, with occurrences of and replaced by λ.
Using Proposition 11, we can now construct a transducer computing H C by considering all possible rules x → y / c d, where |c| = |d| = k C −1. For any input a, a, b ∈ H C if a, b is an application of the most beneficial rule that could be applied to a. Thus, for each rule r, we can construct a transducer that checks whether r is the most beneficial rule that is applicable to its input, and if so, apply r to its input. The following lemma gives us a way to check whether r is the most beneficial rule possible.
Lemma 13. Let C be a constraint ranking, and suppose that a, b is an application of r = x → y / c d, where |c| = |d| = k C − 1. Suppose further that for all y ∈ Σ ∪ {λ}, Then, there is a set F r ⊆ (Σ ∪ { , }) 2k C −1 such that a, b ∈ H C if and only if a does not contain any element of F r as a substring.
Proof. Since a, b is an application of r, the length of a must be at least |cxd| = 2k C −1. Thus, for any b ∈ Σ * , a, b ∈ H C only if a, b is an application of a rule r = x → y / c d with |c | = |d | = k C − 1.
Such a rule r is applicable to a if and only if a contains the substring c x d . Thus, let us define F r by By hypothesis, cxd / ∈ F r , so a rule more beneficial than r can be applied to a if and only if a contains a substring from F r . But a, b ∈ H C if and only if no rule more beneficial than r can be applied to a, hence the lemma.
To use Lemma 13, we only consider rules r = x → y / c d such that no y satisfies To check whether a rule r is the most benefical rule applicable to a string a, we simply construct the set F r and check that a does not contain any element of F r as a substring.
We are now ready to present the construction of a finite-state transducer for H C . Theorem 14. Let C = c 1 , c 2 , . . . , c n be a constraint ranking. Then, H C is a rational relation.
Proof. We need to construct a finite-state transducer T such that on input a, T outputs b if and only if a, b ∈ H C . To do this, we consider two possible cases: either |a| < 2k C − 1, or |a| ≥ 2k C − 1. In the first case, a, b is generally not an application of a rule x → y / c d with |c|, |d| ≥ k C −1, so Proposition 11 does not apply. Instead, we simply observe that the relation is finite, so it is automatically rational. Let T 0 be a transducer whose behavior is this relation. Now, let us assume that |a| ≥ 2k C − 1. Then, a is an application of a rule x → y / c d with |c| = |d| = k C − 1, so we can use the technique discussed in this subsection. To that end, let R be the set of all rules r = x → y / c d such that R is precisely the set of all rules r with |c| = |d| = k C − 1, other than the identity, such that some application of r is in H C . For each r ∈ R, let F r be defined as in Lemma 13. For each rule r = x → y / c d ∈ R, we need to construct a transducer T r that applies r if it is the most beneficial rule applicable to its input. As discussed earlier, this amounts to checking that the input does not contain any element of F r as a substring, and then applying the rule. Observe that the set of strings without substrings from F r forms a (2k C − 1)-strictly local language S r , so to achieve this effect, we can simply take a transducer applying the rule and restricting its domain to S r . Let us call this transducer I r , whose behavior is defined below.
To construct T r , we simply add the boundary symbols and to the input, apply I r , and then remove the boundary symbols.
Finally, to construct a transducer T computing H C , we simply take the union of all the T r s, along with T 0 and the identity relation on any string for which no rule in R is applicable.
It is clear that [T ] = H C , so H C is rational.

Transducing Recursion
Having shown that H C is a rational relation for any constraint ranking C, it remains to show that H * C is rational as well. Recall that the behavior of H * C is to repeatedly apply H C until a fixed point is reached. Since rational relations are closed under composition, the naïve approach to transducing H * C is to take the transducer T constructed in the previous subsection and compose it with itself multiple times. This approach is not correct, however, because there is no bound on the recursion depth of an HS grammar. A string can, in principle, contain arbitrarily many instances of a sequence banned by a strictly local constraint, and such a string could require a recursive call for each instance of a banned sequence.
To address the problem of unbounded recursion depth, I rely on techniques from regular model checking, a discipline that analyzes automated systems with infinitely many state configurations and attempts to find the set of states reachable from the initial states. Under a paradigm introduced by  and Bouajjani et al. (2000), the set of possible states of a system are represented as a regular language, and the possible transitions between states are modelled as a rational relation. Finding the set of reachable states amounts to finding the transitive closure of the transition relation, since the transitive closure is exactly the relation obtained by applying the transition relation to itself arbitrarily many times. Accordingly, much has been written in the model checking literature about how the transitive closure of rational relations might be computed. Surveys of these results can be found in Nilsson (2000), Abdulla et al. (2004), and Abdulla (2012).
Using these techniques, we can take the transducer T computing H C and compute its transitive closureT . The effect ofT is to apply H C to a string arbitrarily many times, soT is able to handle the problem of arbitrary recursion depth.
The intuition behind the technique for computing the transitive closure is as follows. Consider the transducer T shown at the top of Figure 3. This transducer reads strings over Σ = {a, b} and changes the first b to an a. This is done by having T begin in state 0, and enter state 1 when a b is read. Now, let us consider how a transducer computing [T ] • [T ], which changes the first two bs in the input to as, might be constructed. Recall that on an input x, the run of T on x is the sequence q 0 q 1 . . . q n of states that T enters into during its computation. On input x, T produces an output y by changing the first b of x to an a.  producing an output z. This produces another run p 0 p 1 . . . p n . The two runs are visualized in Figure  4, taking x to be a sample input beginning with aabb. A transducer T 2 computing [T ] • [T ] can be constructed by stacking the two runs on top of one another. Each state of T 2 represents a column of the diagram in Figure 4-an ordered pair encoding the state of T during its first and its second iterations. Transitions can then be defined between the columns so as to match the behavior of T during its two passes. The resulting transducer T 2 is shown at the bottom of Figure 3. By inspection, it is clear that this transducer changes the first two bs of its input to as.
Let us now make these ideas explicit by defining the notion of a column transducer. This definition was introduced by Abdulla et al. (2002). Definition 15. Let T = Q, Σ, Σ, {q 0 }, F, δ be a finite-state transducer such that x, y ∈ T implies |x| = |y|. The column transducer for T is the transducer T + = Q + , Σ, Σ, q + 0 , F + , ρ , where q 1 q 2 . . . q m , a, b, r 1 r 2 . . . r m ∈ ρ if and only if there exist a 0 , a 1 , . . . , a m such that a = a 0 , b = a m , and for each i, q 1 , a i−1 , a i , r i ∈ δ.
Abdulla et al. (2002) show that for any transducer T computing a length-preserving relation, [T + ] is indeed the transitive closure of [T ]. How-ever, this is not enough to show that the transitive closure of [T ] is rational. In the example of Figures 3 and 4, the transducer T 2 only computes two iterations of T , so the states of T 2 are columns of length 2. However, the column transducer T + has states of arbitrary length, so T + has infinitely many states, and is therefore not a finitestate transducer.
To remedy this, Abdulla et al. (2002), noting that different states often exhibit the same behavior, define an equivalence relation on Q + in hopes that only finitely many columns in Q + / might be reachable. Definition 16. Let T = Q, Σ, Σ, {q 0 }, {q f }, δ be a finite-state transducer. A state q ∈ Q is leftcopying if q 0 , x, y, q ∈δ implies x = y. A state q ∈ Q is right-copying if q, x, y, q f ∈δ implies x = y. A state is non-copying if it is neither leftcopying nor right-copying. Definition 17. Let p, q ∈ Q + . We write p q if there exist m 1 , m 2 , . . . , m k , n 1 , n 2 , . . . , n k > 0 and q 1 , q 2 , . . . , q k ∈ Q such that • q = q n 1 1 q n 2 2 . . . q n k k , and • for each i, if q i is non-copying, then m i = n i = 1.
Taking the quotient of Q + by does not affect the behavior of T + . This result shows that the transitive closure of [T ] is rational if only finitely many states in Q + / are reached. By inspecting Definition 17, we see that this is possible if each reachable column contains finitely many non-copying states, and if each column contains finitely many alternations between different copying states. Abdulla et al. (2003) introduce a technique known as bideterminization for ensuring that the latter condition is always met, so the former condition is sufficient to ensure that the transitive closure of [T ] is rational.
We are now ready to use Theorem 18 to show that H * C is rational. To do so, we first need to modify the construction from Theorem 14 for the 26 transducer T computing H C so that [T ] is lengthpreserving. This can be done by padding strings with symbols that are treated like λs. Insertions are then performed by replacing these special symbols with symbols from Σ, while deletions are performed by replacing symbols from Σ with special symbols. This allows insertions and deletions to be simulated without changing the length of the input. Once T has been made to preserve length, we construct T , and restrict its output to strings x such that x, y ∈ H C only if x = y. Theorem 19. Let C be a constraint ranking. Then, H * C is rational.
Proof. Let T = Q, Σ, Σ, {q 0 }, F, δ be the finitestate transducer such that [T ] = H C . We shall first show that the transitive closure of T is rational, and then use the transitive closure to construct a finite-state transducer whose behavior is H * C . Let B = {i, d} be the special symbols used to pad strings so that [T ] can be made finite-state. Insertions are made by changing is to other symbols, and deletions are made by changing symbols to ds. For any string x = x 1 x 2 . . . x n , define In other words, ι freely inserts special symbols to a string. Now, let us modify T by again considering the two cases where the length of T 's input a is at most or greater than 2k C − 1. In the former case, for any a, b ∈ [T ], write a = uxv and b = uyv. We replace a, b with In the case where |a| > 2k C − 1, let R be defined as in the proof of Theorem 14. Each rule r = x → y / c d in R is replaced by Let us call the set of these new rules R .
In this modified version of T , the only modifications that could be made to the input are changing is to alphabet symbols and changing alphabet symbols to ds. In particular, ds can never be changed by T . Therefore, if T is applied to an input arbitrarily many times, for any i, the ith position only changes at most twice. This means that in the column transducer T + , the column reached at the ith position can only contain at most two non-copying states, so in the quotient transducer T , only finitely many states are reachable. Removing unreachable states makes T a finite-state transducer, so by Theorem 18, the transitive closure of [T ] is rational.
To complete the proof, let us use T to construct a finite-state transducer M for H * C . Define the transducers E and D, which freely insert and remove padding symbols, respectively, as follows.
[E] = { x, y |x ∈ Σ * , y ∈ ι(x)} [D] = { x, y |y ∈ Σ * , x ∈ ι(y)} M must first insert padding symbols to its input, then apply T , and then remove padding symbols. Afterwards, the range of these operations must be intersected with the set of strings such that the most beneficial change is the identity. Letting S r be defined for each r as in Theorem 14, recall that this set of strings is precisely Since each S r is regular, so is S. Therefore, we write completing the construction.

Conclusion
In this paper, I have shown that the Harmonic Seralism version of Optimality Theory defines rational relations if markedness constraints are assumed to be strictly local. This was done by constructing a finite-state transducer relating each input with the winner chosen by EVAL after a single iteration of the grammar. This transducer was extended to a transducer that makes recursive calls to the grammar by relying on techniques from regular model checking for computing the transitive closure of rational relations satisfying certain conditions. The assumption that markedness constraints are strictly local allowed us to show that H C is regular by partitioning the space of possible changes effected by the grammar into a finite number of cases. The limitation of GEN to 27 "one change at a time" allowed us to construct the transitive closure of H C in such a way that only finitely many states in the quotient transducer are reachable.
For computational phonology, this paper contributes a new finite-state model of OT that incorporates ideas from recent work on the subregular hierarchy and provides an example of how the property of locality could be exploited to restrict the power of OT to rational relations. The model presented here is also the first to achieve finite-stateness using restrictions on OT originating in the phonological literature: most markedness constraints proposed in OT analyses are indeed strictly local, and Harmonic Serialism was first introduced in the original manuscript of Prince and Smolensky (1993).
This paper also has implications for theoretical phonology. While Harmonic Serialism is generally known as a way to model phonological opacity, McCarthy (2000) mentions that in many cases, HS analyses are not distinguishable from standard OT analyses. On the other hand, the ability of HS grammars to make recursive calls is traditionally seen as a significant increase in the complexity of OT, so a standard OT analysis is usually preferable to a similar HS analysis. The proposal of this paper, however, provides evidence against that intuition: since standard OT with strictly local constraints is more powerful than rational relations, the finite-state model presented here shows that HS is weaker than standard OT in language-theoretic terms. Thus, this paper supports the viewpoint, originating from Moreton (1999)'s proof that the recursion of EVAL always converges to a fixed point, that HS is in fact less complex than standard OT. While the ability to feed the output of EVAL back into GEN seems to increase the power of OT, this increase in power is offset by the restriction of GEN to one operation at a time. The rationality of HS provides an interesting distinction between standard OT and HS, and presents motivation for further work on HS phonology.
To conclude, several issues should be addressed in future work on this topic. Firstly, the results presented in this paper are purely theoretical. An implementation of the two constructions described in Section 5 needs to be developed if the ideas from this paper are to be used in NLP applications. Secondly, while the class of strictly local constraints is motived in part by empirical studies regarding phonological patterns in natural language, many constraints found in OT fall outside of this class. Future work should determine the extent to which the power of constraints can be extended while still ensuring that HS grammars define rational relations. One possibility would be to extend strictly local constraints to a class of constraints corresponding to the tier-based strictly local languages. Finally, for the sake of formal completeness, many commitments were made in Section 4 in the development of the formal model of HS used in this paper. In particular, I have assumed that "one change at a time" means insertion, deletion, or substitution of a single symbol. I have also assumed that if a single iteration of EVAL chooses multiple winners, each of these winners is passed back to GEN independently. In reality, multiple proposals exist in the HS literature regarding the implementational details of the framework. By modifying the formalism of Section 4, further studies could investigate which of these details affect the generative power of HS, and which do not.