Output Strictly Local Functions

This paper characterizes a subclass of subse-quential string-to-string functions called Out-put Strictly Local (OSL) and presents a learning algorithm which provably learns any OSL function in polynomial time and data. This al-gorithm is more efﬁcient than other existing ones capable of learning this class. The OSL class is motivated by the study of the nature of string-to-string transformations, a cornerstone of modern phonological grammars.


Introduction
Motivated by questions in phonology, this paper studies the Output Strictly Local (OSL) functions originally defined by Chandlee (2014) and . The OSL class is one way Strictly Local (SL) stringsets can be generalized to stringto-string maps. Their definition is a functional version of a defining characteristic of SL stringsets called Suffix Substitution Closure (Rogers and Pullum, 2011). Similar to SL stringsets, the OSL functions contain nested subclasses parameterized by a value k, which is the length of the suffix of output strings that matters for computing the function.
As Chandlee (2014) argues, almost all local phonological processes can be modeled with Input Strictly Local (ISL) functions. Yet there is one notable class of exceptions: so-called spreading processes, in which a feature like nasality iteratively assimilates over a contiguous span of segments. As we show, the OSL functions are needed to describe this sort of phenomenon.
Here we provide a slight, but important, revision to the original definition of OSL functions in Chandlee (2014) and , which allows two important theoretical contributions while preserving the previous results The first is a finite-state transducer (FST) characterization of OSL functions, which leads to the second result, the OSLFIA (OSL Function Inference Algorithm) and a proof that it efficiently identifies the k-OSL functions from positive examples. We compare this algorithm to OS-TIA (Onward Subsequential Transducer Inference Algorithm, Oncina et al. (1993)) which identifies total subsequential functions in cubic time, its modifications OSTIA-D and OSTIA-R, which can learn particular subclasses of subsequential functions using domain and range information, respectively, in at least cubic time (Oncina and Varò, 1996;Castellanos et al., 1998), and SOSFIA (Structured Onward Subsequential Inference Algorithm, Jardine et al. (2014)), which can learn particular subclasses of subsequential functions in linear time and data. We show these algorithms either cannot learn the OSL functions or do so less efficiently than the OSLFIA. These contributions were missing from the initial research on OSL functions (except for a preliminary FST characterization in Chandlee (2014)). Finally, we explain how a unified theory of local phonology will have to draw insights from both the ISL and OSL classes and offer an idea of how this might work. Thus, this paper is a crucial and necessary intermediate step towards an empirically adequate but restrictive characterization of phonological locality.
The remainder of the paper is organized as follows. Motivation and related work are given in sec-tion 2, including an example of the spreading processes that cannot be modeled with ISL functions. Notations and background concepts are presented in section 3. In section 4 we define OSL functions, and the theoretical characterization and learning results are given in sections 5 and 6. In section 7, we explain how OSL functions model spreading processes. In section 8 we elaborate on a few important areas for future work, and in section 9 we conclude.

Background and related work
A foundational principle of modern generative phonology is that systematic variation in morpheme pronunciation is best explained with a single underlying representation of the morpheme that is transformed into various surface representations based on context (Kenstowicz and Kisseberth, 1979;Odden, 2014). Thus, much of generative phonology is concerned with the nature of these transformations.
One way to better understand the nature of linguistic phenomena is to develop strong computational characterizations of them. Discussing SPE-style phonological rewrite rules (Chomsky and Halle, 1968), Johnson (1972, p. 43) expresses the reasoning behind this approach: It is a well-established principle that any mapping whatever that can be computed by a finitely statable, well-defined procedure can be effected by a rewriting system (in particular, by a Turing machine, which is a special kind of rewriting system). Hence any theory which allows phonological rules to simulate arbitrary rewriting systems is seriously defective, for it asserts next to nothing about the sorts of mappings these rules can perform.
This leads to the important question of what kinds of transformations ought a theory of phonology allow?
Earlier work suggests that phonological theories ought to exclude nonregular relations (Johnson, 1972;Kaplan and Kay, 1994;Frank and Satta, 1998;Graf, 2010). More recently, it has been hypothesized that phonological theory ought to only allow certain subclasses of the regular relations (Gainor et al., 2012;Payne, 2013;Luo, 2014;Heinz and Lai, 2013). This research places particular emphasis on subsequential functions, which can informally be characterized as functions definable with a weighted, deterministic finite-state acceptor where the weights are strings and multiplication is concatenation. The aforementioned work suggests that this hypothesis enjoys strong support in segmental phonology, with interesting and important exceptions in the domain of tone (Jardine, 2014).
Recent research has also showed an increased awareness and understanding of subregular classes of stringsets (formal languages) and their importance for theories of phonotactics (Heinz, 2007;Heinz, 2009;Heinz, 2010;Rogers et al., 2010;Rogers and Pullum, 2011;Rogers et al., 2013). While many of these classes and their properties were studied much earlier (McNaughton and Papert, 1971;Thomas, 1997), little to no attention has been paid to similar classes properly contained within the subsequential functions. Thus, at least within the domain of segmental phonology, there is an important question of whether stronger computational characterizations of phonological transformations are possible, as seems to be the case for phonotactics.
As mentioned above, Chandlee (2014) shows that many phonological processes belong to a subclass of subsequential functions, the Input Strictly Local (ISL) functions. Informally, a function is k-ISL if the output of every input string a 0 a 1 · · · a n is u 0 u 1 · · · u n where u i is a string which only depends on a i and the k − 1 input symbols before a i (so a i−k+1 a i−k+2 · · · a i−1 ). (A formal definition is given in section 4). ISL functions can model a range of processes including local substitution, epenthesis, deletion, and metathesis. For more details on the exact range of ISL processes, see Chandlee (2014) and Chandlee and Heinz (to appear).
Processes that aren't ISL include long-distance processes as well as local iterative spreading processes. As an example of the latter, consider nasal spreading in Johore Malay (Onn, 1980). As shown in (1), contiguous sequences of vowels and glides are nasalized following a nasal: This process is not ISL, because the initial trigger of the spreading (the nasal) can be arbitrarily far from a target (as suggested by the nasalization of the glide and the second [ã]) when the distance is measured on the input side. However, on the output side, the triggering context is local; the second [ã] is nasalized because the preceding glide on the output side is nasalized. Every segment between the trigger and target is affected; nasalization applies to a contiguous, but arbitrarily long, substring. It is this type of process that we will show requires the notion of Output Strict Locality. Processes in which a potentially unbounded number of unaffected segments can intervene between the trigger and target -such as long-distance consonant agreement (Hansson, 2010;Rose and Walker, 2004), vowel harmony (Nevins, 2010;Walker, 2011), and consonant dissimilation (Suzuki, 1998;Bennett, 2013) -are neither ISL nor OSL. More will be said about such long-distance processes in §7.

Preliminaries
The set of all possible finite strings of symbols from a finite alphabet Σ and the set of strings of length ≤ n are Σ * and Σ ≤n , respectively. The cardinality of a set S is denoted card(S). The unique empty string is represented with λ. The length of a string w is |w|, so |λ| = 0. If w 1 and w 2 are strings then w 1 w 2 is their concatenation. The prefixes of w, Pref(w), is {p ∈ Σ * | (∃s ∈ Σ * )[w = ps]}, and the suffixes of w, Suff(w), is {s ∈ Σ * | (∃p ∈ Σ * )[w = ps]}. For all w ∈ Σ * and n ∈ N, Suff n (w) is the single suffix of w of length n if |w| ≥ n; otherwise Suff n (w) = w. The following reduction will prove useful later. Remark 1. For all w, v ∈ Σ * , n ∈ N, Suff n Suff n (w)v = Suff n (wv).
The longest common prefix of a set of strings S, lcp(S), is p ∈ ∩ w∈S Pref(w) such that ∀p ∈ ∩ w∈S Pref(w), |p | < |p|. Let f : A → B be a function f with domain A and co-domain B. When A and B are stringsets, the input and output languages of  introduce delimited subsequential FSTs (DSFSTs). The class of functions describable with DSFSTs is exactly the class representable by traditional subsequential FSTs (Oncina and Garcia, 1991;Oncina et al., 1993;Mohri, 1997), but DSFSTs make explicit use of symbols marking both the beginnings and ends of input strings. Definition 1. A delimited subsequential FST (DS-FST) is a 6-tuple Q, q 0 , q f , Σ, ∆, δ where Q is a finite set of states, q 0 ∈ Q is the initial state, q f ∈ Q is the final state, Σ and ∆ are finite alphabets of symbols, δ ⊆ Q × (Σ ∪ { , }) × ∆ * × Q is the transition function (where ∈ Σ indicates the 'start of the input' and ∈ Σ indicates the 'end of the input'), and the following hold: 1. if (q, σ, u, q ) ∈ δ then q = q f and q = q 0 , 2. if (q, σ, u, q f ) ∈ δ then σ = and q = q 0 , 3. if (q 0 , σ, u, q ) ∈ δ then σ = and if (q, , u, q ) ∈ δ then q = q 0 , 4. if (q, σ, w, r), (q, σ, v, s) ∈ δ then (r = s) ∧ (w = v).
In words, in DSFST, initial states have no incoming transitions (1) and exactly one outgoing transition for input (3) which leads to a nonfinal state (2), and final states have no outgoing transitions (1) and every incoming transition comes from a noninitial state and has input (2). DSFSTs are also deterministic on the input (4). In addition, the transition function may be partial. We extend the transition function to δ * recursively in the usual way: δ * is the smallest set containing δ and which is closed under the following condition: if (q, w, u, q ) ∈ δ * and (q , σ, v, q ) ∈ δ then (q, wσ, uv, q ) ∈ δ * . Note no elements of the form (q, λ, λ, q ) are elements of δ * .
A DSFST T defines the following relation: Since DSFSTs are deterministic, the relations they recognize are (possibly partial) functions. Sequential functions are defined as those representable with DSFSTs for which for all (q, , u, q f ) ∈ δ, u = λ. 1 For any function f : Σ * → ∆ * and x ∈ Σ * , let the tails of x with respect to f be defined as .
If x 1 , x 2 ∈ Σ * have the same set of tails with respect to f , they are tail-equivalent with respect to f , written Theorem 1 (Oncina and Garcia, 1991). A function f is subsequential iff ∼ f partitions Σ * into finitely many blocks.
The above theorem can be seen as the functional analogue to the Myhill-Nerode theorem for regular languages. Recall that for any stringset L, the tails of a word w w.r.t. L is defined as tails L (w) = {u | wu ∈ L}. These tails can be used to partition Σ * into a finite set of equivalence classes iff L is regular. Furthermore, these equivalence classes are the basis for constructing the (unique up to isomorphism) smallest deterministic acceptor for a regular language. Likewise, Oncina and Garcia's proof of Theorem 1 shows how to construct the (unique up to isomorphism) smallest subsequential transducer for a subsequential function f . With little modification to their proof, the smallest DSFST for f can also be constructed. We refer to this DSFST as the canonical DSFST for f and denote it T C (f ). (If f is understood from context, we may write T C .) States of T C (f ) which are neither initial nor final are in one-to-one correspondence with tails f (x) for all x ∈ Σ * (Oncina and Garcia, 1991). To construct Observe that unlike the traditional construction, the initial state q 0 is not tails f (λ). The single outgoing transition from q 0 , however, goes to this state with the input . Canonical DSFSTs have an important property called onwardness.
Informally, this means that the writing of output is never delayed. For all q ∈ Q let the outputs of the edges out of q be outputs Lemma 1. If DSFST T recognizes f and is onward then ∀q = q 0 lcp(outputs(q)) = λ and lcp(outputs(q 0 )) = lcp(f (Σ * )).

Output Strictly Local functions
Here we define Output Strictly Local (OSL) functions, which were originally introduced by Chandlee (2014) and  along with the Input Strictly Local (ISL) functions. Both classes generalize SL stringsets to functions based on a defining property of SL languages, the Suffix Substitution Closure (Rogers and Pullum, 2011).
Theorem 2 (Suffix Substitution Closure). L is Strictly Local iff for all strings u 1 , v 1 , u 2 , v 2 , there exists k ∈ N such that for any string x of length k − 1, if u 1 xv 1 , u 2 xv 2 ∈ L, then u 1 xv 2 ∈ L.
An important corollary of this theorem follows.
Input and Output Strictly Local functions were defined by Chandlee (2014) and  in the manner suggested by the corollary.
While Definition 3 lead to an automata-theoretic characterization and learning results for ISL , such results do not appear possible with the original definition of OSL. The trouble is with subsequential functions that are not sequential. The value returned by the function includes the writing that occurs when the input string has been fully read (i.e., the output of transitions going to the final state in a corresponding DSFST). This creates a problem because it does not allow for separation of what happens during the computation from what happens at its end. Figure 1 illustrates the distinction Definition 4 is unable to make. 2 Function f is sequential, but g is not. Otherwise, they are identical. While f (bab) = bba, g(bab) = bbaa. With the original OSL definition, there is no way to refer to the output for input bab before the final output string has been appended.
To deal with this problem we first define the prefix function associated to a subsequential function.
Definition 5 (Prefix function). Let f : Σ * → ∆ * be a subsequential function. We define the prefix function f p : Σ * → ∆ * associated to f such that f p (w) = lcp({f (wΣ * )}). Remark 2. If T is an onward DSFST for f , then We can now revise the definition of OSL functions.
Definition 6 (Output Strictly Local Function (revised)). We say that a subsequential function f is Chandlee et al. (2014) provide several theorems which relate ISL functions, OSL functions (defined as in Definition 4), and SL stringsets. Here we explain why those results still hold with Definition 6. The proofs for those results depend on the six functions (f i , 1 ≤ i ≤ 6) reproduced here in Figure 2. The transducers shown there are not DSFSTs but traditional subsequential transducers; readers are referred to  for formal definitions. With the exception of f 5 , these functions are clearly sequential since each state outputs λ on (shown as # in Figure 2). The transducer for f 5 is not onward, but an onward, sequential version of this transducer recognizing exactly the same function is obtained by suffixing a (which is the lcp of the outputs of state 1) onto the output of state 1's incoming transition. Thus, f 5 is also sequential. By Remark 3 then, Theorems 4, 5, 6, and 7 of that paper still hold under Definition 6.

Automata characterization
First we show, for any non-initial state of any canonical transducer recognizing an OSL function, that if reading a letter a implies writing λ, then this corresponds to a self-loop. So writing the empty string never causes a change of state (except from q 0 ).

Lemma 2.
For any OSL function f whose canonical DSFST is T C , if ∃q = q 0 , a ∈ Σ, and q ∈ Q such that (q, a, λ, q ) ∈ δ C then q = q.

Proof. Consider
w and u such that (q 0 , w, u, q) ∈ δ * C and suppose (q, a, λ, q ) ∈ δ C . Then f p (w) = f p (wa) which implies Suff k−1 (f p (w)) = Suff k−1 (f p (wa)). As f is k-OSL, tails f (w) = tails f (wa). As T C is canonical the non-initial and non-final states correspond to unique tail-equivalence classes, and two distinct states correspond to two different classes. Therefore q = q.
Lemma 4. Any k-OSL DSFST corresponds to a k-OSL function.
We now need to show that every k-OSL function can be represented by a k-OSL DSFST. An issue here is that one cannot work from T C since its states are defined in terms of its tails, which themselves are defined in terms of input strings, not output strings. Hence, the proof below is constructive.
Theorem 3. Let f be a k-OSL function. The DSFST T defined as followed computes f : The diagram below helps express pictorially how the transitions are organized per the second and third bullets above. The input is written above the arrows, and the output written below.
Note that T is a k-OSL SFST. To prove this result, we first show the following lemma: Lemma 5. Let T be the transducer defined in Theorem 3. We have: Proof. (⇒) By recursion on the length of w. Suppose (q 0 , w, u, q) ∈ δ * and |w| = 0. Then (q 0 , , u, q) ∈ δ; by construction, q = Suff k−1 (u) and f p (λ) = u which validates the initial case.
We can now prove Theorem 3.
6 Learning OSL functions

Learning criterion
We adopt the identification in the limit learning paradigm (Gold, 1967), with polynomial bounds on time and data (de la Higuera, 1997). The underlying idea of the paradigm is that if the data available to the algorithm does not contain enough information to distinguish the target from other potential targets, then it is impossible to learn.
We first need to define the following notions. A class T of functions is represented by a class R of representations if every r ∈ R is of finite size and there is a total and surjective naming function L : R → T such that L(r) = t if and only if for all w ∈ pre image(t), r(w) = t(w), where r(w) is the output of representation r on the input w. We observe that the class of k-OSL functions can be represented by the class of k-OSL DSFSTs.
Definition 8. Let T be a class of functions represented by some class R of representations. 1. A sample S for a function t ∈ T is a finite set of data consistent with t, that is to say (w, v) ∈ S iff t(w) = v. The size of a sample S is the sum of the length of the strings it is composed of: |S| = (w,v)∈S |w| + |v|.

A (T, R)-learning algorithm
A is a program that takes as input a sample for a function t ∈ T and outputs a representation from R.
The paradigm relies on the notion of characteristic sample, adapted here for functions: Definition 9 (Characteristic sample). For a (T, R)learning algorithm A, a sample CS is a characteristic sample of a function t ∈ T if for all samples S for t it is the case that CS ⊆ S and A returns a representation r such that L(r) = t.
This definition is the one used in the proof of the OSTIA algorithm. The learning paradigm can now be defined as follows.
Definition 10 (Identification in polynomial time and data). A class T of functions is identifiable in polynomial time and data if there exists a (T, R)learning algorithm A and two polynomials p() and q() such that: 1. For any sample S of size m for t ∈ T, A returns a hypothesis r ∈ R in O(p(m)) time. 2. For each representation r ∈ R of size n, with t = L(r), there exists a characteristic sample of t for A of size at most O(q(n)).

Learning algorithm
We show here that Algorithm 1 learns the OSL functions under the criterion introduced. We call this the Output Strictly Local Function Inference Algorithm (OSLFIA). We assume Σ, ∆, and k are fixed and not part of the input to the learning problem. Essentially, the algorithm computes a breadthfirst search through the states that are reachable Algorithm 1: OSLFIA given the learning sample: the set C contains the states already checked while R is a queue made of the states that are reachable but have not been treated yet. Initially, the only transition leaving the initial state is writing the lcp of the output strings of the sample and reaches the state corresponding to the k − 1 suffix of this lcp. At each step of the main loop, OSLFIA treats the first state that is in the queue R and computes whenever possible the transitions that leave that state. The outputs associated with each added transition are the longest common prefixes of the outputs associated with the smallest input prefix in the sample that allows the state to be reachable. We show that provided the algorithm is given a sufficient sample the transducer outputted by OSLFIA is onward and in fact a k-OSL transducer. After adding transitions with input letters from Σ to a state, the transition to the final state is added, provided it can be calculated.

Theoretical results
Here we establish the theoretical results, which culminate in the theorem that OSLFIA identifies the k-OSL functions in polynomial time and data.
Lemma 6. For any input sample S, OSLFIA produces its output in time polynomial in the size of S.
Proof. The main loop is used at most |∆| k−1 which is constant since both ∆ and k are fixed for any learning sample. The smaller loop is executed |Σ| times. At each execution: the first conditional can be tested in time linear in n, where n = (w,u)∈S |w|; the computation of the lcp can be done in nm steps where m = max{|u| : (w, u) ∈ S} with an appropriate data structure (for instance a prefix tree); computing the suffix requires at most m steps. The second conditional can be tested in at most card(S) · m steps; the computation of the final transitions can be done in less than m steps; all the other instructions can be done in constant time. The overall computation time is thus O(|∆| k−1 |Σ|(n + nm + card(S) · m+2m)) = O(n+m(n+card(S)) which is polynomial (in fact bounded by a quadratic function) in the size of the learning sample.
Next we show that for each k-OSL function f , there is a finite kernel of data consistent with f (a 'seed') that is a characteristic sample for OSLFIA.
Definition 11 (A OSLFIA seed). Given a k-OSL transducer Q, q 0 , q f , Σ, ∆, δ computing a k-OSL function f , a sample S is a OSLFIA seed for f if • For all q ∈ Q such that ∃v ∈ ∆ * (q, , v, q f ) ∈ δ, ( w q , f (w q )) ∈ S, where w q = min ¡ {w | ∃u, (q 0 , w, u, q) ∈ δ * } • For all (q, a, u, q ) ∈ δ with q = q f and a ∈ { }∪Σ, for all b ∈ Σ such that there exists (q , b, u , q ) ∈ δ, there exists ( w , f (w)) ∈ S and x ∈ Σ * such that w = w q abx and In what follows, we set T = Q , q 0 , q f , Σ, ∆, δ be the target k-OSL transducer, f the function it computes, and T = Q, q 0 , q f , Σ, ∆, δ be the transducer OSLFIA constructs on a sample that contains a seed.
First we show that IH1 also implies that s = smallest(q) such that s = w q . Since the algorithm searches breadth-first, s is the smallest input that reaches q in T . If w q ¡ s then ∃q = q such that (q 0 , w q , u , q ) ∈ δ * because w q is a prefix of an input string of the sample S (since S contains a seed). Since w q ¡ s and |s| ≤ n, by IH1 then (q 0 , w q , u , q ) ∈ δ * which implies q = q which contradicts the supposition that w q ¡s. If s¡ w q , then again since (q 0 , s, u , q) ∈ δ * then by IH1 (q 0 , s, u , q) ∈ δ * . This contradicts the definition of w q . Therefore s = w q .
We first show that s = smallest(q) = w q . Suppose s ¡ w q . By construction of the SFST s is a prefix of an element of S which means there exists q such that (q 0 , s, f p (s), q ) ∈ δ * . But by IH2, this implies that q = q and the definition of w q contradicts s ¡ w q . Suppose now that w q ¡ s. By the construction of the seed, w q is a prefix of an element of the sample, which implies it is considered by the algorithm. As (q 0 , w q , u 1 , q) ∈ δ * by IH2, w q is a smaller prefix than s that reaches the same state which is impossible as s is the earliest prefix that makes the state q reachable. Therefore w q = s and thus the transition from state q reading a is created when s = w q .
Next we show that f p (w q ) = out(q). By construction of the seed, there is an element ( w q aw , f (w q aw )) ∈ S for all transitions (q, a, x, q ) ∈ δ leaving q and ( w q , f (w q )) ∈ S if ∃v, (q, , v, q f ) ∈ δ . As the target is onward, lcp({x | (q, σ, x, q) ∈ δ * , σ ∈ Σ ∪ } = λ (Lemma 1). This implies out(q) = lcp({y | ∃a, x, (sax, y) ∈ S}) = lcp({y | ∃a, x, (w q ax, y) ∈ S}) = lcp(f (w q Σ * )) = f p (w q ) = f p (s). Now let v = lcp({y | ∃b, x, (sabx, y) ∈ S}). Since s = w q , (q 0 , w q a, v, r) ∈ δ * since, as before, the onwardness of the target implies the lcp of the output written from r is λ. This is because each possible output from r is in S (because it is in the seed according to the second item of Definition 11).
Consequently v = f p (w q a) = f p (sa).
Together these results imply that As the target is a k-OSL transducer (and thus deterministic) Suff k−1 (qu 2 ) = r. Therefore the transition (q, a, out(q) −1 · v, r) that is added to δ is the same as the transition (q, a, u 2 , r) in δ . This implies (q 0 , wa, u, r) ∈ δ * and proves the lemma.
Lemma 8. Any seed for the OSL Learner is a characteristic sample for this algorithm.
Proof. A corollary of Lemma 7 is that if a seed is contained in a learning sample we have (q 0 , w, u, q) ∈ δ * ⇐⇒ f p (w) = u (Lemma 3) as the target transducer is k-OSL. For all states q where ∃v, (q, , v, q f ) ∈ δ , we have ( w q , f (w q )) in the seed, which implies the algorithm will add (q, , f p (w q ) −1 · f (w q ), q f ) to δ which is exactly the output function of the target. As every state is treated only once, this holds for any learning set containing a seed. Therefore, from any superset of a seed, for any w, the function computed by the outputted transducer of Algorithm 1 is equal to Observe that OSLFIA is designed to work with seeds, which contains minimal strings. We believe both the seed and algorithm can be adjusted to relax this requirement, though this is left for future work.
Lemma 9. Given any k-OSL transducer T , there exists a seed for the OSL learner that is of size polynomial in the size of T .
Proof. Let T = Q , q 0 , q f Σ, ∆, δ , . There are at most card(Q ) pairs ( w q , f (w q )) in a seed that corresponds to the first item of Definition 11, each of which is such that | w q | ≤ card(Q ) and |f (w q )| ≤ (q,σ,u,q )∈δ |u|. We denote by m this last quantity and note that m = O(|T |).
For the elements of the second item of Definition 11 we restrict ourselves without loss of generality to pairs ( w q abw , f (w q abw )) where w = min ¡ {x : f (w q abx) is defined}. We have |w | ≤ card(Q ) and |f (w q abw )| is in O(card(Q )m ). There are at most |Σ| pairs ( w q abw , f (w q abw )) for a given transition (q, a, u, q ) which implies that the overall bound on the number of such pairs is in O(|Σ|card(δ)). The overall length of the elements in the seed that fulfill the second item of the definition is in O(card(Q )(card(Q ) + m + |Σ|card(δ)m )).
The size of the seed studied in this proof is thus in O((m + |Q |)(|Q | + |Σ|card(δ)) which is polynomial (in fact quadratic) in the size of the target transducer.
Theorem 4. OSLFIA identifies the k-OSL functions in polynomial time and data.
We conclude this section by comparing this result to other subsequential function-learning algorithms.
OSTIA (Oncina et al., 1993) is a state-merging algorithm which can identify the class of total subsequential functions in cubic time. (Partial subsequential functions cannot be learned exactly; for a partial function, OSTIA will learn some superset of it.) k-OSL functions include both partial and total functions, so the classes exactly learnable by OSTIA and OSLFIA are, strictly speaking, incomparable.
SOSFIA  identifies subclasses of subsequential functions in linear time and data. These subclasses are determined by fixing the structure of a transducer in advance. For every input string, SOSFIA knows exactly which state in the transducer is reached. The sole carrier of information regarding reached states is the input string. But for k-OSL functions, the output strings carry the information about the states reached. As the theorems demonstrate, the destination of a transition is only determined by the output of the transition. Thus no class learned by SOSFIA contains any k-OSL class.
OSTIA-D (OSTIA-R) (Oncina and Varò, 1996;Castellanos et al., 1998) identify a class of subsequential functions with a given domain D (range R) in at least cubic time because it adds steps to OSTIA to prevent merging states that would result in a transducer whose domain (range) is not compatible with D (R). OSTIA-D cannot represent k-OSL functions for the same reasons SOSFIA cannot: domain information is about input strings, not output strings. On the other hand, the range of a k-OSL function is a k-OSL stringset which can be represented with a single acceptor, and thus OSL functions may be learned by OSTIA-R. However, OSLFIA is more efficient both in time and data. 3 To sum up, OSLFIA is the most efficient algorithm for learning k-OSL functions.

Phonology
The example of Johore Malay nasal spreading given in §2 is an example of progressive spreading, since it proceeds from a triggering segment (the nasal) to vowels and glides that follow it. There also exist regressive spreading processes, in which the trigger follows the target(s). An example from the Mòbà dialect of Yoruba (Ajíbóyè, 2001;Ajíbóyè and Pulleyblank, 2008;Walker, 2014) is shown in (2). An underlying nasalized vowel spreads its nasality to preceding oral vowels and glides.
(2) /ujĩ/ → [ũjĩ], 'praise(n.)' The difference between progressive and regressive spreading corresponds to reading the input from leftto-right or right-to-left, respectively (Heinz and Lai, 2013). Regressive spreading cannot be modeled with OSL in a left-to-right fashion, because the output of the preceding vowels and glides depends on the presence or absence of a following nasal that could be an unbounded number of segments away. By reading from right-to-left, that nasal trigger will always be read before the target(s), making it akin to progressive spreading. Thus there are two overlapping but non-identical classes, which we call left(to-right) OSL and right(-to-left) OSL.
(3) . What matters is whether the left and right contexts of the rule match the input or output string: if both match the input it is simultaneous application, and if one side matches the input and the other the output it is left-to-right or right-to-left.
ISL functions always match contexts against the input and therefore they cannot model @-deletion. In this respect, ISL functions model simultaneous rule application. But there is also a problem with modeling the process as OSL, which is what to output when the @ that will be deleted is read. Consider the input VC@CV. When the DSFST reads the @, it cannot decide what to output, because whether or not that @ is deleted depends on whether or not the next two symbols in the input are CV. But since the DS-FST is deterministic, it must make a decision at this point. It could postpone the decision and output λ. But that would require it to loop at the current state (Lemma 2), which in turn means it cannot distinguish VC@CV from VC@@@CV, a significant problem since only the former meets the context for deletion.
Thus the range of phonological processes that can be modeled with OSL functions is limited to those with one-sided contexts (e.g., either C or D, the former being left OSL and the latter right OSL). In such cases the entire triggering context will be read before the potential target, so there is never a need to delay the decision about what to output. To summarize, phonological rules that apply simultaneously are ISL, and phonological rules with one-sided contexts that apply left-to-right or right-to-left are OSL.
In addition to iterative rules with two-sided contexts, long-distance processes like vowel harmony and consonant agreement and dissimilation are also excluded from the current analysis. While such process have been shown to be subsequential and therefore subregular (see Gainor et al. (2012;Luo (2014;Payne (2013;Heinz and Lai (2013)) they are neither ISL nor OSL because the target and trigger-ing context are not within a fixed window of length k in either the input or output. An example is the long-distance nasal assimilation process in Kikongo (Rose and Walker, 2004), as in (4). (4) /tu+nik+idi/ → [tunikini] 'we ground' In Kikongo, the alveolar stop in the suffix /-idi/ surfaces as a nasal when joined to a stem containing a nasal. Since stem nasals appear to occur arbitrarily far from the suffix, there is no k such that the target /d/ and the trigger /n/ are within a window of size k. Thus the process is neither ISL nor OSL.

Future Work
Processes like French @-deletion that have two-sided contexts, with one being on the output side, suggest a class that combines the ISL and OSL properties. We are tentatively calling this class 'Input-Output SL' and are currently working on its properties, FST characterization, and learning algorithm. For long-distance processes, we expect other functional subclasses will strongly characterize these. SL stringsets are just one region of the Subregular Hierarchy (Rogers and Pullum, 2011;Rogers et al., 2013), so we expect functional counterparts of the other regions can be defined. Some of these other regions model long-distance phonotactics (Heinz, 2007;Heinz, 2010;Rogers et al., 2010), so their functional counterparts may prove equally useful for modeling and learning long-distance phonology.

Conclusion
We have defined a subregular class of functions called the OSL functions and provided both language-theoretic and automata-theoretic characterizations. The structure of this class is sufficient to allow any k-OSL function to be efficiently learned from positive data. It was shown that the OSL functions-unlike the ISL functions-can model local iterative spreading processes. Future work will aim to combine the results for both ISL and OSL to model iterative processes with two-sided contexts.