Modeling Reduplication with 2-way Finite-State Transducers

This article describes a novel approach to the computational modeling of reduplication. Reduplication is a well-studied linguistic phenomenon. However, it is often treated as a stumbling block within finite-state treatments of morphology. Most finite-state implementations of computational morphology cannot adequately capture the productivity of unbounded copying in reduplication, nor can they adequately capture bounded copying. We show that an understudied type of finite-state machines, two-way finite-state transducers (2-way FSTs), captures virtually all reduplicative processes, including total reduplication. 2-way FSTs can model reduplicative typology in a way which is convenient, easy to design and debug in practice, and linguistically-motivated. By virtue of being finite-state, 2-way FSTs are likewise incorporable into existing finite-state systems and programs. A small but representative typology of reduplicative processes is described in this article, alongside their corresponding 2-way FST models.


Introduction
Reduplication is a cross-linguistically common word-formation process. Reduplication is roughly divided into two categories, total reduplication where an unbounded number of segments are copied (1) vs. partial reduplication where a bounded number of segments are copied (2). In spoken language, reduplication usually involves making at most two copies, though making three copies is attested in spoken language (3) and is common in sign language (Wilbur, 2005).

1) wanita→wanita∼wanita
'woman'→'women' Indonesian (Cohn, 1989, 308) 2) takki→tak∼takki 'leg'→'legs' Agta (Moravcsik, 1978, 311) 3) roar→ roar∼roar∼roar 'give a shudder' →'continue to shudder' Mokilese (Moravcsik, 1978, 301) Most of the world's languages include at least one reduplicative process, with the most common reduplicative process being total reduplication. The WALS database documents that 278 out of 368 (75%) languages use both partial reduplication and total reduplication as productive morphological operations (Rubino, 2013). An extra 35 (10%) use only total reduplication as a productive morphological operation. The 55 (15%) remaining languages with no reduplicative processes include most Indo-European languages. 1 Although reduplication has a rich history in morpho-phonology, it continues to present challenges for computational and mathematical linguistics (Sproat, 1992;Roark and Sproat, 2007). Within computational linguistics, most of morphology and phonology have been analyzed with finite-state calculus as rational languages and transductions (Kaplan and Kay, 1994;Beesley and Karttunen, 2003). However, reduplication cannot be easily modeled with the same finitestate systems used to model the rest of morphophonology. In the case of total reduplication, this is because those finite-state systems cannot express unbounded copying in the first place (Culy, 1985). As for partial reduplication, those finitestate systems are often discussed as being burdensome models because of the state explosion that partial reduplication causes (Roark and Sproat, 2007). This has lead some researchers to develop finite-state approximations of total reduplication (Walther, 2000;Beesley and Karttunen, 2000;Cohen-Sygal and Wintner, 2006;Hulden, 2009a;Hulden and Bischoff, 2009). These are approximations because they cannot model the productivity of total reduplication, the most common reduplicative process. Another alternative is to use formalisms that are beyond finite-state, e.g. queue-based CFGs (Savitch, 1989), MCFGs (Albro, 2000(Albro, , 2005, and HPSG (Crysmann, 2017).
This article shows how a specific understudied type of finite-state technology actually can account for virtually all forms of bounded and unbounded reduplication as they are found in typological studies (Moravcsik, 1978;Rubino, 2005). This finite-state technology not only describes reduplication as a process which applies to infinitely many words of unbounded size, but it does so without the state-space explosion. The type of transducer which accomplishes this is known as a 2-way Finite-State Transducer or 2-way FST (Savitch, 1982;Engelfriet and Hoogeboom, 2001;Filiot and Reynier, 2016). While these computer scientists are well aware that 2-way FSTS can model unbounded copying, this is the first use of 2-way FSTs within computational linguisticst to our knowledge. 2 2-way FSTs are distinguished from the more well-known (1-way) finite-state transducers or 1way FSTs by allowing the machine to move back and forth on the input tape, but not on the output tape. It is this increased power of 2-way FSTs that allows them to adequately model reduplication without the difficulties of using 1-way FSTs.
In this paper, we focus on deterministic 2-way FSTs. Like 1-way FSTs, 2-way FSTs can be either deterministic or non-deterministic on the input. Deterministic 1-way FSTs are less expressive than non-deterministic 1-way FSTs (Elgot and Mezei, 1965;Schützenberger, 1975;Choffrut, 1977;Mohri, 1997;Heinz and Lai, 2013). Similarly, deterministic 2-way FSTs are less expressive than non-deterministic 2-way FSTs (Culik and Karhumäki, 1986). For the typology of reduplication studied in this article, deterministic 2way FSTs are sufficient. This result is in line with work showing that various phonological and morphological processes can be described with deterministic finite-state technology Gainor et al., 2012;Heinz and Lai, 2013;Chandlee, 2014;Luo, 2017;Payne, 2014Payne, , 2017. This article is organized as follows. 2-way finite-state transducers (2-way FSTs) are introduced in section §2, where we provide a formal definition ( §2.1), discuss their computational properties ( §2.2), and discuss their computational complexity ( §2.3). In §3, we illustrate how 2-way FSTs can model reduplication, notably total reduplication ( §3.1) and partial reduplication ( §3.2). In section §4, we contrast 2-way FSTs with 1way FSTs and show how the former are empirically adequate, practically convenient or useful, and linguistically-motivated for modeling reduplication. To illustrate this, we briefly discuss how we have used 2-way FSTs to develop the RedTyp database, a database of reduplicative processes with corresponding 2-way FSTs. Conclusions and directions for future research are in §5.
2 Two-way finite-state transducers: definition and properties

Definition
It is useful to imagine a 2-way FST as a machine operating on an input tape and writing to an output tape. The symbols on the input tape are drawn from an alphabet Σ and the symbols written to the output tape are drawn from an alphabet Γ. For an input string w = σ 1 . . . σ n , the initial configuration is that the FST is in some internal state q 0 , the read head. The FST begins at the first position of the tape reading σ 1 , and the writing head of the FST is positioned at the beginning of an empty output tape. After the FST reads the symbol under the read head, three things occur: • The internal state of the FST changes.
• The FST writes some string, possibly empty, to the output tape. • The read head may move in one of three ways: it can either move to the left (-1), move to the right (+1), or stay (0).
This process repeats until the read head "falls off" one of the edges of the input tape. If for some input string w, the FST falls off the right edge of the input tape when the FST is in an accepting state after writing u on the output tape, we say the FST transduces, transforms, or maps, w to u. If for some input string w, the FST falls off the left edge, falls off the right edge while in a non-accepting state, or never falls off an edge, then the FST is undefined at w. Note the writing head of the FST can never move back along the output tape. It only ever advances as strings are written. Below is a formalization of 2-way FSTs based on Filiot and Reynier (2016) and Shallit (2008). We adopt the convention that inputs to a 2-way FST are flanked with the start ( ) and end boundaries ( ). This larger alphabet is denoted by Σ . 4) Definition: A 2-way, deterministic FST is a six-tuple (Q, Σ , Γ, q 0 , F, δ) such that: Q is a finite set of states, A configuration of a 2-way FST T is an element of Σ * QΣ * × Γ * . The meaning of the configuration (wqx, u) is that the input to T is wx and the machine is currently in state q with the read head on the first symbol of x (or has fallen off the right edge of the input tape if x = λ) and that u is currently written on the output tape.
The transitive closure of → is denoted with → + . Thus, if c → + c then there exists a finite sequence of configurations c 1 , c 2 . . . c n with n > 1 such that Next we define the function that a 2-way FST There are situations where a 2-way FST T crashes on some input w and hence f T (w) is undefined. If the configuration is (qax, u) and δ(q, a) = (r, −1, v) then the derivation crashes and the transduction f T (ax) is undefined. Likewise, if the configuration is (wq, u) and q ∈ F then the transducer crashes and the transduction f T is undefined on input w.
There is one more way in which f T may be undefined for some input. The input may cause the transducer to go into an infinite loop. 3 This occurs for input wx ∈ Σ * whenever there exist q ∈ Q and u, v ∈ Γ * such that (q 0 wx, λ) → + (wqx, u) → + (wqx, uv).

Computational properties
With respect to acceptors, 1-way and 2-way finitestate acceptors are equivalent in expressive power. Both define the regular languages (Hopcroft and Ullman, 1969;Shallit, 2008). However, with respect to transducers, 1-way FSTs are strictly less expressive than 2-way FSTs (Savitch, 1982;Aho et al., 1969). For a 1-way FST, both the input language and the output language must be regular languages. A 1-way FST thus cannot have its output language be the non-regular copy language L ww = {ww|w ∈ Σ * }. In contrast, as we will see, the output language of a 2-way FST can be a nonregular language such as L ww . The next section will show that this additional power allows 2-way FSTs to productively model reduplication.
They are closed under composition (Chytil and Jákl, 1977) and their non-deterministic variants are invertible (Courcelle and Engelfriet, 2012). 2-way FSTs are less powerful than Turing machines because they cannot move back and forth over the output tape.
Note that given the difference in expressive power between 1-way and 2-way FSTs, it makes sense to give the classes of functions that they compute different names. We follow Filiot and Reynier (2016) who identify the class of functions describable with a 1-way deterministic FST as 'rational functions', and they reserve the term 'regular functions' for functions describable with 2-way deterministic FSTs.

Computational complexity
Deterministic 1-way FSTs run in time linear to the length of the input string. Since 2-way FSTs can reread the input string, is this still the case? One useful metric for measuring the complexity of deterministic 2-way FSTs is in terms of the number of times the 2-way FST passes through the input (Baschenis et al., 2016). In the case of the reduplication examples in §3, a deterministic 2-way FST can be designed with only two passes through the input per copy. Thus, the run time for a deterministic 2-way FST modeling reduplication which makes at most n copies of an input string of length m is 2n · m. Since n is fixed by the reduplicative morpheme, the run time is still linear in the size of the input string.
Also, to our knowledge existing applications of regular functions have been efficient (Alur anď Cerný, 2011;Alur et al., 2014).

Illustrative use of two-way transducers for reduplication
Having established what 2-way FSTs are and how they behave, this section illustrates how they can be used model reduplication. We provide two illustrative examples: total reduplication ( §3.1) and partial reduplication ( §3.2).

Total reduplication
Total reduplication is cross-linguistically the most common reduplicative process (Rubino, 2005), and it is used in an estimated 85% of the world's languages (Rubino, 2013). A canonical example is total reduplication in Indonesian which marks plurality (Cohn, 1989). Examples are in Table 1.  Figure 1 shows a 2-way FST that captures this total reduplication process. Basically, the 2-way FST in Figure 1 operates by: 1. reading the input tape once from left to right in order to output the first copy, 2. going back to the start of the input tape by moving left until the start boundary is reached, 3. reading the input tape once more from left to right in order to output the second copy.
Specifically, this figure is interpreted as follows. The symbol Σ stands for any segment in the alphabet except for { , }. The arrow from q 1 to itself means this 2-way FST reads Σ, writes Σ, and advances the read head one step to the right on the input tape. The boundary symbol ∼ is a symbol in the output alphabet Γ, and is not necessary. We include it only for illustration.
We show an example derivation in Figure 2 of /buku/→[buku∼buku] using the 2-way FST in Figure 1. The derivation shows the configurations of the computation for the input /buku/ and is step by step. Each tuple consists of four parts: input string, output string, current state, transition. In the input string, we underline the input symbol which FST will read next. The output string is what the 2-way FST has outputted up to that point. The symbol λ marks the empty string. The current state is what state the FST is currently in. The transition represents the used transition arc from input to output. In the first tuple, there is no transition arc used (N/A). But for other tuples, the form of the arc is:

Partial reduplication
Partial reduplication processes are also very common. A canonical example is initial-CV reduplication found in many Austronesian languages (Rubino, 2005). This section presents a simplified version of initial-CV reduplication from Bikol that is used to mark imperfective aspect (Mattes, 2007). 4 Examples are in Table 2. Initial-CV reduplication in Bikol has two phonological modifications processes 5 apply to the reduplicant, i.e. the smaller copy: Outputting the first copy Outputting the second copy  The 2-way FST in Figure 3 captures the partial reduplication pattern and its modifications. The symbol V M stands for monophthongs, V D for diphthongs, and C for consonants. An example derivation of /draIf/→[da∼draIf] using our 2-way FST is provided in Figure 4. 6

Contrasting 2-way FSTs with 1-way FSTs
Having illustrated how 2-way FSTs can model reduplication, here we contrast 2-way FSTs with 1-way FSTs on three criteria: empirical coverage, practical utility, and intensional description. We do not contrast 2-way FSTs with more powerful formalisms like pushdown transducers (Allauzen and Riley, 2012). We do not assume the former are superior to other such formalisms. Our goal is to show 2-way FSTs have practical and scientific utility in computational linguistics; thus, they merit further study.

Empirical coverage
In terms of empirical coverage, 2-way FSTs can model virtually the entire typology of reduplication (Moravcsik, 1978;Hurch, 2005;Inkelas and Zoll, 2005;Rubino, 2005;Samuels, 2010). This includes both local reduplication (as in the two examples from §3), but likewise non-local or 'wrong-side' reduplication (Riggle, 2004), internal reduplication (Broselow and McCarthy, 1983), multiple reduplication (Urbanczyk, 1999), subconstituent reduplication (Downing, 1998), and cases of interactions between reduplication and opaque phonological processes (overapplication, underapplication, backcopying) (McCarthy and Prince, 1995). This is especially the case for total reduplication which is the most widespread reduplicative process (Rubino, 2013) but which cannot be modeled with 1-way FSTs. In most cases, this will be inadequate because total reduplication is a productive grammatical process (Rubino, 2005(Rubino, , 2013. We emphasize the term virtually because in our investigation we have found only two marginal cases of reduplication in the literature which cannot be modeled by 2-way FSTs unless certain 6 The FST treats the diphthong /aI/ as a single segment. plausible assumptions are made. These two cases involve reduplication producing suppletive allomorphs of morphemes as in Sye (Inkelas and Zoll, 2005, 52), and reduplication being blocked by homophony or haplology as in Kanuri (Moravcsik, 1978, 313). These two cases of 'under-generation' can be solved if we assume the language contains a finite number of suppletive allomorphs, and if we assume that there's either a finite number of banned identical sequences or a separate linguistic mechanism that filters out ill-formed homophonies.
Of course there are cases where 2-way FSTs can 'over-generate' and model unattested types of reduplication, e.g. reduplicate a word n times for some natural number n or reduplicate a word by reversing it. This over-generation can be addressed by either restricting the class of 2-way FSTs used (Dolatian and Heinz, 2018) or by not treating 2-way FSTs as having to be exact models of human cognition (Potts and Pullum, 2002). For further discussion and solutions on how 2-way FSTs can over-and under-generate, see Dolatian and Heinz (In press.).

Practical utility
To showcase empirical coverage of 2-way FSTs and their practical utility, we have constructed the RedTyp database 7 which contains entries for 138 reduplicative processes from 91 languages gleaned from various surveys (Rubino, 2005;Inkelas and Downing, 2015). 50 of these processes were from Moravcsik (1978), an early survey which is representative of the cross-linguistically most common reduplicative patterns. RedTyp contains 57 distinct 2-way FSTs that model the 138 processes. 8 Each 2-way FST was designed manually, implemented in Python, and checked for correctness. On average, these 2-way FSTs had 8.8 states. This shows that 2-way FSTs are concise and convenient computational descriptions and models for reduplicative morphology. This is in contrast to 1-way FSTs which suffer from an explosion of states when modeling partial redupli- Going back to the start of the tape cation. 9 On average, a language's phoneme inventory would include 22 consonants and 5 vowels (Maddieson, 2013a,b). In order to handle initial-CV, initial-CVC, or initial-CVCV reduplication with a 1-way FST, the FST would require at least an estimated 22, 110, and 2420 states respectively.

Linguistic motivation and origin semantics
Finally, using 2-way FSTs for reduplication is linguistically motivated and matches the intensional descriptions behind the linguistic generalizations on reduplication. 2-way FSTs do not approximate reduplication like 1-way FSTs do. They can fully and productively model reduplicative processes as they appear in the typology, including both partial and total reduplication. As said, this is because 1-way FSTs simply remember the possible shapes for a reduplicant when the number of possible shapes is (large yet) finite as in partial reduplication. When the number of possible shapes to remember is unbounded as in total reduplication, a 1-way FST cannot productively model reduplication. In contrast, a 2-way FST does not need to remember strings of segments in order to copy them, but actively copies them. This contrast between copying and remembering can be formalized with the notion of the origin semantics of a transduction (Bojańczyk, 2014). a. Given a string-to-string function, the origin semantics of a function is the origin information of each symbol o n in the output string. This is the position i m of the read head on the input tape when the transducer had outputted o n .
To illustrate, consider a string-to-string function f ab which maps ab to itself, and every other string to the empty string: f (x) = {(ab, ab), (a, λ), (b, λ), ...}. This function can be modeled with at least two different 1-way FSTs as in Figure 5 which differ in when they output the output symbols a and b. In Figure 6, we show the origin information created by the two 1-way FSTs from Figure 5 for the mapping (ab, ab). The two FSTs model the same function and are equivalent in their general semantics of what they output; however, they are not equivalent in their origin semantics because they create differ origin information for their output. be modeled by the same 2-way FST in Figure 3. Because of the bound on the size of the reduplicant, this function can also be modeled with the 1-way FST in Figure 7. The two transducers in Figures 3,7 are equivalent in their general semantics because they can output the same string. For example, given the input /pat/, both FSTs will output [pa∼pat]. However, the two FSTs differ in their origin semantics. Given the mapping /pat/→[pa∼pat], the two FSTs will create different origin information. Setting aside the word boundaries and reduplicant boundary ∼, the 1-way FST associates the second pa string of the output with the vowel a of the input as in Figure 8a. This is because the second pa was outputted when the 1-way FST was reading the a in the input. In contrast, the 2-way FST associates each segment in the output with an identical segment in the input as in Figure 8b.
The origin information created by the 2-way FST matches theoretical treatments of how the reduplicant's segments are individually associated with identical segments in the input (Marantz, 1982;Inkelas and Zoll, 2005). In contrast, the origin information created by the 1-way FST does not match any linguistic intuitions of reduplication because non-identical segments are associated. This difference in the origin semantics of the 1-way FST and 2-way FST formalizes their difference in behavior: the 1-way FST simply remembers what strings of segments to output twice, while the 2way FST actively copies.
In Base-Reduplicant correspondence theory (BRCT), what matters for reduplication is not the relationship or correspondence between input and output segments in the reduplication, but between the two copies in the output (McCarthy and Prince, 1995). Origin semantics might be able to formalize the intuition behind BRCT with finite-state technology (output symbols with the same origin are in correspondence). The only computational implementation of BRCT to our knowledge (Albro, 2000(Albro, , 2005 uses MCFGs to do so. Note however that the empirical validity of BRCT is questionable (Inkelas and Zoll, 2005;McCarthy et al., 2012).

Conclusion
In summary, finite-state technology has often been argued to be incapable of adequately and efficiently capturing productive reduplication as used in natural language. However, this article shows that an understudied type of finitestate machinery-2-way finite-state transducerscan exactly model reduplication and its wide typology.
2-way FSTs can model the virtually entire typology of reduplication, without needing to approximate any processes (unlike 1-way FSTs). They likewise do not suffer from a state explosion for partial reduplication because the size of the 2-way FST is not dependent on the size of the alphabet. This allows 2-way FSTs to directly capture the copying aspect of reduplication instead of remembering all potential reduplicants. This makes 2-way FSTs be a practical, convenient, and concise tool to model reduplication. As a sign of their empirical coverage and utility, we developed the RedTyp database of reduplicative processes that contains 57 distinct 2-way FSTs which model common and uncommon reduplicative processes covered in the literature (Moravcsik, 1978).
Having showcased their utility, several avenues of future research remain, of which we highlight three. First, we have approached reduplication from the perspective of morphological generation. Given an input /buku/, a 2-way can generate the output [buku∼buku] easily. On the other hand, it is an open question as to how to do morphological analysis with 2-way FSTs to get the inverse relation of [buku∼buku→buku]. 10 A second, more practical, area of research is the integration of 2-way FSTs into natural language processing. This obviously has many aspects. A first step may be the integration of 2way FSTs into existing platforms such as xfst (Beesley and Karttunen, 2003), foma (Hulden, 2009b), open-fst (Allauzen et al., 2007), and pynini (Gorman, 2016). Third, it is theoretically interesting that within morpho-phonology, only reduplication requires the bidirectional power of 2-way FSTs. The bulk of morphology and phonology can be modeled with non-deterministic 1-way finite-state transducers (Beesley and Karttunen, 2003;Jardine, 2016) or subclasses of them (Chandlee, 2017). As a copying process, reduplication requires more than just 1-way finite-state technology. This may be a sign that it is of a different nature than the rest of morpho-phonology (Inkelas and Zoll, 2005;Urbanczyk, 2017). It is an open question if 2-way FSTs can likewise be used to model copying in other areas of natural language, including syntactic copying (Kobele, 2006).
Fourth, in the same way that Chandlee 2014; 2017 and Chandlee et al. ( , 2015 have studied subclasses of 1-way FSTs and shown how they map to subclasses of morpho-phonology, we are currently investigating what proper subclasses of 2-way FSTs can be designed in order to make a tighter fit with reduplicative typology. This would open doors to not only better understanding the computational properties of reduplication, but to likewise develop learning algorithms for reduplication. As of now, we hypothesize that a large majority of reduplicative fall under a sub-class of 2-way FSTs (that we have discovered) based on a 2-way extension of the Output-Strictly Local subclass of 1-way FSTs (Chandlee et al., 2015). For more discussion of this subclass for reduplication and its learnability, see Dolatian and Heinz (2018).
In sum, the present study is the initial step in formalizing the wide typology of reduplicative processes into mathematically sound, yet expressively adequate, formal-language theoretic terms. Future work will include incorporating this technology into existing platforms and NLP systems, and further bridging the gaps between computational and theoretical morpho-phonology.