Recurrent Inference in Text Editing

In neural text editing, prevalent sequence-to-sequence based approaches directly map the unedited text either to the edited text or the editing operations, in which the performance is degraded by the limited source text encoding and long, varying decoding steps. To address this problem, we propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space. In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit. For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC). Extensive experiments on these tasks with varying difficulties demonstrate that Recurrence achieves improvements over conventional inference methods.


Introduction
For text editing, the sequence-to-sequence (seq2seq) framework has been applied to text simplification (Narayan and Gardent, 2014;Dong et al., 2019), punctuation restoration (Tilk and Alumäe, 2016;Kim, 2019), grammatical error correction (Ge et al., 2018;Lichtarge et al., 2018;, machine translation post-editing (Libovický et al., 2016;Bérard et al., 2017), and etc. We observe that current inference methods can be roughly grouped into two categories: End-to-end (End2end) (Nisioi et al., 2017;See et al., 2017;Tan et al., 2017;Junczys-Dowmunt et al., 2018) and Tagging (Filippova et al., 2015;Che et al., 2016;Libovický et al., 2016;Wang et al., 2017;Alva-Manchego et al., 2017;Kim, 2019). For models from both categories, the encoders extract and encode information from the source text sequence. Yet, the goal of the decoders is different for End2end and Tagging. Upon receiving the encoder's hidden states that comprise the source text information, the decoder of End2end directly decodes the hidden states and generates the completely edited target text sequence. But, the decoder of Tagging produces a sequence of editing operations, such as deletion and insertion, that is later applied to the source text to yield the edited text via a realization step (Malmi et al., 2019). The mechanisms of End2end and Tagging are illustrated in Figure 1.
However, both End2end and Tagging are problematic because as decoding progresses, the divergence between the partially edited text and the original text grows, rendering the encoder hidden states less and less helpful for decoding the edited text or editing operations toward the end of the editing process; and as the number of decoding steps increases with edited text length, decoding the completely edited text or the full editing operation sequence becomes more and more demanding.
To tackle the aforementioned issues, we propose a recurrent inference method, Recurrence, for textediting with the encoder-decoder framework. Recurrence consists of two components as illustrated in Figure 1: (i) an encoder-decoder model, namely the programmer; (ii) an interpreter. For a given source sequence, the programmer determines an editing action that consists of an editing operation with the tokens it needs and the position in the source sequence to apply the operation. After the interpreter executes the editing action, the partially edited text is again fed to the programmer to determine the next appropriate editing action. This process repeats until the programmer decides that no further editing is needed.
Intuitively, Recurrence is advantageous because (i) as a novel recurrent inference process, it is not constrained by model structures and generally applicable; (ii) the programmer only produces one single editing step, easing the learning difficulty; (iii) the encoder hidden states are updated for each decoding step, providing faithful latent representations; (iv) the decoder outputs an editing action of fixed sequence length, alleviating the problem caused by long decoding steps. Empirically, through three text editing tasks, namely Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES) and Arithmetic Equation Correction (AEC), we show that Recurrence is dataefficient and more resilient to the text sequence length and the vocabulary size.
Our contributions are the followings: (1) we demonstrate that many text editing tasks can be solved by multiple inference steps recurrently; (2) we propose a novel recurrent inference method, Recurrence, for text editing that tears an editing task down into iterations of editing actions; (3) we design three easily reproducible, proof-of-concept text editing tasks, AOR, AES and AEC; (4) we exhibit that Recurrence outperforms End2end and Tagging in all three text editing tasks and is (i) less sensitive to longer sequences; (ii) less sensitive to larger vocab sizes; (iii) less data-hungry to achieve superior or competitive performances.
The code for three inference methods, text editing tasks, data generation, and experiments in this work is available at: https://github.com/ ShiningLab/Recurrent-Text-Editing.

Related Work
Text Editing is an Natural Language Processing (NLP) task in that systems change texts by inserting, deleting and rephrasing the words to meet certain needs. According to the length relationship between input and output texts, we summarize text editing tasks into three types: short-to-long, longto-short, and mixed. End-to-end is one of the early methods to perform text editing by casting the job as seq2seq  text generation. Without complicated preparation and subsequent processing, End2end has been proven to accomplish text editing well, in all three types (Tilk and Alumäe, 2016;Nisioi et al., 2017;See et al., 2017;Tan et al., 2017;Junczys-Dowmunt et al., 2018;). Yet, conventional seq2seq-based approaches are well-known for their drawbacks, including dependency on large amounts of data, unexplainable processes, and uncontrollable outcomes (Wiseman et al., 2018). When texts do not need a complete modification, there are more appropriate methods than learning a direct mapping from unedited texts to edited texts.
Tagging solves text editing in two steps instead. It firstly employs a seq2seq framework to produce tag sequences, and secondly, edits input texts according to the tag sequences (the "realization" step) (Malmi et al., 2019). Tagging assigns the tag KEEP for words that do not need to be changed so that it does not need to learn a copy mechanism. Some have reported that Tagging is better than End2end in short-to-long (Che et al., 2016;Kim, 2019), longto-short (Filippova et al., 2015;Alva-Manchego et al., 2017;Wang et al., 2017), and mixed editing (Libovický et al., 2016;Bérard et al., 2017;Malmi et al., 2019). One notable member of the Tagging family is Neural Programmer-Interpreter (NPI), a recurrent and compositional neural network (Reed and de Freitas, 2016). NPI is adopted in text editing to predict tags, such as KEEP, DELETE, and INSERT, and execute operations during decoding simultaneously. NPI-based methods have achieved state-of-the-art results in long-to-short (Dong et al., 2019;Gu et al., 2019), and mixed editing (Vu and Haffari, 2018). Nevertheless, like other Tagging methods, NPI's encoder hidden states are not updated during editing. Its decoder considers operations and executions from previous time steps to predict the current operation while putting massive pressure on the decoder (Hochreiter, 1998;Bahdanau et al., 2015;Cho et al., 2014). Also, Tagging in general suffers from a performance decline caused by a large vocabulary that combines tags and words or too many decoding steps to assign tags. To resolve the aforementioned problems with Tagging, in Recurrence, we update the encoder Figure 2: Illustrate Recurrence inference for text editing; the example shows an number ordering task where the number sequence [0, 2, 1, 4, 3, 5] is edited to [0, 1, 2, 3, 4, 5] via action a (1) , [<swap>, pos 1], which instructs the interpreter to swap number 1 and 2, and action a (2) , [<swap>, pos 3], which instructs the interpreter to swap number 3 and 4, imitating the bubble sort algorithm; finally, the interpreter halts inference and outputs the completely edited sequence y (c) after receiving the termination action a (3) = [<done>, <done>].
hidden states iteratively and free the interpreter from the decoder to complete text editing in several program-interpret iterations (recurrent inference). NPI belongs to neural program induction (Devlin et al., 2017), but Recurrence is part of neural program synthesis (Ellis et al., 2019). Consequently, Recurrence always follows the latest hidden representation of its input text rather than a static context matrix and only needs to decode an editing action of a fixed length in each iteration. Multi-Step Learning is a manner to solve a problem in several steps. Recent work in text editing prefers multi-step learning, especially for long-toshort (Narayan and Gardent, 2014;Zhang and Lapata, 2017), and mixed editing (Ge et al., 2018;Lichtarge et al., 2018). For example, Tagging can also be regarded as a two-steps learning. However, these studies usually edit texts incrementally through a multi-round seq2seq inference. To the best of our knowledge, our Recurrence is the first inference method that divides a text editing task into multiple independent sub-tasks and completes them recurrently.

Method Overview
Recurrence breaks the text editing task down into iterations of editing actions and each editing action is determined on the hidden representation of the partially edited sequence. Conceptually, it is preforming a predefined underlying iterative algorithm that is designed to achieve some text editing goals. There are two components in Recurrence: programmer and interpreter. Given a source sequence x = x 1 , · · · , x |x| , the programmer determines a single editing action, a (1) , to be applied on x. Then, the interpreter executes the action a (1) on x and produces the partially edited sequence with one edit, y (1) . Then, y (1) is fed to the programmer to determine the next action a (2) . This process continues until the programmer determines the text is fully edited and outputs a termination action to stop further editing. The inference also ends if the number of iterations reaches a predefined limit. Finally, the interpreter outputs the completely edited sequence y (complete) . This recurrent editing process is illustrated with an example in the number ordering task in Figure 2.
The hypothesis is that it is easier to let a model learn a single editing step than the whole mapping between original and edited sequences. Also, being able to observe the latest text status leads to a more accurate input representation. Furthermore, Recurrence is explainable in the sense that not only we can understand the intention of each editing step done by the model, but we can also actively participate in designing the editing procedure.

Programmer
Broadly speaking, the programmer determines the action for a given input in accordance with the underlying algorithm that the programmer is trained to mimic. In the programmer, the encoder extracts relevant information from an input text sequence x and then the decoder decides a single step of action that should be applied to x. The programmer can be any model that is able to produce editing actions based on textual information. In our experiments, the programmer is a seq2seq model with an encoder-decoder architecture.

Editing Actions
An editing action contains (i) the type of editing operation, (ii) the position the editing occurs, and (iii) a text symbol.
Formally, the set of editing actions is defined by A := {a = (e, p, s)|∀e ∈ E, p ∈ P, s ∈ S)}, where E is the set of all operations, P is the set of all positions, and S is the set of symbols. The definition of E, P and S is determined by the specific text editing task and the underlying text editing algorithm. For example, each p ∈ P would contain a single position or multiple positions (i.e., a tuple of position indices) depending on the operation. Also, if an editing task contains only a single type of operation, then the operation can be omitted. Some operations, such as deletion, do not need a symbol input, so, the symbol component can also be omitted. It is required that DONE ∈ E, P, S to indicate termination.
Editing actions allow the design of the editing order. Given a (1) , · · · , a (n) , the position sequence p (1) ∈ a (1) , · · · , p (n) ∈ a (n) determines the editing order. This could be beneficial since empirical results have shown that ordering matters for text generation (Ford et al., 2018). For the sake of simplicity, in our experiments, we choose to arrange positions across actions in an increasing order, editing a sequence from left to right.
Due to liberty given by the definition of the action, we believe Recurrence can be applied to a much border field of applications. In the scope of this paper, we only concern about text editing.

Interpreter
The interpreter is a parameter-free function that executes the editing action produced by the programmer. Specifically, the interpreter first checks if the action is the termination action. If so, the interpreter will halt inference and directly output its input sequence as the completely edited text, y (complete) . Otherwise, the interpreter carries out the received action to its input sequence and produces a partially edited sequence. Then, the Recurrence continues by feeding the partially edited sequence into the programmer to determine the next editing action.
It is possible for the programmer to output illegal actions that do not follow the predefined action template (e.g., actions with missing a position component), especially when the programmer is not fully trained. Therefore, the interpreter checks if an action is valid and skips invalid actions by returning the input sequence.

Offline Training
Training text editing models requires pairs of source sequence x and target sequence y, but different inference methods employ different generation algorithms to produce appropriate target sequences to form suitable training pairs. For the conventional inference methods, End2end map unedited text sequences to target text sequences directly, and Tagging map unedited text sequences to target tag sequences before realizing the target text sequences. Hence, for the training data, the source sequences are the original, while the target sequences are edited text sequences for End2end and editing operation sequences for Tagging. In our experiments, we name the training modes used by the conventional methods offline training.

Online Training
To train the programmer, we compute all intermediate actions a (1) , · · · , a (n) that are required to edit input x to target y (complete) . Applying these editing actions, we obtain the partially edited sequences y (1) = x, y (2) , · · · , y (n) = y (complete) . After that, the training list of pairs for the programmer is (y (1) , a (1) ), (y (2) , a (2) ), · · · , (y (n) , a (n) ), where a (n) is the termination action. We uniformly sample one source-target pair from this list as the training data instance. Due to the fact that selected training pairs for each source sequence x varies during training, we name this training mode online training. For the thoroughness of experiments, we examine three inference methods with both training modes. In the training phase, intermediate training instances are exposed to End2end Online and Tagging Online . Only the immediate editing action (y (1) , a (1) ) are fed to Recurrence Offline .

Tasks
An increasing number of studies takes synthetic benchmark tasks to examine ideas before extending to open-domain natural language data (Zaremba and Sutskever, 2014;Lake and Baroni, 2018;Nangia and Bowman, 2018;Lample and Charton, 2020). Following the fruitful results of previous work, we aim to evaluate three inference methods in the domain of arithmetic problems (Hosseini et al., 2014;Roy and Roth, 2015;Ling et al., 2017) that can be treated as the test-beds for text editing. We introduce three tasks, namely AOR, AES, and AEC, corresponding to the three types of text editing tasks: short-to-long, long-to-short, and mixed. Being able to control the aspects of the datasets allows us to compare the characteristics of the three inference methods more thoroughly and analyze the appropriate situations to apply each method.

Arithmetic Equation
Our arithmetic equation consists of integer numbers N ∈ Z ≥2 , an equal sign ("=="), and operators 1 O = {" + ", " − ", " * ", "/"}. For convenience, we restrict the right-hand side of the equation to a number. The equation holds if the value of the left-hand side equals the number on the righthand side. Operators O are placed between two numbers, where the subtraction operator " − " can also be put to the left of any single number. We consider equations as sequences of mathematical symbols (Saxton et al., 2019) instead of tree structures (Lample and Charton, 2020). We describe an arithmetic equation dataset from three aspects: (1) N = |N | defines the number of unique integers; (2) L ∈ Z * defines the number of integers in an equation; (3) D ∈ Z * defines the number of unique equations.
Note that since we only consider binary operations, the sequence length of a valid arithmetic expression is always 2L or 2L − 1, depending on if there is a subtraction operator before the first num- 1 We use these symbols to apply the Python built-in function eval().
ber. Intuitively, it is reasonable to assume that the greater N and L become, the harder the task gets. Whereas, the larger D, the easier the task becomes.

Arithmetic Operators Restoration
The goal of AOR is to convert a sequence of integer numbers into a valid arithmetic equation. For a given source sequence of integer numbers, x ∈ N L , a model for AOR inserts appropriate operators from O in between the first L−1 integers in x and inserts an equal sign ("==") before the L th element in x so that the resulting arithmetic expression sequence (target sequence) is valid. Each integer sequence potentially corresponds to different valid arithmetic equations. Thus, AOR is one-to-many learning. To obtain integer sequences for AOR, we first generate valid arithmetic equations and then remove all the operators and equal signs (see Table  1).

Arithmetic Equation Simplification
Here, we involve two more mathematical symbols ("(", ")"). In an equation, parentheses help to group parts of an expression and indicate the order of precedence. In this task, we aim to simplify equations by calculating the parts in parentheses and removing parentheses from equations. Equation that has no parentheses is already in the simplest form, so there is no need to change. We generate complicated versions of a simplified equation by randomly replacing some integers (including the one on the right-hand side) with their equivalent bracketed expressions. Since these variants share the same simplified form, AES is many-toone learning (see Table 1).

Arithmetic Equation Correction
AEC is a more comprehensive text editing task in that a model needs to detect and correct possible mistakes. To generate mistakes, we inverse a valid equation by deleting, substituting, or inserting random tokens at random positions. We do not touch the right-hand side integer to guarantee that the corrected left-hand side (include "==") equals the same value to assert equality. We fix the maximum number of errors to three, regardless the values of N , L, and D. No change is made if there is no error. We generate many wrong equations based on one correct equation. Meanwhile, a wrong equation can be modified into multiple correct equations. Hence, AEC is many-to-many learning (see Table 1).

Experiments
We test Recurrence in comparison with End2end and Tagging across AOR, AES, and AEC. We describe the results conditioned on specific N , L, and D. Later, we analyze the impact of each of them in Section 6. Data. In all tasks, the dataset is divided into three subsets: 70% for training, 15% for validation, and 15% for testing. For AES (many-to-one learning) and AEC (many-to-many learning), we feed the training set to a data generator in every epoch to expose all the variants of targets as input sequences (see Section 4). For the sake of fairness, we examine three methods in both online and offline training modes. To train End2end online and Tagging online , in each epoch, we keep the targets, but uniformly pick a partially edited y (i) to alternate the original input x as the source sequence. The target equations can be used to train End2end directly. By contrast, further pre-processing is necessary for Tagging and Recurrence. Training targets for Tagging are tag sequences, while those for Recurrence are editing actions. Models. After testing Transformer (Vaswani et al., 2017) and a range of modern RNNs (Mikolov et al., 2010;LeCun et al., 2015), we focus on the overall best-performed architecture -bidirectional LSTM (Schuster and Paliwal, 1997;Hochreiter and Schmidhuber, 1997) with an attention mechanism (Luong et al., 2015). Throughout all the experiments, three inference methods share the same model structure with d model = 512, d embedding = 512, n layers = 1, r learning = 10 −5 , r teacher forcing = 0.5, and r dropout = 0.5 (Srivastava et al., 2014). Parameters are uniformly initialized To prevent uncontrolled interference, we train all models from scratch instead of pre-training. We use Adam optimizer (Kingma and Ba, 2015) with an L2 gradient clipping of 5.0 (Pascanu et al., 2013). Evaluation. We evaluate methods by three metrics: token accuracy, sequence accuracy, and equation accuracy. Token accuracy marks the correct predictions at the token-level divided by the target sequence length and then averaged by the test size. Sequence accuracy stands for the correct predictions at the sequence-level divided by the test size. Equation accuracy is the number of true predicted equations divided by the test size; it emphasizes on whether an equation holds rather than whether an equation is the same as the target. We evaluate the performance via equation accuracy for AOR (one-to-many), sequence accuracy for AES (manyto-one), and both equation accuracy and sequence accuracy for AEC (many-to-many). Sequence accuracy is accompanied by token accuracy for additional reference. Training. We train on a single GeForce RTX Titan with a batch size of 256. The last batch is dropped if it does not contain 256 samples. To ensure convergence, we adopt early stopping (Prechelt, 1998) with a patience of 512 epochs.

Arithmetic Operators Restoration
Data. Experiments are performed on a dataset with N = 10, L = 5, and D = 10K. For Tagging, the tags are KEEP and INSERT TOKEN AOR , where For Recurrence, the set of editing actions is defined as A AOR := {a = (e, p, s) | ∀e ∈ E, p ∈ P, s ∈ S)}, where E is an empty set since there is only one operation, insertion, and thus omitted; P := {p | p ∈ {0, · · · , |x|}}; and S = TOKEN AOR . For a given action a = (p, s), the interpreter inserts s before x p (see Table 2). Table 3, Recurrence Online outperforms End2end Online by 29.20% and Tagging Online by 7.13%, achieving an equation accuracy of 58.53%. Hence, Recurrence Online has the best performance. Note that online training is critical for Recurrence to achieve good performance as Recurrence Online outperforms Recurrence Offline by 27.40%, whilst online training only helps to improve the performance of Tagging by 0.87% and End2end by 2.86%.  Table 3: Evaluation results of three inference methods on AOR, AES, and AEC with specific N , L, and D. Data. We first experiment with N = 10, L = 5, and D = 10K, but all methods can reach a nearperfect sequence accuracy (see Figure 3). Therefore, we adjust N from 10 to 100 to make the task more challenging. A target sequence to train Tagging is a sequence of tags consisting of KEEP, DELETE, and SUBSTITUTE TOKEN AES , where TOKEN AES ∈ N . For Recurrence, target editing actions are A AES := {a = (e, p, s) | ∀e ∈ E, p ∈ P, s ∈ S)}, where the default operation is substitution, so E is an empty set and omitted; P := {p = [p 1 , p 2 ] | p i ∈ {0, · · · , |x|}, ∀i = 1, 2}; S = TOKEN AES . This editing action instructs the interpreter to replace the part between x p 1 and x p 2 with TOKEN AES (see Table 2

Arithmetic Equation Correction
Data. We use a dataset with N = 10, L = 5, and D = 10K. To freeze the sequence length of a, we repeat p at a 3 to replace s when e = DELETE. During interpreting, e = DELETE directs to remove x p ; e = SUBSTITUTE guides to replace x p with s; e = INSERT means to insert s before x p (see Table 2).
Results. Recurrence Online attains higher scores over the other two methods, resulting in a sequence accuracy of 57.47% and an equation accuracy of 58.27%. The performance edge of Recurrence is not obvious due to the task setting. In section 6, we adjust the task to distinguish the performance of each method more easily. When applying online training, we observe improvements in all three methods. Particularly, Recurrence Online takes around 50K epochs less than Recurrence Offline and attains a better performance.

Analysis
As shown in section 5, Recurrence outperforms End2end and Tagging in all three tasks in our experiment settings. In this section, we explore the limits of Recurrence by running experiments with varying values of N , L and D, so as to determine in what scenario Recurrence performs well (see Figure 4). The Impact of N . We conduct experiments with L = 5, D = 50K, and N increasing from 10 to 50 with an interval of 10 for AOR; L = 5, D = 10K, and N increasing from 100 to 300 with an interval of 50 for AES; and L = 5, D = 10K, and N increasing from 10 to 50 with an interval of 10 for AEC. For AOR, Recurrence Online and Tagging show similar resilience, however, Tagging Offline performs better when N ≥ 20. For AES, Recurrence Online performs much better than Tagging and End2end (by at least 20%) when N ≤ 150. Note that End2end performs bad when N ≥ 100 with End2end Offline learns hardly anything. We also observe that End2end Offline can achieve a near-perfect performance when N = 10.
These results indicate that the End2end Offline 's performance declines rapidly as N increases and re-quires a much larger D-to-N ratio to perform well. Finally, for AEC, Recurrence Online displays the most resilience and performs the best. The Impact of L. We conduct experiments with N = 10, D = 50K, and L increasing from 5 to 9 with an interval of 1 for AOR; N = 10, D = 50K, and L increasing from 3 to 7 with an interval of 1 for AES; and settings identical to AOR for AEC. For AOR and AES, both Recurrence Online and Tagging show similar trend, however, Recurrence Online performs the best. For AEC, while Recurrence Online still outperforms Tagging, End2end Offline performs the best for L ≥ 7 and shows more resilience. We think when N = 10, the AEC task is too easy for End2end with 50K training data. Thus, we increase N from 10 to 100 and find that End2end cannot gain any performance within 512 epochs (i.e., accuracy is 0%). We want to stress that when the amount of data cannot counter the increase of L, which is the case for AOR and AES, End2end's performance declines faster than Recurrence and Tagging. The Impact of D. We conduct experiments with N = 10, L = 5, and D increasing from 10K to 50K with an interval of 10K for AOR; N = 100, L = 5, and D increasing from 10K to 50K with an interval of 10K for AES; and settings identical to AOR for AEC. All models benefit from the increasing of D as expected. However, it is clear that Recurrence Online is the best performing model when D is small. The only exception is that for AEC, End2end has similar performance trend as Recurrence. As discussed before, this is likely because End2end performs well with small N . The Impact of Online Training. When comparing the performance between online and offline training, the online training, as expected, generally has better performances than offline training for End2end and Tagging with only a few exceptions. Note that online training is not part of the standard training procedure for End2end and Tagging, however, we use online training with End2end and Tagging for the sake of a fair comparison. Therefore, for End2end and Tagging, the online training acts like a data augmentation technique, providing more data points for training. Surprisingly, offline training also allows Recurrence to gain some editing ability, at times better than End2end and Tagging. We believe for text editing tasks with very localized editing actions, such as AES, showing the immediate editing actions are enough for the model to generalize proper editing actions. In other words, when the editing actions are less sequentially dependent, even offline training enables Recurrence to achieve performance better than End2end and Tagging. This supports our intuition that letting the programmer produce one single editing step reduces the learning difficulty. The Impact of Ordering. In early experiments, We find that the programmer cannot converge if the data guide it to edit a sequence in a random order (a mixture of both left-to-right and right-to-left). Hence, we think ordering matters for not only text generation (Ford et al., 2018) but also Recurrence in text editing. One of our assumptions is that random ordering may assign various actions to the same text state, and thus causes confusion in the list of actions used to edit the input text x to the output text y. When there are conflicting sample pairs in the training data set, the model cannot easily converge. We leave this problem for future work.
To summarize our findings, under settings with moderate or large N and L, End2end performs much worse than Tagging and Recurrence with limited data. Tagging performs slightly better than Recurrence when N gets larger with fixed D and L in AOR (short-to-long). However, Tagging performs worse than Recurrence in all other cases. Therefore, we conclude that Recurrence is more data-efficient and overall better performs than End2end and Tagging in most situations, especially in AES (long-to-short).

Conclusions and Future Work
We propose a recurrent inference method, Recurrence, that edits a given text sequence iteratively such that in each iteration the programmer determines a single step of editing action and the interpreter executes the action. Our method outperforms the other two inference methods, End2end and Tagging, in three arithmetic equation editing tasks we introduced. For future work, we plan to apply Recurrence to open-domain natural language data and investigate on how to relax its need for intermediate editing steps as extra supervision signals. We also wish to experiment with applying pointer attention  to replace the position component in actions.