Fact-based Text Editing

We propose a novel text editing task, referred to as fact-based text editing, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-based text editing, where each instance consists of a draft text, a revised text, and several facts represented in triples. We apply the method into two public table-to-text datasets, obtaining two new datasets consisting of 233k and 37k instances, respectively. Next, we propose a new neural network architecture for fact-based text editing, called FactEditor, which edits a draft text by referring to given facts using a buffer, a stream, and a memory. A straightforward approach to address the problem would be to employ an encoder-decoder model. Our experimental results on the two datasets show that FactEditor outperforms the encoder-decoder approach in terms of fidelity and fluency. The results also show that FactEditor conducts inference faster than the encoder-decoder approach.


Introduction
Automatic editing of text by computer is an important application, which can help human writers to write better documents in terms of accuracy, fluency, etc. The task is easier and more practical than the automatic generation of texts from scratch and is attracting attention recently Yin et al., 2019). In this paper, we consider a new and specific setting of it, referred to as fact-based text editing, in which a draft text and several facts (represented in triples) are given, and the system * The work was done when Hayate Iso was a research intern at ByteDance AI Lab. Revised text Baymax was created by American creators Duncan Rouleau and Steven T. Seagle . Baymax is a character in Big Hero 6 which stars Scott Adsit . Table 1: Example of fact-based text editing. Facts are represented in triples. The facts in green appear in both draft text and triples. The facts in orange are present in the draft text, but absent from the triples. The facts in blue do not appear in the draft text, but in the triples. The task of fact-based text editing is to edit the draft text on the basis of the triples, by deleting unsupported facts and inserting missing facts while retaining supported facts.
aims to revise the text by adding missing facts and deleting unsupported facts. Table 1 gives an example of the task. As far as we know, no previous work did address the problem. In a text-to-text generation, given a text, the system automatically creates another text, where the new text can be a text in another language (machine translation), a summary of the original text (summarization), or a text in better form (text editing). In a table-to-text generation, given a table containing facts in triples, the system automatically composes a text, which describes the facts. The former is a text-to-text problem, and the latter a table-to-text problem. In comparison, fact-based text editing can be viewed as a 'text&table-to-text' problem.
First, we devise a method for automatically creating a dataset for fact-based text editing. Recently, several table-to-text datasets have been created and released, consisting of pairs of facts and corresponding descriptions. We leverage such kind of data in our method. We first retrieve facts and their descriptions. Next, we take the descriptions as revised texts and automatically generate draft texts based on the facts using several rules. We build two datasets for fact-based text editing on the basis of WEBNLG (Gardent et al., 2017) and ROTOWIRE, consisting of 233k and 37k instances respectively (Wiseman et al., 2017) Second, we propose a model for fact-based text editing called FACTEDITOR. One could employ an encoder-decoder model, such as an encoderdecoder model, to perform the task. The encoderdecoder model implicitly represents the actions for transforming the draft text into a revised text. In contrast, FACTEDITOR explicitly represents the actions for text editing, including Keep, Drop, and Gen, which means retention, deletion, and generation of word respectively. The model utilizes a buffer for storing the draft text, a stream to store the revised text, and a memory for storing the facts. It also employs a neural network to control the entire editing process. FACTEDITOR has a lower time complexity than the encoder-decoder model, and thus it can edit a text more efficiently.
Experimental results show that FACTEDITOR outperforms the baseline model of using encoderdecoder for text editing in terms of fidelity and fluency, and also show that FACTEDITOR can perform text editing faster than the encoder-decoder model.
The rise of encoder-decoder models (Cho et al., 2014;Sutskever et al., 2014) as well as the attention (Bahdanau et al., 2015;Vaswani et al., 2017) and copy mechanisms (Gu et al., 2016;Gulcehre et al., 2016) has dramatically changed the landscape, and now one can perform the task relatively easily with an encoder-decoder model such as Transformer provided that a sufficient amount of data is available. For example, Li et al. (2018) introduce a deep reinforcement learning framework for paraphrasing, consisting of a generator and an evaluator. Yin et al. (2019) formalize the problem of text edit as learning and utilization of edit representations and propose an encoder-decoder model for the task. Zhao et al. (2018) integrate paraphrasing rules with the Transformer model for text simplification. Zhao et al. (2019) proposes a method for English grammar correction using a Transformer and copy mechanism.
Another approach to text editing is to view the problem as sequential tagging instead of encoderdecoder. In this way, the efficiency of learning and prediction can be significantly enhanced. Vu and Haffari (2018) and  conduct automatic post-editing and text simplification on the basis of edit operations and employ Neural Programmer-Interpreter (Reed and De Freitas, 2016) to predict the sequence of edits given a sequence of words, where the edits include KEEP, DROP, and ADD. Malmi et al. (2019) propose a sequential tagging model that assigns a tag (KEEP or DELETE) to each word in the input sequence and also decides whether to add a phrase before the word. Our proposed approach is also based on sequential tagging of actions. It is designed for fact-based text editing, not text-to-text generation, however. Table-to-text generation is the task which aims to generate a text from structured data (Reiter and Dale, 2000;Gatt and Krahmer, 2018), for example, a text from an infobox about a term in biology in wikipedia (Lebret et al., 2016) and a description of restaurant from a structured representation (Novikova et al., 2017). Encoder-decoder models can also be employed in table-to-text generation with structured data as input and generated text as output, for example, as in (Lebret et al., 2016). Puduppully et al. (2019) and Iso et al. (2019) propose utilizing an entity tracking module for document-level table-to-text generation.

Table-to-Text Generation
One issue with table-to-text is that the style of generated texts can be diverse (Iso et al., 2019). Re-y AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission that was operated by PATIENT-2 . x AGENT-1 served as PATIENT-3 was a crew member of the BRIDGE-1 mission . x AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission .
(a) Example for insertion. The revised template y and the reference templatex share subsequences. The set of triple templates T \T is {(BRIDGE-1, operator, PATIENT-2)}. Our method removes "that was operated by PATIENT-2" from the revised template y to create the draft template x .
y AGENT-1 was created by BRIDGE-1 and PATIENT-2 . x The character of AGENT-1 , whose full name is PATIENT-1 , was created by BRIDGE-1 and PATIENT-2 . x AGENT-1 , whose full name is PATIENT-1 , was created by BRIDGE-1 and PATIENT-2 .
(b) Example for deletion. The revised template y and the reference templatex share subsequences. The set of triple templateŝ T \T is {(AGENT-1, fullName, PATIENT-1)}. Our method copies "whose full name is PATIENT-1" from the reference template x to create the draft template x . searchers have developed methods to deal with the problem using other texts as templates Peng et al., 2019). The difference between the approach and factbased text editing is that the former is about tableto-text generation based on other texts, while the latter is about text-to-text generation based on structured data.

Data Creation
In this section, we describe our method of data creation for fact-based text editing. The method automatically constructs a dataset from an existing table-to-text dataset.

Data Sources
There are two benchmark datasets of table-totext, WEBNLG (Gardent et al., 2017) 2 and RO-TOWIRE(Wiseman et al., 2017) 3 . We create two datasets on the basis of them, referred to as WEBE-DIT and ROTOEDIT respectively. In the datasets, each instance consists of a table (structured data) and an associated text (unstructured data) describing almost the same content. 4 . For each instance, we take the table as triples of facts and the associated text as a revised text, and we automatically create a draft text. The set of triples is represented as T = {t}. Each triple t consists of subject, predicate, and object, denoted 2 The data is available at https://github.com/ ThiagoCF05/webnlg. We utilize version 1.5. 3 We utilize the ROTOWIRE-MODIFIED data provided by Iso et al. (2019) available at https://github.com/ aistairc/rotowire-modified. The authors also provide an information extractor for processing the data. 4 In ROTOWIRE, we discard redundant box-scores and unrelated sentences using the information extractor and heuristic rules. as t = (subj, pred, obj). For simplicity, we refer to the nouns or noun phrases of subject and object simply as entities. The revised text is a sequence of words denoted as y. The draft text is a sequence of words denoted as x.
Given the set of triples T and the revised text y, we aim to create a draft text x, such that x is not in accordance with T , in contrast to y, and therefore text editing from x to y is needed.

Procedure
Our method first creates templates for all the sets of triples and revised texts and then constructs a draft text for each set of triples and revised text based on their related templates.

Creation of templates
For each instance, our method first delexicalizes the entity words in the set of triples T and the revised text y to obtain a set of triple templates T and a revised template y . For example, given T ={(Baymax, voice, Scott Adsit)} and y ="Scott Adsit does the voice for Baymax", it produces the set of triple templates T ={(AGENT-1, voice, PATIENT-1)} and the revised template y ="AGENT-1 does the voice for PATIENT-1". Our method then collects all the sets of triple templates T and revised templates y and retains them in a key-value store with y being a key and T being a value.

Creation of draft text
Next, our method constructs a draft text x using a set of triple templates T and a revised template y . For simplicity, it only considers the use of either insertion or deletion in the text editing, and one can easily make an extension of it to a more complex setting. Note that the process of data creation is reverse to that of text editing.
Given a pair of T and y , our method retrieves another pair denoted asT andx , such that y and x have the longest common subsequences. We refer tox as a reference template. There are two possibilities;T is a subset or a superset of T .
(We ignore the case in which they are identical.) Our method then manages to change y to a draft template denoted as x on the basis of the relation between T andT . IfT T , then the draft template x created is for insertion, and ifT T , then the draft template x created is for deletion.
For insertion, the revised template y and the reference templatex share subsequences, and the set of triples T \T appear in y but not inx . Our method keeps the shared subsequences in y , removes the subsequences in y about T \T , and copies the rest of words in y , to create the draft template x . Table 2a gives an example. The shared subsequences "AGENT-1 performed as PATIENT-3 on BRIDGE-1 mission" are kept. The set of triple templates T \T is {(BRIDGE-1, operator, PATIENT-2)}. The subsequence "that was operated by PATIENT-2" is removed. Note that the subsequence "served" is not copied because it is not shared by y andx .
For deletion, the revised template y and the reference templatex share subsequences. The set of triplesT \T appear inx but not in y . Our method retains the shared subsequences in y , copies the subsequences inx aboutT \T , and copies the rest of words in y , to create the draft template x . Table 2b gives an example. The subsequences "AGENT-1 was created by BRIDGE-1 and PATIENT-2" are retained. The set of triple templatesT \T is {(AGENT-1, full-Name, PATIENT-1)}. The subsequence "whose full name is PATIENT-1" is copied. Note that the subsequence "the character of" is not copied because it is not shared by y andx .
After getting the draft template x , our method lexicalizes it to obtain a draft text x, where the lexicons (entity words) are collected from the corresponding revised text y.
We obtain two datasets with our method, referred to as WEBEDIT and ROTOEDIT, respectively. Table 3 gives the statistics of the datasets.
In the WEBEDIT data, sometimes entities only appear in the subj's of triples. In such cases, we also make them appear in the obj's. To do so, we  introduce an additional triple (ROOT, IsOf, subj) for each subj, where ROOT is a dummy entity.

FACTEDITOR
In this section, we describe our proposed model for fact-based text editing referred to as FACTEDITOR.

Model Architecture
FACTEDITOR transforms a draft text into a revised text based on given triples. The model consists of three components, a buffer for storing the draft text and its representations, a stream for storing the revised text and its representations, and a memory for storing the triples and their representations, as shown in Figure 1. FACTEDITOR scans the text in the buffer, copies the parts of text from the buffer into the stream if they are described in the triples in the memory, deletes the parts of the text if they are not mentioned in the triples, and inserts new parts of next into the stream which is only presented in the triples.
The architecture of FACTEDITOR is inspired by those in sentence parsing Dyer et al. (2015); Watanabe and Sumita (2015). The actual processing of FACTEDITOR is to generate a sequence of words into the stream from the given sequence of words in the buffer and the set of triples in the memory. A neural network is employed to control the entire editing process.

Neural Network
Initialization FACTEDITOR first initializes the representations of content in the buffer, stream, and memory.
There is a feed-forward network associated with the memory, utilized to create the embeddings of triples. Let M denote the number of triples. The embedding of triple t j , j = 1, · · · , M is calculated as where W t and b t denote parameters, e subj j , e pred j , e obj j denote the embeddings of subject, predicate, and object of triple t j , and [ ; ] denotes vector concatenation.
There is a bi-directional LSTM associated with the buffer, utilized to create the embeddings of words of draft text. The embeddings are obtained as b = BILSTM(x), where x = (x 1 , . . . , x N ) is the list of embeddings of words and b = (b 1 , . . . , b N ) is the list of representations of words, where N denotes the number of words.
There is an LSTM associated with the stream for representing the hidden states of the stream. The first hidden state is initialized as where W s and b s denotes parameters.

Action prediction
FACTEDITOR predicts an action at each time t using the LSTM. There are three types of action, namely Keep, Drop, and Gen. First, it composes a context vectort t of triples at time t using attentiont where α t,j is a weight calculated as where v α and W α are parameters. Then, it creates the hidden state z t for action prediction at time t where W z and b z denote parameters. Next, it calculates the probability of action a t P (a t | z t ) = softmax(W a · z t ) where W a denotes parameters, and chooses the action having the largest probability.

Action execution
FACTEDITOR takes action based on the prediction result at time t. For Keep at time t, FACTEDITOR pops the top embedding b t in the buffer, and feeds the combination of the top embedding b t and the context vector of triplest t into the stream, as shown in Fig. 1a. The state of stream is updated with the LSTM as s t+1 = LSTM([t t ; b t ], s t ). FACTEDITOR also copies the top word in the buffer into the stream.
For Drop at time t, FACTEDITOR pops the top embedding in the buffer and proceeds to the next state, as shown in Fig. 1b. The state of stream is updated as s t+1 = s t . Note that no word is inputted into the stream.
For Gen at time t, FACTEDITOR does not pop the top embedding in the buffer. It feeds the Draft text x Bakewell pudding is Dessert that can be served Warm or cold .
Revised text y Bakewell pudding is Dessert that originates from Derbyshire Dales . combination of the context vector of triplest t and the linearly projected embedding of word w into the stream, as shown in Fig. 1c. The state of stream is updated with the LSTM as s t+1 = LSTM([t t ; W p y t ], s t ), where y t is the embedding of the generated word y t and W p denotes parameters. In addition, FACTEDITOR copies the generated word y t into the stream. FACTEDITOR continues the actions until the buffer becomes empty.

Word generation
FACTEDITOR generates a word y t at time t, when the action is Gen, where W y is parameters.
To avoid generation of OOV words, FACTEDI-TOR exploits the copy mechanism. It calculates the probability of copying the object of triple t j where v c and W c denote parameters, and o j is the object of triple t j . It also calculates the probability of gating where w g and b g are parameters. Finally, it calculates the probability of generating a word w t through either generation or copying, where it is assumed that the triples in the memory have the same subject and thus only objects need to be copied.

Model Learning
The conditional probability of sequence of actions a = (a 1 , a 2 , · · · , a T ) given the set of triples T and the sequence of input words x can be written as where P (a t | z t ) is the conditional probability of action a t given state z t at time t and T denotes the number of actions.
The conditional probability of sequence of generated words y = (y 1 , y 2 , · · · , y T ) given the sequence of actions a can be written as where P (y t | a t ) is the conditional probability of generated word y t given action a t at time t, which is calculated as Note that not all positions have a generated word. In such a case, y t is simply a null word.
The learning of the model is carried out via supervised learning. The objective of learning is to minimize the negative log-likelihood of P (a | T , x) and P (y | a) where θ denotes the parameters.
A training instance consists of a pair of draft text and revised text, as well as a set of triples, denoted as x, y, and T respectively. For each instance, our method derives a sequence of actions denoted as a, in a similar way as that in . It first finds the longest common subsequence between x and y, and then selects an action of Keep, Drop, or Gen at each position, according to how y is obtained from x and T (cf., Tab. 4). Action Gen is preferred over action Drop when both are valid.  Table Encoder Text Encoder Decoder y x T (c) ENCDECEDITOR Figure 2: Model architectures of the baselines. All models employ attention and copy mechanism.

Time Complexity
The time complexity of inference in FACTEDITOR is O(N M ), where N is the number of words in the buffer, and M is the number of triples. Scanning of data in the buffer is of complexity O(N ). The generation of action needs the execution of attention, which is of complexity O(M ). Usually, N is much larger than M .

Baseline
We consider a baseline method using the encoderdecoder architecture, which takes the set of triples and the draft text as input and generates a revised text. We refer to the method as ENCDECEDITOR. The encoder of ENCDECEDITOR is the same as that of FACTEDITOR. The decoder is the standard attention and copy model, which creates and utilizes a context vector and predicts the next word at each time.
The time complexity of inference in ENCDE-CEDITOR is O(N 2 + N M ) (cf., Britz et al. (2017)). Note that in fact-based text editing, usually N is very large. That means that ENCDECEDITOR is less efficient than FACTEDITOR.

Experiment
We conduct experiments to make comparison between FACTEDITOR and the baselines using the two datasets WEBEDIT and ROTOEDIT.

Experiment Setup
The main baseline is the encoder-decoder model ENCDECEDITOR, as explained above. We further consider three baselines, No-Editing, Table-to-Text, and Text-to-Text. In No-Editing, the draft text is directly used. In Table-to-Text, a revised text is generated from the triples using encoder-decoder. In Text-to-Text, a revised text is created from the draft text using the encoder-decoder model. Figure  2 gives illustrations of the baselines.
We evaluate the results of revised texts by the models from the viewpoint of fluency and fidelity.
We utilize ExactMatch (EM), BLEU (Papineni et al., 2002) and SARI (Xu et al., 2016) scores 5 as evaluation metrics for fluency. We also utilize precision, recall, and F1 score as evaluation metrics for fidelity. For WEBEDIT, we extract the entities from the generated text and the reference text and then calculate the precision, recall, and F1 scores. For ROTOEDIT, we use the information extraction tool provided by Wiseman et al. (2017) for calculation of the scores.
For the embeddings of subject and object for both datasets and the embedding of the predicate for ROTOEDIT, we simply use the embedding lookup table. For the embedding of the predicate for WEBEDIT, we first tokenize the predicate, lookup the embeddings of lower-cased words from the table, and use averaged embedding to deal with the OOV problem (Moryossef et al., 2019).
We tune the hyperparameters based on the BLEU score on a development set. For WEBEDIT, we set the sizes of embeddings, buffers, and triples to 300, and set the size of the stream to 600. For ROTOEDIT, we set the size of embeddings to 100 and set the sizes of buffers, triples, and stream to 200. The initial learning rate is 2e-3, and AMS-Grad is used for automatically adjusting the learning rate (Reddi et al., 2018). Our implementation makes use of AllenNLP (Gardner et al., 2018).

Quantitative evaluation
We present the performances of our proposed model FACTEDITOR and the baselines on factbased text editing in Table 5. One can draw several conclusions from the results.
First, our proposed model, FACTEDITOR, achieves significantly better performances than the main baseline, ENCDECEDITOR, in terms of almost all measures. In particular, FACTEDITOR  obtains significant gains in DELETE scores on both WEBEDIT and ROTOEDIT. Second, the fact-based text editing models (FACTEDITOR and ENCDECEDITOR) significantly improve upon the other models in terms of fluency scores, and achieve similar performances in terms of fidelity scores.
Third, compared to No-Editing, Table-to-Text has higher fidelity scores, but lower fluency scores. Text-to-Text has almost the same fluency scores, but lower fidelity scores on ROTOEDIT.

Qualitative evaluation
We also manually evaluate 50 randomly sampled revised texts for WEBEDIT. We check whether the revised texts given by FACTEDITOR and ENCDE-CEDITOR include all the facts. We categorize the factual errors made by the two models. Table 6 shows the results. One can see that FACTEDITOR covers more facts than ENCDECEDITOR and has less factual errors than ENCDECEDITOR.
FACTEDITOR has a larger number of correct editing (CQT) than ENCDECEDITOR for fact-based text editing. In contrast, ENCDECEDITOR often includes a larger number of unnecessary rephrasings (UPARA) than FACTEDITOR.  There are four types of factual errors: fact repetitions (RPT), fact missings (MS), fact unsupported (USUP), and relation difference (DREL). Both FACTEDITOR and ENCDECEDITOR often fail to insert missing facts (MS), but rarely insert unsupported facts (USUP). ENCDECEDITOR often generates the same facts multiple times (RPT) or facts in different relations (DREL). In contrast, FACTE-DITOR can seldomly make such errors. Table 7 shows an example of results given by ENCDECEDITOR and FACTEDITOR. The revised texts of both ENCDECEDITOR and FACTEDITOR appear to be fluent, but that of FACTEDITOR has higher fidelity than that of ENCDECEDITOR. ENCDECEDITOR cannot effectively eliminate the Set of triples {(Ardmore Airport, runwayLength, 1411.0), (Ardmore Airport, 3rd runway SurfaceType, Poaceae), (Ardmore Airport, operatingOrganisation, Civil Aviation Authority of New Zealand), (Ardmore Airport, elevationAboveTheSeaLevel, 34.0), (Ardmore Airport, runwayName, 03R/21L)}

Draft text
Ardmore Airport , ICAO Location Identifier UTAA . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

Revised text
Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport name is 03R/21L . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .

ENCDECEDITOR
Ardmore Airport , ICAO Location Identifier UTAA , is operated by Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 m long .

FACTEDITOR
Ardmore Airport is operated by Civil Aviation Authority of New Zealand . Ardmore Airport 3rd runway is made of Poaceae and Ardmore Airport . 03R/21L is 1411.0 m long and Ardmore Airport is 34.0 above sea level .  description about an unsupported fact (in orange) appearing in the draft text. In contrast, FACTEDI-TOR can deal with the problem well. In addition, ENCDECEDITOR conducts an unnecessary substitution in the draft text (underlined). FACTEDITOR tends to avoid such unnecessary editing.

Runtime analysis
We conduct runtime analysis on FACTEDITOR and the baselines in terms of number of processed words per second, on both WEBEDIT and RO-TOEDIT. Table 8 gives the results when the batch size is 128 for all methods. Table-to-Text is the fastest, followed by FACTEDITOR. FACTEDITOR is always faster than ENCDECEDITOR, apparently because it has a lower time complexity, as explained in Section 4. The texts in WEBEDIT are relatively short, and thus FACTEDITOR and ENCDE-CEDITOR have similar runtime speeds. In contrast, the texts in ROTOEDIT are relatively long, and thus FACTEDITOR executes approximately two times faster than ENCDECEDITOR.

Conclusion
In this paper, we have defined a new task referred to as fact-based text editing and made two contributions to research on the problem. First, we have proposed a data construction method for fact-based text editing and created two datasets. Second, we have proposed a model for fact-based text editing, named FACTEDITOR, which performs the task by generating a sequence of actions. Experimental results show that the proposed model FACTEDI-TOR performs better and faster than the baselines, including an encoder-decoder model.