Controlling Text Complexity in Neural Machine Translation

This work introduces a machine translation task where the output is aimed at audiences of different levels of target language proficiency. We collect a high quality dataset of news articles available in English and Spanish, written for diverse grade levels and propose a method to align segments across comparable bilingual articles. The resulting dataset makes it possible to train multi-task sequence to sequence models that can translate and simplify text jointly. We show that these multi-task models outperform pipeline approaches that translate and simplify text independently.


Introduction
Generating text at the right level of complexity can make machine translation (MT) more useful for a wide range of users.As Xu et al. (2015) note, simplifying text makes it possible to develop reading aids for people with low-literacy (Watanabe et al., 2009;De Belder and Moens, 2010), for non-native speakers and language learners (Petersen and Ostendorf, 2007;Allen, 2009), for people who suffer from language impairments (Carroll et al., 1999;Canning et al., 2000;Inui et al., 2003), and for readers lacking expert knowledge of the topic discussed (Elhadad and Sutaria, 2007;Siddharthan and Katsos, 2010).Such readers would also benefit from MT output that is better targeted to their needs by being easier to read than the original.
Prior work on text complexity has focused on simplifying input text in one language, primarily English (Chandrasekar et al., 1996;Coster and Kauchak, 2011;Siddharthan, 2014;Xu et al., 2015;Zhang and Lapata, 2017;Scarton and Specia, 2018;Kriz et al., 2019;Nishihara et al., 2019).Simplification has been used to improve MT by restructuring complex sentences into shorter and simpler segments that are easier to translate (Gerber and Hovy, 1998;Štajner and Popovic, 2016;Hasler et al., 2017).Contemporaneously to our work, Marchisio et al. (2019) show that tagging the English side of parallel corpora with automatic readability scores can help translate the same input in a simpler or more complex form.Our work shares the goal of controlling translation complexity, but considers a broader range of reading grade levels and simplification operations grounded in professionally edited text simplification corpora.
Building a model for this task ideally requires rich annotation for evaluation and supervised training that is not available in bilingual parallel corpora typically used in MT.Controlling the complexity of Spanish-English translation ideally requires examples of Spanish sentences paired with several English translations that span a range of complexity levels.We collect such a dataset of English-Spanish segment pairs from the Newsela website, which provides professionally edited simplifications and translations.By contrast with MT parallel corpora, the English and Spanish translations at different grade levels are only comparable.They differ in length and sentence structure, reflecting complex syntactic and lexical simplification operations.
We adopt a multi-task approach to control complexity in neural MT and evaluate it on complexity-controlled Spanish-English translation.Our extensive empirical study shows that multitask models produce better and simpler translations than pipelines of independent translation and simplification models.We then analyze the strengths and weaknesses of multitask models, focusing on the degree to which they match the target complexity, and the impact of training data types and reading grade level annotation.Given corpora of parallel complex-simple segments, text simplification can naturally be framed as a translation task, borrowing and adapting model architectures originally designed for MT. Xu et al. (2016) provide a thorough study of statistical MT techniques for English text simplification, and introduce novel objectives to measure simplification quality.Interestingly, they indirectly make use of parallel translation corpora to derive simplification paraphrasing rules by bilingual pivoting (Callison-Burch, 2007).Zhang and Lapata (2017) train sequence-to-sequence models to translate from complex to simple English using reinforcement learning to directly optimize the metrics that evaluate complexity (SARI) and fluency and adequacy (BLEU).Scarton and Specia (2018) address the task of producing text of varying levels of complexity for different target audiences.They show that neural sequence-tosequence models informed by target-complexity tokens inserted in the input sequence perform well on this task.While the vast majority of text simplification work has focused on English, Spanish (Štajner et al., 2015), Italian (Brunato et al., 2016;Aprosio et al., 2019) and German (Klaper et al., 2013) have also received some attention.
While most MT approaches only indirectly capture style properties (e.g., via domain adaptation), a growing number of studies share the goal of considering source texts and their translation in their pragmatic context.Mirkin and Meunier (2015) introduce personalized MT.Rabinovich et al. (2016) and Vanmassenhove et al. (2018) suggest that the gender of the author is implicitly marked in the source text and that dedicated statistical and neural systems better preserve gender traits in MT output.Neural MT has enabled more flexible ways to control stylistic properties of MT output.Sennrich et al. (2016) first propose to append a special token to the source that neural MT models can attend to and to select the formal (Sie) or informal (du) version of second person pronouns when translating into German.Niu et al. (2018) show that multi-task models can jointly translate between languages and styles, producing formal and informal translations with broader lexical and phrasal at https://Newsela.com/data/.Scripts to replicate our model configurations and our cross-lingual segment aligner are available at https://github.com/sweta20/ComplexityControlledMT.changes than the local pronoun changes in Sennrich et al. (2016).Closest to our goal, Marchisio et al. (2019) address the task of producing either simple or complex translations of the same input, using automatic readability scoring of parallel corpora.They show that training distinct decoders for simple and complex language allows better complexity control than using the target complexity as a side-constraint.By contrast, our approach exploits text simplification corpora for richer supervision for both training and evaluation.

A Multi-Task Approach to Complexity
Controlled MT Task We define complexity controlled MT as a task that takes two inputs: an input language segment s i and a target complexity c.The goal is to produce a translation s o in the output language that has complexity c.For instance, given input Spanish sentences in Table 1, complexity controlled MT aims to produce English translations at a specific level of complexity, which might differ from the complexity of the original Spanish.
Model We model P (s o |s i , c; θ) as a neural encoder-decoder with attention (Bahdanau et al., 2015).This architecture has been used successfully for the two related tasks of text simplification (Wang et al., 2016;Zhang and Lapata, 2017;Nisioi et al., 2017;Scarton and Specia, 2018) and machine translation (Bahdanau et al., 2015).The encoder constructs hidden representation for each word in the input sequence, while the decoder generates the target sequence, conditioned on hidden source representations.We hypothesize that training a single encoder-decoder model to perform the two distinct tasks of machine translation and text simplification will yield a model that can perform complexity controlled MT.We adopt the multitask framework proposed by Johnson et al. (2016) to train multilingual neural MT systems.
Representing target complexity Target complexity c can be incorporated in sequence-tosequence models as a special token appended to the beginning of the input sequence, which acts as a side constraint.The encoder encodes this token in its hidden states as any other vocabulary token, and the decoder can attend to this representation to guide the generation of the output sequence.This simple strategy has been used in MT to control second person pronoun forms when translat- ing into German (Sennrich et al., 2016), to indicate the target language in multilingual MT (Johnson et al., 2016), and to obtain formal or informal translations of the same input (Niu et al., 2018).In monolingual text simplification tasks (Scarton and Specia, 2018), the reading grade level has been encoded as such a special token.
• MT samples (s i , s o ): These are sentence pairs drawn from parallel corpora.They are available in large quantities for many language pairs (Tiedemann, 2012) and are used to define the MT loss (2) The multi-task loss is simply obtained by summing the losses from individual tasks: 4 The Newsela Cross-Lingual Simplification Dataset We build on prior work that used the Newsela dataset for English or Spanish text simplification by automatically aligning English and Spanish segments of different complexity to enable complexity-controlled machine translation.
The Newsela website provides high quality data to study text simplification.Xu et al. (2015) argue that text simplification research should be grounded in texts that are simplified by professional editors for specific target audiences, rather than more general-purpose crowd-sourced simplifications such as those available on Wikipedia.They show that Wikipedia is prone to sentence alignment errors, contains a non-negligible amount of inadequate simplifications, and does not generalize well to other text genres.By contrast, Newsela is an instructional content platform meant to help teachers prepare curriculum that match the language skills required at each grade level.The Newsela corpus consists of English articles in their original form, 4 or 5 different versions rewritten by professionals to suit different grade levels as well as optional translations of original and/or simplified English articles into Spanish resulting in 23,130 English and 5,320 Spanish articles with grade annotations respectively.
This section introduces our method to align English and Spanish segments across complexity levels, and the resulting bilingual dataset.

Cross-Lingual Segment Alignment
Extracting training examples from this corpus requires aligning segments within documents.This is challenging because text is neither simplified nor translated sentence by sentence, and as a result, equivalent content might move from one sentence to the next.Past work has introduced techniques to align segments of different complexity within documents of the same language (Xu et al., 2015;Paetzold et al., 2017;Štajner et al., 2018).
Complexity controlled MT requires aligning segments of different complexity in English and Spanish.Existing methods for aligning sentences in English and Spanish parallel corpora are not well suited to this task.For instance, the Gale-Church algorithm (Gale and Church, 1993) assumes that aligned sentences should have similar length.This assumption does not hold if the English article is a simplification of the Spanish article.Consider the following Spanish text and its English translation in Newsela: Spanish: LA HAYA, Holanda -Te has tomado alguna vez una selfie?, Hoy en día es muy fácil.Solo necesitas un teléfono inteligente.
Google Translated English: THE HAGUE, Netherlands -Have you ever taken a selfie?Today is very easy.You only need a smart phone.
Original English Version: THE HAGUE, Netherlands -All you need is a smartphone to take a selfie.It is that easy.
As a result, we adapt a monolingual text simplification aligner for cross-lingual alignment.MAS-SAlign (Paetzold et al., 2017) is a Python library designed to align segments of different length within comparable corpora of the same language.It employs a vicinity-driven search approach, based on the assumption that the order in which information appears is roughly constant in simple and complex texts.A similarity matrix is created between the paragraphs/sentences of aligned documents/paragraphs using a standard bag-of-words TF-IDF model.It finds a starting point to begin the search for an alignment path, allowing long-distance alignment skips, capturing 1-N and N-1 alignments.To leverage this alignment flexibility, we apply MASSAlign to English articles and Spanish articles machine translated into English by Google translate.2An important property of Google translated articles is that they are aligned 1-1 at the sentence level.This lets us deterministically find the Spanish replacement for the aligned Google translated English version returned by MASSAlign.Translation quality is high for this language pair, and even noisy translated articles contain enough signal to construct the similarity matrix required by MASSAlign.

Resulting Dataset
We thus create: both samples for complexity controlled MT (s i , c o , s o ) and traditional monolingual text simplification samples s o , c s o , s o that can be used by the multi-task model (Section 3).Since the properties of Newsela monolingual simplification samples have been studied thoroughly by Xu et al. (2015), we present key statistics for the cross-lingual simplification examples only.Table 2 contrasts Newsela parallel segments with bilingual parallel sentences drawn from the OPUS corpus (Tiedemann, 2009).We use Global Voices and News Commentary from OPUS corpus as it has the most similar domain to the Newsela data.Aligned segments in Newsela are about twice as long as segments in parallel corpora, and contain more than two sentences on each side on average.By contrast, parallel corpora samples align sentences one-to-one on average.Articles are distributed across reading levels spanning grades 2 to 12 for both English and English-Spanish pairs.Table 3 highlights the vocabulary differences among the different grade levels for the Newsela Spanish-English corpus.The vocabulary size of the corpus corresponding to lower grade level is smaller as compared to higher complexity levels.Also, complex sentences have more words per sentence on average but fewer sentences per segment compared to their simplified counterparts.Simple sentences differ from complex sentences in various ways, ranging from sentence splitting and content deletion to paraphrasing and lexical substitutions, as illustrated in Table 1.

Experiment Settings
We evaluate complexity controlled MT using a subset of the 150k Spanish-English segment pairs extracted from Newsela as described in Section 4. We select Spanish and English segments that have different reading grade levels, so that given a Spanish input, the task consists in producing an English translation which is simpler (lower reading grade level) than the Spanish input.The train/development/test split ensures that there is no overlap between articles held out for testing and articles used for training.We refer to the corresponding training examples as MT+simplify since it represents the joint task of translation and simplification.

Evaluation Metrics
We evaluate the truecased detokenized output of our models using three automatic evaluation metrics, drawing from both machine translation and text simplification evaluation.BLEU (Papineni et al., 2002) estimates translation quality based on n-gram overlap between system output and references.However it does not separate mismatches due to meaning errors and mismatches due to simplification errors.SARI (Xu et al., 2016) 3 is designed to evaluate text simplification systems by comparing system output against references and against the input sentence.It explicitly measures the goodness of words that are added, deleted and kept by the systems.Xu et al. (2016) showed that BLEU shows high correlation with human scores for grammaticality and meaning preservation and SARI shows high correlation with human scores for simplicity.In the cross-lingual setting, we cannot directly compare the Spanish input with English hypotheses and references, therefore we use the baseline machine translation of Spanish into English as a pseudo-source text.The resulting SARI score directly measures the improvement over baseline machine translation.
In addition to BLEU and SARI, we report Pearson's correlation coefficient (PCC) to measure the strength of the linear relationship between the complexity of our system outputs and the complexity of reference translations.Heilman et al. (2008) use it to evaluate the performance of reading difficulty prediction.Here we estimate the reading grade level complexity of MT outputs and reference translations using the Automatic Readability Index (ARI) 4 score, which combines evidence from the number of characters per word and number of words per sentence using hand-tuned Source (Spanish)  weights (Senter and Smith, 1967): All datasets are pre-processed using Moses tools for normalization, tokenization and truecasing (Koehn et al., 2007).We further segment tokens into subwords using a joint source-target byte pair encoding model with 32,000 operations (Sennrich et al., 2015).

Sequence-to-Sequence Model Configuration
We use the standard encoder-decoder architecture implemented in the Sockeye toolkit (Hieber et al., 2017).Both encoder and decoder have two Long Short Term Memory (LSTM) layers (Bahdanau et al., 2015), hidden states of size 500 and dropout of 0.3 applied to the RNNs of the encoder and decoder which is same as what was used by Scarton and Specia (2018).We observe that dot product based attention works best in our scenario, perhaps indicating that the task of complexity controlled translation requires mostly local changes that do not lead to long distance reorderings across sentences.We train using the Adam (Kingma and Ba, 2014) optimizer with a batch size of 256 segments and checkpoint the model every 1000 updates.Training stops after 8 checkpoints without improvement of validation perplexity.The vocabulary size is limited to 50000.We decode with a beam size of 5. Grade side-constraints are defined using a distinct special token for each grade level (from 2 to 12).The constraint token corresponds to the grade level of the target instance.

Baseline
We contrast the multi-task system with pipeline based approaches, where translation and simplification are treated as independent consecutive steps.We train a neural MT model to perform translation from Spanish to English and other neural MT models to perform monolingual text simplification for Spanish and English respectively.In the first pipeline setup, the output from the translation model is fed as input to an English simplification model while in the other, the output from the Spanish simplification model is fed as input to an translation model.As Scarton and Specia (2018), we simply use grade level tokens as side constraints on English simplification examples to control output complexity. 5

Evaluation of Complexity Controlled MT
We compare pipeline and multitask models on the Newsela complexity controlled MT task (Table 4).Overall, results show that compared to pipeline models, multitask models produce complexity controlled translations that better match human references according to BLEU.SARI suggests that multitask translations are simpler than baseline translations, and their resulting complexity correlates better with reference grade levels according to PCC.The two pipeline models use the same MT system, therefore the difference between them comes from text simplification: using English simplification (first pipeline) outperforms Spanish simplification (second pipeline) according to BLEU and PCC, but not SARI.This can be explained by the smaller amount of Spanish simplification training data, which yields a model that generalizes poorly.
The "All tasks" model highlights the strengths of the multi-task approach: combining training samples from many tasks yields improvements over the "Translate and Simplify" multitask model which is trained on the exact same data as the pipelines.However, even without additional training data, the multitask "Translate and Simpifly" model improves over baselines mainly by simplifying the output more, which suggests that the simplification component of the multitask model benefits from the additional MT training data. 5Additional constraints based on simplification operations were also used in that work but did not provide substantial benefits when operations are predicted based on the input.
Qualitative analysis suggests that the multi-task model is capable of distinguishing among different grade levels and the simplification operations performed for different grade levels are gradual.Table 5 illustrates simplification operations observed for a fixed grade 12 Spanish input into English with target grade levels ranging from 9 to 3. When translating to a nearby grade level, for example 9, the model roughly translates the entire input.For lower grade levels such as 7 and 5, lexical simplification and sentence splitting is observed.
For the simplest grade level, the model deletes additional content.More examples are provided in the Appendix ( Table 13

Output Grade Analysis
We aim to better understand to what degree models simplify the input text: how often does the output complexity exactly matches that of the reference?Does this change depend on the distance between input and output complexity levels?Table 6 compiles Adjacency Accuracy scores (Heilman et al., 2008), which represent the percentage of sentences where the system output complexity is within 1 or 2 grades of the reference text.We derive the reading grade levels from ARI (Senter and Smith, 1967) and conduct this analysis for the best pipeline ("Translate then Simplify") and multitask models ("All Tasks").These adjacency scores are broken down according to the distance between input and target grade levels.siglo XVII, que destaca las similitudes y diferencias entre las fotos modernas y las obras de arte históricas.9 Now the museum Mauritois is launching an exhibition dedicated to the 18th century authoritations, highlighting the similarities and differences between modern photos and historical artworks.7 The museum is now set to open an exhibition dedicated to the 18th century authoritations, highlighting the similarities and differences between modern photos and historical artworks.5 The museum is now set to open an exhibit dedicated to the 18th century.It highlights the similarities and differences between modern photos and historical artworks.3 The museum is now set to open an exhibit dedicated to the 18th century.It shows the similarities and differences between modern photos and art works.6: Adjacency ARI accuracy within grade level given by Adjacency level for the system output with respect to the target grade: Multitask model is able to better capture the target grade than the pipeline model when the difference between the source and the target grade is greater than 3.
When the source and target grades are close, roughly 60% of system outputs that are within a ±1 window of the correct grade level.The pipeline model matches the target grade slightly better than the multitask model.However, in the more difficult case where the difference between source and target grades is larger than three, the multitask model outperforms the pipeline.Increasing the adjacency window to ±2 pushes the percentage of matches in the 70s.
Overall these results show that multitask and pipeline models are able to translate and simplify, but that they do not yet fully succeed at precisely controlling the complexity of their output to match a specific target reading grade.

Ablation Experiments
Table 7 shows the impact of different training data types on the multitask model using ablation experiments.OPUS improves BLEU and SARI performance across the board.However, using OPUS without any Newsela MT data (Row 4) hurts the correlation score, indicating the importance of in-domain MT data to control complex-ity.The difference between the performance when using joint translation and simplification (MT+S) examples (Row 2) vs. simplification only (S in Row 3) is small in terms of BLEU (+0.11) and PCC (0.012), indicating that the monolingual simplification dataset can provide simplification supervision when MT+Simplify data is unavailable.The overall best performance for the task is obtained by using all types of training examples.6

Evaluation on Auxiliary Tasks
In addition to complexity controlled MT, the multi-task model can be used to simplify English text, and to translate from Spanish-to-English without changing the complexity.For completeness, we evaluate on these two auxiliary tasks.
Table 8 summarizes the results: the multitask model slightly outperforms a dedicated simplification model on English simplification, showing the benefits of the additional training data from other tasks.By contrast, on the resource-rich MT task, the standalone translation system performs better.This can be explained by the fact that the standalone system is only responsible for text translation, while the multi-task model is exposed to samples of more diverse complexity levels during training which damage its ability to preserve complexity.

Provenance of Reading Grade Level
Our models control complexity using the gold reading grade level assigned by professional Newsela editors.We investigate the impact of replacing these gold labels by automatic predictions from the ARI metric.ARI can be computed for any English segment, including for MT parallel corpora that are not annotated for complexity.with ARI grades, including the OPUS parallel corpus, hurts BLEU.We attribute this result to the differences in length and number of sentences per segment in OPUS vs. Newsela (Table 2): segments of vastly different lengths can have the same ARI score (Equation 4), thus confusing the multitask model.

Conclusion
We introduce a new task that aims to control complexity in machine translation output, as a proxy for producing translations targeted at audiences with different reading proficiency levels.We construct a Spanish-English dataset drawn from the Newsela corpus for training and evaluation, and adopt a sequence-to-sequence model trained in a multitask fashion.
We show that the multitask model improves performance over translation and simplification pipelines, according to both machine translation and simplification metrics.The reading grade level of the multi-task outputs correlate better with target grade levels than with pipeline outputs.Analysis shows that these benefits come from their ability to combine larger training data from different tasks.Manual inspection also shows that the multi-task model successfully produces different translations for increasingly lower grades given the same Spanish input.
However, even when simplifying translations, multitask models are not yet able to exactly match the desired complexity level, and the gap between the complexity achieved and the target complexity increases with the amount of simplification required.Our datasets and models thus provide a foundation to investigate strategies for a tighter control on output complexity in future work.A Supplemental Material    12 El gobierno federal realizó un contrato con el centro de detención juvenil en Vicennes, Indiana, desde el año 2004 hasta el año 2010, para que alojara a aquellos niños inmigrantes considerados como los más peligrosos.9 The federal government conducted a contract with the youth detention center at Vicennes, Indiana, since 2004 to 2010 to host those immigrant children considered as the most dangerous ones.7 The federal government conducted a contract with the youth detention center at Vicennes, Indiana, since 2004 to 2010, so that they hosted those immigrant children considered as the most dangerous ones.5 The federal government made a contract with the youth detention center in Vice-Year.It is in the United States since 2004 to 2010. 3 The government made a deal with the youth detention center in Vice-Year.It is in the United States since 2004 to 2010.It was to host those immigrant children as the most dangerous in 2010.
12 Poco después de que el dron despegó durante la prueba de Verizon en Cape May, una "aeronave de seguimiento" salió tras él, para garantizar que el dron pudiera evadir otros aviones en caso de volar dentro de un espacio aéreo designado.9 Shortly after the drone took off from Verizon in Cape May, a "unmanned aircraft following" came out after him, to ensure that the drone could evade other planes in case of flying inside a designated airspace.6 Shortly after the drone took off in Cape May, a "unmanned aircraft force" came out behind him.They could ensure that the drone could evade other planes in case of flying inside a designated airspace.2 Not long after the drone took off from Verizon in Cape May, a "unmanned aircraft help" came out after him.It is to make sure the drone could evade other planes in case of flying inside a air space.
Table 13: Examples of simplification operations observed when simplifying from a higher Grade level into different lower grade levels using the multitask model (All tasks).
Supporters of the reserve say it sets an example for multiple countries working together to protect a large area of ocean.The area is not controlled by any single nation.
Supporters of the reserve say that marks a precedent for many countries that work together to protect a large portion of the ocean.
The parents did not react well to their daughters surfing.They worried about the danger and what other people would think.
Parents did not want to hear that their daughters were surfing their daughters because they were worried about the danger and their reputation. 12 1 1 Researchers can request the bilingual Newsela data arXiv:1911.00835v1[cs.CL] 3 Nov 2019

Table 1 :
Cross-lingual Newsela examples: the Spanish text s i of complexity, or reading grade level, c i is automatically aligned to English text s o of c o .Simplification transformations range from sentence splitting and deletions to paraphrasing and lexical substitution.

Table 2 :
Comparisons of the English-Spanish Newsela corpus with machine translation corpora from OPUS drawn from Global Voices and News Commentary.

Table 3 :
Grade level Statistics of the Newsela Spanish-English corpus.Vocabulary size decreases with the reading grade level.Simpler segments contain fewer sentences and are often shorter that complex segments. ).
Table4: Compared to pipeline models, multitask models produce complexity controlled translations that better match human references (BLEU), that are simpler (SARI), and whose resulting complexity correlates better with the target grade level (PCC).Pipeline models are trained on Newsela Simplification data and MT parallel data from Newsela and OPUS."Translate and Simplify" uses the exact same data in a multi-task model.The "All tasks" model uses all data available, including Newsla MT+Simplify examples.

Table 5 :
Example of multi-task model outputs when translating grade 12 Spanish into increasingly simpler English.

Table 8 :
Evaluation on auxiliary tasks: Multitask models trained on both the translation and simplification dataset improves the performance for the task of English Simplification.

Table 9 :
Complexity controlled MT with automatic vs. manual reading grade level tags: ARI provides an adequate substitute for manually assigned grade levels.
Table10and 11 provides the statistics of grade pair distribution in the Newsela English and Newsela Spanish-English dataset.

Table 10 :
Number of text segments per grade level pair in our Newsela English Corpus

Table 11 :
Number of text segments per grade level pair in our Newsela English-Spanish Corpus

Table 12 :
Scarton and Specia (2018)y published results on Newsela English text simplification.Our implementation yields BLEU and SARI score that are close to those reported inScarton and Specia (2018).The difference in Flesch score can be attributed to changes in the number and complexity of articles available in newsela at the time the datasets were extracted.12Se estima que 75 personas han expresado interés en alojar al menos a un solicitante de asilo, dijo Cronk.Algunas de estas personas viven en las principales áreas metropolitanas de California y Nueva York.Otros son de zonas remotas y rurales de Montana y Dakota del Norte.8 An estimated 75 people have expressed interest in hosting at least one asylum-seeker said Cronk.Some of these people live in major California and New York area.Others are from remote and rural areas of Montana and North Dakota.6 An estimated 75 people have expressed interest in hosting at least one asylum-seeker said Cronk.Some of these people live in major California and New York area.Others are from remote and rural areas of Montana and North Dakota.4 An estimated 75 people have expressed interest in hosting at least one asylum-seeker said Cronk.Some of these people live in the main areas of California and New York.Others are from remote and rural areas of Montana and North Dakota. 2 Many people live in the United States and New York City.Some are from remote areas of Montana and North Dakota.

Table 14 :
Example translations produced by our best multitask model.Refer Table 7 (Row 1).