Simultaneous Translation and Paraphrase for Language Education

Stephen Mayhew, Klinton Bicknell, Chris Brust, Bill McDowell, Will Monroe, Burr Settles


Abstract
We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE). Given a prompt in one language, the goal is to generate a diverse set of correct translations that language learners are likely to produce. This is motivated by the need to create and maintain large, high-quality sets of acceptable translations for exercises in a language-learning application, and synthesizes work spanning machine translation, MT evaluation, automatic paraphrasing, and language education technology. We developed a novel corpus with unique properties for five languages (Hungarian, Japanese, Korean, Portuguese, and Vietnamese), and report on the results of a shared task challenge which attracted 20 teams to solve the task. In our meta-analysis, we focus on three aspects of the resulting systems: external training corpus selection, model architecture and training decisions, and decoding and filtering strategies. We find that strong systems start with a large amount of generic training data, and then fine-tune with in-domain data, sampled according to our provided learner response frequencies.
Anthology ID:
2020.ngt-1.28
Volume:
Proceedings of the Fourth Workshop on Neural Generation and Translation
Month:
July
Year:
2020
Address:
Online
Editors:
Alexandra Birch, Andrew Finch, Hiroaki Hayashi, Kenneth Heafield, Marcin Junczys-Dowmunt, Ioannis Konstas, Xian Li, Graham Neubig, Yusuke Oda
Venue:
NGT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
232–243
Language:
URL:
https://aclanthology.org/2020.ngt-1.28
DOI:
10.18653/v1/2020.ngt-1.28
Bibkey:
Cite (ACL):
Stephen Mayhew, Klinton Bicknell, Chris Brust, Bill McDowell, Will Monroe, and Burr Settles. 2020. Simultaneous Translation and Paraphrase for Language Education. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 232–243, Online. Association for Computational Linguistics.
Cite (Informal):
Simultaneous Translation and Paraphrase for Language Education (Mayhew et al., NGT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ngt-1.28.pdf
Code
 duolingo/duolingo-sharedtask-2020
Data
Duolingo STAPLE Shared Task