Optimizing Statistical Machine Translation for Text Simplification

Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, Chris Callison-Burch


Abstract
Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.
Anthology ID:
Q16-1029
Volume:
Transactions of the Association for Computational Linguistics, Volume 4
Month:
Year:
2016
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
401–415
Language:
URL:
https://aclanthology.org/Q16-1029
DOI:
10.1162/tacl_a_00107
Bibkey:
Cite (ACL):
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. 2016. Optimizing Statistical Machine Translation for Text Simplification. Transactions of the Association for Computational Linguistics, 4:401–415.
Cite (Informal):
Optimizing Statistical Machine Translation for Text Simplification (Xu et al., TACL 2016)
Copy Citation:
PDF:
https://aclanthology.org/Q16-1029.pdf
Code
 cocoxu/simplification
Data
TurkCorpus