Neural Text Simplification in Low-Resource Conditions Using Weak Supervision

Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi


Abstract
Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.
Anthology ID:
W19-2305
Volume:
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Antoine Bosselut, Asli Celikyilmaz, Marjan Ghazvininejad, Srinivasan Iyer, Urvashi Khandelwal, Hannah Rashkin, Thomas Wolf
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
37–44
Language:
URL:
https://aclanthology.org/W19-2305
DOI:
10.18653/v1/W19-2305
Bibkey:
Cite (ACL):
Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, and Mattia A. Di Gangi. 2019. Neural Text Simplification in Low-Resource Conditions Using Weak Supervision. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pages 37–44, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Neural Text Simplification in Low-Resource Conditions Using Weak Supervision (Palmero Aprosio et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2305.pdf
Data
Newsela