Multilingual AMR-to-Text Generation

Angela Fan, Claire Gardent


Abstract
Generating text from structured data is challenging because it requires bridging the gap between (i) structure and natural language (NL) and (ii) semantically underspecified input and fully specified NL output. Multilingual generation brings in an additional challenge: that of generating into languages with varied word order and morphological properties. In this work, we focus on Abstract Meaning Representations (AMRs) as structured input, where previous research has overwhelmingly focused on generating only into English. We leverage advances in cross-lingual embeddings, pretraining, and multilingual models to create multilingual AMR-to-text models that generate in twenty one different languages. Our multilingual models surpass baselines that generate into one language in eighteen languages, based on automatic metrics. We analyze the ability of our multilingual models to accurately capture morphology and word order using human evaluation, and find that native speakers judge our generations to be fluent.
Anthology ID:
2020.emnlp-main.231
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2889–2901
Language:
URL:
https://aclanthology.org/2020.emnlp-main.231
DOI:
10.18653/v1/2020.emnlp-main.231
Bibkey:
Cite (ACL):
Angela Fan and Claire Gardent. 2020. Multilingual AMR-to-Text Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2889–2901, Online. Association for Computational Linguistics.
Cite (Informal):
Multilingual AMR-to-Text Generation (Fan & Gardent, EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.231.pdf
Video:
 https://slideslive.com/38938791
Data
CCNet