Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin


Abstract
Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated on three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.
Anthology ID:
2020.coling-main.471
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
5397–5408
Language:
URL:
https://aclanthology.org/2020.coling-main.471
DOI:
10.18653/v1/2020.coling-main.471
Bibkey:
Cite (ACL):
Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, and Lori Levin. 2020. Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5397–5408, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations (Zhao et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.471.pdf