Improving NMT Quality Using Terminology Injection

Duane K. Dougal, Deryle Lonsdale


Abstract
Many organizations use domain- or organization-specific words and phrases. This paper explores the use of vetted terminology as an input to neural machine translation (NMT) for improved results: ensuring that the translation of individual terms is consistent with an approved multilingual terminology collection. We discuss, implement, and evaluate a method for injecting terminology and for evaluating terminology injection. Our use of the long short-term memory (LSTM) attention mechanism prevalent in state-of-the-art NMT systems involves attention vectors for correctly identifying semantic entities and aligning the tokens that represent them, both in the source and the target languages. Appropriate terminology is then injected into matching alignments during decoding. We also introduce a new translation metric more sensitive to approved terminological content in MT output.
Anthology ID:
2020.lrec-1.593
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4820–4827
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.593
DOI:
Bibkey:
Cite (ACL):
Duane K. Dougal and Deryle Lonsdale. 2020. Improving NMT Quality Using Terminology Injection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4820–4827, Marseille, France. European Language Resources Association.
Cite (Informal):
Improving NMT Quality Using Terminology Injection (Dougal & Lonsdale, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.593.pdf