A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation

Matthias Huck, Martin Ratajczak, Patrick Lehnen, Hermann Ney


Abstract
In this work we give a detailed comparison of the impact of the integration of discriminative and trigger-based lexicon models in state-of-the-art hierarchical and conventional phrase-based statistical machine translation systems. As both types of extended lexicon models can grow very large, we apply certain restrictions to discard some of the less useful information. We show how these restrictions facilitate the training of the extended lexicon models. We finally evaluate systems that incorporate both types of models with different restrictions on a large-scale translation task for the Arabic-English language pair. Our results suggest that extended lexicon models can be substantially reduced in size while still giving clear improvements in translation performance.
Anthology ID:
2010.amta-papers.32
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-papers.32
DOI:
Bibkey:
Cite (ACL):
Matthias Huck, Martin Ratajczak, Patrick Lehnen, and Hermann Ney. 2010. A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
A Comparison of Various Types of Extended Lexicon Models for Statistical Machine Translation (Huck et al., AMTA 2010)
Copy Citation:
PDF:
https://aclanthology.org/2010.amta-papers.32.pdf