Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation

Kenji Imamura, Atsushi Fujita, Eiichiro Sumita


Abstract
A large-scale parallel corpus is required to train encoder-decoder neural machine translation. The method of using synthetic parallel texts, in which target monolingual corpora are automatically translated into source sentences, is effective in improving the decoder, but is unreliable for enhancing the encoder. In this paper, we propose a method that enhances the encoder and attention using target monolingual corpora by generating multiple source sentences via sampling. By using multiple source sentences, diversity close to that of humans is achieved. Our experimental results show that the translation quality is improved by increasing the number of synthetic source sentences for each given target sentence, and quality close to that using a manually created parallel corpus was achieved.
Anthology ID:
W18-2707
Volume:
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venues:
ACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–63
URL:
https://www.aclweb.org/anthology/W18-2707
DOI:
10.18653/v1/W18-2707
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/W18-2707.pdf