A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size

Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, Masashi Toyoda


Abstract
In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.
Anthology ID:
W17-5708
Volume:
Proceedings of the 4th Workshop on Asian Translation (WAT2017)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Venues:
WAT | WS
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
99–109
URL:
https://www.aclweb.org/anthology/W17-5708
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/W17-5708.pdf