A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size

Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, Masashi Toyoda


Abstract
In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.
Anthology ID:
W17-5708
Volume:
Proceedings of the 4th Workshop on Asian Translation (WAT2017)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Toshiaki Nakazawa, Isao Goto
Venue:
WAT
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
99–109
Language:
URL:
https://aclanthology.org/W17-5708
DOI:
Bibkey:
Cite (ACL):
Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, and Masashi Toyoda. 2017. A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), pages 99–109, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size (Neishi et al., WAT 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5708.pdf
Code
 nem6ishi/wat17
Data
ASPEC