GTCOM Neural Machine Translation Systems for WMT20

Chao Bei, Hao Zong, Qingmin Liu, Conghu Yuan


Abstract
This paper describes the Global Tone Communication Co., Ltd.’s submission of the WMT20 shared news translation task. We participate in four directions: English to (Khmer and Pashto) and (Khmer and Pashto) to English. Further, we get the best BLEU scores in the directions of English to Pashto, Pashto to English and Khmer to English (13.1, 23.1 and 25.5 respectively) among all the participants. Our submitted systems are unconstrained and focus on mBART (Multilingual Bidirectional and Auto-Regressive Transformers), back-translation and forward-translation. Also, we apply rules, language model and RoBERTa model to filter monolingual, parallel sentences and synthetic sentences. Besides, we validate the difference of the vocabulary built from monolingual data and parallel data.
Anthology ID:
2020.wmt-1.6
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Editors:
Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–104
Language:
URL:
https://aclanthology.org/2020.wmt-1.6
DOI:
Bibkey:
Cite (ACL):
Chao Bei, Hao Zong, Qingmin Liu, and Conghu Yuan. 2020. GTCOM Neural Machine Translation Systems for WMT20. In Proceedings of the Fifth Conference on Machine Translation, pages 100–104, Online. Association for Computational Linguistics.
Cite (Informal):
GTCOM Neural Machine Translation Systems for WMT20 (Bei et al., WMT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wmt-1.6.pdf
Video:
 https://slideslive.com/38939603
Data
FLoRes