Character-level Chinese-English Translation through ASCII Encoding

Nikola I. Nikolov, Yuhuang Hu, Mi Xue Tan, Richard H.R. Hahnloser


Abstract
Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.
Anthology ID:
W18-6302
Volume:
Proceedings of the Third Conference on Machine Translation: Research Papers
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–16
Language:
URL:
https://aclanthology.org/W18-6302
DOI:
10.18653/v1/W18-6302
Bibkey:
Cite (ACL):
Nikola I. Nikolov, Yuhuang Hu, Mi Xue Tan, and Richard H.R. Hahnloser. 2018. Character-level Chinese-English Translation through ASCII Encoding. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 10–16, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Character-level Chinese-English Translation through ASCII Encoding (Nikolov et al., WMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6302.pdf
Code
 duguyue100/wmt-en2wubi