HausaMT v1.0: Towards English–Hausa Neural Machine Translation

Adewale Akinfaderin


Abstract
Neural Machine Translation (NMT) for low-resource languages suffers from low performance because of the lack of large amounts of parallel data and language diversity. To contribute to ameliorating this problem, we built a baseline model for English–Hausa machine translation, which is considered a task for low–resource language. The Hausa language is the second largest Afro–Asiatic language in the world after Arabic and it is the third largest language for trading across a larger swath of West Africa countries, after English and French. In this paper, we curated different datasets containing Hausa–English parallel corpus for our translation. We trained baseline models and evaluated the performance of our models using the Recurrent and Transformer encoder–decoder architecture with two tokenization approaches: standard word–level tokenization and Byte Pair Encoding (BPE) subword tokenization.
Anthology ID:
2020.winlp-1.38
Volume:
Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Editors:
Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
144–147
Language:
URL:
https://aclanthology.org/2020.winlp-1.38
DOI:
10.18653/v1/2020.winlp-1.38
Bibkey:
Cite (ACL):
Adewale Akinfaderin. 2020. HausaMT v1.0: Towards English–Hausa Neural Machine Translation. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 144–147, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
HausaMT v1.0: Towards English–Hausa Neural Machine Translation (Akinfaderin, WiNLP 2020)
Copy Citation:
Video:
 http://slideslive.com/38929578