Semi-supervised Fine-grained Approach for Arabic dialect detection task

Nitin Nikamanth Appiah Balaji, Bharathi B


Abstract
Arabic being a language with numerous different dialects, it becomes extremely important to device a technique to distinguish each dialect efficiently. This paper focuses on the fine-grained country level and province level classification of Arabic dialects. The experiments in this paper are submissions done to the NADI 2020 shared Dialect detection task. Various text feature extraction techniques such as TF-IDF, AraVec, multilingual BERT and Fasttext embedding models are studied. We thereby, propose an approach of text embedding based model with macro average F1 score of 0.2232 for task1 and 0.0483 for task2, with the help of semi supervised learning approach.
Anthology ID:
2020.wanlp-1.25
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
257–261
Language:
URL:
https://aclanthology.org/2020.wanlp-1.25
DOI:
Bibkey:
Cite (ACL):
Nitin Nikamanth Appiah Balaji and Bharathi B. 2020. Semi-supervised Fine-grained Approach for Arabic dialect detection task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 257–261, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
Semi-supervised Fine-grained Approach for Arabic dialect detection task (Appiah Balaji & B, WANLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.wanlp-1.25.pdf