Offensive language detection in Arabic using ULMFiT

Mohamed Abdellatif, Ahmed Elgammal


Abstract
In this paper, we approach the shared task OffenseEval 2020 by Mubarak et al. (2020) using ULMFiT Howard and Ruder (2018) pre-trained on Arabic Wikipedia Khooli (2019) which we use as a starting point and use the target data-set to fine-tune it. The data set of the task is highly imbalanced. We train forward and backward models and ensemble the results. We report confusion matrix, accuracy, precision, recall and F1 of the development set and report summarized results of the test set. Transfer learning method using ULMFiT shows potential for Arabic text classification. Mubarak, K. Darwish,W. Magdy, T. Elsayed, and H. Al-Khalifa. Overview of osact4 arabic offensive language detection shared task. 4, 2020. Howard and S. Ruder. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018. Khooli. Applied data science. https://github.com/abedkhooli/ds2, 2019.
Anthology ID:
2020.osact-1.13
Volume:
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak
Venue:
OSACT
SIG:
Publisher:
European Language Resource Association
Note:
Pages:
82–85
Language:
English
URL:
https://aclanthology.org/2020.osact-1.13
DOI:
Bibkey:
Cite (ACL):
Mohamed Abdellatif and Ahmed Elgammal. 2020. Offensive language detection in Arabic using ULMFiT. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 82–85, Marseille, France. European Language Resource Association.
Cite (Informal):
Offensive language detection in Arabic using ULMFiT (Abdellatif & Elgammal, OSACT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.osact-1.13.pdf
Code
 abedkhooli/ds2