Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection

Ibrahim Abu Farha; Walid Magdy

Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection

Abstract

The introduction of transformer-based language models has been a revolutionary step for natural language processing (NLP) research. These models, such as BERT, GPT and ELECTRA, led to state-of-the-art performance in many NLP tasks. Most of these models were initially developed for English and other languages followed later. Recently, several Arabic-specific models started emerging. However, there are limited direct comparisons between these models. In this paper, we evaluate the performance of 24 of these models on Arabic sentiment and sarcasm detection. Our results show that the models achieving the best performance are those that are trained on only Arabic data, including dialectal Arabic, and use a larger number of parameters, such as the recently released MARBERT. However, we noticed that AraELECTRA is one of the top performing models while being much more efficient in its computational cost. Finally, the experiments on AraGPT2 variants showed low performance compared to BERT models, which indicates that it might not be suitable for classification tasks.

Anthology ID:: 2021.wanlp-1.3
Volume:: Proceedings of the Sixth Arabic Natural Language Processing Workshop
Month:: April
Year:: 2021
Address:: Kyiv, Ukraine (Virtual)
Editors:: Nizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21–31
Language:
URL:: https://aclanthology.org/2021.wanlp-1.3/
DOI:
Bibkey:
Cite (ACL):: Ibrahim Abu Farha and Walid Magdy. 2021. Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 21–31, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
Cite (Informal):: Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection (Abu Farha & Magdy, WANLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.wanlp-1.3.pdf
Data: ArSarcasm, ArSarcasm-v2, OSCAR

PDF Cite Search Fix data