An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis

Leila Moudjari; Karima Akli-Astouati; Farah Benamara

An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis

Leila Moudjari, Karima Akli-Astouati, Farah Benamara

Abstract

In this paper, we address the lack of resources for opinion and emotion analysis related to North African dialects, targeting Algerian dialect. We present TWIFIL (TWItter proFILing) a collaborative annotation platform for crowdsourcing annotation of tweets at different levels of granularity. The plateform allowed the creation of the largest Algerian dialect dataset annotated for both sentiment (9,000 tweets), emotion (about 5,000 tweets) and extra-linguistic information including author profiling (age and gender). The annotation resulted also in the creation of the largest Algerien dialect subjectivity lexicon of about 9,000 entries which can constitute a valuable resources for the development of future NLP applications for Algerian dialect. To test the validity of the dataset, a set of deep learning experiments were conducted to classify a given tweet as positive, negative or neutral. We discuss our results and provide an error analysis to better identify classification errors.

Anthology ID:: 2020.lrec-1.151
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1202–1210
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.151
DOI:
Bibkey:
Cite (ACL):: Leila Moudjari, Karima Akli-Astouati, and Farah Benamara. 2020. An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1202–1210, Marseille, France. European Language Resources Association.
Cite (Informal):: An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis (Moudjari et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.151.pdf

PDF Cite Search