CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media

Sayanta Paul, Sriparna Saha, Mohammed Hasanuzzaman


Abstract
The SemEval-2020 Task 12 (OffensEval) challenge focuses on detection of signs of offensiveness using posts or comments over social media. This task has been organized for several languages, e.g., Arabic, Danish, English, Greek and Turkish. It has featured three related sub-tasks for English language: sub-task A was to discriminate between offensive and non-offensive posts, the focus of sub-task B was on the type of offensive content in the post and finally, in sub-task C, proposed systems had to identify the target of the offensive posts. The corpus for each of the languages is developed using the posts and comments over Twitter, a popular social media platform. We have participated in this challenge and submitted results for different languages. The current work presents different machine learning and deep learning techniques and analyzes their performance for offensiveness prediction which involves various classifiers and feature engineering schemes. The experimental analysis on the training set shows that SVM using language specific pre-trained word embedding (Fasttext) outperforms the other methods. Our system achieves a macro-averaged F1 score of 0.45 for Arabic language, 0.43 for Greek language and 0.54 for Turkish language.
Anthology ID:
2020.semeval-1.253
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1925–1931
Language:
URL:
https://aclanthology.org/2020.semeval-1.253
DOI:
10.18653/v1/2020.semeval-1.253
Bibkey:
Cite (ACL):
Sayanta Paul, Sriparna Saha, and Mohammed Hasanuzzaman. 2020. CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1925–1931, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media (Paul et al., SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.253.pdf