CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling

Kathryn Chapman, Johannes Bernhard, Dietrich Klakow


Abstract
We present our submission and results for SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) where we participated in offensive tweet classification tasks in English, Arabic, Greek, Turkish and Danish. Our approach included classical machine learning architectures such as support vector machines and logistic regression combined in an ensemble with a multilingual transformer-based model (XLM-R). The transformer model is trained on all languages combined in order to create a fully multilingual model which can leverage knowledge between languages. The machine learning model hyperparameters are fine-tuned and the statistically best performing ones included in the final ensemble.
Anthology ID:
2020.semeval-1.252
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1916–1924
Language:
URL:
https://aclanthology.org/2020.semeval-1.252
DOI:
10.18653/v1/2020.semeval-1.252
Bibkey:
Cite (ACL):
Kathryn Chapman, Johannes Bernhard, and Dietrich Klakow. 2020. CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1916–1924, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling (Chapman et al., SemEval 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.semeval-1.252.pdf