Learning Representations for Detecting Abusive Language

Magnus Sahlgren, Tim Isbister, Fredrik Olsson


Abstract
This paper discusses the question whether it is possible to learn a generic representation that is useful for detecting various types of abusive language. The approach is inspired by recent advances in transfer learning and word embeddings, and we learn representations from two different datasets containing various degrees of abusive language. We compare the learned representation with two standard approaches; one based on lexica, and one based on data-specific n-grams. Our experiments show that learned representations do contain useful information that can be used to improve detection performance when training data is limited.
Anthology ID:
W18-5115
Volume:
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, Jacqueline Wernimont
Venue:
ALW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–123
Language:
URL:
https://aclanthology.org/W18-5115
DOI:
10.18653/v1/W18-5115
Bibkey:
Cite (ACL):
Magnus Sahlgren, Tim Isbister, and Fredrik Olsson. 2018. Learning Representations for Detecting Abusive Language. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 115–123, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Learning Representations for Detecting Abusive Language (Sahlgren et al., ALW 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5115.pdf