Racial Bias in Hate Speech and Abusive Language Detection Datasets

Thomas Davidson, Debasmita Bhattacharya, Ingmar Weber


Abstract
Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.
Anthology ID:
W19-3504
Volume:
Proceedings of the Third Workshop on Abusive Language Online
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Sarah T. Roberts, Joel Tetreault, Vinodkumar Prabhakaran, Zeerak Waseem
Venue:
ALW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25–35
Language:
URL:
https://aclanthology.org/W19-3504
DOI:
10.18653/v1/W19-3504
Bibkey:
Cite (ACL):
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online, pages 25–35, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Racial Bias in Hate Speech and Abusive Language Detection Datasets (Davidson et al., ALW 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-3504.pdf
Code
 additional community code
Data
Hate Speech and Offensive Language