Mitigating Language-Dependent Ethnic Bias in BERT

Jaimeen Ahn, Alice Oh


Abstract
In this paper, we study ethnic bias and how it varies across languages by analyzing and mitigating ethnic bias in monolingual BERT for English, German, Spanish, Korean, Turkish, and Chinese. To observe and quantify ethnic bias, we develop a novel metric called Categorical Bias score. Then we propose two methods for mitigation; first using a multilingual model, and second using contextual word alignment of two monolingual models. We compare our proposed methods with monolingual BERT and show that these methods effectively alleviate the ethnic bias. Which of the two methods works better depends on the amount of NLP resources available for that language. We additionally experiment with Arabic and Greek to verify that our proposed methods work for a wider variety of languages.
Anthology ID:
2021.emnlp-main.42
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
533–549
Language:
URL:
https://aclanthology.org/2021.emnlp-main.42
DOI:
10.18653/v1/2021.emnlp-main.42
Bibkey:
Cite (ACL):
Jaimeen Ahn and Alice Oh. 2021. Mitigating Language-Dependent Ethnic Bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 533–549, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Mitigating Language-Dependent Ethnic Bias in BERT (Ahn & Oh, EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.42.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.42.mp4
Code
 jaimeenahn/ethnic_bias
Data
XNLI