Unequal Representations: Analyzing Intersectional Biases in Word Embeddings Using Representational Similarity Analysis

We present a new approach for detecting human-like social biases in word embeddings using representational similarity analysis. Specifically, we probe contextualized and non-contextualized embeddings for evidence of intersectional biases against Black women. We show that these embeddings represent Black women as simultaneously less feminine than White women, and less Black than Black men. This finding aligns with intersectionality theory, which argues that multiple identity categories (such as race or sex) layer on top of each other in order to create unique modes of discrimination that are not shared by any individual category.


Introduction
Word embeddings are ubiquitous components of modern natural language processing systems, and have enabled performance increases on a wide variety of NLP tasks (Devlin et al., 2019;Peters et al., 2018;Pennington et al., 2014). They provide vector representations of words, and are trained on a large corpus of text in order to capture the distributional semantics of lexical items. Many studies have shown that these embeddings also encode a variety of dangerous social biases, as these biases are present in naturally-occurring English text (see Mehrabi et al. (2019) for a survey of such work).
The present work studies intersectional biases in both contextualized and non-contextualized word embeddings. The theory of intersectionality states that multiple different aspects of a person's identity often combine to create unique modes of discrimination (Crenshaw, 1989;Crenshaw, 1990). For example, specific biases against Black women are not necessarily shared by either White women or Black men. Furthermore, discourses on discrimination and public health typically focus on sexism or racism, each implicitly excluding the other (Crenshaw, 1989;Bowleg, 2012).
It is further argued that these discourses are focused on the most privileged members of a group (Crenshaw, 1989). Thus, we might expect that the ostensibly race-neutral category 'Woman' is implicitly White, and that the ostensibly non-gendered category 'Black' is implicitly male. Using representational similarity analysis (RSA; Kriegeskorte et al. (2008)), we find that both non-contextualized GloVe embeddings (Pennington et al., 2014) and contextualized BERT embeddings (Devlin et al., 2019) display exactly these biases. We also find that the BERT embeddings of the names of Black women are semantically impoverished, compared to the embeddings of the names of White women.
Related Work: Many recent studies have sought to understand social biases in NLP systems. Notably, Caliskan et al. (2017) showed that the human biases revealed through the Implicit Association Test  are also present in word embeddings, prompting other studies that seek to detect bias in word embeddings (Kurita et al., 2019;Garg et al., 2018). Other studies have attempted to detect bias in other NLP systems (Rudinger et al., 2018), to automatically discover and measure intersectional biases in word embeddings (Guo and Caliskan, 2020), to analyze the distributions of gendered words in training sets Rudinger et al., 2017), and to automatically mitigate the effects of social biases on NLP systems (Sun et al., 2019). Another study has produced empirical evidence for the claims of intersectionality theory by leveraging distributional representations (Herbelot et al., 2012).
Perhaps most relevant, some prior work has attempted to understand whether intersectional differences are present in contextualized word embeddings by trying to detect whether these embeddings encoded intersectional stereotypes (May et al., 2019;Tan and Celis, 2019). However, detecting these stereotypes is only a proxy for detecting intersectional biases in embeddings: An embedding might encode intersectional differences without encoding more complex interesectional stereotypes. This work presents a method that leverages representational similarity analysis to directly probe word embeddings for intersectional biases. 1 2 Methods: Representational Similarity Analysis Representational similarity analysis (RSA) is a technique first used in cognitive neuroscience for analyzing distributed activity patterns in the brain (Kriegeskorte et al., 2008), but recent work has used it to analyze artificial neural systems (Lepori and McCoy, 2020;Chrupała, 2019;Chrupała and Alishahi, 2019;Abnar et al., 2019;Bouchacourt and Baroni, 2018). RSA analyzes the representational geometry of a system, which is defined by the pairwise dissimilarities between representations of a set of stimuli. Specifically, we use RSA to compare the representational geometry of the word embeddings under study to the representational geometries of well-understood hypothesis models. If the representational geometries are similar, you can infer that the word embeddings represent the particular information expressed by a hypothesis model.
Applying RSA: The approach proceeds as follows: First, we create a corpus that consists of three types of items: group 1 items, G 1 , group 2 items, G 2 , and concept items, C. Next, we define a reference model M Ref , which consists of the set of embeddings that we are interested in. Then, we define two hypothesis models, M Hyp 1 and M Hyp 2 . M Hyp 1 instantiates the hypothesis that: 'Group 1 is associated with the concept under study, while Group 2 is not'. Likewise M Hyp 2 instantiates the hypothesis that 'Group 2 is associated with the concept under study, while Group 1 is not'. We then draw a sample c, consisting of n 1 items from G 1 , n 2 items from G 2 , and n 3 items from C, and calculate the (n 1 +n 2 +n 3 )×(n 1 +n 2 +n 3 ) representational geometries R of each model using a dissimilarity metric D. In all analyses, n 1 = n 2 = n 3 = 10.
We define D = 1 − Spearman s ρ for M Ref . For M Hyp 1 , the dissimilarity metric is equal to 1 if one item is from G 2 and the other item is from G 1 or C, and 0 otherwise. Likewise, the dissimilarity metric for M Hyp 2 is equal to 1 is one item is from G 1 and the other item is from G 2 or C, and 0 otherwise. Clearly, these hypothesis models are not expected to provide a very good fit to the representational geometry of a set of word embeddings. However, if one hypothesis model exhibits a significantly better fit than the other, then there is evidence for a problematic bias.
Finally, we calculate the similarity, s, between the representational geometries of our hypothesis models and our reference model using a similarity metric, sim. Because the R matrices are symmetric, sim only operates on the upper triangle of each R matrix. We define sim = Spearman s ρ.
We then repeat the process on 100 samples from our corpus in order to create two 100-length vectors of representational similarities, S Hyp 1 and S Hyp 2 . We can then apply a nonparametric sign test to the difference of these vectors, S Hyp 1 − S Hyp 2 , in order to test for a consistent difference between measurements of s Hyp 1 and s Hyp 2 . See Figure 1 for a visual summary of this approach.
(1) Sample of Stimuli  Figure 1: A summary of our approach, applied to BERT embeddings.

Data
In order to perform the analyses described in Section 2, we need to create sets of group items for all of the groups under study, as well as concept items for the identity concepts under study. For analyses involving GloVe embeddings, these glossaries can only contain single-word entries. For analyses involving BERT embeddings, these entries can be full sentences. See Appendix A for more details about all of the following datasets.

Single-Word Sets
Group Items: For our single-word group items for the Black women, Black men, White women, and White men groups, we use the dataset curated in Sweeney (2013). This dataset includes names commonly associated with all of these identity groups. We rely on name data because identity categories like Black woman do not have single word terms (May et al., 2019). See Section 5 for a discussion of the limitations of using name data.
Concept Items: For our single-word, ostensibly race-neutral concept items that are associated with the female identity category, we combine the datasets from Nosek et al. (2002a) and . The United States Census Bureau defines the Black/African American identity category as "all individuals who identify with one or more nationalities or ethnic groups originating in any of the black racial groups of Africa" (Census Bureau, 2020). They proceed to non-exhaustively list several nationalities that meet this definition. All 10 single-word country names that correspond to the nationalities listed are included in our set of single-word, ostensibly gender-neutral concept items that are associated with the Black identity category. We also include the word Black, as well as the word Africa, in keeping with the Census Bureau's definition. See Section 5 for a discussion of the limitations of this approach.

Multi-Word Sets
We leverage the semantically bleached sentence templates introduced by May et al. (2019) in order to generate our sentence data. These sentences "make heavy use of deixis and are designed to convey little specific meaning beyond that of the terms inserted into them". Some examples include: "This is a <word/phrase>" and "The <word/phrase> is here".

GloVe Experiments
GloVe embeddings are non-contextualized word embeddings, and so they represent information from one word, ignoring the specific context in which they are found. These experiments seek to test whether RSA can be used to identify intersectional biases in GloVe embeddings. We use 300-dimensional embeddings trained on the Wikipedia 2014 + Gigaword 5 corpus throughout this study.
Black Female Name Embedding Check First, we ensure that the GloVe embeddings of the names of Black women contain sufficient semantic detail, such that more in-depth experiments can be performed. Our concept items are a set of words associated with the female identity category. Our Group 1 items are a set of proper names associated with Black women, and our Group 2 items are a set of proper names associated with Black men. See Section 3 for details about these sets. We would expect that the reference model will display greater representational similarity to the hypothesis model that associates female attributes to Black female names. From Table 1 we see that this happens.
Female Concept We investigate whether the GloVe embeddings of the names of Black women encode the 'female' concept less than the embeddings of the names of White women. Our concept items are a set of words associated with the female identity category. Our Group 1 items are a set of proper names associated with Black women, and our Group 2 items are a set of proper names associated with White women. See Section 3 for details about these sets. From Table 1, we see that the reference model exhibits greater representational similarity to the hypothesis model that associates the concept items to White female names.

Black Concept
We investigate whether the GloVe embeddings of the names of Black women encode the 'Black' concept less than the embeddings of the names of Black men. Our concept items are a set of words associated with the Black identity category. Our Group 1 items are a set of proper names associated with Black women, and our Group 2 items are a set of proper names associated with Black men. From Table 1, we see that the reference model exhibits greater representational similarity to the hypothesis model that associates the concept items with Black male names.

BERT Experiments
BERT embeddings are contextualized word embeddings, and so they represent information from a word in context. This allows us to analyze group terms, using the semantically-bleached sentence data from May et al. (2019), which are designed to convey little meaning beyond that of the group term that we insert into them. We also attempt to reproduce our findings using the single-word proper name data used in Section 4.1. We use the pretrained BERT-base-uncased model for all multi-word studies.
Black Female Name Embedding Check: First, we use the pretrained BERT-base-cased model to attempt to reproduce our results from Section 4.1. However, we see from

Limitations
One notable limitation of this work is that we study artificial binary gender and race distinctions. Unfortunately, these binary distinctions exclude many individuals from our analysis. Future work should expand upon the methodological framework presented here in order to assess biases against groups that are not represented by these distinctions. Furthermore, Gaddis (2017) describes several limitations of relying on name data to represent racial categories. Perhaps most importantly, that work demonstrates that 'Black names' are not one monolithic category, that perceptions of the Blackness of names are correlated with other confounding variables (such as education status), and that the associations between identity categories and proper names are only loosely reflected in real-world naming practices. Furthermore, we rely on an incomplete set of country names provided by the Census Bureau to generate concept sets for the Black identity category. We acknowledge that this excludes many nationalities from our analysis, and also acknowledge that any attempt to exhaustively list nations associated with the Black identity category is destined to fail. We hope that our results inspire future work that aspires to more robust coverage of the Black identity group. Finally, it is important to note that this method can only provide positive evidence: If this method does not detect intersectional biases for a particular system, that does not provide evidence that the system does not contain such biases.

Conclusion
Both contextualized and non-contextualized word embeddings learn to represent human-like biases. We apply RSA to the task of detecting these biases, and find that both GloVe embeddings and BERT embeddings reproduce (and thus perpetuate) the dual marginalization of Black women within both the Black community and female community, as predicted by the theory of intersectionality. Specifically, we showed that word embeddings represent the female identity category as implicitly White, and the Black identity category as implicitly masculine. To our knowledge, this is the first method that can probe word embeddings for these intersectional biases directly. Aside from addressing the issues discussed in Section 5, future work should investigate whether this method can be used to detect other sorts of intersectional biases, particularly those facing the LGBTQ community.