Learning Gender-Neutral Word Embeddings

Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.


Introduction
Word embedding models have been designed for representing the meaning of words in a vector space. These models have become a fundamental NLP technique and have been widely used in various applications. However, prior studies show that such models learned from humangenerated corpora are often prone to exhibit social biases, such as gender stereotypes (Bolukbasi et al., 2016;Caliskan et al., 2017). For example, the word "programmer" is neutral to gender by its definition, but an embedding model trained on a news corpus associates "programmer" closer with "male" than "female".
Such a bias substantially affects downstream applications. Zhao et al. (2018) show that a coreference resolution system is sexist due to the word embedding component used in the system. This concerns the practitioners who use the embedding model to build gender-sensitive applications such as a resume filtering system or a job recommendation system as the automated system may discriminate candidates based on their gender, as reflected by their name. Besides, biased embeddings may implicitly affect downstream applications used in our daily lives. For example, when searching for "computer scientist" using a search engine, as this phrase is closer to male names than female names in the embedding space, a search algorithm using an embedding model in the backbone tends to rank male scientists higher than females', hindering women from being recognized and further exacerbating the gender inequality in the community.
To alleviate gender stereotype in word embeddings, Bolukbasi et al. (2016) propose a postprocessing method that projects gender-neutral words to a subspace which is perpendicular to the gender dimension defined by a set of genderdefinition words. 1 However, their approach has two limitations. First, the method is essentially a pipeline approach and requires the gender-neutral words to be identified by a classifier before employing the projection. If the classifier makes a mistake, the error will be propagated and affect the performance of the model. Second, their method completely removes gender information from those words which are essential in some domains such as medicine and social science (Back et al., 2010;McFadden et al., 1992).
To overcome these limitations, we propose a learning scheme, Gender-Neutral Global Vectors (GN-GloVe) for training word embedding models with protected attributes (e.g., gender) based on GloVe (Pennington et al., 2014). 2 GN-GloVe represents protected attributes in certain dimen-sions while neutralizing the others during training. As the information of the protected attribute is restricted in certain dimensions, it can be removed from the embedding easily. By jointly identifying gender-neutral words while learning word vectors, GN-GloVe does not require a separate classifier to identify gender-neutral words; therefore, the error propagation issue is eliminated. The proposed approach is generic and can be incorporated with other word embedding models and be applied in reducing other societal stereotypes.
Our contributions are summarized as follows: 1) To our best knowledge, GN-GloVe is the first method to learn word embeddings with protected attributes; 2) By capturing protected attributes in certain dimensions, our approach ameliorates the interpretability of word representations; 3) Qualitative and quantitative experiments demonstrate that GN-GloVe effectively isolates the protected attributes and preserves the word proximity.

Related Work
Word Embeddings Word embeddings serve as a fundamental building block for a broad range of NLP applications (dos Santos and Gatti, 2014;Bahdanau et al., 2014;Zeng et al., 2015) and various approaches (Mikolov et al., 2013b;Pennington et al., 2014;Levy et al., 2015) have been proposed for training the word vectors. Improvements have been made by leveraging semantic lexicons and morphology (Luong et al., 2013;Faruqui et al., 2014), disambiguating multiple senses (Šuster et al., 2016;Arora et al., 2018;Upadhyay et al., 2017), and modeling contextualized information by deep neural networks (Peters et al., 2018). However, none of these works attempts to tackle the problem of stereotypes exhibited in embeddings.
Stereotype Analysis Implicit stereotypes have been observed in applications such as online advertising systems (Sweeney, 2013), web search (Kay et al., 2015), and online reviews (Wallace and Paul, 2016). Besides, Zhao et al. (2017) and Rudinger et al. (2018) show that coreference resolution systems are gender biased. The systems can successfully predict the link between "the president" with male pronoun but fail with the female one. Rudinger et al. (2017) use pointwise mutual information to test the SNLI (Bowman et al., 2015) corpus and demonstrate gender stereotypes as well as varying degrees of racial, re-ligious, and age-based stereotypes in the corpus. A temporal analysis about word embeddings (Garg et al., 2018) captures changes in gender and ethnic stereotypes over time. Researchers attributed such problem partly to the biases in the datasets (Zhao et al., 2017;Yao and Huang, 2017) and word embeddings (Garg et al., 2017;Caliskan et al., 2017) but did not provide constructive solutions.

Methodology
In this paper, we take GloVe (Pennington et al., 2014) as the base embedding model and gender as the protected attribute. It is worth noting that our approach is general and can be applied to other embedding models and attributes. Following GloVe (Pennington et al., 2014), we construct a word-to-word co-occurrence matrix X, denoting the frequency of the j-th word appearing in the context of the i-th word as X i,j . w,w ∈ R d stand for the embeddings of a center and a context word, respectively, where d is the dimension. In our embedding model, a word vector w consists of two parts w = [w (a) ; w (g) ]. w (a) ∈ R d−k and w (g) ∈ R k stand for neutralized and gendered components respectively, where k is the number of dimensions reserved for gender information. 3 Our proposed gender neutralizing scheme is to reserve the gender feature, known as "protected attribute" into w (g) . Therefore, the information encoded in w (a) is independent of gender influence. We use v g ∈ R d−k to denote the direction of gender in the embedding space. We categorize all the vocabulary words into three subsets: male-definition Ω M , female-definition Ω F , and gender-neutral Ω N , based on their definition in WordNet (Miller and Fellbaum, 1998).
Gender Neutral Word Embedding Our minimization objective is designed in accordance with above insights. It contains three components: where λ d and λ e are hyper-parameters. The first component J G is originated from GloVe (Pennington et al., 2014), which captures the word proximity: Here f (X i,j ) is a weighting function to reduce the influence of extremely large co-occurrence frequencies. b andb are the respective linear biases for w andw. The other two terms are aimed to restrict gender information in w (g) , such that w (a) is neutral. Given male-and female-definition seed words Ω M and Ω F , we consider two distant metrics and form two types of objective functions.
In J L1 D , we directly minimizing the negative distances between words in the two groups: In J L2 D , we restrict the values of word vectors in [β 1 , β 2 ] and push w (g) into one of the extremes: where e ∈ R k is a vector of all ones. β 1 and β 2 can be arbitrary values, and we set them to be 1 and −1, respectively.
Finally, for words in Ω N , the last term encourages their w (a) to be retained in the null space of the gender direction v g : where v g is estimating on the fly by averaging the differences between female words and their male counterparts in a predefined set, where Ω is a set of predefined gender word pairs. We use stochastic gradient descent to optimize Eq. (1). To reduce the computational complexity in training the wording embedding, we assume v g is a fixed vector (i.e., we do not derive gradient w.r.t v g in updating w (a) , ∀w ∈ Ω ) and estimate v g only at the beginning of each epoch.

Experiments
In this section, we conduct the following qualitative and quantitative studies: 1) We visualize the embedding space and show that GN-GloVe separates the protected gender attribute from other latent aspects; 2) We measure the ability of GN-GloVe to distinguish between gender-definition words and gender-stereotype words on a newly annotated dataset; 3) We evaluate GN-GloVe on standard word embedding benchmark datasets and show that it performs well in estimating word proximity; 4) We demonstrate that GN-Glove reduces gender bias on a downstream application, coreference resolution.
We compare GN-GloVe with two embedding models, GloVe and Hard-GloVe. GloVe is a widely-used model (Pennington et al., 2014), and we apply the post-processing step introduced in (Bolukbasi et al., 2016) to reduce gender bias in GloVe and name it after Hard-GloVe. All the embeddings are trained on 2017 English Wikipedia dump with the default hyper-parameters decribed in (Pennington et al., 2014). When training GN-GloVe, we constrain the value of each dimension within [−1, 1] to avoid numerical difficulty. We set λ d and λ e both to be 0.8. In our preliminary study on development data, we observe that the model is not sensitive to these parameters. Unless other stated, we use J L1 D in the GN-GloVe model.
Separate protected attribute First, we demonstrate that GN-GloVe preserves the gender association (either definitional or stereotypical associations) in w (g)4 . To illustrate the distribution of gender information of different words, we plot Fig. 1a using w (g) for the x-axis and a random value for the y-axis to spread out words in the plot. As shown in the figure, the gender-definition words, e.g. "waiter" and "waitress", fall far away from each other in w (g) . In addition, words such as "housekeeper" and "doctor" are inclined to different genders and their w (g) preserves such information. Next, we demonstrate that GN-GloVe reduces gender stereotype using a list of profession titles from (Bolukbasi et al., 2016). All these profession titles are neutral to gender by definition. In Fig. 1b and Fig. 1c, we plot the cosine similarity between each word vector w (a) and the gender direction v g (i.e., w T vg w vg ). Result shows that words, such as "doctor" and "nurse", possess no gender association by definition, but their GloVe word vectors exhibit strong gender stereotype. In contrast, the gender projects of GN-GloVe word vectors w (a) are closer to zero. This demonstrates the gender information has been substantially diminished from w (a) in the GN-GloVe embedding.
We further quantify the gender information exhibited in the embedding models. For each model, we project the word vectors of occupational words into the gender sub-space defined by "he-she" and compute their average size. A larger projection indicates an embedding model is more biased. Results show that the average projection of GloVe is 0.080, the projection of Hard-GloVe is 0.019, and the projection of Gn-Glove is 0.052. Comparing with GloVe, GN-GloVe reduces the bias by 35%. Although Hard-GloVe contains less gender information, we will show later GN-GloVe can tell difference between gender-stereotype and genderdefinition words better.
Gender Relational Analogy To study the quality of the gender information present in each model, we follow SemEval 2012 Task2 (Jurgens et al., 2012) to create an analogy dataset, SemBias, with the goal to identify the correct analogy of "he -she" from four pairs of words. Each instance in the dataset consists of four word pairs: a genderdefinition word pair (Definition; e.g., "waiterwaitress"), a gender-stereotype word pair (Stereotyp; e.g., "doctor -nurse") and two other pairs of words that have similar meanings (None; e.g., "dog -cat", "cup -lid") 5 . We consider 20 genderstereotype word pairs and 22 gender-definition word pairs and use their Cartesian product to generate 440 instances. Among the 22 genderdefinition word pairs, there are 2 word pairs that 5 The pair is sampled from the list of word pairs with "SIMILAR: Coordinates" relation annotated in (Jurgens et al., 2012). The original list has 38 pairs. After removing gender-definition word pairs, 29 are left.  are not used as a seed word during the training.
To test the generalization ability of the model, we generate a subset of data (SemBias (subset)) of 40 instances associated with these 2 pairs. Table 1 lists the percentage of times that each class of pair is on the top based on a word embedding model (Mikolov et al., 2013c). GN-GloVe achieves 97.7% accuracy in identifying genderdefinition word pairs as an analogy to "he -she". In contrast, GloVe and Hard-GloVe makes significantly more mistakes. On the subset, GN-GloVe also achieves significantly better performance than Hard-Glove and GloVe, indicating that it can generalize the gender pairs on the training set to identify other gender-definition word pairs. Word Similarity and Analogy In addition, we evaluate the word embeddings on the benchmark tasks to ensure their quality. The word similarity tasks measure how well a word embedding model captures the similarity between words comparing to human annotated rating scores. Embeddings are tested on multiple datasets: WS353-ALL (Finkelstein et al., 2001), RG-65 (Rubenstein and Goodenough, 1965), MTurk-287 (Radinsky et al., 2011), MTurk-771 (Halawi et al., 2012, RW (Luong et al., 2013), and MEN-TR-3k (Bruni et al., 2012) Table 2: Results on the benchmark datasets. Performance is measured in accuracy and in Spearman rank correlation for word analogy and word similarity tasks, respectively.
datasets. The analogy tasks are to answer the question "A is to B as C is to ?" by finding a word vector w that is closest to w A − w B + w C in the embedding space. Google (Mikolov et al., 2013a) and MSR (Mikolov et al., 2013c) datasets are utilized for this evaluation. The results are shown in Table 2, where the suffix "-L1" and "-L2" of GN-GloVe stand for the GN-GloVe using J L1 D and J L2 D , respectively. Compared with others, GN-GloVe achieves a higher accuracy in the similarity tasks and its analogy score slightly drops indicating that GN-GloVe is capable of preserving proximity among words.
Coreference Resolution Finally, we investigate how the gender bias in word embeddings affects a downstream application, such as coreference resolution. Coreference resolution aims at clustering the denotative noun phrases referring to the same entity in the given text. We evaluate our models on the Ontonotes 5.0 (Weischedel et al., 2012) benchmark dataset and the WinoBias dataset (Zhao et al., 2018). 6 In particular, the WinoBias dataset is composed of pro-stereotype (PRO) and antistereotype (ANTI) subsets. The PRO subset consists of sentences where a gender pronoun refers to a profession, which is dominated by the same gender. Example sentences include "The CEO raised the salary of the receptionist because he is generous." In this sentence, the pronoun "he" refers to "CEO" and this reference is consistent with societal stereotype. The ANTI subset contains the same set of sentences, but the gender pronoun in each sentence is replaced by the opposite gender. For instance, the gender pronoun "he" is replaced by "she" in the aforementioned example. Despite the sentence is almost identical, the gender pronoun now refers to a profession that is less represented by the gender. Details about the dataset are in (Zhao et al., 2018). 6 Specifically, we conduct experiments on the Type 1 version.

Embeddings
OntoNotes  We train the end-to-end coreference resolution model (Lee et al., 2017) with different word embeddings on OntoNote and report their performance in Table 3. For the WinoBias dataset, we also report the average (Avg) and absolute difference (Diff) of F1 scores on two subsets. A smaller Diff value indicates less bias in a system. Results show that GN-GloVe achieves comparable performance as Glove and Hard-GloVe on the OntoNotes dataset while distinctly reducing the bias on the WinoBias dataset. When only the w (a) potion of the embedding is used in representing words, GN-GloVe(w (a) ) further reduces the bias in coreference resolution.

Conclusion and Discussion
In this paper, we introduced an algorithm for training gender-neutral word embedding. Our method is general and can be applied in any language as long as a list of gender definitional words is provided as seed words (e.g., gender pronouns). Future directions include extending the proposed approach to model other properties of words such as sentiment and generalizing our analysis beyond binary gender.