Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

Advanced machine learning techniques have boosted the performance of natural language processing. Nevertheless, recent studies, e.g., (CITATION) show that these techniques inadvertently capture the societal bias hidden in the corpus and further amplify it. However, their analysis is conducted only on models’ top predictions. In this paper, we investigate the gender bias amplification issue from the distribution perspective and demonstrate that the bias is amplified in the view of predicted probability distribution over labels. We further propose a bias mitigation approach based on posterior regularization. With little performance loss, our method can almost remove the bias amplification in the distribution. Our study sheds the light on understanding the bias amplification.


Introduction
Data-driven machine learning models have achieved high performance in various applications. Despite the impressive results, recent studies (e.g., Wang et al. (2019); Hendricks et al. (2018)) demonstrate that these models may carry societal biases exhibited in the dataset they trained on. In particular, Zhao et al. (2017) show that a model trained on a biased dataset may amplify the bias. For example, we can consider a task of labeling the activity and objects depicted in an image. The training set contains 30% more images with "woman cooking" than "man cooking". However, when evaluating the top predictions of a trained model, the disparity between males and females is amplified to around 70%. Based on this observation, Zhao et al. (2017) conduct a systematic study and propose to calibrate the top predictions of a learned model by injecting * Both authors contributed equally to this work and are listed in alphabetical order. corpus-level constraints to ensure that the gender disparity is not amplified.
However, when analyzing the top predictions, the models are forced to make one decision. Therefore, even if the model assigns high scores to both labels of "woman cooking" and "man cooking", it has to pick one as the prediction. This process obviously has a risk to amplify the bias. However, to our surprise, we observe that gender bias is also amplified when analyzing the posterior distribution of the predictions. Since the model is trained with regularized maximal likelihood objective, the bias in distribution is a more fundamental perspective of analyzing the bias amplification issue.
In this paper, we conduct a systematic study to quantify the bias in the predicted distribution over labels. Our analysis demonstrates that when evaluating the distribution, though not as significant as when evaluating top predictions, the bias amplification exists. About half of activities show significant bias amplification in the posterior distribution, and on average, they amplify the bias by 3.2%.
We further propose a new bias mitigation technique based on posterior regularization because the approaches described in Zhao et al. (2017) can not be straightforwardly extended to calibrate bias amplification in distribution. With the proposed technique, we successfully remove the bias amplification in the posterior distribution while maintain the performance of the model. Besides, the bias amplification in the top predictions based on the calibrated distribution is also mitigated by around 30%. These results suggest that the bias amplification in top predictions comes from both the requirement of making hard predictions and the bias amplification in the posterior distribution of the model predictions. Our study advances the understanding of the bias amplification issue in natural language processing models. The code and data are available at https://github.com/uclanlp/reducingbias.

Related Work
Algorithmic Bias Machine learning models are becoming more and more prevalent in the real world, and algorithmic bias will have a great societal impact (Tonry, 2010;Buolamwini and Gebru, 2018). Researchers have found societal bias in different applications such as coreference resolution (Rudinger et al., 2018;Zhao et al., 2018), machine translation (Stanovsky et al., 2019) and online advertisement (Sweeney, 2013). Without appropriate adjustments, the model can amplify the bias (Zhao et al., 2017). Different from the previous work, we aim at understanding the bias amplification from the posterior perspective instead of directly looking at the top predictions of the model.

Posterior Regularization
The posterior regularization framework (Ganchev et al., 2010) is aiming to represent and enforce constraints on the posterior distribution. It has been shown effective to inject domain knowledge for NLP applications. For example, Ji et al. (2012); Gao et al. (2014) design constraints based on similarity to improve question answering and machine translation, respectively. Yang and Cardie (2014) propose constraints based on lexical patterns in sentiment analysis. Meng et al. (2019) apply corpus-level constraints to guide a dependency parser in the cross-lingual transfer setting. In this paper we leverage corpus-level constraints to calibrate the output distribution. Our study resembles to the confidence calibration (Guo et al., 2017;Naeini et al., 2015). However, the temperature turning and binning methods proposed in these papers cannot straightforwardly be extended to calibrate the bias amplification.

Background
We follow the settings in Zhao et al. (2017) to focus on the imSitu vSRL dataset (Yatskar et al., 2016), in which we are supposed to predict the activities and roles in given images and this can be regraded as a structure prediction task (see Fig. 1).
We apply the Conditional Random Field (CRF) model for the structure prediction task. We denote y as a joint prediction result for all instances, and y i as a prediction result for instance i. We use y v to denote the predicted activity, and y r to denote the predicted role. An activity can have multiple roles and usually one of them conveys the gender information. For an instance i, the CRF model predicts the scores for every activity and role, and the score for a prediction is the summation of all these scores. Formally, where s θ (y i v , i) and s θ (y i v , e, i) are the scores for activity y i v of instance i, and the score for role e of instance i with activity y i v , respectively. We can infer the top structure for instance i by: where Y i refers to all the possible assignments to the instance.

Bias Amplification Quantification and
Corpus-level Constraints Zhao et al. (2017) demonstrate bias amplification in the top prediction and present a bias mitigation technique by inference with corpus-level constraints. In the following, we extend their study to analyze the bias amplification in the posterior distribution by the CRF model and define the corresponding corpus-level constraints. Formally, the probability of prediction y i for instance i and the joint prediction y defined by CRF model with parameters θ are given by since instances are mutually independent.
In this section, we will define how to quantify the bias and the bias amplification in the distribution, and introduce the corpus-level constraints towards restricting the bias in the distribution.
We focus on the gender bias on activities in the vSRL task. To quantify the gender bias given a particular activity v * , Zhao et al. (2017) uses the percentage that v * is predicted together with male agents among all prediction with genders. This evaluation focuses on the top prediction. In the contrast, we define bias function B(p, v * , D) w.r.t distribution p and activity v * , evaluating the bias toward male in dataset D based on the conditional probability P (X|Y ), where event Y : given an instance, its activity is predicted to be v * and its role is predicted to have a gender; event X : this instance is predicted to have gender male. Formally, .
(2) This bias can come from the training set D tr . Here we use b * (v * , male) to denote the "dataset bias" toward male in the training set, measured by the ratio of between male and female from the labels: , whereŷ i denotes the label of instance i. Ideally, the bias in the distribution given by CRF model should be consistent with the bias in the training set, since CRF model is trained by maximum likelihood. However, the amplification exists in practice. Here we use the difference between the bias in the posterior distribution and in training set to quantify the bias amplification, and average it over all activities to quantify the amplification in the whole dataset: Note that if we use the top prediction indicator function to replace p in A,Ā, it is the same as the definition of the bias amplification in top prediction in Zhao et al. (2017).
The corpus-level constraints aim at mitigating the bias amplification in test set D ts within a predefined margin γ, ∀v * , |A(p, v * , D ts )| ≤ γ. (3)

Posterior Regularization
Posterior regularization (Ganchev et al., 2010) is an algorithm leveraging corpus-level constraints to regularize the posterior distribution for a structure model. Specifically, given corpus-level constraints and a distribution predicted by a model, we 1) define a feasible set of the distributions with respect to the constraints; 2) find the closest distribution in the feasible set from given distribution; 3) do maximum a posteriori (MAP) inference on the optimal feasible distribution. The feasible distribution set Q is defined by the corpus-level constraints defined in Eq. (3): Given the feasible set Q and the model distribution p θ defined by Eq. (1), we want to find the closest feasible distribution q * : q * = arg min q∈Q KL(q p θ ).
This is an optimization problem and our variable is the joint distribution q with constraints, which is intractable in general. Luckily, according to the results in Ganchev et al. (2010), if the feasible set Q is defined in terms of constraints feature functions φ and their expectations: Eq. (5) will have a close form solution q * (y) = p θ (y) exp(−λ * · φ(y)) Z(λ * ) , where λ * is the solution of λ * = arg max λ≥0 −c · λ − log Z(λ).
Actually, we can derive the constraints into the form we want. We set c = 0 and We can choose a proper φ i (y i ) to make Eq. (4) equal to Eq. (6). The detailed derivation and the definition of φ i (y i ) are shown in Appendix A. We can solve Eq. (8) by gradient-based methods to get λ * , and further compute the close form solution in Eq. (7). Actually, considering the relation between y and y i in Eq. (1) and (9), we can factorize the solution in Eq. (7) on instance level: and the derivation details are in Appendix B. With this, we can reuse original inference algorithm to conduct MAP inference based on the distribution q * for every instance seperately.

Experiments
We conduct experiments on the vSRL task to analyze the bias amplification issue in the posterior distribution and demonstrate the effectiveness of the proposed bias mitigation technique.
Dataset Our experiment settings follow Zhao et al. (2017). We evaluate on imSitu (Yatskar et al., 2016) that activities are selected from verbs, roles are from FrameNet (Baker et al., 1998) and nouns from WordNet (Fellbaum, 1998). We filter out the non-human oriented verbs and images with labels that do not indicate the genders.
Model We analyze the model purposed together with the dataset. The score functions we describe in Sec. 3 are modeled by VGG (Simonyan and Zisserman, 2015) with a feedforward layer on the top of it. The scores are fed to CRF for inference.

Bias Amplification in Distribution
Figures 2a and 2c demonstrate the bias amplification in both posterior distribution p θ and the top predictions y defined in Sec.4, respectively. For most activities with the bias toward male (i.e., higher bias score) in the training set, both the top prediction and posterior distribution are even more biased toward male, vise versa. If the bias is not amplified, the dots should be scattered around the reference line. However, most dots are on the top-right or bottom-left, showing the bias is amplified. The black regression line with slope > 1 also indicates the amplification. Quantitatively, 109 and 173 constraints are violated when analyzing the bias in distribution an in top predictions.
Most recent models are trained by minimizing the cross-entropy loss which aims at fitting the model's predicted distribution with observed distribution on the training data. In the inference time, the model outputs the top predictions based on the underlying prediction distribution. Besides, in practice, the distribution has been used as an indicator of confidence in the prediction. Therefore, understanding bias amplification in distribution provides a better view about this issue.
To analyze the cause of bias amplification, we further show the degree of amplification along with the learning curve of the model (see Fig. 3). We observed that when the model is overfitted, the distribution of the model prediction becomes more peaky 1 . We suspect this is one of the key reasons causes the bias amplification.

Bias Amplification Mitigation
We set the margin γ = 0.05 for every constraint in evaluation. However, we employ a stricter margin (γ = 0.001) in performing posterior regularization to encourage the model to achieve a better feasible solution. We use mini-batch to estimate the gradient w.r.t λ with Adam optimizer (Kingma and Ba, 2015) when solving Eq. (5). We set the batchsize to be 39 and train for 10 epochs. The learning rate is initialized as 0.1 and decays after every mini-batch with the decay factor 0.998.

Results
We then apply the posterior regularization technique to mitigate the bias amplification in distribution. Results are demonstrated in Figures 2b (distribution) and 2d (top predictions). The posterior regularization effectively calibrates the bias in distribution and only 5 constraints are violated after the calibration. The average bias amplification is close to 0 (Ā: 0.032 to −0.005). By reducing the amplification of bias in distribution, the bias amplification in top predictions also reduced by 30.9% (Ā: 0.097 to 0.067). At the same time, the model's performance is kept (accuracy: 23.2% to 23.1%).
Note that calibrating the bias in distribution cannot remove all bias amplification in the top predictions. We posit that the requirement of making hard predictions (i.e., maximum a posteriori estimation) also amplifies the bias when evaluating the top predictions.

Conclusion
We analyzed the bias amplification from the posterior distribution perspective, which provides a better view to understanding the bias amplification issue in natural language models as these models are trained with the maximum likelihood objective. We further proposed a bias mitigation technique based on posterior regularization and show that it effectively reduces the bias amplification in the distribution. Due to the limitation of the data, we only analyze the bias over binary gender. However, our analysis and the mitigation framework is general and can be adopted to other applications and other types of bias.
One remaining open question is why the gender bias in the posterior distribution is amplified. We posit that the regularization and the over-fitting nature of deep learning models might contribute to the bias amplification. However, a comprehensive study is required to prove the conjecture and we leave this as future work.

A Definition of the Feature Functions
The feature function for predictions y is defined as the summation of feature functions for each instance y i , which is a 2n−dimensional vector where n is the number of constraints. Each entry is the feature function corresponding to a constraint and the inequality sign direction. Formally,

B Derivation of Feature Functions Expectation
We can derive the feature functions expection as Thus, it is equivalent as ∀v * , i E y i ∼q(·,i) φ i v * ,− (y i ) ≤ 0, i E y i ∼q(·,i) φ i v * ,+ (y i ) ≤ 0.
The inequality about φ i v * ,− can be derived as (1 − b * − γ)q(y i , i) − i y i :y i v =v * ,y i r ∈W (b * + γ)q(y i , i) ≤ 0 i y i :y i v =v * ,y i r ∈M q(y i , i) i y i :y i v =v * ,y i r ∈M ∪W q(y i , i) The inequality about φ i v * ,− can be derived similarly.