“You Are Grounded!”: Latent Name Artifacts in Pre-trained Language Models

Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for"Donald is a"substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.


Introduction
Pre-trained language models (LMs) have transformed the NLP landscape. State-of-the-art performance across tasks is achieved by fine-tuning the latest LM on task-specific data. LMs provide an effective way to represent contextual information, including lexical and syntactic knowledge as well as world knowledge (Petroni et al., 2019).
LMs conflate generic facts (e.g."the US has a president") with grounded knowledge regarding specific entities and events (e.g."the (current) president is a male"), occasionally leading to gender and racial biases (e.g."women can't be presidents") (May et al., 2019;Sheng et al., 2019).
In this work we focus on the representations of given names in pre-trained LMs (Table 1). Prior work showed that the representations of named entities incorporate sentiment (Prabhakaran et al., 2019), which is often transferable across entities via a shared given name (Field and Tsvetkov, 2019).

Model
Main Corpus Type Gen. Cls.
BERT (  In a series of experiments we show that, depending on the corpus, some names tend to be grounded to specific entities, even in generic contexts. The most striking effect is of politicians in GPT2. For example, the name Donald: 1) predicts Trump as the next token with high probability; 2) generated endings of "Donald is a" are easily distinguishable from any other given name; 3) their sentiment is substantially more negative; and 4) this bias can potentially perpetuate to downstream tasks.
Although these results are expected, their extent is surprising. Biased name representations may have adverse effect on downstream models, just as in social bias: imagine a CV screening system rejecting a candidate named Donald because of the negative sentiment associated with his name. Our experiments may be used to evaluate the extent of name artifacts in future LMs. 1

Last Name Prediction
As an initial demonstration of the tendency of pretrained LMs to ground given names to prominent named entities in the media, we examine the nextword probabilities assigned by the LM. If high probability is placed on a named entity's last name conditioned on observing their given name (e.g., P (Trump|Donald) = 0.99), we take this as evidence that the LM is, in effect, interpreting the first-name mention as a reference to the named entity. We note that this is a lower bound on evidence   for grounding: while it is reasonable to assume that nearly all mentions of, e.g., "Hillary Clinton" in text are references to (the entity) Hillary Clinton, other references may use different strings ("Hillary Rodham Clinton," "H.R.C.," or just "Hillary"). We also note that the LM is not constrained to generate a last name but may instead select one of many other linguistically plausible continuations. We examine greedy decoding of named entity last names systematically for each generative LM. To this end, we compile two sets of prominent named entities from the media and from history. 2 We construct four prompt templates ending with a given name to feed to each LM: (1) Minimal: "[NAME]", (2) News: "A new report from CNN says that [NAME]", (3) History: "A newly published biography of [NAME]", and (4) Informal: "I want to introduce you to my best friend, [NAME]". Table 2 shows, for each LM, the percentage of named entities for which the LM greedily generated that entity's last name 3 conditioned on one of the four prompt templates.
3 Or a middle initial followed by the last name.
Overall, the GPT2 models (in particular, GPT2-XL), which are trained on web text -including news but excluding Wikipedia -are vastly more likely than other models to predict named entities from the news, across all prompts. The GPT2 models are also very likely to predict named entities from history, but primarily when conditioned with the History prompt. By contrast, the TransformerXL model, trained on Wikipedia articles, is overall more likely to predict historical named entities than any other model, and is substantially more likely to predict historical entities than news entities. The GPT model, trained on fiction is the least likely of any model to generate named entities from the news. These results clearly demonstrate that (1) the variance of named entity grounding effects across different LMs is great, and (2) these differences are likely at least partially attributable to differences in training data genre. Table 3 focuses on GPT2-XL and shows the next word prediction for 8 given names of named entities frequently appearing in the U.S. news media, which are also common in the general population. Due to the contextual nature of LMs, the prompt type affects the last-name probabilities. Intuitively, generating the last name of an entity seems appro-  Table 4: Top 10 most predictable names from the "is a" endings for each model, using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 150. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of scores.
priate and expected in news-like contexts ("A new report from CNN says that [NAME]") but less so in more personal contexts ("I want to introduce you to my best friend, [NAME]"). Indeed, Table 3 demonstrates grounding effects are strongest in news-like contexts; however, these effects are still clearly present across all contexts-appropriate or notfor more prominent named entities in the U.S. media (Donald, Hillary, and Bernie). When prompted with given name only, GPT2-XL predicts the last name of a prominent named entity in all but one case (Elizabeth). In three cases, the corresponding probability is well over 50% (Clinton, Trump, Sanders), and in one case generates the full name of a white supremacist, Richard B. Spencer.

Given Name Recovery
Given a text discussing a certain person, can we recover their (masked) given name? Our hypothesis was that it would be more feasible for a given name prone to grounding, due to unique terms that appear across multiple texts discussing this person.
To answer this question, we compiled a list of the 100 most frequent male and female names in the U.S., 4 to which we added the first names of the most discussed people in the media (Section 2). Using the template "[NAME] is a" we generated 50 endings of 150 tokens for each name, with each of the generator LMs (Table 1), using Nucleus sampling (Holtzman et al., 2019) with p = 0.9. For each pair of same-gender given names, 5 we trained a binary SVM classifier using the Scikit-learn library (Pedregosa et al., 2011) to predict the given name from the TF-IDF representation of the endings, excluding the name. Finally, we computed the average of pairwise F 1 scores as a single score per given name. Table 4 displays the top 10 names with the most distinguishable "is a" endings. Bold entries mark given names of media entities, most prominent in the GPT2 models, trained on web text. Apart from U.S. politicians, Virginia (name of a state) and Irma (a widely discussed hurricane) are also predictable, supposedly due to their other senses. The results are consistent for different generation lengths and sampling strategies (see Section B). Figure 1 illustrates the ease of distinguishing texts discussing Hillary from others (GPT2-large). We masked the name ("[MASK] is a..."), computed the BERT vectors, and projected them to 2d using t-SNE (Maaten and Hinton, 2008). Similar results were observed for texts generated by other GPT2 models, for different names (e.g., Donald, Bernie), and with other input representations (TF-IDF).

Sentiment Analysis
Following Prabhakaran et al. (2019), we can expect endings ( §3) discussing specific named entities to be associated with sentiment more consistently than those discussing hypothetical people. We predict sentiment using the AllenNLP sentiment analyzer (Gardner et al., 2018) trained on the Stanford Sentiment Treebank (Socher et al., 2013). Table 5 displays the top 10 most negative given names for each LM, where per-name score is the average of negative sentiment scores for their endings.  Table 5: Top 10 names with the most negative sentiment for their "is a" endings on average, for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of average negative scores. Endings were generated using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 150.
Again, many of the top names are given names of people discussed in the media, mainly U.S. politicians, and more so in the GPT2 models. 6 We found the variation among the most positive scores to be low. We conjecture that LMs typically default to generating neutral texts about hypothetical people.

Effect on Downstream Tasks
Pre-trained LMs are now used as a starting point for a vast array of downstream tasks (Raffel et al., 2019), raising concerns about unintended consequences in such models. To study an aspect of this, we construct a set of 26 question-answer probe templates with [NAME1] and [NAME2] slots. We populate the templates with pairs of samegender names sampled from the list in §2. We evaluate the expanded templates on a set of LMs fine-tuned for either SQuAD (exemplified in are swapped in the template (flips). Table 6 and Table 7 present the top names contributing to the name swap fragility and the overall LM scores. SQuAD models exhibit a significant effect for all LMs, from weak to strong. Conversely, Winogrande models are mostly insulated from this effect. We speculate that the nature of the Winogrande training set, having seen many examples of names used in generic fashion, have helped remove the inherent artifacts associated with names.
We also note that extra pre-fine-tuning on RACE, although not helping noticeably with the original task, seems to increase robustness for name swaps. Named Entities. Field and Tsvetkov (2019) used pre-trained LMs to analyze power, sentiment, and agency aspects of entities, and found the representations were biased towards the LM training corpus. In particular, frequently discussed entities such as politicians biased the representations of their given names. Prabhakaran et al. (2019) showed that bias reflected in the language describing named entities is encoded into their representations, in particular

Ethical Considerations and Conclusion
We explored biases in pre-trained LMs with respect to given names and the named entities that share them. We discuss two types of ethical considerations pertaining to this work: (1) the limitations of this work, and (2) the implications of our findings.
Our methodology relies on a number of limitations that should be considered in understanding the scope of our conclusions. First, we evaluated only English LMs, thus we cannot assume these results will extend to LMs in different languages. Second, the lists of names we use to analyze these models are not broadly representative of English-speaking populations. The list of most common given names in the U.S. are over-representative of stereotypically white and Western names. The list of most frequently named people in the media as well as A&E's (subjective) list of most influential people of the millennium both are male-skewed, owing to many sources of gender bias, both historical and contemporary. For our last name prediction experiment, we are forced to filter named entities whose given names don't precede the surname, which is a cultural assumption that precludes naming conventions from many languages, like Chinese and Korean. We used statistical resources that treat gender as a binary construct, which is a reductive view of gender. We hope future work may better address this limitation, as in the work of Cao and Daumé III (2019). Finally, there are many other important types of biases pertaining to given names that we do not focus on, including biases on the basis of perceived race or gender (e.g. Bertrand and Mullainathan, 2004;Moss-Racusin et al., 2012). While our experiments shed light on artifacts of certain common U.S. given names, an equally important question is how LMs treat very uncommon names, effects which would disproportionately impact members of minority groups.
What this work does do, however, is shed light on a particular behavior of pre-trained LMs which has potential ethical implications. Pre-trained LMs do not treat given names as interchangeable or anonymous; this has not only implications for the quality and accuracy of systems that employ these LMs, but also for the fairness of those systems. Furthermore, as we observed with GPT2-XL's freeform production of a white supremacist's name conditioned only on a common given name (Richard), further inquiry into the source of training data of these models is warranted.
Are emily and greg more employable than lakisha and jamal? a field experiment on labor market discrimination. American economic review, 94 (4)

A Lists of Given Names
Tables 8 and 9 specify the given names used in this paper for females and males, respectively, along with named entities with each given name, and the sections of the experiments in which they were included (2 -last name prediction, 3 -given name recovery, 4 -sentiment analysis, and 5 -effect on downstream tasks).

B Given Name Prediction
In Section 3 we have presented the most predictable given names from the generated texts using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 150. Here we present the result with different hyper-parameters. Specifically, Tables 10 and 11 display the results for different lengths, 75 and 300 respectively, while Table 12 shows the results with length 150 and top k sampling with k = 25. The results are highly consistent for the different hyperparameter values. We omitted the results for beam search because it tends to generate very homogeneous texts for each name, making it trivial to classify all the names. Table 13 shows the most negative "is a" ending generated by GPT2-small for some of the people with the most negative average sentiment.

C Sentiment Analysis
In Section 4 we have presented the most negative given names based on the generated texts using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 150. Here we present the result with different hyper-parameters. Specifically , Tables 16 and 14 display the results  for different lengths, 75 and 300 respectively, while  Table 15 shows the results with length 150 and top k sampling with k = 25. The results are highly consistent for the different hyperparameter values. Figure 3 shows 6 (out of 26) example name swap probing templates, along with the most affected given names for each model.

Name
Media History 2 3-4 5 Name Media History 2 3-4 5  Table 10: Top 10 most predictable names from the "is a" endings for each model, using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 75. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of scores.  Table 11: Top 10 most predictable names from the "is a" endings for each model, using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 300. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of scores.  Table 12: Top 10 most predictable names from the "is a" endings for each model, using top k sampling with k = 25 and limiting the number of generated tokens to 150. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of scores.
Bernie is a progressive who would advocate for a $15 minimum wage as president.638 SHARES Facebook Twitter Google Whatsapp Pinterest Print Mail Flipboard A prominent think tank official conceded that Donald Trump is not a bigot after he asked people at the Trump rally to "go back to their country" before the audience voted on Donald Trump. Video: He was speaking at a rally in Springfield, Illinois on Thursday. At the rally, Trump has referred to immigrants as "rapists" and compared Muslim Americans to those from "disaster-ridden countries." Trump has referred to Mexican immigrants as rapists, drug dealers, and criminals. Video: The official at the conservative Manhattan Institute said, "Donald Trump has earned the right Donald is a "dangerous ideologue" and has advocated "leaving the EU" -calling for a "popular uprising" that would return him to office. Donald Trump and other Republicans plan to rename the Supreme Court, claiming it should be abolished, since it has usurped the authority of the lower federal courts in deciding national security issues. However, in the first month of the Trump administration, the federal courts have already replaced four of the nine judges on the current nine-member court with Hillary Clinton appointees, and the judge appointed by Barack Obama has prevented a deportation injunction granted by a federal district court against a pro-immigration defendant from taking effect. Much of Trump's court-reforming rhetoric has involved his arguments that the liberal judiciary has Hillary is a most reckless candidate. She shouldn't have the guts to mention, let alone say, that Russia is working with Donald Trump. Don't the people know better? She's one of the most irresponsible politicians in this country." Hillary's blatant corruption has been reported for years. It would not be the first time for a politician to praise Vladimir Putin for allegedly manipulating or exploiting his people.   Table 14: Top 10 names with the most negative sentiment for their "is a" endings on average, for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of average negative scores. Endings were generated using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 300.  Table 15: Top 10 names with the most negative sentiment for their "is a" endings on average, for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of average negative scores. Endings were generated using top k sampling with k = 25 and limiting the number of generated tokens to 150.  Table 16: Top 10 names with the most negative sentiment for their "is a" endings on average, for each model. Bold entries mark given names that appear frequently in the media. Bottom: mean and STD of average negative scores. Endings were generated using Nucleus sampling with p = 0.9 and limiting the number of generated tokens to 75.