Interpretable Word Embeddings via Informative Priors

Word embeddings have demonstrated strong performance on NLP tasks. However, lack of interpretability and the unsupervised nature of word embeddings have limited their use within computational social science and digital humanities. We propose the use of informative priors to create interpretable and domain-informed dimensions for probabilistic word embeddings. Experimental results show that sensible priors can capture latent semantic concepts better than or on-par with the current state of the art, while retaining the simplicity and generalizability of using priors.


Introduction
Increased availability of large digitized corpora and significant developments in natural language processing (NLP) have sparked a growing interest within computational social science and digital humanities (CSSDH) to use computational methods for textual data (Laver et al., 2003;Grimmer, 2010;DiMaggio et al., 2013;Jockers and Mimno, 2013;Tsur et al., 2015). Word embeddings, a family of unsupervised methods for representing words as dense vectors (Mikolov et al., 2013b;Pennington et al., 2014), are one such development. Although word embeddings have demonstrated strong performance on NLP tasks (Mikolov et al., 2013a,c), they have yet to gain widespread attention within CSSDH.
We believe two key limitations can help explain the lack of applications within CSSDH. First, since the dimensions of the word embeddings are largely uninterpretable, it is not clear how to disentangle why words are similar. Substantive interpretability is key for CSSDH research, and thus, the lack thereof is a major limitation. Second, offthe-shelf word embedding models generally lack a channel through which substantive research questions can be incorporated. * Equal contribution To improve interpretability, previous research suggests using sparsity constraints (Murphy et al., 2012;Sun et al., 2016;Faruqui et al., 2015) and rotation techniques (Park et al., 2017;Rothe and Schütze, 2016;Dufter and Schütze, 2019). Other work considers dimension-specific constraints to remove gender-bias, and, as a by-product, improve interpretability (Zhao et al., 2018). Within CSSDH, previous work derives interpretable dimensions via post-processing in the form of antonym-pair vector algebra (Kozlowski et al., 2018;Garg et al., 2018) and ideal-point anchoring of antonym word-pairs (Lauretig, 2019). Recent formulations of word embeddings as probabilistic models (Vilnis and McCallum, 2015;Rudolph et al., 2016;Barkan, 2017;Havrylov et al., 2018) enable the incorporation of domain knowledge through priors. In this paper, we add to the literature on interpretable word embeddings, proposing a novel use of informative priors to create predefined interpretable dimensions -thus leveraging the expressiveness and generalizeability of the probabilistic framework.

Informative Priors in Word Embeddings
The central idea of this paper is to use informative priors to restrict the degree to which different words can inhabit different dimensions, such that one or more dimensions become interpretable and connected to one's research interest. Specifically, we place informative priors on word types that we expect to discriminate on a particular dimension, e.g. man-woman for a gender dimension. Let V + and V − be the set of anchor word types that informative priors is placed on, e.g. V + = {man} and V − = {woman}. Also, let V ± = V + ∪ V − , and let V ± be the word types without any informative priors, i.e. the complement set.
Given a corpora with vocabulary size V , we represent each token x i ∈ {0, 1} V as a one-hot vector with a single nonzero entry at v. Here, v ∈ V represents the word type at position i in the text. Following Rudolph et al. (2016), we model each individual entry x iv ∈ {0, 1} of the one-hot vectors conditional on its context x c i , i.e the tokens surrounding x i , where c i denotes the positions belonging to the context. Each word type is associated with an embedding vector, ρ v ∈ IR K , which governs the distribution of x i , and a context vector, α v ∈ IR K , governing the distributions of the tokens for which x i is part of x c i . The conditional probability of x iv is modelled as a linear combination of the embedding and context vectors, i.e. where with logit serving as the link function. Rudolph et al. (2016) place a zero-centered Gaussian prior with variance σ on the embedding and context vectors.
In addition, since evolution of semantic concepts are of special interest in CSSDH (Tahmasebi et al., 2015), we also consider a dynamic word embedding model to capture temporal dynamics in the dimension of interest. We follow the specification in Rudolph and Blei (2017) which extends Eq. (1) by associating each token with a time slice t, and fit separate ρ To share statistical strength between time-points, a Gaussian random walk prior is placed on ρ ( Where σ d = σ 100 , as in Rudolph and Blei (2017), determines the smoothness of the trajectories. This shows how, in contrast to the state of the art (Kozlowski et al., 2018;Garg et al., 2018;Lauretig, 2019;Zhao et al., 2018), informative priors allow easy integration with other, more complex, probabilistic models.

The Standard Basis Prior
In the following sections, we introduce a number of prior specifications that differ in how they restrict the degree to which words can occupy different dimensions. Letting K represent the dimension that we want to make interpretable, and dimensions 1 : K − 1 be standard word embedding dimensions not subject to interpretation, we define our first prior specification, the Standard Basis Prior, as where θ K V + and θ K V − are priors on the dimension of interest (K) of ρ v and α v for word types in V + and V − respectively, and where θ 1:K−1 V ± is the shared prior for all anchor word types on dimensions 1 : is the standard prior from Eq. (2), placed on all dimensions for all nonanchor word types. Hyperparameters ω and γ are shared for v ∈ V ± , controlling the strength of the prior. As γ, ω → 0, we force v ∈ V ± to essentially become a standard basis, defined by the word types in the prior. Consequently, the dot product for these vectors will be 0 for all dimensions except K, and thus the effect of the anchored word types on the rest of the vocabulary will exclusively depend on K.
However, this implies that, as γ, ω → 0, word types within V − and V + obtain exactly the same word embedding. For example, with V + = {brother, king}, brother is treated as semantically identical to king. To address this issue, we consider increasing ω, allowing the anchor word types to exist more freely in the first K − 1 dimensions while remaining close to −1 and 1 in the K th dimension. This permits the use of both brother and king as prior anchors without assuming that they are exactly the same word. We henceforth refer to a standard basis prior with ω = 10 −6 as strict and ω = 1 as weak.

The Truncated Prior
A limitation of the prior specifications introduced thus far is the implicit assumptions that (i) anchor word types discriminate equally on the dimension of interest, and (ii) that we know their exact location in this dimension. To address this, we consider the Truncated Prior that does not assume a basis for the anchored word types, but only that they live on the positive and negative real line as Where N + (0, 1) and N − (0, 1) is the positive and negative truncated normal distribution, truncated at 0, respectively.

Using Neutral Words
So far, we have only considered word types known to discriminate on the dimension of interest as prior anchors. However, domain knowledge might also inform us about neutral words in this dimension (e.g. stop words). We thus consider placing informative priors on a third set of prior anchors V * containing neutral word types as where ψ is the variance for dimension K. By guiding neutral words close to 0 in the dimension of interest, explanatory power that otherwise might have been attributed to word types in V * will instead be allocated to other words in x c i .

Experiments
Our main empirical concern is how well the proposed priors can capture meaningful latent dimensions. We summarize our empirical questions as follows: 1. Which prior specification can best capture predefined dimensions?
2. How does the best prior specification compare with the state of the art in CSSDH (Garg et al., 2018;Kozlowski et al., 2018) (henceforth referred to as SOTA)?
We consider two semantic dimensions, gender, which is explored in SOTA, and sentiment, a dimension proven difficult to capture in standard word embedding models (Tang et al., 2014). We follow SOTA when choosing prior anchors for gender, while using the AFINN dictionary (Nielsen, 2011) to find prior anchors for sentiment. To evaluate the effect of "few" vs. "many" prior anchors, we run experiments using between 2 and 276 words depending on the dimension of interest and the dataset at hand. All prior word types used in the experiments can be found in Sec. B in the supplementary material. We follow Rudolph et al. (2016), obtaining maximum a posteriori estimates of the parameters using TensorFlow (Abadi et al., 2015) with the Adam optimizer (Kingma and Ba, 2015) and negative sampling 1 . We set the size of the embeddings K = 100, use a context window size of 8 and σ = 1 throughout all experiments.
We examine the proposed priors using three commonly sized English corpora for textual analysis within CSSDH: the top 100 list of books in Project Gutenberg (2019), a sample from Twitter (Go et al., 2009) and the U.S. Congress Speeches 1981-2016 (Gentzkow et al., 2018). The number of tokens ranges from 1.8M to 40M after preprocessing (see Sec. A in the supplementary material for details). The various origins, sizes and contents of these datasets work as a check of the effect of the priors in different types of corpora.
To measure the extent to which the inferred dimension reflects the semantic concept of interest, we consider how well-placed a number of prespecified hold-out word types (not a part of V ± ) are in this particular dimension. Specifically, accuracy is computed as the fraction of hold-out words that are placed on the correct side of 0 on the dimension of interest in the embedding and context vectors. For the gender dimension, hold-out word types include the 200 most common names of the last century (Social Security Administration, 2019) and gendered words, while for sentiment, a sample of the AFINN dictionary is used (see Sec. C in the supplementary material). The number of hold-out word types ranges between 213 and 275, since not all exist in each corpus.
The experimental configurations are compared to SOTA, which we implement by (i) fitting the probabilistic word embedding model in Eq.
(1) and (2) without informative priors, and (ii) deriving the interpretable dimension post-hoc by subtracting normalized embedding vectors of antonym-pairs, e.g. ρ gender = (ρ man −ρ woman )+ (ρ he − ρ she ). Comparing the sign of the cosine similarity between hold-out words and the created vector allows us to contrast the accuracy of our method with that of SOTA. To get a measure of uncertainty we calculate binomial confidence intervals using the normal approximation.

Results
We begin by comparing the strict and weak Standard Basis Priors, with γ = 10 −6 , varying the number of anchor words used to inform the dimension of interest. The general pattern in Fig.  1 shows how (i) increasing the number of anchor words improves the accuracy, and that (ii) this improvement is much greater for the weak than for the strict Standard Basis Prior. We explain this Gutenberg Twitter Senate Gender Sentiment F e w M a n y N e u tr a l F e w M a n y N e u tr a l F e w M a n y N e u tr a l difference by the nature of the strict prior; it forces all anchor words to have exactly the same meaning -which clearly is untrue. These results speak against methods that transform the vector space based on strict standard basis vectors, as in Lauretig (2019). Further noticeable is that the average accuracy is greater for the Senate corpus, which, using subsampling, we find is driven by corpora size.
Relaxing the location assumptions with the Truncated Prior (γ = 1000) and placing informative priors on neutral words (ψ = 0.01) show varying degrees of improvements. However, due to the limited number of hold-out test words, large improvements are required for significant differences -which only is observed for the sentiment dimension in the Senate corpus. Fig. 1 further shows how our proposed approach generally performs better or on-par with SOTA. The one exception is the gender dimension in the Senate corpus. Follow-up analysis show that this difference is driven by misclassification of a cluster of ambiguous names, e.g. Madison (founding father) and Jordan (country). These names become correctly classified when using the full vectors, suggesting that gender has not been completely isolated in the Senate corpus (see Gonen and Goldberg (2019) for an in-depth discussion on issues of concept-isolation in word embeddings).

Case Study of Semantic Change
Using the Truncated Prior with neutral words, we leverage the dynamic embedding model described in Eq. (3) to explore temporal patterns in the Senate corpus. We set σ d = 0.05 to allow for the identification of abrupt temporal changes.
The left panel of Fig. 2 displays sentimenttrajectories for two words experiencing drastic three-year consecutive changes in the sentiment dimension; September between 1999 and 2001 and Oklahoma between 1993 and 1995, capturing two terror events: the attacks on the World Trade Center on September 11th 2001, and the Oklahoma City bombing in 1995. The change precedes the event due to the smoothness of the prior. In the years that follow, the words gradually regain their previous sentiment -reflecting a decline in their association with terror.
Second, we test and find support for the genderoccupation results in Kozlowski et al. (2018), i.e. that occupations' temporal positioning in the gender dimension correlates well with the proportion of men and women within those fields. The right panel of Fig. 2 displays the gender-dimension trajectories for the words Nurse and Laywer, occupations with high shares of women and men, respectively (Kozlowski et al., 2018).
In sum, these examples showcase how interpretability can be gained using informative priors. By isolating prespecified concepts, e.g. sentiment, into one dimension (K), one can infer word typeconcept associations such as September-sentiment -for all non-anchor word types -allowing for more meaningful within and between word type comparisons.

Conclusion
In this paper, we show how informative priors provide a simple and straightforward alternative for constructing interpretable dimensions for probabilistic word embeddings -allowing CSSDH researchers to explore how words relate to each other in prespecified dimensions of interest, e.g. gender and sentiment.
Our results demonstrate that the biggest gains in interpretability are obtained by (i) using many prior words and (ii) increasing the degree to which they can live outside the predefined dimension. The weak Standard Basis Prior and the Truncated Prior overall capture the predefined dimensions with similar accuracy. Aligned with previous research, the experimental results indicate some issues with incomplete isolation of concepts which we believe could be addressed in future work by placing informative priors on multiple dimension to capture more complex concepts, and move away from simplistic antonym-driven dimension definitions.
Finally, while being flexible and easily extended to other probabilistic word embedding models, our approach performs better or on par with SOTA.