Analytical Methods for Interpretable Ultradense Word Embeddings

Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. In contrast to Densifier, DensRay can be computed in closed form, is hyperparameter-free and thus more robust than Densifier. We evaluate the three methods on lexicon induction and set-based word analogy. In addition we provide qualitative insights as to how interpretable word spaces can be used for removing gender bias from embeddings.


Introduction
Distributed representations for words have been of interest in natural language processing for many years. Word embeddings have been particularly effective and successful. On the downside, embeddings are generally not interpretable. But interpretability is desirable for several reasons. i) Semantically or syntactically similar words can be extracted: e.g., for lexicon induction.
ii) Interpretable dimensions can be used to evaluate word spaces by examining which information is covered by the embeddings. iii) Computational advantage: for a high-quality sentiment classifier only a couple of dimensions of a high-dimensional word space are relevant. iv) By removing interpretable dimensions one can remove unwanted information (e.g., gender bias). v) Most importantly, interpretable embeddings support the goal of interpretable deep learning models.
Orthogonal transformations have been of particular interest in the literature. The reason is twofold: under the assumption that existing word embeddings are of high-quality one would like to preserve the original embedding structure by using orthogonal transformations (i.e., preserving original distances). Park et al. (2017) provide evidence that rotating existing dense word embeddings achieves the best performance across a range of interpretability tasks.
In this work we modify the objective function of Densifier (Rothe et al., 2016) such that a closed form solution becomes available. We call this method DensRay. Following Amir et al. (2015) we compute simple linear SVMs, which we find to perform surprisingly well. We compare these methods on the task of lexicon induction.
Further, we show how interpretable word spaces can be applied to other tasks: first we use interpretable word spaces for debiasing embeddings. Second we show how they can be used for solving the set-based word analogy task. To this end, we introduce the set-based method IntCos, which is closely related to LRCos introduced by Drozd et al. (2016). We find IntCos to perform comparable to LRCos, but to be preferable for analogies which are hard to solve.
Our contributions are: i) We modify Densifier's objective function and derive an analytical solution for computing interpretable embeddings. ii) We show that the analytical solution performs as well as Densifier but is more robust. iii) We provide evidence that simple linear SVMs are best suited for the task of lexicon induction. iv) We demonstrate how interpretable embedding spaces can be used for debiasing embeddings and solving the set-based word analogy task. The source code of our experiments is available. 1 2 Methods

Notation
We consider a vocabulary V := {v 1 , v 2 , ..., v n } together with an embedding matrix E ∈ R n×d where d is the embedding dimension. The ith row of E is the vector e i . 2 We require an annotation for a specific linguistic feature (e.g., sentiment) and denote this annotation by l : V → {−1, 1}. The objective is to find an orthogonal matrix Q ∈ R d×d such that EQ is interpretable, i.e., the values of the first k dimensions correlate well with the linguistic feature. We refer to the first k dimensions as interpretable ultradense word space. We interpret x ∈ R n as a column vector and x as a row vector. Further, we normalize all word embeddings with respect to the euclidean norm.
Densifier (Rothe et al., 2016) solves the following optimization problem, subject to q q = 1 and q ∈ R d . Further α = , α = ∈ [0, 1] are hyperparameters. We now modify the objective function: we use the squared euclidean norm instead of the euclidean norm, something that is frequently done in optimization to simplify the gradient. The problem becomes then (1) Using x 2 2 = x x together with associativity of the matrix product we can simplify to q Aq subject to q q = 1. 2 We denote the vector corresponding to a word w by ew.
Thus we aim to maximize the Rayleigh quotient of A and q. Note that A is a real symmetric matrix. Then it is well known that the eigenvector belonging to the maximal eigenvalue of A solves the above problem (cf. Horn et al. (1990, Section 4.2)). We call this analytical solution DensRay.
A second dimension that is orthogonal to the first dimension and encodes the linguistic features second strongest is given by the eigenvector corresponding to the second largest eigenvalue. The matrix of k eigenvectors of A ordered by the corresponding eigenvalues yields the desired matrix Q (cf. Horn et al. (1990, Section 4.2)) for k > 1. Due to A being a real symmetric matrix, Q is always orthogonal.

Comparison to Densifier
We have shown that DensRay is a closed form solution to our new formalization of Densifier. This formalization entails differences.
Case k > 1. While both methods -Densifier and DensRay -yield ultradense k dimensional subspaces. While we show that the spaces are comparable for k = 1 we leave it to future work to examine how the subspaces differ for k > 1.
Multiple linguistic signals. Given multiple linguistic features, Densifier can obtain a single orthogonal transformation simultaneously for all linguistic features with chosen dimensions reserved for different features. DensRay can encode multiple linguistic features in one transformation only by iterative application.
Optimization. Densifier is based on solving an optimization problem using stochastic gradient descent with iterative orthogonalization of Q. Den-sRay, in contrast, is an analytical solution. Thus we expect DensRay to be more robust, which is confirmed by our experiments.

Geometric Interpretation
Assuming we normalize the vectors d vw one can interpret Eq. 1 as follows: we search for a unit vector q such that the square of the cosine similarity with d vw is large if (v, w) ∈ L = and small if (v, w) ∈ L = . Thus, we identify dimensions that are parallel/orthogonal to difference vectors of words belonging to different/same classes. It seems reasonable to consider the average cosine similarity. Thus if n = , n = is the number of elements in L = , L = one can choose α = = n −1 = and α = = n −1 = .

Lexicon Induction
We show that DensRay and Densifier indeed perform comparably using the task of lexicon induction. We adopt Rothe et al. (2016)'s experimental setup. We also use Rothe et al. (2016)'s code for Densifier. Given a word embedding space and a sentiment/concreteness dictionary (binary or continuous scores where we binarize continuous scores using the median), we identify a onedimensional interpretable subspace. Subsequently we use the values along this dimension to predict a score for unseen words and report Kendall's τ rank correlation with the gold scores.
To ensure comparability across methods we have redone all experiments in the same setting: we deduplicated lexicons, removed a potential train/test overlap and ignored neutral words in the lexicons. We set α = = α = = 0.5 to ensure comparability between Densifier and DensRay.
Additionally we report results created by linear SVM/SVR inspired be their good performance as demonstrated by Amir et al. (2015). While they did not use linear kernels, we require linear kernels to obtain interpretable dimensions. Naturally the normal vector of the hyperplane in SVMs/SVRs reflects an interpretable dimension. An orthogonal transformation can be computed by considering a random orthogonal basis of the null space of the interpretable dimension. Table 1 shows results. As expected the performance of Densifier and DensRay is comparable (macro mean deviation of 0.001). We explain slight deviations between the results with the slightly different objective functions of DensRay and Densifier. In addition, the reorthogonalization used in Densifier can result in an unstable training process. Figure 1 assesses the stability by reporting mean and standard deviation for the concreteness task (BWK lexicon). We varied the size of the training lexicon as depicted on the x-axis and sampled 40 subsets of the lexicon with the prescribed size. For the sizes 512 and 2048 Densifier shows an increased standard deviation. This is because there is at least one sample for which the performance significantly drops. Removing the re-orthogonalization in Densifier prevents the drop and restores performance. Recent work (Zhao and Schütze, 2019) also finds that replacing the orthogonalization with a regularization is reasonable in certain circumstances. Given that DensRay and Densifier yield the same perfor-mance and DensRay is a stable closed form solution always yielding a orthogonal transformation we conclude that DensRay is preferable.
Surprisingly, simple linear SVMs perform best in the task of lexicon induction. SVR is slightly better when continuous lexica are used for training (line 8). Note that the eigendecomposition used in DensRay yields a basis with dimensions ordered by their correlation with the linguistic feature. An SVM can achieve this only by iterated application.

Removing Gender Bias
Word embeddings are well-known for encoding prevalent biases and stereotypes (cf. Bolukbasi et al. (2016)). We demonstrate qualitatively that by identifying an interpretable gender dimension and subsequently removing this dimension, one can remove parts of gender information that potentially could cause biases in downstream processing. Given the original word space E we consider the interpretable space E := EQ, where Q is computed using DensRay. We denote by E ·,−1 ∈ R n×(d−1) the word space with removed first dimension and call it the "complement" space. We expect E ·,−1 to be a word space with less gender bias.
To examine this approach qualitatively we use  Table 1). SVR performs similar to SVM and is omitted for clarity. a list of occupation names 3 by Bolukbasi et al. (2016) and examine the cosine similarities of occupations with the vectors of "man" and "woman". Figure 2 shows the similarities in the original space E and debiased space E ·,−1 . One can see the similarities are closer to the identity (i.e., same distance to "man" and "woman") in the complement space. To identify occupations with the greatest bias, Table 3 lists occupations for which sim(e w , e man ) − sim(e w , e woman ) is largest/smallest. One can clearly see a debiasing effect when considering the complement space. Extending this qualitative study to a more rigorous quantitative evaluation is part of future work.

Word Analogy
In this section we use interpretable word spaces for set-based word analogy. Given a list of analogy  pairs [(a, a ), (b, b ), (c, c ), . . . ] the task is to predict a given a. Drozd et al. (2016) provide a detailed overview over different methods, and find that their method LRCos performs best.
LRCos assumes two classes: all left elements of a pair ("left class") and all right elements ("right class"). They train a logistic regression (LR) to differentiate between these two classes. The predicted score of the LR multiplied by the cosine similarity in the word space is their final score. Their prediction for a is the word with the highest final score.
We train the classifier on all analogy pairs except for a single pair for which we then obtain the predicted score. In addition we ensure that no word belonging to the test analogy is used during training (splitting the data only on word analogy pairs is not sufficient).
Inspired by LRCos we use interpretable word spaces for approaching word analogy: we train DensRay or an SVM to obtain interpretable embeddings E = EQ using the class information as reasoned above. We use a slightly different notation in this section: for a word w the ith component of its embedding is given by E w,i . Therefore we denote as E ·,1 the first column of E (i.e., the most interpretable dimension). We min-max normalize E ·,1 such that words belonging to the right class have a high value (i.e., we flip the sign if necessary). For a query word a we now want to identify the corresponding a by solvinĝ where sim computes the cosine similarity.
Given the result from §4 we extend the above method by computing the cosine similarity in the orthogonal complement, i.e., sim(E a,−1 , E v,−1 ). We call this method IntCos (INTerpretable, COSine). Depending on the space used for computing the cosine similarity add the word "Original" or "Complement".
We evaluate this method across two analogy datasets. These are the Google Analogy Dataset (GA) (Mikolov et al., 2013) and BATS (Drozd et al., 2016). As embeddings spaces we use Google News Embeddings (GN) (Mikolov et al., 2013) and FastText subword embeddings (FT) (Bojanowski et al., 2017). We consider the first 80k word embeddings from each space. Table 4 shows the results. The first observation is that there is no clear winner. IntCos Original performs comparably to LRCos with slight improvements for GN/BATS: here the classes are widespread and exhibit low cosine similarity (In-traR and IntraL), which makes them harder to solve. IntCos Complement maintains performance for GN/BATS and is beneficial for Derivational analogies on GN. For most other analogies it harms performance.
Within IntCos Original it is favorable to use DensRay as it gives slight performance improvements. Especially for harder analogies, where interclass similarity is high and intraclass similarities are low (e.g., in GN/BATS), DensRay outperforms SVMs. In contrast to SVMs, DensRay considers difference vectors within classes as wellthis seems to be of advantage here.

Related Work
Identifying Interpretable Dimensions. Most relevant to our method is a line of work that uses transformations of existing word spaces to obtain interpretable subspaces. Rothe et al. (2016) compute an orthogonal transformation using shallow neural networks. Park et al. (2017) apply exploratory factor analysis to embedding spaces   , 1996) is closely related to our method. However, both methods yield non-orthogonal transformation. Faruqui et al. (2015a) use semantic lexicons to retrofit embedding spaces. Thus they do not fully maintain the structure of the word space, which is in contrast to this work. Interpretable Embedding Algorithms. Another line of work modifies embedding algorithms to yield interpretable dimensions (Koç et al., 2018;Luo et al., 2015;Shin et al., 2018;Zhao et al., 2018). There is also much work that generates sparse embeddings that are claimed to be more interpretable (Murphy et al., 2012;Faruqui et al., 2015b;Fyshe et al., 2015;Subramanian et al., 2018). Instead of learning new embeddings, we aim at making dense embeddings interpretable.

Conclusion
We investigated analytical methods for obtaining interpretable word embedding spaces. Relevant methods were examined with the tasks of lexicon induction, word analogy and debiasing.
We gratefully acknowledge funding through a Zentrum Digitalisierung.Bayern fellowship awarded to the first author. This work was supported by the European Research Council (# 740516).