Interpretable Textual Neuron Representations for NLP

Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture.


Introduction
Deep Neural Networks (DNNs) have led to advances in Natural Language Processing, but they are hard to interpret. This is partly due to the fact that their smallest components, i.e., neurons, lack interpretable representations.
For computer vision problems, Simonyan et al. (2014) propose to use gradient ascent to find an input image that maximizes the activation of a neuron of interest. Using these image representations, one can for instance show that lower level neurons in vision CNNs specialize in patterns such as stripes (Mordvintsev et al., 2015).
Applying gradient ascent input optimization to NLP is not straightforward, as discrete symbols are not open to continuous manipulation. A common alternative approach is to search existing corpora for optimal documents or n-grams (e.g., Kádár et al. (2017), Aubakirova and Bansal (2016)). As this strategy only covers the space of existing inputs, we assume that it may lead to incorrect assumptions. For instance, the representation of a given neuron may suggest that syntax was learned, when in reality this is due to a lack of ungrammatical inputs in the corpus. Also, a neuron might attend to a set of concepts that do not usually appear together (e.g., it may fire in the presence of both food-related and sports-related words). In this case, a search-based representation may only reveal part of the whole picture.
In the following, we propose and test methods for gradient ascent input optimization in NLP. Our quantitative assessment suggests that one method, which is based on the gumbel softmax trick, produces inputs that are more highly activating than corpus search. By applying this method to the Imaginet architecture, we confirm that a language model pays attention to syntax to some degree, while a visual model looks for key content words and ignores function words.

Input optimization for NLP
In the following, we denote as f (E) the activation of some neuron of interest when forward-feeding a sequence of embedding vectors E = [e 1 . . . e T ].

Embedding optimization
One straightforward approach to NLP input optimization is to treat E like Simonyan et al. (2014) treat images, i.e., to apply gradient ascent directly to the embedding vectors, while keeping other model parameters constant: argmax E f (E) . However, there is no guarantee that the optimal vectors will correspond to the embedding vectors of real words, or even be close to them. In our experiments, the average cosine proximity to the closest real-word embedding is 0.24, suggesting that there is a divergence between the training goal (finding embedding vectors) and the real goal (finding a representation made up of real words).

Word optimization
Note that the embedding operation can be written as E = XM, where X ∈ {0, 1} T ×V is a matrix of one-hot vectors and M is the embedding matrix arXiv:1809.07291v1 [cs.CL] 19 Sep 2018 for all V known words. If we relax the requirement that X be one-hot, we can perform gradient ascent directly on X, while keeping M constant: argmax X f (XM) . This approach has the undesirable effect that entries in X can become very large or negative, and therefore unlike the one-hot vectors seen in training.
To enforce positive vectors that sum to one, we can use the softmax function across the vocabulary axis: However, this input can still be unlike the inputs seen during training, as the optimal distribution may be smooth.
To remedy this situation, we use the gumbel softmax trick (Jang et al. (2017), Maddison et al. (2017)): The resulting probability distribution has the property that selecting its argmax is equivalent to sampling from p smx . By slowly annealing τ , we are able to transition from a smooth distribution to one where probability mass is highly concentrated, while at the same time avoiding instabilities caused by hard sampling (c.f., Buckman and Neubig (2018)).

Model
We re-implement the Imaginet architecture from Kádár et al. (2017). It consists of a joint word embedding layer (embedding size 1024) and two separate unidirectional GRUs (hidden size 1024 each). One GRU serves as a language model, while the other predicts visual features of a scene described in the input sentence. The model is trained on 566435 MSCOCO captions with visual features taken from Chrupała et al. (2017) 1 .

Quantitative evaluation
We evaluate the above-mentioned methods by the activation that their optimal representations achieve in target neurons. We assume that the higher the activation, the better the representation. For embedding optimization, representations are derived by finding the nearest real-word neighbor of the optimized embeddings in the embedding space. For word optimization, we take the argmax over the vocabulary dimension of X.

Projection layer
In the projection layer, we randomly select 160 target neurons and find an optimal representation for each one of them individually (Figure 1, upper boxplots). Note that in the language model, we maximize the linear pre-softmax score.

GRU hidden layer
In the GRU hidden layer, optimizing a single neuron is not very challenging, as the tanh activation function is easily saturated. Tang et al. (2016) report that, contrary to LSTMs, GRUs use highly distributed activation patterns to convey meaningful signals. Therefore, we evaluate the methods by their ability to achieve high mean activation in disjoint groups of GRU hidden state neurons (Figure 1, lower boxplots). The groups are derived by hierarchical clustering with complete linkage. As distance metric, we use negative activation correlation, as measured on n-grams from the corpus.

Results
We find that while representations from embedding, logit and softmax optimization are not competitive, the gumbel softmax trick outperforms the corpus search strategy in terms of target neuron activation. Paired t-tests on the difference between corpus search and gumbel softmax representations were highly significant, with p < 0.001 in all cases: t = −23.5 (visual model projec-tion layer), t = −33.6 (language model projection layer), t = −14.1 (visual model hidden layer), t = −21.7 (language model hidden layer). Table 1 shows optimal 5-grams for some neurons. We observe that, contrary to what corpus search suggests, optimal inputs for the visual model rarely contain function words, i.e., the model seems to ignore them. Optimal inputs for the language model sometimes display grammatically correct structures with function words directly before the predicted word (e.g., "stare to their [left]", "under an [umbrella]", see Table 1). This suggests that the language model pays attention to function words and has indeed learned some syntax, as suggested by Kádár et al. (2017).

Qualitative observations
Furthermore, we observe that one neuron may pay attention to different concepts. For example, the "race" neuron in the language model is activated by both horse and motorbike racing words, as evidenced by the gumbel representation (Example 5 in Table 1 fest stares stares to their 13.22 973th neuron ("left") in language model projection layer Table 1: Examples of optimal 5-grams via corpus search and via gradient ascent with gumbel softmax. Spelling errors stem from the Imaginet dictionary.

Conclusion
The gumbel softmax trick makes it possible to extend the input optimization method to NLP, and to find interpretable textual neuron representations via gradient ascent. Our experimental results suggest that this technique exceeds naive search on a large in-domain corpus in terms of target neuron activation. The representations also show interesting differences in syntax awareness based on target modality in Imaginet. Our code will be made available on https://github.com/ NPoe/input-optimization-nlp.