Representation of Word Meaning in the Intermediate Projection Layer of a Neural Language Model

Performance in language modelling has been significantly improved by training recurrent neural networks on large corpora. This progress has come at the cost of interpretability and an understanding of how these architectures function, making principled development of better language models more difficult. We look inside a state-of-the-art neural language model to analyse how this model represents high-level lexico-semantic information. In particular, we investigate how the model represents words by extracting activation patterns where they occur in the text, and compare these representations directly to human semantic knowledge.


Introduction & Related Work
Language modelling involves learning to predict the next word in a sequence of words, using large text corpora as the training input.Language models must therefore learn to represent information from the preceding context which is relevant for future word prediction, and, intuitively, this should include information about the syntactic structure of the context and the meanings of constituent words.Today's state-of-the-art language models make use of Recurrent Neural Networks (RNNs) with Long Short-Term Memory cells (LSTMs) (Hochreiter and Schmidhuber, 1997) which can handle time series information by remembering salient information over latent variables (Mikolov et al., 2010).Because of their wide applicability, there has been much interest in developing a better understanding of the inner workings of RNN models, and, in particular, researchers have investigated how syntactic knowledge is encoded and processed by such networks (Dyer et al., 2016;Linzen et al., 2016;Jozefowicz et al., 2016;Mc-Coy et al., 2018;Gulordava et al., 2018).Karpathy et al. (2015) performed an in-depth analysis of the types of errors RNN's make, in order to understand how recurrent mechanisms can encode long-term dependency information.Linzen et al. (2016) present a more direct analysis by examining LSTM language models' ability to understand difficult long-range dependencies such as the form of a verb linked to a noun subject.Recently, researchers have started to study the semantic embeddings generated by these networks (Chrupała et al., 2015), especially for those focused on encoding visual grounding (Kiela et al., 2017;Yoo et al., 2017).However, compared to syntax, there has been relatively less work on how LSTM networks represent lexical semantic knowledge.
In this work, we evaluate latent semantic knowledge present in the LSTM activation patterns produced before and after the word of interest.We evaluate whether these activations predict human similarity ratings, human-derived property knowledge, and brain imaging data.In this way, we test the model's ability to encode important semantic information relevant to word prediction, and it's relationship with human cognitive semantic representations.

Language Model Data
We make use of a state-of-the-art LSTM neural language model known as lm 1b (Jozefowicz et al., 2016), which consists of two LSTM layers followed by low-dimensional projections.To construct representations from the language model's LSTM projection layer, we first select a subset of 62.5 million sentences from the One Billion Word dataset (Chelba et al., 2013).We then choose a predefined set of target words, based on the overlap of words in the lm 1b vocabulary with words used in three evaluation datasets, described in Section 3. To derive a model of the lexical representation for each of our target words using the language model, we sample 100 sentences for each word in which that word occurs, and process each of those sentences using lm 1b.More specifically, at the location in the sentence where the specific word of interest has just been processed, we record the 1024-dimensional projection of the activations of the first LSTM layer in the network and then average all these vectors (from 100 sentences) to get the final vector.On the assumption that the effects of context "average out" over the 100 different sampled sentences for each word, we take the average vector to be a representation of the lexical content of the concept, independent of context.Furthermore, we also build a model of lexical representation by recording the LSTM activations at the word just before the target word is presented to the network.

Comparison to Similarity Judgments
We first investigate how well similarities between our model vectors predict human similarity judgments.We use WordSim353 (Finkelstein et al., 2001) a set of 353 pairs of words along with human ratings.We split WordSim353 into semantic similarity and semantic relatedness datasets, following Agirre et al. (2009).On the hypothesis that the representations we derived from the language model reflect lexical content, we predicted that similarity, as calculated from the model, would more closely correspond to semantic similarity (i.e.shared hypernyms) than semantic relatedness.We also anticipated that correlations with human judgments would be stronger for the 'after' model than the 'before' model, since the word explicitly affects activations in the network only after it is encountered (however, the 'before' model provides an interesting test of whether lexical information can be predicted, drawing an analogy with models of human language comprehension (Kuperberg, 2016)).
For both the before and after models, correlations were stronger for the human semantic similarity ratings than for semantic relatedness, with the strongest correlation achieved for the 'after' model and similarity ratings (r=0.30).Furthermore, the after model more closely corresponded to the human similarities than the before model, though the before model still shows some correlation (r=0.21),indicating that the model may indeed encode information about upcoming concepts before they occur.

Property Knowledge Prediction
To directly investigate how the language model encodes lexico-semantic content, we analysed whether the derived lexical representations can predict human-derived properties of the same concepts.We used a dataset of human-elicited property knowledge (the CSLB norms; Devereux et al. ( 2014)), which lists semantic properties for concepts (e.g.leaf has the properties is-green & grows-on-trees).To test how well the model representations can predict these properties, we largely follow Collell and Moens (2016) and Lucy and Gauthier (2017).For each property, we train an L2-regularized logistic regression to predict whether that property is true for a given concept.We train two sets of logistic regression models to predict properties from the vectors in the 'before' and 'after' models.We use 5-fold cross validation with stratified sampling to ensure at least one positive case occurs in the validation fold.To get the final score of the decodability of a property for each model, we average the F1 scores over each test fold.Interestingly, semantic features were more decodable before the noun than afterwards.

Comparison to Brain Imaging Data
We compared the before and after representations from the language model to fMRI and MEG brain imaging data for 60 concepts available in Brain-Bench (Xu et al., 2016).We use the "2 vs 2" test described in Xu et al. (2016) for all pairs of concepts to measure the correspondence between the models and the brain data.The 'before' and 'after' models perform similarly, though (somewhat surprisingly) the before model performs slightly better on fMRI data than the after model.However, both models perform above chance, indicating that these models are correlated with brain representations of the same nouns.

Conclusions
Our results suggest that LSTM language models not only encode probabilistic syntactic knowledge but also represent the semantic content of words in a way which is at least somewhat consistent with measures of human conceptual knowledge.Language models' ability to predict human property knowledge allows us to draw initial comparisons between these models and activation (and preactivation) of lexical information in human language comprehension.