Implicit Subjective and Sentimental Usages in Multi-sense Word Embeddings

In multi-sense word embeddings, contextual variations in corpus may cause a univocal word to be embedded into different sense vectors. Shi et al. (2016) show that this kind of pseudo multi-senses can be eliminated by linear transformations. In this paper, we show that pseudo multi-senses may come from a uniform and meaningful phenomenon such as subjective and sentimental usage, though they are seemingly redundant. In this paper, we present an unsupervised algorithm to find a linear transformation which can minimize the transformed distance of a group of sense pairs. The major shrinking direction of this transformation is found to be related with subjective shift. Therefore, we can not only eliminate pseudo multi-senses in multisense embeddings, but also identify these subjective senses and tag the subjective and sentimental usage of words in the corpus automatically.


Introduction
Multi-sense word embeddings are popular choices to represent polysemous words (Reisinger and Mooney, 2010;Huang et al., 2012;Neelakantan et al., 2014;Cheng and Kartsaklis, 2015;Lee and Chen, 2017). These methods learn senses of words automatically by clustering contexts they appear in. However, contextual variation in corpus may cause a univocal word be embedded into different senses (Shi et al., 2016). For example, the context of "another" in this sentence is normal and narrative: "South Trust, another large bank headquartered in Birmingham, was acquired by Wachovia in 2004." * Now at Toyota Technological Institute at Chicago, freda@ttic.edu. † Corresponding author. In the second sentence, the word "another" locates in subjective and emotional context with intense feelings: "He committed suicide after the woman he loved married another man." The word "another" in these two sentences have the same meaning, but they are often embedded into two different senses by existing multi-sense word embedding models. Shi et al. (2016) used linear transformation to eliminate the vector differences between corresponding sense pairs with the same meaning, and improved the performance on downstream tasks such as contextual word similarity (Huang et al., 2012). Such pairs were called pseudo multi-sense pairs. However, they did not give any explicit explanation of the eliminated vector difference in a pseudo multi-sense pair.
In this paper, we propose to explain the socalled pseudo multi-senses by slightly modifying the linear transformation proposed by Shi et al. (2016). We find that a large number of pseudo multi-senses can be viewed as pairs of i) a normal sense and ii) a subjective or sentimental sense. In addition, as shown in Figure 1, a group of words may have similar normal-subjective/sentimental difference vector, indicating subjectivity and sentiment are general sources of pseudo multi-senses.
In the first step of our approach, we identify the multi-sense pairs that are generated by an uniform contextual variation. Then we regress a linear transformation which can minimize the average Euclidean distance between two opposite groups in embedding space. We analyze the major shrinking directions in the embedding space w.r.t. the linear transformation, and find it consistent that one of such directions is relevant to subjective and sentimental usage.
The motivation of our approach is that a group of pseudo multi-senses is often generated systematically, i.e., pseudo multi-senses in the same group come from the same reason. Therefore, a linear transformation that eliminate the shift and minimize distance between senses may reflect a salient language phenomenon. In addition to giving explicit explanation to pseudo multi-senses, experimental results also show that our approach can contribute to some NLP tasks such as subjective and sentimental analysis.
In Section 2, we introduce some related work. In Section 3, we present a method to mine a linear transformation that eliminates semantic shift generating 'pseudo multi-senses'. In Section 4, we analyze the language phenomenon represented and eliminated by that linear transformation, namely subjective and sentimental usage, and do some evaluations on the subjective shift. Finally in Section 6 we draw conclusions and propose future work left to be done.
Multi-sense word embedding is also a popular way to represent polysemous words (Reisinger and Mooney, 2010;Huang et al., 2012;Neelakantan et al., 2014;Guo et al., 2014;Li and Jurafsky, 2015;Iacobacci et al., 2015;Cheng and Kartsaklis, 2015;Lee and Chen, 2017). However, these method using contextual difference for sense clustering to decide senses are so sensitive to contextual variation and usage of word, therefore may embed a single sense into several vectors. We aim to mine such contextual variations. While supervised methods rely on external knowledge with manually definition of senses (Chen et al., 2014;Cao et al., 2017).
Singular value decomposition is used on latent semantic indexing by factorizing a termdocument matrix and constructing a "semantic space" (Deerwester et al., 1990). We use a similar approach to extract language phenomena we mine.

Methodology
The general framework of our method includes the following four steps: 1. We start with random selected "pseudo multisenses" as initial seeds, training a linear transformation to minimize transformed distance of these sense pairs.
Let M denote the transformation matrix and (x, y) denote a vector pair of pseudo multisense. Define the set of pairs sharing a uniform semantic shift as P. Ideally, we expect M to transform x closer to y and keep y unmoved for all (x, y) ∈ P. Therefore we derive the following loss function: 2. Update pseudo multi-senses iteratively w.r.t. the loss function.
According to the hypothesis that a systematical contextual variation may generate a group of pseudo multi-senses, the linear transformation that eliminate this variation can be used to pick out most typical pseudo multisenses. We define shrinking rate, which reveals the degree of a pair of senses being combined by transformation M : The smaller ρ M (x, y) is, the more likely (x, y) are generated by this contextual variation. Thus we optimize the set of pseudo multi-senses, and use it to retrain transformation matrix. This algorithm can eventually converge to an optimal solution.
3. Extract eigen-directions of linear transformation M with a singular value decomposition algorithm.
The major shrinking directions are semantic directions shrunk and eliminated by the linear transformation. If there exists an obvious explanation of such directions, they can be viewed as the representative directions for specific language phenomena (e.g., subjective or sentimental usage) in the embedding space.
4. Observe KNN of major shrinking directional vectors so as to reveal the language phenomena corresponding to contextual variations.
The pseudo code of this procedure is shown in Algorithm 1.

Intuitive Results
Our experiments are based on the multi-sense skip-gram(MSSG) model (Neelakantan et al., 2014) and Wikipedia Corpus, training a 50dimensional multi-sense embedding space.
With this method, we train a linear transformation with random seeds. Results show that transformation will converge to a stable point, which verifies the existence of systematical contextual variation in corpus.
Furthermore, to understand the language phenomena that generate 'pseudo' mutli-senses, we observe nearest neighbours of eigen-directional vectors. We find each eigen-direction is meaningful. The eigenvalues of these eigen-directions are Choose the k-most shrunk pairs as S 5: end while 6: returnM, S subj-vec's KNN feelings, song, strange, love, everything, emotional, never,something, girlfriend, always, eyes, smell, dialogue, smile, really, movie, sounds, things, sexual, mind, script Reversed subjvec's KNN regional, administrative, township, located, racial, lies, avenue, virginia, approximately, historic, register, pennsylvania, municipality, served, delaware, situated, politician, operates, terminus, unincorporated  expansion multipliers of these dimensions. Therefore the eigen-direction with smallest eigenvalue represents the most salient language phenomenon. Interestingly, nearest neighbours are words about sentiments and emotions. Under our observation, this major shrinking directional vector is likely to be the vector representing subjective usage. We denote this vector as subj-vec. The KNN of subjvec is shown in Table 1.
Therefore we found that the subjective usage of words is a salient language phenomenon in multisense embedding space. Interestingly, we also observe that the reversed direction of subj-vec is related to some regional and political topics, which is matched with human intuition.

Sentence Classification
We take subjectivity and sentiment analysis tasks to evaluate the function of subj-vec.
We take two text classification tasks: SUBJ (Pang and Lee, 2004), a subjectivity status detection task and MPQA (Wiebe et al., 2005), an opinion polarity classification task. We use the LR(logistic regression) classifier and sentence level features to do the classification tasks. We use word/sense embedding as encoder, and decide the sense of every instance by Equation2 .
Sense(C(w)) = arg max sense cos(V context , V sense ) (2) We express the sentence-level features with contextual vector, denoted as context-vec, which is the sum of sense embeddings in a sentence. We provide four groups of evaluation results with different encoders: 1. context-vec with original embeddings. 2. context-vec with the embedding space whose subjective direction is stretched. We stretch subjective direction by Equation 3, in which embedding of a sense s is denoted as v(s), and embedding in the stretched space is denoted as In Equation 3, subj-vec is the directional vector of subjective usage. Each embedding is added by a bias in subjective direction. 3. context-vec with the embedding space whose subjective direction is eliminated by Equation 4.
The results are shown in Table 2.
Obviously subjective direction of embedding space improves its performance on subjective and sentimental analysis. Such improvement doesn't appear on single-sense embeddings. Meanwhile eliminating subjective direction worsen the performances on every listed tasks. Li and Jurafsky (2015) argued that multi-senses word embedding does not outperform single-sense word embedding in several language tasks. In fact, we found by add features on the sentimental dimension, multi-sense embeddings can achieve better performance on subjectivity and sentimental analysis tasks.

Analogies
Moreover, since subj-vec represents subjective usage, we add it to some embeddings in multi-sense embedding space to observe the effect of subj-vec on semantic shift. Table 3 illustrates the KNN for the original words and the words with a subjective bias. The table shows that by adding subjective subj-vec, the subjective and sentimental properties for words are changed. In general, more emotional and subjective words appear in the KNNs of the new location. This is another interesting property of subj-vec.

Conclusions And Future Work
In this article we propose a methodology to represent language phenomena such as subjective usage by a uniform bias vector of sense pairs, and provide an unsupervised approach to mine it. We also use evaluations to explore its functions and find subj-vec can relatively improve multi-sense embeddings performance on subjective and sentimental analysis tasks. Furthermore, there are many linguistic phenomena left to be mined.