Emotion Detection with Neural Personal Discrimination

There have been a recent line of works to automatically predict the emotions of posts in social media. Existing approaches consider the posts individually and predict their emotions independently. Different from previous researches, we explore the dependence among relevant posts via the authors’ backgrounds, since the authors with similar backgrounds, e.g., gender, location, tend to express similar emotions. However, such personal attributes are not easy to obtain in most social media websites, and it is hard to capture attributes-aware words to connect similar people. Accordingly, we propose a Neural Personal Discrimination (NPD) approach to address above challenges by determining personal attributes from posts, and connecting relevant posts with similar attributes to jointly learn their emotions. In particular, we employ adversarial discriminators to determine the personal attributes, with attention mechanisms to aggregate attributes-aware words. In this way, social correlationship among different posts can be better addressed. Experimental results show the usefulness of personal attributes, and the effectiveness of our proposed NPD approach in capturing such personal attributes with significant gains over the state-of-the-art models.


Introduction
The advent of social media and its prosperity enable the creation of massive online user-generated content including opinions and product reviews. Analyzing such user-generated contents allows to detect the users' emotional states, which are useful for various downstream applications.
In the literature, there are a large number of works on emotion detection (Roberts et al., 2012;Abdul-Mageed and Ungar, 2017;Gupta et al., 2017), both discrete and neural models have been * corresponding auther used to predict the emotions of posts in social media. For example, Roberts et al. (2012) used a series of binary SVM classifiers to detect the emotion of a post, while Gupta et al. (2017) used sentiment based semantic embedding and a LST-M model to learn the representation of a post for emotion detection.
Different from previous researches, which consider each post individually, we think that posts in social media are much correlated by the authors' backgrounds. Motivated by the principle of homophily (Lazarsfeld et al., 1954), the idea that similarity and connection tend to co-occur, or "birds of a feather flock together", suggests that users, connected by mutual personal backgrounds, may hold similar opinions toward a post (Thelwall, 2010). In the literature, the personal attributes, such as gender, location, age, have been proved useful in personal background construction : people with different attributes tend to express their emotions through different ways. For example, the happiness emotion in [E1] is expressed through some femininity sense words, such as "little brother", "handsome", while the happiness emotion in [E2] is expressed using a dialectal word "bashi(comfortable)", which contains strong characteristic of the Sichuan dialect 1 . Therefore, it is necessary to jointly detect the emotion of posts with the personal attributes.
[E1] ]:-ý7î: ° † å å / (Congratulations to Chinese basketball team on becoming the champion, the little brother of Xinjiang is really handsome.) [E2] ÙÍ) Ù* /ô (It is just bashi to eat this in such weather!) However, the personal attributes are not easy to obtain in most social media websites. On one hand, most websites may not contain useful personal information. On the other hand, people are normally not willing to attach their personal information in social media. Besides, integrating personal attributes into emotion detection is challenging, since it is hard to capture attributes-aware words, such as "little brother" and "bashi"(comfortable), to connect the posts with similar backgrounds. Although there are some related works on either personal attribute extraction (Wang et al., 2014) or emotion detection with personal attributes , none of them address both challenges at the same time.
In this paper, we propose a Neural Personal Discrimination (NPD) model with both adversarial discriminators and attention mechanisms to tackle above challenges. Here, the Adversarial discriminators (Goodfellow et al., 2014) are used to determine the personal attributes, e.g., gender or location, of a post, providing the inherent correlationship between emotions and personal backgrounds, while the Attention mechanisms (Wang et al., 2016) are utilized to aggregate the representation of informative attributes-aware words into a vector for emotion prediction, providing insights into which words contribute to a personal background. Experimental results show the usefulness of personal attributes in emotion detection, and the effectiveness of our proposed NPD model with both adversarial discriminators and attention mechanisms over the state-of-the-art discrete and neural models.

Related Work
Earlier works on emotion detection are based on discrete models. For example, Yang et al. (2007) built a support vector machine (SVM) model and a conditional random field (CRF) model for the emotion detection. Bhowmick et al. (2009) used a multi-label kNN model to classify a new sentence into multiple emotion categories. Quan et al. (2015) proposed a logistic regression model for social emotion detection. Recently, with the development of artificial intelligence, neural network models have been successfully applied to various NLP tasks (Collobert et al., 2011;Goldberg, 2016). However, few works use neural network models for emotion detection. Abdul-Mageed and Ungar (2017) used a gated recurrent neural network model for emotion detection with a largescale dataset. Zhang et al. (2018) used an auxiliary and attention based LSTM to detect emotion on a cross-lingual dataset.
Lexicon and social information are very important for emotion detection, and there are many researches focus on this topic. For example, Strapparava and Mihalcea (2008) used WordNet-Affect to compute the sentimental score of a post. More recently, In addition, Hovy (2015) used both the age and gender information of the authors to improve the performance of sentiment analysis. Vosoughi et al. (2016) explored the relationship among locations, date time, authors and sentiments.
Different from previous works which consider each post individually, we think that the posts in social media can be connected through the authors' backgrounds and should be better addressed. On the basis, we propose a neural personal discrimination model to determine the personal background attributes from each post through adversarial discriminators, and aggregate the representation of informative attributes-aware words through attention mechanisms.

Vanilla Model for Emotion Detection
In this section, we propose a vanilla model. In the next section, we show how to utilize the neural personal discrimination model to improve the vanilla model by capturing personal attributes.

Document Representation
In general, we denote a post as a document d with n words {w 1 , w 2 , ..., w n }. Given the post, we use a standard Long Short-Term Memory (LST-M) model to learn the shared document representation. Specially, we transform each token w i into a real-valued vector x i using the word embedding vector of w i , obtained by looking up a pre-trained word embedding table D via the skip-gram algorithm to train embeddings (Mikolov et al., 2013). We then employ the LSTM model over d to generate a hidden vector sequence {h 1 , h 2 , ..., h n }. At each step t, the hidden vector h t of the LST-M model is computed based on the current vector x t and the previous vector h t−1 with h t = LSTM(x t , h t−1 ). The initial state and all stand LSTM parameters are randomly initialized and tuned during training.

Multi-label Emotion Detection
Emotion detection aims to predict the emotion labels of posts. We follow (Wang et al., 2016) which adopts five kinds of emotions 2 in the study. Since there may be more than one emotion in a post, emotion detection can be considered as a multi-label classification task: we use K emotionspecific binary perceptions (K = 5) to predict if the post has the corresponding emotion or not. The advantage of multi-label classification is that it learns and predicts all the emotion labels jointly.
Formally, giving an input vector H, a hidden layer is first used to induce a set of high-level features for each emotion j: and then, H j is used as inputs to a softmax output layer:ŷ Here, W j , b j ,W j , andb j are model parameters.

Training
Given the word sequence in a post, our training objective is to minimize the cross-entropy loss over a set of training examples ( where y j i represents the label of the j-th emotion for x i , θ y is the set of model parameters and λ is the parameter for 2 regularization. In this paper, the model parameters are optimized by AdaGrad (Duchi et al., 2011), and Skipgram algorithm (Mikolov et al., 2013) is used for word embedding.

Neural Personal Discrimination Model
The drawback of above vanilla model is that it does not consider the deep personal correlationship among different posts.
In this study, we think that the posts in social media can be connected through the authors' backgrounds. Therefore, we propose a Neural Personal Discrimination (NPD) model to connect people and learn their emotions collectively. We use adversarial discriminators to determine the personal attributes to construct the personal profiles, and employ attention mechanisms to aggregate attributes-aware words. In this way, the social correlationship between different posts can be well addressed. Figure 1 illustrates our proposed neural personal discrimination model for emotion detection. In particular, we first learn the representation of each post using a LSTM model, same as the vanilla joint model. Then, we use adversarial discriminators to determine the personal attributes of each post. Finally, we employ attention mechanisms to aggregate the representation of informative attributes-aware words into a vector for the emotion prediction. In the following of this section, we illustrate the details of the infrastructure one by one.

Personal Adversarial Discriminators
A straightforward way to jointly detect personal attributes and the emotion of a post is to treat emotion detection as a multi-label classification. However, such model may not be able to separate the posts from different attributes directly, and thus fail to learn the correlationship between the emotion of a post and the personal backgrounds of the authors. To address this issue, we utilize adversarial discriminators to determine the personal attributes of the authors, and to learn the emotion and the attributes of the authors collectively. Adversarial networks have achieved much success in various studies, especially in image and text generation (Goodfellow et al., 2014;Wang and Wan, 2018;Fedus et al., 2018). In this part, we propose two adversarial discriminators, i.e, a gender discriminator and a location discriminator, to determine the personal attributes of each post.
Gender Discriminator. The gender discriminator is employed to determine the author's gender of each post. Let g i ∈ [0, 1] represents the probability of the gender label (female or male) for the gender discriminator, and f is the function parameterized by θ f which maps an embedding vector to a hidden representation h g i from the post x i . Here, the gender discriminator G(h g i ; θ g ) →ĝ i param- eterized by θ g maps a hidden representation vector h g i to a predicted gender labelĝ i with the loss function is: In this study, the gender discriminator is trained towards a saddle point of the loss function through maximizing the loss over θ g while minimizing the loss over θ f (Ganin et al., 2017).
Location Discriminator. The location discriminator is employed to determine the authors' location of a post 3 . Let j i ∈ [0, 1] represents probability of the j-th location information of the i-th post and j ∈ {1, 2, . . . , m}, where, m is the number of provinces. The loss function is: where,ˆ j i = L(f (x i )) and L(h i ; θ ) are the parameters of the location discriminator, h i = f (x i ) is the hidden represent from the post x i From the optimization of both discriminators, we can find that both h g and h represent the latent feature representation of posts, which integrate the discrimination of various personal information. With the goal at G(θ g ) and L(θ ) try best to determine the gender and location of the authors. The adversarial network makes use of minmax optimization.

Personal Attention Mechanisms
In emotion detection, not all words contribute equally to the representation of emotions and personal attributes. Hence, we employ attention mechanisms to extract the words that are important to the personal backgrounds of posts, and to aggregate the representations of those informative attributes-aware words. With regard to the two adversarial discriminators, we propose attention mechanisms to build two representation (v g and v ) from the gender and location discriminators respectively, and then concatenate them together to construct the overall personal representation through the informative attributes-aware words .
Gender Attention. We use an attention function to aggregate the gender-aware representation of the salient words to formulate the gender attention vector v g . Here, the gender attention model outputs a continuous vector v g ∈ R d×1 recurrently by feeding the hidden representation vectors {h g 1 , h g 2 , · · · , h g nt } as inputs. Specifically, v g is computed as a weighted sum of h g where n t is the hidden variable size, α i ∈ [0, 1] is the weight of h g i , and i α i = 1. For each piece of hidden state h g i , the scoring function is calculated as follows: Location Attention. Similar with the gender attention mechanism, the location attention model outputs a continuous vector v ∈ R d×1 recurrently by feeding the hidden representation vectors {h 1 , h 2 , · · · , h n k } as inputs. Specifically, v is computed as a weighted sum of h where, n k is the location hidden variable size, and β i is the same setting as α i . Finally, we concatenate v g and v to capture the overall personal representation through all the personal attributes discriminators.

Adversarial Training with Neural Personal Discrimination
The proposed NPD model can be trained in a endto-end manner once we obtain the loss function of the emotion detector and the attribute discriminators. Our ultimate training goal is to minimize the loss function with parameters θ = {θ f , θ y , θ g , θ } as follow: Whereλ 1 , λ 2 and λ 3 are the weight parameters to balance the importance of losses between the emotion detection and the two personal attribute discriminators. Specifically, Eq. 13 is defined by finding a saddle pointθ y ,θ f ,θ g ,θ such that (θ f ,θ y ) = arg min θ f ,θy J(θ f , θ y ,θ g ,θ ) θ g = arg max θg J(θ f ,θ y , θ g ,θ ) As suggested previously, a saddle point is defined by Eq. 14−Eq. 16, and can be achieved as a stationary point the gradient updates: where µ is the learning rate.

Experimental Settings
We collect the data from Weibo.com, one of the most popular SNS websites in China. We crawl all the posts and corresponding personal profiles from the website. The dataset contains 11,157 microblog posts from 839 users. We employed six graduated students to annotate the corpus with a well-defined annotation guideline. Every two annotators annotate a same part of corpus, if they have disagreement on some posts, we ask another annotator to vote with them. The annotation guideline is based on Lee et al. (2013). Five basic emotions are annotated, namely happiness, sadness, fear, anger and surprise (Lee et al., 2013;Wang et al., 2017). Table 1 illustrates the statistics of each emotion. From the table, we find that the frequency of happiness and sadness are similar. Moreover, the frequency of fear and anger is much less than other three emotions. We randomly select 70% posts as the training data, and remaining 30% posts as the test data. For evaluation, F1-measure is used to evaluate the performance of proposed model in each emotion. Average F1-measure is used to evaluate the overall performance of all emotions.

Experimental Results
We compare the proposed Neural Personal Discrimination (NPD) model with several representative baselines models in Table 2, where,1) SVM is a widely used baseline to predict the emotion of a post in social media (Yang et al., 2007).2) Ab-dul17 is a standard LSTM model which consist of a LSTM layer and a fully connected layer, and it is modified from the model in Abdul-Mageed and Ungar (2017). The LSTM model yields the stateof-the-art performance on emotion detection in recent researches.3)Vaswani17 is an improved LST-M model with a self-attention mechanism. The self-attention mechanism is used to capture the structural information and has been successfully applied in various natural language processing tasks recently (Cheng et al., 2016;Vaswani et al., 2017) From Table 2, we find that all of the neural models outperform SVM significantly. This indicates that neural models are much more effective than discrete models in emotion detection. In addition, our proposed NPD model outperforms both the standard LSTM model (Abdul17) and the improved LSTM model with self-attention (Vaswani17) significantly. This shows the effectiveness of our proposed NPD model with both adversarial discriminators and attention mechanisms. This also shows the usefulness of personal attributes for emotion detection. Moreover, we find that the performance of Vaswani17 is even lower than the standard L-STM model. This shows that simply integrating a self-attention mechanism may not be able to well capture informative words for emotion detection in social media.

Analysis and Discussion
In this subsection, we analyze the influence of different factors in the proposed NPD model, and give some statistics and examples to illustrate the effectiveness of the proposed NPD model with different personal attributes.

Influence of Personal Attributes
We illustrate the influence of personal attributes in the proposed NPD model in Table 3, where, 1)LSTM-attributes is a LSTM based multi-label classification model, which predicts both the emotion and the attribute labels of each post collectively.2)NPD-gender ablates the location attribute, i.e., only considering the gender attributes in the NPD model.3) NPD-location ablates the gender attribute, i.e., only considering the location attributes in the NPD model.
From the table, we can find that the performance of LSTM-attributes is much lower than L-STM model. This indicates that simple multi-label classification setting is not effective for integrating personal attributes. This may be due to the fact that basic multi-label setting fails to learn the correlationship between the emotions of posts and the personal attributes of the authors well. In addition, both the NPD-gender and the NPD-location perform better than the LSTM model respectively. This shows the effectiveness of the proposed N-PD model with both adversarial and attention networks, and the usefulness of both gender and location attributes. Finally, the proposed NPD model with all the attributes significantly outperforms all the other models. This suggests that we should integrate all the personal attributes for emotion detection in social media.

Influence of Network Structures
After analyzing the influence of different attributes, we analyze the influence of network structures in Table 4, where LSTM-attention ablates the adversarial discriminators, and only utilizes attention mechanisms with a multi-label classification setting, and LSTM-adversarial ablates the attention mechanisms and only utilizes adversarial discriminators for emotion detection. From Table 4, we can see that both the attention mechanisms (LSTM-attention) and the adversarial discriminators (LSTM-adversarial) are effective in emotion detection. Moreover, the adversarial discriminators are much more effective than the attention mechanisms. This shows that the personal attribute discriminator is more important than learning informative attributes-aware words. Moreover, the NPD model obtains the best results by integrating both adversarial discriminators and attention mechanisms.

Statistics
We give the statistics of gender and location to explore the correlationship between emotion and personal attributes.
Distribution of Gender. Figure 2 illustrates the distribution of emotions between genders. Here, the Y-axis is conditional probability of each emotion given gender. From the figure, we can find that women tend to express the sadness emotion, while men tend to express anger emotion. This may be due to the fact that the different personality has different emotion expressions, i.e., sentimentality of the female and the impulsion of the

male.
Distribution of Location. Figure 3 is an example of the distribution of emotions between locations. As discussed in the above section, we use the province of authors as location attributes. Here, the Y-axis represents the conditional probability of each emotion given location. From the figure, we can find that the authors' location can may influence their emotions in many aspects. For example, people tend to express the positive (i.e., happiness) emotion than the negative emotion in Jiangsu. One of the most comfortable and developed regions in China. Due to air pollution and populations, people tend to express the negative emotion than the positive emotion in Beijing. In addition, people in Hong Kong always feel crowding and tend to express the sadness emotion. Finally, people always feel happy and comfortable in Sichuan, well known as "country of paradise" in China.
From the results of statistics, we can see that the personal attributes, like gender, location, can affect the emotion detection. Also, the experimen- None Sadness tal results of personal attribute influence shows the same conclusion.

Case Study
We select three examples from the test set to evaluate the effectiveness of the proposed NPD model for better comparison with the LSTM model in Table 5. In [E3], through "bashi(conformable)" is a strong cue word of the happiness emotion, it is treated as a general word without indicating any location information from the LSTM model. At the same time, the location information (Sichuan Province) determined by the location discriminator and the attention mechanism makes "bashi(conformable)" critical in the NPD model, and these enable the NPD model determines the correct happiness emotion. In [E4] and [E5], both posts contain implicit gender information. For example, "Mr. Mcdreamy" implies the female's affection for the handsome male, and "dysmenorrhea" is a female physiological disease. Without the background of such gender information, it is impossible to infer any potential emotional information from these two words. This explains why the LSTM model fails to detect any emotion from the examples (None of emotion). However, our N-PD model successfully determines the gender of these two posts by the gender discriminator, and improves the weight of these two words for emotion detection by the attention mechanisms. In this way, the emotions of the three examples are correctly detected through the proposed NPD model.

Conclusion
Most of previous studies consider each post individually in emotion detection, one of the most important tasks in sentiment analysis. However, since the posts in social media are generated by users, it is natural that these posts can be connected through authors' personal background attributes. In this paper, we propose a neural personal discrimination model with both adversarial discriminators and attention mechanisms to connect posts with similar personal attributes. Here, the adversarial discriminators are used to determine the personal attributes, and the attention mechanisms are employed to aggregate attributes-aware words. In this way the social correlations between different posts can be captured. The experimental results show the usefulness of the personal attributes and the effectiveness of our proposed neural personal discrimination model in modeling such personal attributes with significant performance improvement over the state-of-the-art baselines.