Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast

We examine communications in a social network to study user emotional contrast – the propensity of users to express different emotions than those expressed by their neighbors. Our analysis is based on a large Twitter dataset, consisting of the tweets of 123,513 users from the USA and Canada. Focusing on Ekman’s basic emotions, we analyze differences between the emotional tone expressed by these users and their neighbors of different types, and correlate these differences with perceived user demographics. We demonstrate that many perceived demographic traits correlate with the emotional contrast between users and their neighbors. Unlike other approaches on inferring user attributes that rely solely on user communications, we explore the network structure and show that it is possible to accurately predict a range of perceived demographic traits based solely on the emotions emanating from users and their neighbors.


Introduction
The explosion of social media services like Twitter, Google+ and Facebook have led to a growing application potential for personalization in human computer systems such as personalized intelligent user interfaces, recommendation systems, and targeted advertising. Researchers have started mining these massive volumes of personalized and diverse data produced in public social media with the goal of learning about their demographics (Burger et al., 2011;Zamal et al., 2012; and personality (Golbeck et al., 2011;Kosinski et al., 2013), 1 lan-guage variation Bamman et al., 2014), 2 likes and interests (Bachrach et al., 2012;Lewenberg et al., 2015), emotions and opinions they express (Bollen et al., 2011b;, their well-being (Schwartz et al., 2013) and their interactions with online environment (Bachrach, 2015;Kalaitzis et al., 2016). The recent study has shown that the environment in a social network has a huge influence on user behavior and the tone of the messages users generate (Coviello et al., 2014;Ferrara and Yang, 2015a).
People vary in the ways they respond to the emotional tone of their environment in a social network. Some people tend to send out messages with a positive emotional tone, while others tend to express more negative emotions such as sadness or fear. Some of us are likely to share peer messages that are angry, whereas others filter out such messages. In this work we focus on the problem of predicting user perceived demographics by examining the emotions expressed by users and their immediate neighbors. We first define the user emotional tone, the environment emotional tone, and the user-environment emotional contrast.
Definition 1 Environment emotional tone is the proportion of tweets with a specific emotion produced by the user's neighbors. For example, if the majority of tweets sent by the user's neighbors express joy, that user has a positive environment. In contrast, a user is in a negative environment if most of his or her neighbors express anger.
Definition 2 User emotional tone is the proportion of tweets with a specific emotion produced by a user. If a user mostly sends sad messages, he generates a sad emotional tone, while a user who mostly sends joyful messages has a joyful tone.
Definition 3 User-environment emotional contrast is a degree to which user emotions differ from the emotions expressed by user neighbors. We say that users express more of an emotion when they express it more frequently than their neighbors, and say they express less of an emotion when they express it less frequently than their environment.
There are two research questions we address in this work. First, we analyze how user demographic traits are predictive of the way they respond to the emotional tone of their environment in a social network. One hypothesis stipulates that the emotional response is a universal human trait, regardless of the specific demographic background (Wierzbicka, 1986;Cuddy et al., 2009). For example, men and women or young and old people should not be different in the way they respond to their emotional environment. An opposite hypothesis is a demographic dependent emotional contrast hypothesis, stipulating that user demographic background is predictive of the emotional contrast with the environment. For example, one might expect users with lower income to express negative emotion even when their environment expresses mostly positive emotions (high degree of emotional contrast), while users with higher income are more likely to express joy even if their environment expresses negative emotions (Kahneman and Deaton, 2010).
We provide an empirical analysis based on a large dataset sampled from a Twitter network, supporting the demographic dependent emotional contrast hypothesis. We show that users predicted to be younger, without kids and with lower income tend to express more sadness compared to their neighbors but older users, with kids and higher income express less; users satisfied with life express less anger whereas users dissatisfied with life express more anger compared to their neighbors; optimists express more joy compared to their environment whereas pessimists express less.
Furthermore, we investigate whether user demographic traits can be predicted from user emotions and user-environment emotional contrast. Earlier work on inferring user demographics has examined methods that use lexical features in social networks to predict demographic traits of the author (Burger et al., 2011;Van Durme, 2012;Conover et al., 2011;Bergsma et al., 2013;Bamman et al., 2014;Ruths et al., 2014;Sap et al., 2014). However, these are simply features of the text a user produces, and make limited use of the social embedding of the user in the network. Only limited amount of work briefly explored the network structure for user profiling (Pennacchiotti and Popescu, 2011a;Filippova, 2012;Zamal et al., 2012;Volkova et al., 2014;Culotta et al., 2015). In contrast, we investigate the predictive value of features that are completely dependent on the network: the emotional contrast between users and their neighbors. We also combine network (context) and text (content) features to further boost the performance of our models.
Our results show that the emotional contrast of users is very informative regarding their demographic traits. Even a very small set of features consisting of the emotional contrast between users and their environment for each of Ekman's six basic emotions and three sentiment types is sufficient to obtain high quality predictions for a range of user attributes.
Carrying out such an analysis requires using a large dataset consisting of many users annotated with a variety of properties, and a large pool of their communications annotated with emotions and sentiments. Creating such a large dataset with the ground truth annotations is extremely costly; user sensitive demographics e.g., income, age is not available for the majority of social media including Twitter. Therefore, we rely our analysis on a large Twitter dataset annotated with demographics and affects using predictive models that can accurately infer user attributes, emotions and sentiments as discussed in Section 3.

Data
User-Neighbor Dataset For the main analysis we collected a sample of U = 10, 741 Twitter users and randomly sampled their neighbors n ∈ N (u) of different types including friendsu follows n (u) , mentions -u mentions n (u) in his or her tweets e.g., @modollar1, and retweets -u retweets n (u) tweets e.g., RT @GYPSY. In total we sampled N = 141, 034 neighbors for U =10, 741  Dataset Annotated with Demographics Unlike Facebook (Bachrach et al., 2012;Kosinski et al., 2013), Twitter profiles do not have personal information attached to the profile e.g., gender, age, education. Collecting self-reports (Burger et al., 2011;Zamal et al., 2012) brings data sampling biases which makes the models trained on selfreported data unusable for predictions of random Twitter users (Cohen and Ruths, 2013;Volkova et al., 2014). Asking social media users to fill personality questionnaires (Kosinski et al., 2013;Schwartz et al., 2013) is time consuming. An alternative way to collect attribute annotations is through crowdsourcing as has been effectively done recently (Flekova et al., 2015;Sloan et al., 2015;Preoiuc-Pietro et al., 2015). Thus, to infer sociodemographic traits for a large set of random Twitter users in our dataset we relied on pre-trained models learned from 5, 000 user profiles annotated via crowdsourcing 4 released by . We annotated 125, 513 user and neighbor profiles with eight sociodemographic traits. We only used a subset of sociodemographic traits from their original study to rely our analysis on models trained on annotations with high or moderate inter-annotator agreement. Additionally, we validated the models learned from the crowdsourced annotations on several public datasets labeled with gender as described in Section 2. Table 2 reports attribute class distributions and the number of profiles annotated.
Validating Crowdsourced Annotations To validate the quality of perceived annotations we applied 4,998 user profiles to classify users from the existing datasets annotated with gender using approaches other than crowdsourcing. We ran experiments across three datasets (including perceived annotations): Burger et al.'s data (Burger et al.,3 Despite the fact that we randomly sample user neighbors, there still might be an overlap between user neighborhoods dictated by the Twitter network design. Users can be reweeted or mentioned if they are in the friend neighborhood R ⊂ F, M ⊂ F . 4 Data collection and perceived attribute annotation details are discussed in  and (Preoiuc-Pietro et al., 2015).   Table 3 presents a cross-dataset comparison results. We consistently used logistic regression with L2 regularization and relied on word ngram features similar to . Accuracies on a diagonal are obtained using 10-fold cross-validation. These results show that textual classifiers trained on perceived annotations have a reasonable agreement with the alternative prediction approaches. This provides another indication that the quality of crowdsourced annotations, at least for gender, is acceptable. There are no publicly available datasets annotated with other attributes from  Emotion Dataset We collected our emotion dataset by bootstrapping noisy hashtag annota- tions for six basic emotions argued by Ekman 11 as have been successfully done before (De Choudhury et al., 2012;. Despite the existing approaches do not disambiguate sarcastic hashtags e.g., It's Monday #joy vs. It's Friday #joy, they still demonstrate that a hashtag is a reasonable representation of real feelings (González-Ibáñez et al., 2011). Moreover, in this work we relied on emotion hashtag synonyms collected from WordNet-Affect (Valitutti, 2004), GoogleSyns and Roget's thesaurus to overweight the sarcasm factor. Overall, we collected T L E = 52, 925 tweets annotated with anger (9.4%), joy (29.3%), fear (17.1%), sadness (7.9%), disgust (24.5%) and surprise (15.6%).

Methodology
Annotating User-Neighbor Data with Sociodemographics and Affects As shown in Figure 1, to perform our analysis we developed three machine learning components. The first component is a user-level demographic classifier Φ A (u), which can examine a set of tweets produced by any Twitter user and output a set of predicted demographic traits for that user, including age, education etc. Each demographic classifier relies on features extracted from user content. The second and third components are tweet-level emotion and sentiment classifiers Φ E (t) and Φ S (t), which can examine any tweet to predict the emotion and sentiment expressed in the tweet.
For inferring user demographics, emotions and sentiments we trained log-linear models with L2 regularization using scikit-learn. 12 Our models 11 We prefer Ekman's emotion classification over others e.g., Plutchik's because we would like to compare the performance of our predictive models to other systems.
12 Scikit-learn toolkit: http://scikit-learn.org/stable/ Email svitlana.volkova@pnnl.gov to get access to pre-trained scikitlearn models and the data. rely on word ngram features extracted from user or neighbor tweets and affect-specific features described below.
Perceived Attribute Classification Quality In Section 2 we compared attribute prediction models trained on crowdsourced data vs. other datasets. We showed that models learned from perceived annotations yield higher or comparable performance using the same features and learning algorithms. Given Twitter data sharing restriction, 13 we could only make an indirect comparison with other existing approaches. We found that our models report higher accuracy compared to the existing approaches for gender: +0.12 (Rao et al., 2010), +0.04 (Zamal et al., 2012); and ethnicity: +0.08 (Bergsma et al., 2013), +0.15 (Pennacchiotti and Popescu, 2011b). 14 For previously unexplored attributes we present the ROC AUC numbers obtained using our log-linear models trained on lexical features estimated using 10-fold c.v. in Table 6.
Affect Classification Quality For emotion and opinion classification we trained tweet-level classifiers using lexical features extracted from tweets annotated with sentiments and six basic emotions. In addition to lexical features we extracted a set of stylistic features including emoticons, elongated words, capitalization, repeated punctuation, number of hashtags and took into account the clauselevel negation (Pang et al., 2002). Unlike other approaches (Wang and Manning, 2012), we observed that adding other linguistic features e.g., higher order ngrams, part-of-speech tags or lexicons did not improve classification performance. We demonstrate our emotion model prediction quality using 10-fold c.v. on our hashtag emotion dataset and compare it to other existing datasets in Table 4. Our results significantly outperform the existing approaches and are comparable with the state-of-the-art system for Twitter sentiment classification (Mohammad et al., 2013;Zhu et al., 2014) (evaluated on the official SemEval-2013 test set our system yields F1 as high as 0.66).
Correlating User-Environment Emotional Contract and Demographics We performed 13 Twitter policy restricts to sharing only tweet IDs or user IDs rather than complete tweets or user profiles. Thus, some profiles may become private or get deleted over time.
14 Other existing work on inferring user attributes rely on classification with different categories or use regression e.g., age (Nguyen et al., 2011), income (Preoiuc-Pietro et al., 2015), and education (Li et al., 2014).  our user-environment emotional contrast analysis on a set of users U and neighbors N , where N (u) are the neighbors of u. For each user we defined a set of incoming T in and outgoing T out tweets. We then classified T in and T out tweets containing a sentiment s ∈ S or emotion e ∈ E, e.g. T in e , T out e and T in s , T out s where E →{anger, joy, fear, surprise, disgust, sad} and S → {positive, negative, neutral}.
We measured the proportion of user's incoming and outgoing tweets containing a certain emotion or sentiment e.g., p in sad = |T in sad |/|T in |. Then, for every user we estimated user-environment emotional contrast using the normalized difference between the incoming p in e and outgoing p out e emotion and sentiment proportions: We estimated user environment emotional tone and user emotional tone from the distributions over the incoming and outgoing affects e.g., We measure similarity between user emotional tone and environment emotional tone via Jensen Shannon Divergence (JSD). It is a symmetric and finite KL divergence that measures the difference between two probability distributions.
where D = 1 2 I(D in ||D out ), I = e D in ln D in D out . Next, we compared emotion and sentiment differences for the groups of users with different demographics A = {a 0 ; a 1 } e.g., a 0 = Male and a 1 = Female using a non-parametric Mann-Whitney U test. For example, we measured the means µ M ale ∆e=joy and µ F emale ∆e=joy within the group of users predicted to be Males or Females, and estimated whether these means are statistically significantly different. Finally, we used logistic regression to infer a variety of attributes for U = 10, 741 users using different features below: • outgoing emotional tone p out e , p out s -the overall emotional profile of a user (regardless the emotions projected in his environment); • user-environment emotional contrast ∆e, ∆s -show whether a certain emotion ∆e or sentiment ∆s is being expressed more or less by the user given the emotions he has been exposed to within his social environment; • lexical features extracted from user contentrepresent the distribution of word unigrams over the vocabulary.

Experimental Results
For sake of brevity we will refer to a user predicted to be male as a male, and a tweet predicted to contain surprise as a simply containing surprise. Despite this needed shorthand it is important to recall that a major contribution of this work is that these results are based on automatically predicted properties, as compared to ground truth. We argue here that while such automatically predicted annotations may be less than perfect at the individual user or tweet level, they provide for meaningful analysis when done on the aggregate.

Similarity between User and Environment Emotional Tones
We report similarities between user emotional tone and environment emotional tone for different groups of Twitter users using Jensen Shannon Divergence defined in the Eq. 2. We present the mean JSD values estimated over users with two contrasting attributes e.g., predicted to be a 0 =Male vs. a 1 =Female in Table 5.  Table 5: Mean Jensen Shannon Divergences (displayed as percentages) between the incoming D in and outgoing D out affects for contrastive attribute values a0 and a1. MannWhitney test results for differences between a0 and a1 JSD values are shown in blue (p-value ≤ 0.01), green (p-value ≤ 0.05), and gray (p-value ≤ 0.1).
In Table 5 user environment emotional tones are estimated over different user-neighbor environments e.g., retweet, friend, and all neighborhoods including user mentions. We found that if user environment emotional tones are estimated from mentioned or retweeted neighbors the JSD values are lower compared to the friend neighbors. It means that users are more emotionally similar to the users they mention or retweet than to their friends (users they follow).
We show that user incoming and outgoing sentiment tones D in s and D out s estimated over all neighbors are significantly different for the majority of attributes except ethnicity. The divergences are consistently pronounced across all neighborhoods for income, age, education, optimism and children attributes (p-value ≤ 0.01). When the incoming and outgoing emotional tones D in e and D out e are estimated over all neighbors, they are significantly different for all attributes except education and life satisfaction.

User-Environment Affect Contrast
Our key findings discussed below confirm the demographic dependent emotional contrast hypothesis. We found that regardless demographics Twitter users tend to express more (U > N ) sadness↑, disgust↑, joy↑ and neutral↑ opinions and express less (U < N ) surprise↓, fear↓, anger↓, positive↓ and negative↓ opinions compared to their neighbors except some exclusions below.
Users predicted to be older and having kids express less sadness whereas younger users and user without kids express more. It is also known as the aging positivity effect recently picked up in social media . It states that older people are happier than younger people (Carstensen and Mikels, 2005). Users predicted to be pessimists express less joy compared to their neighbors whereas optimists express more.
Users predicted to be dissatisfied with life express more anger compared to their environment whereas users predicted to be satisfied with life produce less. Users predicted to be older, with a degree and higher income express neutral opinions compared to their environment whereas users predicted to be younger, with lower income and high school education express more neutral opinions. Users predicted to be male and having kids express more positive opinions compared to their neighbors whereas female users and users without kids express less. We present more detailed analysis on user-environment emotional contrast for different attribute-affect combinations in Figure 2.
Gender Female users have a stronger tendency to express more surprise and fear compared to their environment. They express less sadness compared to male users, supporting the claim that female users are more emotionally driven than male users in social media (Volkova et al., 2013). Male users have a stronger tendency to express more anger compared to female users. Female users tend to express less negative opinions compared to their environment.
Age Younger users express more sadness but older users express similar level of sadness compared to their environment. It is also known as the aging positivity effect recently picked up in social media . It states that older people are happier than younger people (Carstensen and Mikels, 2005). They have a stronger tendency to express less anger but more disgust compared to younger users. Younger users have a stronger tendency to express less fear and negative sentiment compared to older users.
Education Users with a college degree have a weaker tendency to express less sadness but stronger tendency to express more disgust from their environment compared to users with high school education. They have a stronger tendency to express less anger but weaker tendency to express less fear. Users with high school education are likely to express more neutral opinions whereas users with a college degree express less.
Children Users with children have a stronger tendency to express more joy, less surprise and fear from their environment compared to users without children. Users with children express less sadness and less positive opinions whereas users without children express more.
Income Users with higher annual income have a weaker tendency to express more sadness and have a stronger tendency to express more disgust, less anger and fear from their environment. They tend to express less neutral opinions whereas users with lower income express more.
Ethnicity Caucasian users have a stronger tendency to express more sadness and disgust from their environment whereas African American users have a stronger tendency to express more joy and less disgust. African American users have a stronger tendency to express less anger and surprise, but a weaker tendency to express less fear.
Optimism Optimists express more joy from their environment whereas pessimists do not. Instead, pessimists have a stronger tendency to express more sadness and disgust compared to optimists. Optimists tend to express less fear. Pessimists tend to express less positive but more neutral opinions.
Life Satisfaction User-environment emotional contrast for the life satisfaction attribute highly correlates with the optimism attribute. Users dissatisfied with life have a weaker tendency to ex-press more joy but a stronger tendency to express more sadness and disgust. They express more anger whereas users satisfied with life express less anger. Users satisfied with life have a stronger tendency to express less fear but weaker tendency to express less positive and negative opinions.
In addition to our analysis on user-environment emotional contrast and demographics, we discovered which users are more "opinionated" relative to their environment on Twitter. In other words, users in which demographic group amplify less neutral but more subjective tweets e.g., positive, negative. As shown in Figure 2 male users are significantly more opinionated than female users, users with kids > users without kids, users with a college degree users with high school education, older users younger users, users with higher income users with lower income, optimists pessimists, satisfied dissatisfied with life, and African American > Caucasian users.

Inferring User Demographics From User-Environment Emotional Contrast
Our findings in previous sections indicate that predicted demographics correlate with the emotional contrast between users and their environment in social media. We now show that by using user emotional tone and user-environment emotional contrast we can quite accurately predict many demographic properties of the user. Table 6 presents the quality of demographic predictions in terms of the area and the ROC curve based on different feature sets. These results indicate that most user traits can be quite accurately predicted using solely the emotional tone and emotional contrast features of the users. That is, given the emotions expressed by a user, and contrasting these with the emotions expressed by user environment, one can accurately infer many interesting properties of the user without using any additional information. We note that the emotional features have a strong influence on the prediction quality, resulting in significant absolute ROC AUC improvements over the lexical only feature set.
Furthermore, we analyze correlations between users' emotional-contrast features and their demographic traits. We found that differences between users and their environment in sadness, joy, anger and disgust could be used for predicting whether these users have children or not. Similarly, negative and neutral opinions, as opposed to joy, fear  Table 6: Sociodemographic attribute prediction results in ROC AUC using Lexical, EmoSent (user emotional tone + user-environment emotional contrast), and All (EmoSent + Lexical) features extracted from user content. and surprise emotions can be predictive of users with higher education.

Discussion
We examined the expression of emotions in social media, an issue that has also been the focus of recent work which analyzed emotion contagion using a controlled experiment on Facebook (Coviello et al., 2014). That study had important ethical implications, as it involved manipulating the emotional messages users viewed in a controlled way. It is not feasible for an arbitrary researcher to reproduce that experiment, as it was carried on the proprietary Facebook network. Further, the significant criticism of the ethical implications of the experimental design of that study (McNeal, 2014) indicates how problematic it is to carry out research on emotions in social networks using a controlled/interventional technique. Our methodology for studying emotions in social media thus uses an observational method, focusing on Twitter. We collected subjective judgments on a range of previously unexplored user properties, and trained machine learning models to predict those properties for a large sample of Twitter users. We proposed a concrete quantitative definition of the emotional contrast between users and their network environment, based on the emotions emanating from the users versus their neighbors.
We showed that various demographic traits correlate with the emotional contrast between users and their environment, supporting the demographic-dependent emotional contrast hypothesis. We also demonstrated that it is possible to accurately predict many perceived demographic traits of Twitter users based solely on the emotional contrast between them and their neighbors. This suggests that the way in which the emotions we radiate differ from those expressed in our environment reveals a lot about our identity.
We note that our analysis and methodology have several limitations. First, we only study correlations between emotional contrast and demographics. As such we do not make any causal inference regarding these parameters. Second, our labels regarding demographic traits of Twitter users were the result of subjective reports obtained using human annotations -subjective impressions (Flekova et al., 2016) of people rather than the true traits. Finally, we crawled both user and neighbor tweets within a short time frame (less than a week) and made sure that user and neighbor tweets were produced at the same time. Despite these limitations, our results do indicate higher performance compared to earlier work. Due to the large size of our dataset, we believe our findings are correct.
Unlike the existing work, we not only focus on previously unexplored attributes e.g., having children, optimism and life satisfaction but also demonstrate that user attributes can be effectively predicted using emotion and sentiment features in addition to commonly used text features.

Emotion and Opinion Mining in Microblogs
Emotion analysis 15 has been successfully applied to many kinds of informal and short texts including emails, blogs (Kosinski et al., 2013), and news headlines (Strapparava and Mihalcea, 2007), but emotions in social media, including Twitter and Facebook, have only been investigated recently. Researchers have used supervised learning models trained on lexical word ngram features, synsets, 15 EmoTag: http://nil.fdi.ucm.es/index.php?q=node/186 emoticons, topics, and lexicon frameworks to determine which emotions are expressed on Twitter Roberts et al., 2012;Qadir and Riloff, 2013;. In contrast, sentiment classification in social media has been extensively studied (Pang et al., 2002;Pang and Lee, 2008;Pak and Paroubek, 2010;Hassan Saif, Miriam Fernandez and Alani, 2013;Nakov et al., 2013;Zhu et al., 2014).
Emotion Contagion in Social Networks Emotional contagion theory states that emotions and sentiments of two messages posted by friends are more likely to be similar than those of two randomly selected messages (Hatfield and Cacioppo, 1994). There have been recent studies about emotion contagion in massively large social networks (Fan et al., 2013;Ferrara and Yang, 2015b;Bollen et al., 2011a;Ferrara and Yang, 2015a).
Unlike these papers, we do not aim to model the spread of emotions or opinions in a social network. Instead, given both homophilic and assortative properties of a Twitter social network, we study how emotions expressed by user neighbors correlate with user emotions, and whether these correlations depend on user demographic traits.

Summary
We examined a large-scale Twitter dataset to analyze the relation between perceived user demographics and the emotional contrast between users and their neighbors. Our results indicated that many sociodemographic traits correlate with user-environment emotional contrast. Further, we showed that one can accurately predict a wide range of perceived demographics of a user based solely on the emotions expressed by that user and user's social environment.
Our findings may advance the current understanding of social media population, their online behavior and well-being (Nguyen et al., 2015). Our observations can effectively improve personalized intelligent user interfaces in a way that reflects and adapts to user-specific characteristics and emotions. Moreover, our models for predicting user demographics can be effectively used for a variety of downstream NLP tasks e.g., text classification (Hovy, 2015), sentiment analysis (Volkova et al., 2013), paraphrasing , part-of-speech tagging Johannsen et al., 2015) and visual analytics (Dou et al., 2015).