Linguistic Reflexes of Well-Being and Happiness in Echo

Different theories posit different sources for feelings of well-being and happiness. Appraisal theory grounds our emotional responses in our goals and desires and their fulfillment, or lack of fulfillment. Self-Determination theory posits that the basis for well-being rests on our assessments of our competence, autonomy and social connection. And surveys that measure happiness empirically note that people require their basic needs to be met for food and shelter, but beyond that tend to be happiest when socializing, eating or having sex. We analyze a corpus of private micro-blogs from a well-being application called Echo, where users label each written post about daily events with a happiness score between 1 and 9. Our goal is to ground the linguistic descriptions of events that users experience in theories of well-being and happiness, and then examine the extent to which different theoretical accounts can explain the variance in the happiness scores. We show that recurrent event types, such as obligation and incompetence, which affect people’s feelings of well-being are not captured in current lexical or semantic resources.


Introduction
There has recently been huge interest in wellbeing, with a recent review arguing that psychological well-being plays a causal role in promoting job success, physical health, and long-term relationships (Lyubomirsky et al., 2005;Kahneman, 1999). In this paper we analyze a corpus of private micro-blogs from a well-being application called ECHO, with the aim to detect, understand, and fur-RECORDING (Negative): I have to clean the kitchen since it's my chore this week, but I really don't want to do it! REFLECTION (Positive): I'm glad I did it!! The kitchen was clean and I watched the kardashians while doing it! RECORDING (Positive): I am having a lovely lunch with my two friends. We are eating at Pacific Thai. Tom yuumm!! REFLECTION (Negative): I miss hanging out with friends, I've been so busy lately. Figure 1: RECORDING and REFLECTION of Echo ther advance systems that can improve both short and longer-term issues with well-being.
ECHO initiates user-written reactions to daily events, called RECORDINGS, as well as subsequent REFLECTIONS on those events at points in the future (Isaacs et al., 2013). 1 Each reaction is labelled at the time of recording or reflection by the user, the first-person experiencer, with a happiness rating from 1 and 9. Note that all users' posts and ratings are private, distinguishing this corpus from public sources like LiveJournal, where the content of posts might be influenced by considerations of self-presentation. Figure 1 shows a RECORDING and REFLECTION from two users, after binning the happiness ratings into positive and negative.
Our goal is to ground the linguistic descriptions of events that users experience, such as those in Figure 1, in theories of well-being and happiness. Without such a grounding, it is difficult for the ECHO system to make recommendations to users to improve their well-being, or to explain the relationships between different event types and well-being, or to develop a policy that can do a good job of selecting events for targeted reflection (Konrad et al., 2015;Isaacs et al., 2013). That is, for ECHO's purposes, we need techniques that not only reliably categorize a user's scalar happiness level, but are explanatory with respect to the sources of that happiness level.
There are two principal challenges to this goal. First, different theories posit different sources for feelings of well-being and happiness. Second, the relevant computational resources for sentiment or mood are primarily lexically based, while many of the events can only be characterized well via their compositional semantics (Reschke and Anand, 2011).
Other research also shares our motivation of understanding the relationship between what people say and their levels of happiness and related moods. Mishne (2005) used a corpus of 340,000 posts from Livejournal that were self-annotated with the 40 most common moods. Lexical features alone improved classification accuracy by 6 to 15% over a balanced baseline. These results were then improved considerably (Keshtkar and Inkpen, 2009). Mihalcea and Liu (2006) experimented with the subset of happy/sad posts, and used conditional probability to explore the "happiness factor" of various terms, and the relationship of these terms to well-being categories such as human-centeredness and socialness. Schwartz et al. (2016) extract 5,100 public status updates on Facebook and have Turkers annotate them using Seligman's dimensions for well-being: Positive Emotions, Engagement, Relationships, Meaning, and Accomplish (Seligman et al., 2006;Forgeard et al., 2011). They then predict each dimension with lexical and LDA topic features.
A related line of work builds lexico-semantic resources for sentiment analysis with a focus on how the participants of an event are affected by it. Goyal and Riloff (2013) bootstrap a set of patientpolarity verbs from narratives and Ding and Riloff (2016) extract event-triples from blogs that reliably indicate positive or negative affect on one of the event participants. Reed et al. (2017) take a similar approach. Deng et al. (2013) annotate how participants of an event are affected, and Deng & Wiebe (2014) show that this assists inference about the author's sentiment towards entities or events. Balahur et al. (2012) use the narratives produced by the ISEAR questionnaire (Scherer et al., 1986) for first-person examples of particular emotions ("I felt angry when X and then Y happened") and extract sequences of subject-verbobject triples, which they then annotate for seven basic emotions. Choi & Wiebe (2014) use Word-Net to try to learn similar patterns, and Rupenhofer & Brandes (2015) annotate synsets in Ger-maNet based on an event decomposition framework. Russo et al. (2015) proposed a shared task for recognition of a set of pleasant and unpleasant events from a clinical framework for well-being (MacPhillamy and Lewinsohn, 1982). Work on AFINN, SentiWordNet and the Connotation Lexicon also aim to refine existing sentiment resources to capture more subtle notions of sentiment (Feng et al., 2013;Kang et al., 2014;Baccianella et al., 2010;Nielsen, 2011).
Here we report an exploratory study where we synthesize theoretical constructs associated with well-being and happiness from different sources. We then develop several methods for characterizing events in terms of these theories. We examine the extent to which different theoretical accounts can explain the variance in the happiness scores in ECHO. We show that each theory explains a part of the variance, but that our event characterizations need to be more fine-grained. We show that several recurrent event types which affect people's feelings of well-being, such as OBLIGATION and INCOMPETENCE, are not captured in current lexical or semantic resources.

Background and Motivation
ECHO is designed to encourage users to react to daily events as well as to periodically reflect on past events (Isaacs et al., 2013). Figure 2 depicts the user interface, showing a RECORDING from today, as well as prompts to reflect on events from the past. ECHO has been deployed with 134 users, in three different experiments on well-being (Konrad et al., 2016b,a). The total corpus consists of 10354 posts, where 7573 are RECORDINGS and 2781 are REFLECTIONS. While the corpus could be considered relatively small, these posts provide a window onto users' private thoughts as opposed to what users are willing to make public on social media. In addtion, the annotations for happiness are provided by the user, the first-person experiencer, and not by a third party.
Our aim is to explain users' emotional reactions to different categories of events mentioned in ECHO posts, linking the user reactions directly to theories of well-being as exemplified in Table 1.
Influential accounts such as Appraisal Theory (Scherer et al., 2001(Scherer et al., , 1986Ortony et al., 1990) Table 1. Such mediation arises because emotions have an important adaptive signaling function that serves to motivate future behaviors in relation to those goals. Row 1 provides a description from ECHO of successfully achieving goals. Appraisal theory posits that goal achievement promotes positive affect, which then serves to reinforce the relevant behavior. Row 2 provides an example of failing to achieve an important personal goal, which is posited to promote negative affect, motivating people to modify current behaviors to change that negative outcome.
There are significant critiques of the adaptive goal-based account espoused in Appraisal theory. Appraisal theory focuses on short-term personal goals, but Eudaimonic psychologists instead focus on what determines long-term happiness. Eudaimonic theorists suggest that certain fundamental psychological needs have to be satisfied for people to experience sustained positive long-term emotions. Self-determination theory argues that there are 3 basic psychological needs: AUTONOMY, COMPETENCE and CONNECTION (Deci and Ryan, 2010;Ryan and Deci, 2000;Bandura, 1977). We add these to our inventory in Table 1 in Rows 3 to 8. According to self-determination theory, satisfaction of these basic needs results in positive emotions. Row 3 describes a good day at work. Row 5 describes feeling competent because hard work led to an achievement, and Row 7 describes feeling connected with family. On the other hand, if these basic needs are not satisfied, then negative emotions will regularly arise. For example, obligations to do things one does not feel like doing (Row 4), or a job that does not engage personal decision making or involvement (lack of autonomy) can make one feel unhappy. Similarly, people may feel unhappy due to an experience where the demands of the situation outstrip one's basic abilities, such as doing poorly on a test (lack of competence), as in Row 6. In addition, bad things happening to friends (Row 8) as well as separation from family or friends often reduces happiness (lack of connection).
In addition, there is strong evidence from SAVOURING theory (Jose et al., 2012;Bryant et al., 2011) arguing that people often experience highly positive or negative emotions arising from situations that aren't directly goal-related, and that relate more directly to basic drives (Maslow, 1943;Elson, 2012). For example, experiences such as eating, experiencing nature, sex and physical exercise tend to engender positive emotions, whereas pain, discomfort and inactivity have the opposite effects, and these are documented in results from happiness surveys (Kahneman et al., 2004;Seligman et al., 2006). Thus while experiences such as eating may serve the survival goal of preventing starvation, avoiding starvation is unlikely to be a direct personal goal every time we eat, suggesting that such experiences are not explained by Appraisal theory. Similar arguments have been made by Lewinsohn and colleagues who have shown that encouraging people to engage in certain simple activities (shopping, mowing the lawn, driving, personal hygiene) have quite predictable effects on mood without engaging significant personal goals (MacPhillamy and Lewinsohn, 1982;Lewinsohn et al., 1985;Lewinsohn and Amenson, 1978  We start with the 10354 posts from the ECHO corpus and map happiness scores between [1, 4] to negative, and scores between [6,9] to positive. For posts labelled 5 by the experiencer, we categorize it as negative if its REFLECTION score decreases to lower than 5, and positive if its REFLECTION score increases. We label the rest of the 5s as neutral, and leave them aside. We then have 5997 positive posts and 3573 negative posts. We randomly sample 2868 posts as training data, and 478 as test data. We keep the rest of the 6224 posts untouched for future work. Then we split the posts into sentences. Table 2 shows the splits for each class. We first test the separability of the positive and negative sentences with an SVM classifier from Weka 3.8, using as baselines only unigrams and LIWC (Pennebaker et al., 2001) as features. Results for these baseline classifiers are in Table 3,    illustrating that the positive and negative classes can be separated with F1 above .70, and that both unigrams and LIWC perform worse on the negative class. However, as discussed above, the word level representations of the features in the baselines do not help us with our goal to understand how linguistic descriptions of events that affect wellbeing map onto theoretical constructs. Table 4 and  Table 5 provide the most informative UniGrams  taste, feel, delicious, tasty, sweet, food, coffee, bread, cheese, good, bad, great, better, best, horrible, worst wonderful, weird, nice, relaxing, annoying, interesting, sad, weird enjoyable, comforting, entertaining, unpleasant, hilarious, rest, relaxation, exhilarating, tiring, nicer, disturbing, disappointing, embarrassing, irritating, upsetting, heartbreaking, consoling, tedious, traumatic, chilling, calming, frightening touching, pleasure, satisfying, fascinating, tired, exhausted, sleepy, hungry, nauseated, horny  and LIWC categories. We cannot recommend to an ECHO user that they should for example, try to use the word why less (Row 7) because it is correlated with negative feelings, or try to use less negation (Rows 9 and 10). It is difficult to associate these features with well-being classes. Even in cases where the words seem to be strongly related to a well-being category, a single word typically fails to provide enough information, e.g., "it was fun talking to him" and "worked on a fun project" belong to different well-being classes. Moreover, the mapping of LIWC categories to words are many-to-many, e.g. the "discrep" category contains words related to both Goals and Autonomy. We posit that we need compositional semantic features to ground our a Well-Being classification of events.
We thus explore two different methods for mapping these well-being event categories into lexical descriptions, one of which is top-down and the other which is bottom-up. Our top-down method is based on mapping general event types from FrameNet to the theoretical categories enumerated in Table 1. We take frame specific features for each theoretical category from the lexical units for each frame. For example, GOALS are often dis-cussed in terms of specific frames from the Desiring and the Intentionally act classes, as shown in the first two rows of Table 6.
We show that FrameNet features do provide an interesting level of generalization but much of the compositional semantics of events is still missing from this characterization (Section 4). Thus, our bottom-up method applies the AutoSlog linguistic-pattern learner to induce lexically-grounded predicate patterns from the ECHO data (Section 5). We show how many light verbs acquire a specific semantics with their arguments, and how common events like "Talking" are separated into positive and negative events depending on whether they are "Talking about" or "Talking with". Table 6 provides our posited mapping from frame categories to the appraisal category of GOALS as well as to the eudaimonic categories of AUTON-OMY, COMPETENCE and CONNECTION, and to the hedonic category of SAVOURING. To develop features related to these frame categories, we apply SEMAFOR (Das et al., 2013) to label the ECHO posts with their corresponding frames using FrameNet 1.5 (Baker et al., 2015;Baker, 2014). We partition frame features into subsets corresponding to the different theoretical constructs as defined in Table 6. We acknowledge that our mapping may not be perfect, and that some frames could conceivably be categorized as both goal related and eudaimonic.  We train an SVM with each feature subset, and evaluate the models on our test set, with results in Table 7. The general ALL FRAME feature is also listed for comparison. The .67 F1 of FRAME is slightly lower than LIWC in Table 3, but in our view, more interpretable. In addition, the average count of FRAME features per sentence is an order of magnitude less than LIWC features (hence, much less than unigram features), suggesting the targeted power of these features. See Table 8. We posit that FRAMES are thus more discriminative than LIWC for well-being classes, and that FRAME features are more naturally categorized into wellbeing categories at a semantic level.  The Goals section of Table 7 shows that Appraisal theory does well at predicting positive events, but performs poorly for negative events, primarily due to low recall. All features achieve  good F1 for the positive class, but not the negative class. This is consistent with the results in Table 3. The EUDAIMONIC features include Autonomy & Obligation, Competence and Connection. The SVM trained with just eudaimonic features produces the highest F1 score for the negative class, highlighting the role of eudaimonic related events in negative well-being. See Table 7. The results for an breaking eudaimonic into its constituent categories is in Table 9. The results show that most of our autonomy categories are related to negative autonomy, to obligations that cause feelings of negative well-being. On the other hand, the results indicate that competence and connection play a large role in positive well-being.

Frames and Well-Being
The top 25 most informative frame features are illustrated in Table 10 (out of 639 instantiated in ECHO). These illustrate general events for well-being, but compositional differences, such as "spending my nights by the side of my textbook" and "spending my nights with friends" are not captured. The first "spend (time)" evokes the theoretical construct of obligation, while "spend (time with)" is related to connection.

Linguistic Pattern Learning
We also apply Autoslog-TS, a weakly supervised linguistic-pattern learner as a way of learning some compositional patterns. Autoslog only requires training documents labeled broadly into our two classes of POSITIVE or NEGATIVE. The learner uses a set of syntactic templates to define different types of linguistic expressions. In general, this method tends to produce high precision (and potentially low recall) markers of the particular classes that can seed further hypothesizing.
The left-hand side of Table 11 lists example pattern template and the right-hand side illustrates  a specific lexico-syntactic pattern (in bold) that represents an instantiation of each general pattern template for learning well-being patterns in our data. 2 In order to enable selection of particular patterns, AutoSlog-TS computes statistics on the strength of association of each pattern with each class, i.e. P(POSITIVE | p) and P(NEGATIVE | p), along with the pattern's overall frequency. We define two tuning parameters for each class: θ f , the frequency with which a pattern occurs, θ p , the probability with which a pattern is associated with the given class. AutoSlog lets us systematically explore tradeoffs with precision and recall. Here we select θ f and θ p to optimize F1 on our test set. For more detail, see (Riloff, 1996;Oraby et al., 2015).
Our primary interest here is Autoslog's ability to learn compositional patterns. Autoslog can, in principle, provide three kinds of information: i) it can provide supplement the lexical units for a given frame; ii) it can supplement the frames in a well-being category; and iii) it can reveal reliable markers of mood that well-being categories do not capture. Because our interest in frames is ultimately as a way of relating well-being categories with linguistic signals, we will not distinguish (i) and (ii) here.
Here we discuss all patterns with a θ p > .7 Several lexicosyntactic patterns fit within our wellbeing categories but are not captured by frames, while as expected there are overlaps between FrameNet and Autoslog as well. Examples are listed in Table 12. One large class includes straightforward lexical patterns: FINISHED, FIN-ISH, and FINALLY which we associate with feelings of comptence. Verbal patterns with EAT and ATE indicate savouring, with NOT EAT reliably marking negative sentences. The frames also show many specific types of food (cake), and we use a comprehensive list from DBpedia (Lehmann et al., 2014) to collapse all these to the general type FOOD, allowing us to develop patterns such as MADE FOOD. Autoslog also discovers many patterns syntactically linking content (nouns and verbs) and function words (e.g., prepositions and light verbs). It thus furnishes a ready source for multi-word, partially compositional expressions of positivity or negativity. In what follows, we provide some examples (note that in the patterns below, expressions in brackets are used to indicate expressions not part of the pattern that correlate with it in the data).
There are 262 positive patterns of the form Verb/Noun + "with", e.g.  There are 36 patterns with the string 'go', 12 positive (16 items) and 24 negative (40 items). There are 34 patterns involving the past tense form "went", which reverses the polarity to 25 positive patterns (273 items) and 9 negative (9 items). Across the two versions of the lemma, the positive patterns provide several expressions for savouring (WENT/GO ON BUY) and the negative class has 6 patterns with 'bought' and 16 with 'buy', all emphasizing buying necessities (BUY GROCERIES/TICKET, NEED/WANT BUY, NOT BUY) Thus, even though these expressions all involve the same verbs and prepositions, the surrounding environments, as reflected in the form of the verb, split between positive and negative sentence classes.
There are 73 bigram patterns of the form NEW X, 56 positive (83 items) and 17 negative (21 items). In general, the positive ones describe new objects -SHIRT, SHEETS, COMPUTER, CLOTHES, TEA -and acquaintances (NEW FRIEND), thus encompassing both Connection and possibly Savouring. In contrast, the negative patterns describe changes to routines -HABITS, school QUARTER, PROFESSOR, LIVING [conditions], or SCHEDULE -which are likely to engender a sense of instability, and hence be Eudaimonically negative.
Thus, these patterns illustrate that Autoslog can serve as a high-precision method of building additional patterns -especially compositional onesfor a given well-being category.

Conclusions and Future Work
In this paper, we have advanced a synthetic categorization of the sources for well-being and happiness. We have used a corpus of private microblogs from the ECHO application to explore how well we can map linguistic expressions of wellbeing to this classification. We have shown that  FrameNet provides useful generalizations, while the linguistic pattern learner AutoSlog illustrates the details and challenges of the compositional nature of user's descriptions of their daily experiences. Moreover, we have demonstrated that, independently, each of these methods can produce performance similar to that of conventional lexical methods with a feature space that is smaller, and, in the case of FrameNet features, psychologically grounded. Our Autoslog exploration moreover reveals a way of exploring the space of patterns that our FrameNet mapping has missed. In future work, we aim to automatically combine these two methods and bring the Autoslog patterns under the well-being categorization we have advocated here. We also plan to investigate new models with the untouched 6224 Echo posts, as well as larger public corpus like LiveJournal.
In addition, we plan to explore the source of the fact that there are more positive patterns (both as types and the tokens they capture) than the negative ones, which directly relates to the lower Neg recall for all classifiers we tested. While we could not find any clear reason in our examination of the data, this asymmetry may indicate that markers of negativity are more syntactically distributed than our current list of patterns looks for, or perhaps less linguistically reliable.