In your wildest dreams: the language and psychological features of dreams

In this paper, we provide the first quantified exploration of the structure of the language of dreams, their linguistic style and emotional content. We present a collection of digital dream logs as a viable corpus for the growing study of mental health through the lens of language, complementary to the work done examining more traditional social media. This paper is largely exploratory in nature to lay the groundwork for subsequent research in mental health, rather than optimizing a particular text classification task.


Introduction
Despite a prominent role in the origin of psychology (Freud, 2013;Jung, 2002), scientific research about the meaning and value of dreams has waned in the 21st century. Cartwright (2008), for one, has argued that dreams lost their prominence in the latter half of the 20th century as psychology attempted to become a more empirical science focused on observable behavior and mental activity and less reliant on memory. In the last decade, the distinctive brain patterns of dreaming have become more identifiable (Siclari et al., 2017) and research has amassed on the impact of dreams on waking life with links to mood (Cartwright, 2013), relationship health (Selterman et al., 2012) and decision-making (Morewedge and Norton, 2009). While scientists debate the purpose of dreams (Barrett, 2007;Cartwright et al., 2006), dreams continue to be a universal and time intensive experience across humanity.
Until recently, dreams remained an offline phenomena, qualitatively separate from other forms of social interaction via social media. Online platforms such as Facebook and Twitter are fer-tile grounds for research in social science (Wilson et al., 2012;boyd and Ellison, 2007) and more recently, in mental health via computational approaches in text analysis (Pennebaker et al., 2015;De Choudhury et al., 2013;Coppersmith et al., 2014) and network structure (Christakis and Fowler, 2014). However, dreams have remained as private, albeit important conversational currency (Wax, 2004). When dreams are studied, they are gathered from sleep labs, psychotherapeutic and inpatient settings, personal dream journals and occasionally classroom settings where "most recent dreams" and "most vivid dreams" are collected (Domhoff, 2000). The recent development of a social network dedicated to dreams offers scientists unprecedented access to the language of dreams at scale, collected with consistent methodology. Understanding the structure of this large corpus of dreams gives us access to previously unobservable mental activity and enables future research to identify abnormal patterns in themes, emotional tone, and styles associated with mental health diagnoses and therapeutic outcomes.
We begin with a brief overview of the impetus for this work and a discussion of related work in the intersection of dreams and text analysis. We then provide details on the corpus of dreams and discuss our results organized around three research questions. The paper concludes with implications for subsequent research on dreams, both to better understand nuances in the medium, and for mental health purposes.

Previous research on dream content and text analysis
Dreams are challenging to understand. Dreams are a diverse medium that vary from being perceptual or cognitive, from involving simple settings to complicated narratives, which may be similar or dissimilar to waking life (Siclari et al., 2017). Analyzing them is similarly complex; researchers have put extensive effort into the development of systems to score their global content, specific themes, psychological intensity, and theoretical underpinnings (Schredl, 2010). Different researchers, research goals, collection vehicles and analytic techniques present issues in replication, reliability and the validity of standardized methods for the content analysis of dreams. The Hall-Van de Castle coding system is the most comprehensive protocol for content analysis of dreams, with eight main categories and over 300 sub scales in the dream manual (Hall and Castle, 1966). Categories include: Physical surroundings (e.g. indoor, outdoor), Characters (e.g. persons, animals), Social interactions (e.g. friendly vs. aggressive), Activities (e.g. communication, thinking), Achievement outcomes (e.g. success, failure), Environmental press (e.g. fortune, misfortune), Emotions (e.g. anger, happiness), Descriptive elements (e.g. size, age, color), and Theoretical scales (e.g. castration anxiety, regression).
A handful of studies have used automated text analysis to explore dreams, specifically to discern differences from waking narratives and identify the relationship between dream language and personality (Hawkins and Boyd, in press), for automated sentiment detection (Nadeau et al., 2006) and to distinguish linguistic features from personal narratives (Hendrickx et al., 2016). To our knowledge, no study has examined as large a sample of dreams from a naturalistic setting (neurotypical research participants, online social context) across methodologies for psychological purposes (i.e. non classification/ non hypothesis driven).
Hawkins and Boyd (in press) analyze dreams across three samples of recent dream reports, two undergraduate and one sample from Amazon's Mechanical Turk 1 . Using Linguistic Inquiry and Word Count , they find a distinctive pattern for recent dreams that differs from the base rate norms for waking narratives, specifically characterized by more function words, common words, pronouns, personal pronouns, first person pronouns, past tense verbs, and more use of words describing leisure activities; less use of present tense and future tense verbs, causation words, second person pronouns, numbers, swear words, and assent words. They did not 1 Mechanical Turk users do short human intelligence tasks for small payments. For more see http://www.mturk. com. find consistent relationships between dream language features and personality. Hawkins & Boyd's research paves the way for understanding how and why a dream narrative differs from a waking narrative and what these differences mean from a psychological perspective. For example, what does it mean for a dream to have more function words than a waking narrative? What is the relationship between the content of dreams and the more "invisible" word differences (pronouns, prepositions, articles)? Nadeau et al. (2006) also used LIWC on dreams to gauge the efficacy of automated sentiment analysis to bypass human judges or dreamer estimates of emotion. Comparing the performance of LIWC, the General Inquirer, a weighted lexicon (HM) and standard bag of words approach, they find machine learning outperforms human judgments -and specifically demonstrate that LIWC and the GI have the best features for sentiment classification. While a step in a promising direction, Nadeau et al.'s sample was small (100 dreams from 29 individuals) and sentiment was classified on a limited negative scale (4-class, from neutral to highly negative) omitting nuance in the purported emotional content of dreams, c.f. Cartwright (2013). Hendrickx et al. (2016) looked at the distinguishing features from dreams as compared to personal narratives (diary entries from Reddit and personal stories from Prosebox) via text classification, topic modeling and text coherence. The authors find dreams can be classified with near perfect precision based on the presence of uncertainty markers (somebody, remember, somewhere, recall) and descriptions of scenes (setting, riding, building, swimming, table, room), with lower discourse coherence. Personal narrative markers (non-dream) include time (2014, today, tonight, yesterday, day, months) and conversational expressions (please, :), ?, thanks). Hendrickx et al. also applied LDA topic modeling to explore the main themes in dreams as compared to personal narratives validating the classification results. Dream topics span everyday activities, setting descriptions, and uncertainty expressions. The Hendrickx et al. research is notable in its exploration of male vs. female topic distributions within dreams in addition to comparisons across corpus type (dream vs. personal narrative) though does not explore the relationship between topic and emotion and excludes the analysis of function words, which we believe is a critical piece in understanding the psychological value of dreams and dreamers, given previous findings .

Relevant research on mental health and text analysis
Computational text analysis allows for assessment of larger samples and proactive identification of mental illness. Language in social media can indicate the likelihood a user self-reports a particular mental disorder , or has received a mental health diagnosis (De Choudhury et al., 2013). The language of online dreams has yet to be analyzed relative to mental health conditions, however prior laboratory research suggests that dream content may differ between clinical conditions. We refer the reader to Skancke et al.'s comprehensive review of dream content grouped by clinical disorder (Skancke et al., 2014). In brief, patterns in emotional tone, themes, and actor focus have been associated with diagnoses of mood and anxiety disorders, schizophrenia, personality, and eating disorders. Though, it remains unclear whether dream content can distinguish between clinical disorders. Nightmares are especially relevant to mental health, featuring as a diagnostic symptom for posttraumatic stress disorder (Campbell and Germain, 2016), and a common correlate with schizophrenia (Okorome Mume, 2009), depression and anxiety (Swart et al., 2013), and personality disorders (Schredl et al., 2012). Nightmare frequency and intensity have been positively correlated with incidence of suicidal thoughts and behaviors (Bernert et al., 2005), suggesting nightmares could be a near-term risk factor to assess during crisis. In sum, analysis of dream topics and emotional tone may provide some insight to the mental health of the dreamer.

Data
Dreams were collected from DreamsCloud, a social network for sharing dreams. DreamsCloud is available to the public; those who register for the site are informed that their data can be used for research purposes. DreamsCloud is moderated by professional dream reflectors who comment on dreams, in addition to the broader community of registered users who can also "like" and comment on dreams.
DreamsCloud has the largest available digital collection of dreams with over 119k dreams from 73k users and an overall community of over 300k registered users. Visitors to the site come from 234 countries (according to Google Analytics) and have shared dreams in 8 languages. DreamsCloud differs from online dream banks in that dreams are voluntarily shared for social purposes rather than collections from research studies.
A random sample of 10k English dreams over 100 words from September 1, 2013 through December 31, 2016 was used in this study. Data cleansing removed 322 dreams due to incorrectly classified language (Spanish), lyrics or news content copied from the Internet by the user, and duplicated data. The remaining sample included 9,678 dreams. No additional data about the gender, age, name, or ethnicity of the participants are included in our study. Only the original dream texts are analyzed. While DreamsCloud has comments and conversations around many of these dreams, we put off analysis of commentary for subsequent research and focus directly on the firstperson accounts of dreams. The average length of dreams in the sample is 208 words (SD = 116.7). Data is organized by an encrypted alphanumeric Dreamer ID and a unique, encrypted alphanumeric Dream ID for each dream logged.

Ethical considerations
While community members agree to Terms of Service that explicitly state their content is owned by the company and will be used for research purposes, the nature of the content is very intimate. Because of the unknowns about the science behind why we dream, what our dreams mean, how dreams are related to life events, there is less of a stigma about sharing otherwise private or bizarre information. The site refers to dream-sharing as an "anonymous-as-you-want" activity. Although the analyses in this paper are structural and aggregate in nature, deeper analysis of this data could raise privacy concerns as well as questions about appropriate intervention. Our hope is that additional research in this area will shed light on the relationship between dreaming and waking life to help address these questions.

Results
Three approaches are used to examine the dream narratives: content analysis using an LDA topic model (Blei et al., 2003), analysis of linguistic style via function words using LIWC (Pennebaker et al., 2015), and categorization of emotions using an emotion classification model (Coppersmith et al., 2016).

The topical structure of dreams
Topic models are statistical models which discover topics in a corpus. Topic modeling is especially useful in large data, where it is too cumbersome to extract the topics manually. Due to the large volume of dreams in our corpus and the lack of prior knowledge about their subjects, we follow other content-based studies in employing topic modeling to understand the content of the dreams (Kireyev et al., 2009;Yin et al., 2011;Chae et al., 2012;Mitchell et al., 2015;Hendrickx et al., 2016). We analyzed the topical structure of the dream corpus using a popular topic modeling algorithm, latent Dirichlet allocation (LDA) (Blei et al., 2003). LDA is an algorithm for the automated discovery of topics. LDA treats documents as a mixture of topics, and topics as a mixture of words. Each topic discovered by LDA is represented by a probability distribution which conveys the affinity for a given word to that particular topic.
We used the LDA implementation available in the Mallet package (McCallum, 2002). We converted the text to lower case and, because the topic analysis is focused on content of dream narratives, excluded all function words and punctuation marks. (Function and style will be considered in the following section.) No reduction in inflection (i.e. stemming, lemmatization) was performed to satisfy the goals of exploring the nuance of dream narratives as a medium and subsequently make inferences about the psychological orientation of the authors (see section 3.2). Further, in order to make more valid comparisons to the existing literature based on human coding, it is important to understand how distributions of singular vs. plural nouns and present vs. past tense verbs, for example are distributed topically. We selected 25 topics for LDA to infer and used 2000 iterations of Gibbs sampling to fit the model. The number of topics was informed by maximizing the computed information gain of the resulting feature sets, while maintaining a reasonable training time.
LDA provides insightful information about the topics in the corpus. However, interpreting the 'aboutness' of a topic based on a list of words requires human judgment based on term frequency, exclusivity, meaning, and subjective inference. Interestingly, we found 23 of 25 topics to be interpretable based on semantic meaning and 2 (Topics 17 and 22) which appeared more syntactically related. Most heavily weighted topic words are quoted in results tables, and the full 25-topic distribution with manual labeling is included in Appendix A. Note that the topic number is randomly assigned by LDA and does not indicate anything meaningful like rank, weight, or importance.
Although we utilize a 25-topic solution as compared to Hendrickx et al.'s 50-topic solution, we see some consistency in the topics identified as characteristic of dream narratives. Specifically, we see similar support for the continuity hypothesis of dreams -that dreams are a continuation of waking life activities -in topics such as Topic 19 about School, Topic 12 about food and eating, and Topic 15 about driving and cars. Similar to their research, we also see clustering of present tense verbs in Topic 0, a water topic (11), and home settings topic (5). We see an almost exact replication of their "dreaming in general," in our Topic 18. Comprehensive comparisons in distributions or characteristic words are not possible with the data their published research makes available.
In inspecting the topical distribution and noting the support for the continuity hypothesis, what also stands out is the lack of support for the 'dreams-as-psychotic-state' hypothesis. Beginning with Freud and Jung, researchers have drawn similarities between dreaming and psychosis. These similarities range from phenomenological to neurobiological, qualitatively manifested as a loosening of associations, incongruity and bizarreness of personal experience, and distortion of time and space parameters (Scarone et al., 2008). Reviewing the content of our 25-topic solution, we see no reason to interpret the clustering of words within any given topic as incongruous nor do we detect support for the content to be evaluated as "bizarre" (Hobson et al., 1987). The topics instead appear closely aligned with reality, reflective or overt (actions) and covert (thoughts) behaviors and demonstrate semantic congruity within topic. However, an automated approach to coding as subjective a construct as bizarreness demands inspection beyond content words alone.
LDA is an effective means to understand the distribution of content words in a given corpus. Importantly, it was developed for the purpose of dimensionality reduction -document summarization and information retrieval (Blei et al., 2003). Some of the assumptions that enable the algorithms behind topic models, such as the exclusion of words that have no content relevance (e.g. function words), leave room for additional methods to explore the psychological meaning of a given document, the author's mindset, and emotions.

The linguistic style of dreams
Recent research on language from a psychological perspective demonstrates that function word use reflects and is a reliable marker of personality and a range of social and psychological processes, cognitive thinking styles and psychological states (Pennebaker, 2011). Pennebaker proposes that function words are the infrastructure for thought and perspective: they connect (e.g. conjunctions, auxiliary verbs), shape (e.g. pronouns) and organize (e.g. articles, prepositions) content. Content is important in dreams, and often metaphorical (Lakoff, 1993). The style in which we remember and share our dreams can give important clues to how we make sense of our dreams, and in turn, ourselves. Said another way, our goals in this paper are not just to explore the stuff that dreams are made of but the style of dreams as a reflection of the dreamers' psychological states. With multiple lenses on the data, we can obtain an enhanced picture of the psychological value of the corpus.
LIWC categorizes the words in a given text into approximately 80 variables. Variables represent the proportion of words in a given document (i.e. dream) that correspond to a lexicon composed of different categories of words, including function words (pronouns, prepositions), affect words (positive emotion, anxiety), and content words (money, religion, leisure activities). We reduced the window of interest in LIWC categories to function words, affect, and cognitive processes, as justified by what remains from the LDA analysis (e.g. functions words) and comparisons to results from the empirical literature described thus far (Hawkins and Boyd, in press;Nadeau et al., 2006). Table 1 shows the means and SDs for all LIWC categories within the Linguistic Processes dictionaries with Cognitive, Social and Affective Processes added. Unweighted means from the aggregated sample of expressive writing in Pennebaker et al. (2015) are provided for context.
As compared to the base rates from expressive writing (Pennebaker et al., 2015), a dream narrative comes across as a first person (1st person pronouns) account of a past event (past tense) with particular attention to people (family, friends, women, and men), objects (articles), locations (prepositions) and what is seen, heard, and felt (perceptual processes) more than known or understood (cognitive processes).
Low cognitive processes (M = 9.29; SD = 3.48) would suggest dreamers are not on a search for meaning in sharing their dreams, however it is unclear if this is a case of displaced cognitive processing due to the more dominant perceptual experience of dreams. Previous research indicates that narrative coherence has an inverse relationship with cognitive processing words (Klein and Boals, 2010;Boals et al., 2011). Boals et al. (2011) show that cognitive process words are related to sense making as a process which occurs prior to the development of a narrative (sense making as an outcome). This might suggest that dreamers do not tend to be caught up in why they had a given dream as much as explaining what happened. In other words, dreams are shared as complete stories. A dream narrative's low proportion of emotion words (Mean Affect = 3.42, SD= 1.90) are unexpected given recent research on the emotion regulatory function of dreams and call for additional investigation, which we address below. One possibility is the sensitivity of a lexicon-based instrument to the way in which emotions are expressed in dream narratives. In general, our findings are consistent with Hawkins and Boyd (in press), despite differences in the collection vehicle (recall: Hawkins and Boyd use the 'most recent dream' and 'most vivid dream' paradigm) and previous version of LIWC (2007 vs. 2015).

How is language style related to the content of dreams?
To explore the relationship between dream topic and language style, we focus on function words only: pronouns, prepositions, articles, auxiliary verbs, and negations. In particular, we use an index composed of the proportions of these classes of words called the Categorical Dynamic Index (CDI; Pennebaker et al. 2014 The CDI is a simple unit-weighted computation which adds the proportions of articles and prepositions and subtracts personal pronouns, impersonal pronouns, auxiliary verbs, conjunctions, adverbs and negations. It has been shown to be a reliable marker of cognitive style which we use to understand differences in the experience of various topics in dreams. Being categorical versus dynamic are different ways of sense-making. One of the goals of our research is to understand how people use "the dream" as a medium on the path to self insight and social connection. In the most basic sense, do people share dreams about certain topics as a narrative personal experiences indicating changes over time? Do certain topics lend themselves to a more distant style-stories of what hap-pened to whom with precise descriptions of events and goals? The top five Categorical dream topics and top five Dynamic topics are depicted in Table 2. Topics that are the most categorical are primarily marked by physical environments: trees, sky, house, beach, road. Dynamic dream narratives are characterized by intimate relationships (baby, mom, boyfriend, sister) and experiences (remember, time). The CDI acts a shortcut to identify those dreams that are experienced as a narrative, potentially offering cues to the role of the dreamer as the main character, a distinguishing factor in dreams of healthy controls as compared to psychiatric patient samples (Skancke et al., 2014). Additionally, this shortcut points to a style of dream that would be difficult to discern with a topical lens only; that is, interpersonal situations with multiple characters and complex relationships. Interestingly, Cartwright et al. (1984) find that complex dreams containing multiple characters and shifts of scenes were one marker of depression remission in their five month longitudinal REM tracking study. Appendix B includes two samples of dreams with high and low CDI scores.

The emotional landscape of dreams
One of the goals of this paper is to investigate how emotions are revealed in dreams, which emotions, and how they vary with the topics that emerge. One prominent hypothesis in dream research posits that the function of dreams is to help regulate negative emotion by "intervening" between waking emotional concerns and post sleep mood (Cartwright, 2008). Much of the literature points to a central role for emotions in dreams, yet there are inconsistencies in the frequencies of the emotional array detected and their valance. The inconsistencies are dependent on a similar variety of reasons to those cited above which make standardized dream content analysis challenging, with the added challenge that make emotions difficult to detect and discern in the broader computer science literature (Sikka et al., 2014;Schredl and Doll, 1998). For example, Merritt et al. (1994) tested a small student population (n=20) and found that there are an average of 3.6 emotions per dream with 95% of dreams having at least one emotion, with fear being the most pervasive. This is directionally consistent with Hall and Castle (1966) who find negative emotions to be more prominent, however the frequencies vary. Sikka et al. (2014) find consistent differences in the external judgments of emotions in dreams as compared to self ratings. The predicted labels of each dream narrative should not be taken as a definitive representation of the overall emotion of that narrative (a difficult task for even human annotators to accomplish consistently; see Purver and Battersby 2012). Instead, these results should be viewed as an additional feature of each narrative, able to be evaluated automatically and quickly to gain insight and explore broader trends.
In our exploration of language style with a lexicon-based approach, LIWC detected a low proportion of affect (Mean Affect = 3.42, SD= 1.90). To assess the emotional content of dreams in an unsupervised manner (i.e., without annotating each narrative manually), we turn to a model for classifying emotional content from text. (We briefly summarize here, but for complete details, see Coppersmith et al. 2016.) A series of character language models (one for each of anger, fear, joy, sadness, surprise, and no emotion) are trained on a large corpus of Twitter data with an included emotional hashtag, e.g., "#anger". Tweets containing indications of sarcasm were removed. Tweets were labeled by the emotional hashtag contained, and then that hashtag was removed for training the model, thus learning what words might contribute to something being tagged "#anger". A two-step semi-supervised process is used to produce the noemotion model, since most tweets with emotional content are not labeled with #[emotion]. (We also scored each narrative using the Mohammad and Turney 2013 NRC Emotional Lexicon and opted for the character language models for greater vocabulary coverage and possible explicit "no emotion" label.) We apply each of the emotion character language models (CLM) to each of the dream narratives, producing a probability that each narrative's content results from each emotion's CLM. We then label that narrative with the maximumprobability emotion.
Concretely, we expect dreams to have a mixture of emotions, and this technique is likely to surface the dominant emotion in the dream (as measured by the number of words used that indicate that emotion). Percent breakdown of predicted emotion labels were as follows: sadness, 31.6%; fear, 21.0%; surprise, 19.9%; joy, 18.7%; anger, 8.7%; no emotion, 0.0%. Only two narratives out of almost 10,000 were labeled no-emotion, and only 6 had the noemotion label above 10% of the estimated emotional content within a dream; see caveats of this approach below.
To continue to deepen our understanding of the psychological value of the corpus and gain insight on the relationship between dream content and emotion, we correlate each emotion's CLM probability with each of the 25 LDA topics. Table  3 shows the most positively-correlated topic and most negatively-correlated topic for each emotion. Consistent with previous research (Merritt et al., 1994;Hall and Castle, 1966), we demonstrate emotions present in all dreams, with more negative than positive emotion: 61.3% negative emotions (sadness, fear, anger), and sadness as the dominant emotion. Drawbacks of this approach of relying on self-stated emotional content tags are outlined in Coppersmith et al. (2016). In short, even given the two-step semi-supervised method of obtaining the most emotionally neutral tweets possible to use as no-emotion exemplars, it is likely that some nontrivial percentage of the tweets contain significant emotional content. In addition, even in a single tweet, emotional content is often mixed, and the training method employed allows for only one label that may not be sufficiently descriptive. Perhaps the largest caveat of these results comes from the mismatch between the Twitter data the model was trained on and the dream data it is applied to here. The featurization and parameters of the model are optimized for Twitter messages that are constrained to 140 characters, while the dream narratives are 1,047 characters on average (SD 716). Content varies as well; the dream narratives, at least in theory, have a consistent purpose and theme: recounting the content of a dream. Content of tweets is incredibly varied, from a segment of a story, meant to be read in the context of additional tweets; to a single hyperlink, perhaps with a few words of commentary; to a single emoji repeated 140 times. Future research directions include training a semi-supervised emotion classifier that includes the dream narratives to generalize better across domains.

Conclusion
Our paper presents three types of analyses on an innovative corpus. First we explored the content of dreams with LDA topic modeling. The results demonstrate topics easily interpreted by a human including everyday activity, dreaming itself, and themes common in the dream literature (teeth, animals, flying). These results are consistent with the limited amount of existing research in this area. Our second lens on the data using LIWC portrays dreams, in general, as first person accounts of past events with disproportionate social references and abstract descriptions of settings. Dreams tend to focus on perceptual processes more than cognitive processes. However, there are qualitative distinctions in the content of dreams such that certain topics are experienced as dynamic and others, more categorical. Lastly, we further explored the emotional content in dreams with an unsupervised approach. Our results indicate that emotion is present in dreams and is disproportionately negative, with the most common emotion being sadness. With a sensitive tool, emotion can help disambiguate content in dreams that would otherwise be lumped together, for example dreams about friends, romance, and love which show a complex configuration of emotion.
One major question that underlies this paper is whether we are investigating how we dream or how we story and share our dreams. In future research, we hope to compare dream data to other corpora to better understand how this way of knowing a person, through their dreams, is related to other forms of self expression. Identifying a reasonable comparative dataset for dreams collected from a social network is challenging. This data set is unique in its length (e.g. 140 character Tweets vs. 210 word dreams), content (intimate and quotidian content), and purpose (these dreams are shared for social connection and interaction) making most social media, which would otherwise present the appropriate scale and date range, a poor fit.
Interpreting topics in dreams is extra challenging because there is no ground truth. Language style and emotional classification enhance our understanding of topics and the mindset of a given dreamer, but it is as of yet unclear whether there are individual differences in the way dreams are experienced, or whether dreams are 'victims' of our memories and are yet another corpus to explore the same individual differences we might see in conscious thought. Continued research on dreams over time, dreamers across media and a variety of facets within dream data as compared to different outcome measures (personality, etc.) will help address this concern.
Another limitation in our research is lack of information about potential skew in the data. For example, there may be biases in who shares dreams and why; who knows about and has access to the social network. We also did not have access to ground truth of user mental health information, so we did not analyze dream content relative to clinical disorders. At this time, site behavior is unreliable at the level of dream reporting to tell us whether there is any systematic bias in who provides dreams. Future studies will certainly explore demographic variables including age, sex, race, socioeconomic status, education level, in addition to variables related to belief in dreams, dream frequency and other psychological attributes which would make people more or less likely to share their dreams. Additionally, future research could investigate associations between mental disorder diagnoses and the content of dreams. This is a preliminary investigation into a vast data set with many additional variables to explore.
Much like this field has used social media data as a lens to study the conscious waking perceptions, emotions, and thought processes of individuals with mental health conditions, we see this as a complementary set of quantifiable signals related to the person's unconscious processes. While more traditional social media data is a convolution of the person's internal state and the world they inhabit, we see this dream data as a convolution of their dreaming self, as recalled and recorded by their waking self. Considered in context of the Fluid Vulnerability Theory, dream content could serve as one of many dynamic, near-term risk factors for detecting transitions into psychological crisis (Rudd, 2006). Given the richness of social media data for uncovering unknown signals related to mental health, we strongly suspect this data may hold similar and complementary power.
In sum, our paper offers preliminary evidence that the language of dreams can be an insightful contribution to human-centric big data, as a means for an enhanced understanding of human behavior and cognition alongside standard psychological means and modern neuroimaging. Paired with large scale analysis of social media language, Internet behavior, and wearable sensor information that predict mental health, the language of dreams could serve as an additional data source from which to evaluate mental health by digital life traces.