Not Just Depressed: Bipolar Disorder Prediction on Reddit

Bipolar disorder, an illness characterized by manic and depressive episodes, affects more than 60 million people worldwide. We present a preliminary study on bipolar disorder prediction from user-generated text on Reddit, which relies on users’ self-reported labels. Our benchmark classifiers for bipolar disorder prediction outperform the baselines and reach accuracy and F1-scores of above 86%. Feature analysis shows interesting differences in language use between users with bipolar disorders and the control group, including differences in the use of emotion-expressive words.


Introduction
World Health Organization's 2017 and Wykes et al. (2015) report that up to 27% of adult population in Europe suffer or have suffered from some kind of mental disorder. Unfortunately, as much as 35-50% of those affected go undiagnosed and receive no treatment for their illness. To counter that, the WHO's Mental Health Action Plan's (Saxena et al., 2013) lists as one of its main objectives the gathering of information and evidence on mental conditions. At the same time, analysis of texts produced by authors affected by mental disorders is attracting increased attention in the natural language processing community. The research is geared toward a deeper understanding of mental health and the development of models for early detection of various mental disorders, especially on social networks.
In this paper we focus on bipolar disorder, a complex psychiatric disorder manifested by uncontrolled changes in mood and energy levels. Bipolar disorder is characterized by manic episodes, during which people feel abnormally elevated and energized, and depression episodes, manifested in decreased activity levels and a feeling of hopelessness. The two phases are recur-rent and differ in intensity and duration, greatly affecting the person's capacity to carry out daily tasks. Bipolar disorder affects more than 60 million people, or almost 1% of the world population (Anderson et al., 2012). The suicide rate in patients diagnosed with bipolar disorder is more than 6% (Nordentoft et al., 2011). There is thus a clear need for the development of systems capable of early detection of this illness.
As a first step toward that goal, in this paper we present a preliminary study on bipolar disorder prediction based on user-generated texts on social media. The main problem in detecting mental disorders from user-generated text is the lack of labeled datasets. We follow the recent strand of research (Gkotsis et al., 2016;De Choudhury et al., 2016;Shen and Rudzicz, 2017;Gjurković andŠnajder, 2018) and use Reddit as a rich and diverse source of high-volume data with self-reported labels. Our study consists of three parts. First, we test benchmark models for predicting Reddit users with bipolar disorder. Second, we carry out a feature analysis to determine which psycholinguistic features are good predictors of the disorder. Lastly, acknowledging that emotional swings are the main symptom of the disorder, we analyze the emotion-expressive textual features in bipolar disorder users and the non-bipolar control group of users.

Related Work
Psychologist have long studied the language use in patients with mental disorders, including schizophrenia (Taylor et al., 1994), suicidal tendencies (Thomas and Duszynski, 1985), and depression (Schnurr et al., 1992). Lately, computerbased analysis with LIWC (Linguistic Inquiry and Word Count)  resource was used to extract features for various stud-ies regarding mental health (Pennebaker and King, 1999). For example, Stirman and Pennebaker (2001) found the increased use of the first-person singular pronouns (I, me, my) in poems to be a good predictor of suicidal behavior, while Rude et al. (2004) detected an excessive use of the pronoun I in essays of depressed psychology students. In a recent study, however, Tackman et al. (2018) suggest that first-person singular pronouns may be better viewed as a marker of general distress or negative emotionality rather than as a specific marker of depression.
A number of studies looked into the use of emotion-expressive words. Rude et al. (2004) found that currently depressed students used more negative emotion words than never-depressed students. Halder et al. (2017) tracked linguistic changes of social network users over time to understand the progression of their emotional status. Kramer et al. (2004) found that conversations in bipolar support chat rooms contained more positively valence words and slightly more negatively valenced emotions than casual conversations.
Much recent work has leveraged social media as a source of user-generated text for mental health profiling (Park et al., 2012). Most studies used Twitter data; e.g., De Choudhury et al. (2013) predicted depression in Twitter users, while CLPsych 2015 shared task (Coppersmith et al., 2015b) addressed depression and post-traumatic stress disorder (PTSD). Bipolar disorder on Twitter is usually classified alongside other disorders. E.g., Coppersmith et al. (2014Coppersmith et al. ( , 2015a) achieved a precision of 0.64 at 10% false alarms, while Benton et al. (2017) used multi-task learning and achieved an AUC-score of 0.752.
Reddit has only recently been used as a source for the analysis of mental disorders. Gkotsis et al. (2016) analyzed the language in different subreddits related to mental health, and showed that linguistic features such as vocabulary use and sentence complexity vary across different subreddits. De Choudhury et al. (2016) explored the methods for automatic detection of individuals which could transit from mental health discourse to suicidal ideas. Shen and Rudzicz (2017) used topic modeling, LIWC, and language models to predict whether a Reddit post is related to anxiety. To our knowledge, there is no previous study on the analysis of bipolar disorder of Reddit users.

Dataset
Reddit is one of the largest social media sites in the world, with more than 85 million unique visitors per month. 1 Reddit is suitable for our study not only because of its vast volume, but also because it offers user anonimity and covers a wide range of topics. Registered users can anonymously discuss various topics on more than 1 million subpages, called "subreddits". A considerable number of subreddits is dedicated to mental health in general, and to bipolar disorder in particular. All comments between 2005 and 2018 (more than 3 billion) are available as a Reddit dump database via Google Big Query, which we used to obtain the data.

Bipolar disorder users.
To obtain a sample of users with bipolar disorder, we first retrieved all subreddits related to the disorder, i.e., bipolar, bipolar2, BipolarReddit, BipolarSOs, bipolarart, as well as the more generic mentalhealth subreddit. Then, following Beller et al. (2014) and Coppersmith et al. (2014), we looked for selfreported bipolar users by searching in the user's comments for the string "I am diagnosed with bipolar" and its paraphrased versions. In addition, following Gjurković andŠnajder (2018), we inspect users' flairs -short descriptive texts that the users can set for certain subreddits to appear next to their names. While a flair is not mandatory, we found that many users with bipolar disorder do use flairs on mental health subreddits to indicate their disorder.
The acquisition procedure yielded a set of 4,619 unique users with self-reported bipolar disorder. The users generated around 5 million comments, totaling more than 163 million tokens. To get an estimate of labeling quality, we randomly sampled 250 users and inspected their labels and text. As we found no false positives (i.e., all 250 users report on being diagnosed a bipolar disorder), we gauge that the dataset is of high precision. The true precision of the dataset depends, of course, on the veracity of the self-reported diagnosis.
To make the subsequent analysis reliable and unbiased, we decided to additionally prune the dataset as follows. To mitigate the topic bias, we removed all comments by bipolar disorder users on bipolar subreddits, as well as on the general mental health subreddit. Additionally, any com-ment on any subreddit that mentions the words bipolar or BP (case insensitive) was also excluded. Finally, to increase the reliability, we retained in our dataset only the users who, after pruning, have at least 1000 word remaining. The final number of users in our dataset is 3,488.
Control group. The control group was sampled from the general Reddit community, serving as a representative of the mentally healthy population. To ensure that the topics discussed by the control group match those of bipolar disorder users, we sampled users that post in subreddits often visited by bipolar disorder users (i.e., subreddits where posting frequency of bipolar disorder users was above the average). To also ensure that the control group is representative of the mentally healthy Reddit population, we removed all users with more than 1000 words on mental health related subreddits. As before, we only retained users that had more than 1000 words in all of their comments. The final number of users in the control group is 3,931, which is close to the number of bipolar users, with the purpose of having a balanced dataset. The total number of comments is about 20 million, which is four times more than for the bipolar disorder users.
Topic categories. Topic of discussion may affect the language use, including the stylometric variables (Mikros and Argiri, 2007), which means that topic distribution may act as a confounder in our analysis. To minimize this effect, we split the dataset into nine topic categories, each consisting of a handful of subreddits on a similar topic. Table 1 shows the breakdown of the number of unique users from both groups across topic categories. AskReddit is the biggest subreddit and not bound to any particular topic; in this category, we also add other subreddits covering a wide range of topics, such as CasualConversation and Showerthoughts. To be included in a category, the user must have had at least 1000 words on subreddits from that category.

Bipolar Disorder Prediction
Feature extraction. For each user, we extracted three kinds of features: (1) psycholinguistic features, (2) lexical features, and (3) Reddit user features. For the psycholinguistic features, in line with much previous work, we used LIWC (Pennebaker et al., 2015), a widely used tool in  predicting mental health, which classifies the words into dictionary-defined categories. We extracted 93 features, including syntactic features (e.g., pronouns, articles), topical features (e.g., work, friends), and psychological features (e.g., emotions, social context). In addition to LIWC, we used Empath (Fast et al., 2016), which is similar to LIWC but categorizes the words using similarities based on neural embeddings. We used the 200 predefined and manually curated categories, which Fast et al. have found to be highly correlated with LIWC categories (r=0.906). The lexical features are the tf-idf weighted bagof-words, stemmed using Porter stemmer from NLTK (Bird et al., 2009). Finally, Reddit user features are meant to model user's interaction patterns. These include post-comment ratio, the number of gilded posts (posts awarded with money by other users), average controversiality, the average difference between ups and downs on user's comments and the time intervals between comments (the mean, median, selected percentiles, and the mode). 2 Experimental setup. We frame bipolar disorder prediction as a binary classification task, using the above-defined features and three classifiers: a support vector machine (SVM), logistic regression, and random forest ensemble (RF). We evaluated our models and tune the hyperparameters using 10×5 nested cross validation. To mitigate for class imbalance, we use class weighting when training classifiers on the dataset split into categories. As baselines, we used a majority class   classifier (MCC) for evaluating the accuracy score and a random classifier with class priors estimated from the training set for evaluating the F1-score (F1-score is undefined for MCC). For implementation, we used Scikit-learn (Pedregosa et al., 2011). We use a two-sided t-test for all statistical significance tests and test at p<0.001 level.
Results. Table 2 shows the accuracy and F1scores for the different classifiers. Random forest classifier achieved the best results, with accuracy of 0.869 and an F1-score of 0.863. All models outperform the baseline accuracies of 0.529 and 0.546, and the baseline F1-score of 0.453. Table 3 shows the accuracy of the models using different feature sets. We observe two trends: Empath generally performs worse than LIWC, and tf-idf features perform better than LIWC. However, looking at the scores of the random forest classifier as the best model, we find that there is no significant difference between LIWC and Empath. Tf-idf does perform significantly different than both LIWC and Empath, while all features combined (including Reddit user features) do not differ from tf-idf alone. We speculate that tf-idf might yield better results in this case because essentially all the words that LIWC and Empath detect also exist as individual features in tf-idf. Similarly, Coppersmith et al. (2014) achieve better results using language models than LIWC, arguing that many relevant text signals go undetected by LIWC.
Finally, Table 4 shows the accuracy across topic categories for the MCC baseline and the best classifier in each category. Our models outperform MCC in all categories, and the differences are significant for all categories except Sports.  Table 4: Accuracy of the MCC baseline and our models across topic categories. Accuracies marked with "*" are significantly different from the baseline.

Feature Analysis
We analyze the merit of the psycholinguistic features using a two-sided t-test, with the null hypothesis of no difference in feature values between users with bipolar disorder and control users. The lower the p-value, the higher the merit. We analyzed the features separately on the entire dataset and on the dataset split into categories.
Between-group analysis. Ten LIWC features with the lowest p-value on the entire dataset are presented in Table 5, together with feature value means for the two groups. The values in the table are percentages of words in text from each category, except Authentic and Clout, which are "summary variables" devised by Pennebaker et al. (2015). Personal pronouns, especially the pronoun I, are used more often by bipolar disorder users. This observation is in accord with past studies on language of depressed people, which we can compare to because a bipolar depressive episode is almost identical to major depression (Anderson et al., 2012). Coppersmith et al. (2014) also report a significant difference in the use of I between Twitter users with bipolar disorder and the control group. The Authentic feature of Newman et al. (2003) reflects the authenticity of the author's text: a higher value of this feature in bipolar disorder users may perhaps be explained by them speaking about personal issues more sincerely, though further research would be required to confirm this. We also observe a higher use of words associated with feelings (feel), health, and biological processes (bio). Kacewicz et al. (2014) argue that pronoun use reflects standings in social hierarchies, expressed through Clout and power features: we observe a lower use of these words in users with bipolar disorder, which might indicate   Per-category analysis. Significant features in specific categories follow a pattern similar to the features on the complete dataset. Pronoun I is statistically significant in all of the categories, as well as features Clout and Authentic.

Emotion Analysis
As emotional swings are of the main symptoms of bipolar disorder, we expect that there will be a difference in the use of emotion words between users with bipolar disorder and the control group. We report the results for LIWC, as Empath gave very similar results.
Between-group differences. Table 6 shows means and standard deviations of the values of six LIWC emotion categories (posemo, negemo, anxiety, anger, sad, and affect) for the users with bipolar disorder and the control group. Users with bipolar disorder use significantly more words linked with general affect. Furthermore, we observe increased use of words related to sadness, while the control group uses more anger-related words. The results for sadness are in line with  Table 7: Averages of standard deviations in the use of emotion-expressive words for the two groups. All differences are significant except for "anger". previous work on depressed authors. In addition, we find significant use of anxiety words in users with bipolar disorder, similar to the findings of Coppersmith et al. (2014). Surprisingly, we find that users with bipolar disorder use more positive emotion words than the control group. This is in contrast to findings of Rude et al. (2004), who report no statistical significance in the use of positive emotion words in depressed authors. We speculate that this difference may be due to the characteristics of manic episodes, which do not occur in clinically depressed people.
Per-category differences. The difference between users with bipolar disorders and the control group in AskReddit, Animals, Movies/music/books, and Sex and relationships categories is significant in words related to sadness, anxiety, anger, and positive emotions. However, there is no significant difference in positive and negative emotions in categories Jobs and Politics, while Sports, Gaming, and Religion differ only in positive emotions.
User-level variance. We hypothesize that, due to the alternation of manic and depressive episodes, users with bipolar disorder will have a higher variance across time in the use of emotion words than users from control group. To verify this, we randomly sampled 100 users with bipolar disorder and 100 control users from all the users in our dataset with more than 100K words and split their comments into monthly chunks. For each of the 200 users, we calculated the LIWC features for each month and computed their standard deviations. We then measured the difference between standard deviations for the two groups. Table 7 shows the results. We find that bipolar users have significantly more variance in most emotionexpressive words, which confirms our hypothesis.
We presented a preliminary study on bipolar disorder prediction from user comments on Reddit. Our classifiers outperform the baselines and reach accuracy and F1-scores of above 86%. Feature analysis suggests that users with bipolar disorder use more first-person pronouns and words associated with feelings. They also use more affective words, words related to sadness and anxiety, but also more positive words, which may be explained by the alternating episodes. There is also a higher variance in emotion words across time in users with bipolar disorder. Future work might look into the linguistic differences in manic and depressive episodes, and propose models for predicting them.