Automatic Detection and Prediction of Psychiatric Hospitalizations From Social Media Posts

We address the problem of predicting psychiatric hospitalizations using linguistic features drawn from social media posts. We formulate this novel task and develop an approach to automatically extract time spans of self-reported psychiatric hospitalizations. Using this dataset, we build predictive models of psychiatric hospitalization, comparing feature sets, user vs. post classification, and comparing model performance using a varying time window of posts. Our best model achieves an F1 of .718 using 7 days of posts. Our results suggest that this is a useful framework for collecting hospitalization data, and that social media data can be leveraged to predict acute psychiatric crises before they occur, potentially saving lives and improving outcomes for individuals with mental illness.


Introduction
Every year, approximately 1% of adults in the United States are hospitalized for psychiatric reasons, including increased suicidality and psychosis (Elflein, 2020). With the global COVID-19 pandemic, hospitalizations due to suicidality are projected to increase substantially , and there is already evidence of the adverse impact of the pandemic on the mental health of individuals around the world (Cullen et al., 2020). Psychiatric hospitalizations typically result from crises among individuals struggling with suicidality and mental illness. The present study aims to predict psychiatric hospitalization due to increased suicidality or a psychotic break before it occurs.
There are several motivations for this research goal. Improving our ability to better predict psychiatric hospitalization helps enable the identification of early warning signs of these crises before they fully develop. Early detection and prediction of acute psychiatric crises is essential for lowering mortality rates and improving overall outcomes for individuals suffering with mental illness. Further, psychiatric hospitalizations place a tremendous burden on limited hospital resources, and involve steep costs for patients as well as taxpayers (Stensland et al., 2012;Owens et al., 2019).
Typically, prediction of psychiatric hospitalization has relied on rich and personalized clinical information for a particular patient. This requirement has limited the size of available datasets, and has also limited the possibility of reaching and helping potential patients who do not have a welldocumented psychiatric medical history. In this work we circumvent this limitation by leveraging social media data to train and evaluate predictive models of psychiatric hospitalization. This is a necessary step towards the ultimate goal of predicting behavioral and cognitive changes that often lead to hospitalization. There is a rich literature of computer scientists, psychologists, and psychiatrists taking advantage of the vast amount of social media data -which includes language data of posts and comments, as well as meta-information such as preferences, engagement patterns, and group membership -to gain insights about mental states and behaviors of people with psychiatric disorders.
Building on this successful line of research, we detect engagement patterns combined with selfdisclosures to identify potential periods of psychiatric hospitalization. We compile a dataset of these periods, or time spans, along with the posts preceding those periods, and conduct machine learning experiments to automatically predict whether a post precedes a hospitalization or not. Our results suggest that this is a potentially useful approach for predicting psychiatric hospitalizations before they occur. This can enable clinicians to mitigate and hopefully prevent a psychotic break or suicide attempt, helping to save patients' lives and improve outcomes.
The rest of this paper is organized as follows: Section 2 reviews related work and Section 3 describes our novel data collection approach. Section 4 presents our experiments to predict psychiatric hospitalizations, and Section 5 provides analyses of the data and the learned models to gain further insights about the dataset and our results. We conclude in Section 6 and discuss ideas for future work.

Related Work
Research over the past decade has supported and validated the use of computational linguistics techniques applied to social media data for predicting and detecting mental illness across a broad range of psychiatric conditions (Guntuku et al., 2017;Wongkoblap et al., 2017). To date, linguistic indicators of psychopathology have been identified for a wide range of psychiatric conditions (Zomick et al., 2019;Coppersmith et al., 2015;Birnbaum et al., 2017;Huang et al., 2017;De Choudhury et al., 2013;Shen and Rudzicz, 2017). Recent work has also looked at detecting and predicting suicidality using linguistic features from social media posts (Du et al., 2018;Coppersmith et al., 2018;Zirikly et al., 2019).
While the majority of past research has compared specific psychiatric conditions with healthy control groups, more recent work has begun analyzing and identifying unique differences and discriminators among psychiatric conditions (Jiang et al., 2020;Cohan et al., 2018a;Coppersmith et al., 2015). As this area progresses, we have begun to investigate whether this technology can be used beyond detection of mental illness for detecting severity of symptomatology and prediction of acute psychiatric episodes that result in hospitalization. This would benefit patients by alerting clinicians to worsening symptoms, allowing for early intervention care and potential mitigation. Relatedly, advancements in machine learning techniques have led to the development of advanced models for predicting psychiatric crises such as increased suicidality and psychotic episodes using a multimodal approach based on clinical data (Koutsouleris et al., 2021). However, to date, these studies have relied exclusively on clinical data and medical data. To our knowledge, this is the first study to leverage a large dataset of publicly available social media posts for predicting psychiatric hospitalization.

Data Collection
In this section we describe the pipeline components of our dataset construction process, in the order in which they are applied. 1 Table 1 presents the overall statistics of our dataset.
Candidates TI SC #Posts 95,904 318 128 7,077 Table 1: Overall dataset statistics, where Candidates are the total number of users we examined, TI corresponds to number of users from which we extracted hospitalization identification with time-span information and SC corresponds to the number of users having posts collected for the 21 days directly before the refined hospitalization span. #Posts are number of posts from these spans in total.

Candidate Collection
We begin data collection by identifying candidate Reddit users who may be at risk for a psychiatric hospitalization. We focus on two user groups: those that self-identify with a psychiatric disorder, and those that self-identify with suicidal ideation or attempted suicide. To identify such users, we leverage subreddits, or forums on Reddit dedicated to specific topics. Following Shing et al. (2018), we collect posts from the r/SuicideWatch (SW) subreddit, and following (Cohan et al., 2018b;Jiang et al., 2020) we collect posts from subreddits related to 8 different mental health conditions: obsessive compulsive disorder (OCD), schizophrenia (SZ), borderline personality disorder (BPD), posttraumatic stress disorder (PTSD), eating disorder (ED), major depression dis-order (MDD), general anxiety disorder (GAD) and bipolar disorder. We then use regular expression matching to extract selfidentification statements from these posts to form our candidate user pool. Our data collection methods yield 69,682 candidates for suicidal risk and 35,606 candidates for mental health conditions.

Hospitalization Time Span Identification
After identifying nearly 100k candidate Reddit users at risk for psychiatric hospitalization, we designed an approach to identify users from that pool that have been hospitalized for psychiatric reasons.
While previous work has shown that regular expression matching alone is able to create high precision mental health datasets (Coppersmith et al., 2014;Cohan et al., 2018b;Jiang et al., 2020), it is far more difficult to automatically construct a dataset with more fine-grained information. MacAvaney et al. (2018) created a dataset of self-disclosures of depression on Reddit, which includes manually annotated temporal information about the diagnosis date. In our case, it is important to not only identify users that self-disclose psychiatric hospitalizations, but also to pinpoint the time span of the hospital stay. There are several challenges associated with this task: First, we need to ensure that the correct time span is identified when a user mentions multiple events in a single post, and avoid identifying a time span that is not associated with the identified hospitalization instance. Second, there are various ways an adverbial phrase of time could be attached to a predicate, making regular expression design difficult. A third challenge is that some time-related words having other common synsets (e.g. "May"). We address the above mentioned problems by (1) sentence-tokenizing the posts and performing all our matching at sentence-level; and (2) running a state-of-the-art semantic role labeling model first to identify the likely span for regular expression matching. Specifically, we only parse the [ARG-TMP] temporal field related to the hospitalization event, identified by the pre-trained SRL model (Shi and Lin, 2019) provided by AllenNLP (Gardner et al., 2018). When the identification is precise to date level we allow ±7 days of flexibility. In total, we extracted 72 hospitalization time spans from the SuicideWatch user group, and 349 time-spans from the psychiatric disorders user group. A clinical psychologist trainee manually reviewed all 421 spans and found that 69.12% of them were clearly cor-rectly identified and relevant hospitalizations, while the other time-spans were not incorrect but simply lacked enough context in the post for confident labeling. This validates our proposed time-span identification approach, and suggests that further context (e.g. other posts in the same thread) may be useful to improve time-span identification.

Span Refinement
We observe that the most common duration of the span identified is one month, and it is desirable to have hospitalization time identified on a more finegrained scale. For example, a user might mention that they were hospitalized "last June," without providing specific start and end dates of their hospital stay. ; Coppersmith et al. (2018) shows that social media provides information in the "clinical whitespace." Inspired by them, we further identify rare media blackout periods in the previously found plausible hospitalization span, and use them as a proxy to a ground truth hospitalization period. To do this, we fit an exponential distribution on users' social media posting activity, and define a rare media blackout period as the time span of inactivity where the occurrence probability is less than a certain threshold r. This process also provides us with other benefits, as we are able to characterize irregularities like throwaway accounts. Figure 1 is an example of such irregularities, where the user became significantly more active after the identified span; therefore we hypothesize that most of their posts would be related to their mental health condition and perhaps their hospitalization experience. In contrast, Figure 2 is an example of users who actively use their social media before and after the hospitalization blackout. We believe these users and their posts are potentially more useful for research, because they include posts on a wide range of topics over long periods of time, both before and after a psychiatric hospitalization. However, in this paper we make no further use of the features other than to select posts that directly precede a blackout period. When multiple rare media blackout periods are found for an identified span, we empirically select the one with the longest overlap with the span.

Prediction of Psychiatric Hospitalization
Having collected a dataset of proposed hospitalization spans and preceding posts, we use our col-  lected dataset to build predictive models of psychiatric hospitalizations. We experiment with two different task formulations: post-level prediction and user-level prediction. Post-level prediction involves a binary classification for each post, determining whether the post is followed by valid hospitalization span or not. User-level prediction classifies a group of posts from a user in a given time window to predict whether the user will be hospitalized. In order to train classification models, we first need to select negative samples as a control group for our experiments. We describe our methods of pairing negative samples in subsection 4.1. We experiment with three set of features: unigram, bigram 2 and LIWC (Pennebaker et al., 2007(Pennebaker et al., , 2015 features. We perform hyper-parameter grid search to optimize performance. For all features we use the Naive-Bayes classifier, as it has been found to perform well on small datasets (NG and Jordan, 2002). We pre-process the text by lower-casing all input posts and, following the guidelines of (Benton et al., 2017), we de-identify posts by anonymizing URLs and replacing usernames with randomly generated strings.

Pairing Negative Samples
To form a challenging prediction task, we compile negative samples for classification by selecting control users from the same candidate pool that the target hospitalization group was selected from. The control users are those who do not have associated hospitalization time spans, but did have similar media blackout periods (described in subsection 3.3).
We group spans by number of post before the span in a prescribed time window of length d days. For each positive span we randomly sample a span from the negative span pool that has a similar number of posts, creating a balanced classification task. Note that we expect this task to be difficult because the control users either self-identified with mental health conditions or posted in the SW subreddit. For post-level classification, we use the same set of posts sampled on the user-level. Table 2 shows mean F-1 scores from crossvalidation on both user-level and post-level tasks.

Classification
In all experiments, we set the span selection probability threshold t = 0.1. For user-level and postlevel performance comparison, we set the inclusion number of days to d = 21.  The best performance of 0.698 F1 is obtained using bigrams for the user-level task. In general, user-level classification results in better F-1 scores, indicating that more context is likely crucial to success in psychiatric hospitalization prediction. Ngram features outperform LIWC features for both tasks, and adding bigram features perform better than unigrams alone. Overall, the model performance with a small amount of data is promising, well above a 50% random baseline.

Performance Over Time
We again run experiments for user-level classification with another more strictly paired control group that satisfies the pairing constraints mentioned in subsection 4.1 for d ∈ {1, 7, 14, 21}. Table 3 shows the performance change as the window length increases. The results suggest that using a wider context is useful in predicting hospitalization blackouts, and the best performance was obtained using unigrams extracted from 7 days of posts. Figure 3 shows the list of most predictive words for the unigram model. We see that many words correspond to time duration (e.g. "week", "month"),  Table 3: F-1 performance with different features on different window lengths medical professions (e.g., "med", "doctor", "hospital") and conversation (e.g., "sorry", "thanks"). We hypothesize that these may correspond to users' frequent online posts seeking advice and describing conditions. Indeed we observe some posts conforming to this pattern through manual examinations.

Conclusion and Future Work
We present a novel social media data collection method for identifying hospitalization time spans and design a novel classification task for predicting psychiatric hospitalizations. We experiment with multiple linguistic feature sets and task formulations, including user-level and post-level classification, as well as varying the time window of posts used. Our results suggest that this is a useful framework for collecting data related to psychiatric hospitalization, and that social media data can be leveraged to predict psychiatric crises before they occur. In our ongoing and future work, we plan to conduct further analysis of the language of pre-hospitalization posts to gain insights about linguistic patterns and changes that occur as the user experiences a psychiatric crisis. We also plan to improve the data collection process to achieve better precision and to expand to a larger scale. We hope that an improved understanding of the linguistic cues that precede psychiatric hospitalizations, as well as improvements in automatic prediction of hospitalizations, will enable interventions that can potentially save lives and improve outcomes for individuals with mental illness.