Assessing the Efficacy of Clinical Sentiment Analysis and Topic Extraction in Psychiatric Readmission Risk Prediction

Predicting which patients are more likely to be readmitted to a hospital within 30 days after discharge is a valuable piece of information in clinical decision-making. Building a successful readmission risk classifier based on the content of Electronic Health Records (EHRs) has proved, however, to be a challenging task. Previously explored features include mainly structured information, such as sociodemographic data, comorbidity codes and physiological variables. In this paper we assess incorporating additional clinically interpretable NLP-based features such as topic extraction and clinical sentiment analysis to predict early readmission risk in psychiatry patients.


Introduction and Related Work
Psychotic disorders affect approximately 2.5-4% of the population (Perälä et al., 2007) (Bogren et al., 2009). They are one of the leading causes of disability worldwide (Vos et al., 2015) and are a frequent cause of inpatient readmission after discharge (Wiersma et al., 1998). Readmissions are disruptive for patients and families, and are a key driver of rising healthcare costs (Mangalore and Knapp, 2007) (Wu et al., 2005). Assessing readmission risk is therefore critically needed, as it can help inform the selection of treatment interventions and implement preventive measures.
Predicting hospital readmission risk is, however, a complex endeavour across all medical fields. Prior work in readmission risk prediction has used structured data (such as medical comorbidity, prior hospitalizations, sociodemographic factors, functional status, physiological variables, etc) extracted from patients' charts (Kansagara et al., 2011). NLP-based prediction models that extract unstructured data from EHR have also been developed with some success in other medical fields (Murff et al., 2011). In Psychiatry, due to the unique characteristics of medical record content (highly varied and context-sensitive vocabulary, abundance of multiword expressions, etc), NLP-based approaches have seldom been applied (Vigod et al., 2015;Tulloch et al., 2016;Greenwald et al., 2017) and strategies to study readmission risk factors primarily rely on clinical observation and manual review (Olfson et al., 1999) (Lorine et al., 2015), which is effort-intensive, and does not scale well.
In this paper we aim to assess the suitability of using NLP-based features like clinical sentiment analysis and topic extraction to predict 30day readmission risk in psychiatry patients. We begin by describing the EHR corpus that was created using in-house data to train and evaluate our models. We then present the NLP pipeline for feature extraction that was used to parse the EHRs in our corpus. Finally, we compare the performances of our model when using only structured clinical variables and when incorporating features derived from free-text narratives.

Data
The corpus consists of a collection of 2,346 clinical notes (admission notes, progress notes, and discharge summaries), which amounts to 2,372,323 tokens in total (an average of 1,011 tokens per note). All the notes were written in English and extracted from the EHRs of 183 psychosis patients from McLean Psychiatric Hospital in Belmont, MA, all of whom had in their history at least one instance of 30-day readmission.
The age of the patients ranged from 20 to 67 (mean = 26.65, standard deviation = 8.73). 51% of the patients were male. The number of admis-sions per patient ranged from 2 to 21 (mean = 4, standard deviation = 2.85). Each admission contained on average 4.25 notes and 4,298 tokens. In total, the corpus contains 552 admissions, and 280 of those (50%) resulted in early readmissions.

Feature Extraction
The readmission risk prediction task was performed at the admission level. An admission consists of a collection of all the clinical notes for a given patient written by medical personnel between inpatient admission and discharge. Every admission was labeled as either 'readmitted' (i.e. the patient was readmitted within the next 30 days of discharge) or 'not readmitted'. Therefore, the classification task consists of creating a single feature representation of all the clinical notes belonging to one admission, plus the past medical history and demographic information of the patient, and establishing whether that admission will be followed by a 30-day readmission or not.
45 clinically interpretable features per admission were extracted as inputs to the readmission risk classifier. These features can be grouped into three categories (See Table 1 for complete list of features): -Sociodemographics: gender, age, marital status, etc.
-Past medical history: number of previous admissions, history of suicidality, average length of stay (up until that admission), etc.
-Information from the current admission: length of stay (LOS), suicidal risk, number and length of notes, time of discharge, evaluation scores, etc.
The Current Admission feature group has the most number of features, with 29 features included in this group alone. These features can be further stratified into two groups: 'structured' clinical features and 'unstructured' clinical features.

Structured Features
Structure features are features that were identified on the EHR using regular expression matching and include rating scores that have been reported in the psychiatric literature as correlated with increased readmission risk, such as Global Assessment of Functioning, Insight and Compliance:

Global Assessment of Functioning (GAF):
The psychosocial functioning of the patient ranging from 100 (extremely high functioning) to 1 (severely impaired) (AAS, 2011).
Insight: The degree to which the patient recognizes and accepts his/her illness (either Good, Fair or Poor).
Compliance: The ability of the patient to comply with medication and to follow medical advice (either Yes, Partial, or None).
These features are widely-used in clinical practice and evaluate the general state and prognosis of the patient during the patient's evaluation.

Unstructured Features
Unstructured features aim to capture the state of the patient in relation to seven risk factor domains (Appearance, Thought Process, Thought Content, Interpersonal, Substance Use, Occupation, and Mood) from the free-text narratives on the EHR. These seven domains have been identified as associated with readmission risk in prior work (Holderness et al., 2018).
These unstructured features include: 1) the relative number of sentences in the admission notes that involve each risk factor domain (out of total number of sentences within the admission) and 2) clinical sentiment scores for each of these risk factor domains, i.e. sentiment scores that evaluate the patients psychosocial functioning level (positive, negative, or neutral) with respect to each of these risk factor domain.
These sentiment scores were automatically obtained through the topic extraction and sentiment analysis pipeline introduced in our prior work (Holderness et al., 2019) and pretrained on inhouse psychiatric EHR text. In our paper we also showed that this automatic pipeline achieves reasonably strong F-scores, with an overall performance of 0.828 F1 for the topic extraction component and 0.5 F1 on the clinical sentiment component.
The clinical sentiment scores are computed for every note in the admission. Figure 1 details the data analysis pipeline that is employed for the feature extraction.
First, a multilayer perceptron (MLP) classifier is trained on EHR sentences (8,000,000 sentences consisting of 340,000,000 tokens) that are extracted from the Research Patient Data Registry (RPDR), a centralized regional data repository of clinical data from all institutions in the Partners HealthCare network. These sentences are automatically identified and labeled for their respec- tive risk factor domain(s) by using a lexicon of clinician identified domain-related keywords and multiword expressions, and thus require no manual annotation. The sentences are vectorized using the Universal Sentence Encoder (USE), a transformer attention network pretrained on a large volume of general-domain web data and optimized for greater-than-word length sequences.
Sentences that are marked for one or more of the seven risk factor domains are then passed to a suite of seven clinical sentiment MLP classifiers (one for each risk factor domain) that are trained on a corpus of 3,500 EHR sentences (63,127 tokens) labeled by a team of three clinicians involved in this project. To prevent overfitting to this small amount of training data, the models are designed to be more generalizable through the use of two hidden layers and a dropout rate (Srivastava et al., 2014) of 0.75.
The outputs of each clinical sentiment model are then averaged across notes to create a single value for each risk factor domain that corresponds to the patient's level of functioning on a -1 to 1 scale (see Figure 2).

Experiments and Results
We tested six different classification models: Stochastic Gradient Descent, Logistic Regression, C-Support Vector, Decision Tree, Random Forest, and MLP. All of them were implemented and fine-tuned using the scikit-learn machine learning toolkit (Pedregosa et al., 2011). Because an accurate readmission risk prediction model is designed to be used to inform treatment decisions, it is important in adopting a model architecture that is clinically interpretable and allows for an analysis of the specific contribution of each feature in the input. As such, we include a Random Forest classifier, which we also found to have the best performance out of the six models.
To systematically evaluate the importance of the clinical sentiment values extracted from the free text in EHRs, we first build a baseline model using the structured features, which are similar to prior studies on readmission risk prediction (Kansagara et al., 2011). We then compare two models incorporating the unstructured features. In the "Baseline+Domain Sentences" model, we consider whether adding the counts of sentences per EHR that involve each of the seven risk factor domains as identified by our topic extraction model improved the model performance. In the "Baseline+Clinical Sentiment" model, we evaluate whether adding clinical sentiment scores for each risk factor domain improved the model performance. We also experimented with combining both sets of features and found no additional improvement.
Each model configuration was trained and evaluated 100 times and the features with the highest importance for each iteration were recorded. To further fine-tune our models, we also perform three-fold cross-validated recursive feature elimination 30 times on each of the three configurations and report the performances of the models with the best performing feature sets. These can be found in Table 2.
Our baseline results show that the model trained using only the structured features produce equivalent performances as reported by prior models for readmission risk prediction across all healthcare fields (Artetxe et al., 2018). The two models that were trained using unstructured features produced better results and both outperform the baseline results. The "Baseline+Clinical Sentiment" model produced the best results, resulting in an F1 of

Conclusions
We have introduced and assessed the efficacy of adding NLP-based features like topic extraction and clinical sentiment features to traditional structured-feature based classification models for early readmission prediction in psychiatry patients. The approach we have introduced is a hybrid machine learning approach that combines deep learning techniques with linear methods to ensure clinical interpretability of the prediction model.
Results show not only that both the number of sentences per risk domain and the clinical sentiment analysis scores outperform the structuredfeature baseline and contribute significantly to better classification results, but also that the clinical sentiment features produce the highest results in all evaluation metrics (F1 = 0.72).
These results suggest that clinical sentiment features for each of seven risk domains extracted from free-text narratives further enhance early readmission prediction. In addition, combining state-of-art MLP methods has a potential utility in generating clinical meaningful features that can be be used in downstream linear models with interpretable and transparent results. In future work, we intend to increase the size of the EHR corpus, increase the demographic spread of patients, and extract new features based on clinical expertise to increase our model performances. Additionally, we intend to continue our clinical sentiment annotation project from (Holderness et al., 2019) to increase the accuracy of that portion of our NLP pipeline.