A Corpus Analysis of Social Connections and Social Isolation in Adolescents Suffering from Depressive Disorders

Social connection and social isolation are associated with depressive symptoms, particularly in adolescents and young adults, but how these concepts are documented in clinical notes is unknown. This pilot study aimed to identify the topics relevant to social connection and isolation by analyzing 145 clinical notes from patients with depression diagnosis. We found that providers, including physicians, nurses, social workers, and psychologists, document descriptions of both social connection and social isolation.


Introduction
Social connection and social isolation are associated with health problems, including mental health issues (Matthews et al., 2015;Williams & Galliher, 2006). The Institute of Medicine (IOM) recommends healthcare providers collect social relationship information from individuals using NHANES III Social Connection and Isolation Questions (IOM, 2015). For example, this survey inquiries about how many times per week an individual speaks on the telephone with family, friends, or neighbors, gets together with friends or others, attends church or religious services, or attends meetings of the clubs or organizations. While these questions focus on the quantity of the social interactions, the survey fails to assess the quality of social relationship and interaction. The electronic health record (EHR) can be a rich source of clinical information. However, it is not clear whether the EHR contains adequate documentation to support a detailed assessment of social connection and social isolation. In this study, our goals are to understand how social connection and social isolation are documented in the clinical notes for patients diagnosed with major depressive disorder: (1) which providers more frequently document social connection and social isolation information?
(2) what types of clinical notes most likely contain descriptions of social connection and social isolation? and (3) what types of social connection and social isolation are documented in clinical notes?

Method
In this Institutional Review Board (IRB)-approved pilot study, we selected a cohort of adolescent patients ages 12-25 (mean=17.14, standard devia-tion=3.61) admitted to a major healthcare system between 2013-2016 with at least one visit coded with International Classification of Disease, version 9 (ICD-9) billing codes for major depressive disorder; resulting in 181,880 in-patient clinical notes. From this set, we originally planned to randomly sample 100 notes based on the distribution of notes generated by provider type: social worker (33.5%), therapist (28.8%), physician (22.3), psychologist (12.0%), intern (1.8%), pharmacist (0.5%), dietitian (0.4%), nurse (0.1%), and other providers (0.6%). It resulted that there was only one note individually represented to five provider types (i.e., intern, pharmacist, dietitian, nurse, and other providers). We, therefore, randomly selected additional 9 notes to supplement the sample of notes for those five provider types. Totally, 145 notes were used in this pilot study.
Social isolation (SI) is a lack of contact and engagement between oneself and society (Cacioppo & Cacioppo, 2014;Nicholson, 2012;Zavaleta, Samuel, & Mills, 2014). There are two types of social isolation: objective isolation, such as absence or limited number of meaningful social interactions; and subjective isolation where an individual reports feeling socially isolated or loneliness (Cacioppo & Cacioppo, 2014;Zavaleta et al., 2014). Social isolation has been associated with depression (Matthews et al., 2015;Tiwari & Ruhela, 2012).

Corpus Annotation
We developed an initial codebook based on the NHANES III Questions for social connections. For social isolation, we . Other codes Then, Two annotators who are registered nurses with clinical experienced, including taking care of depressive patients, reviewed ten notes and developed a coding schema. Then, each individually coded another ten notes, and inter-annotator agreement (IAA) was calculated. The IAA was high for both SC (observed agreement: 0.974; Cohen's kappa: 0.80) and SI (observed agreement: 0.997, Cohen's kappa: 0.90); hence, both annotators continued independently annotating mentions of SC and SI from the remaining 135 notes. The annotation outcome was reviewed together and any discrepancy discussed. To be explicit about the codes for SC and SI, we used the concepts from NHANES III as subtypes of SC. We identified subtypes of SI from review of the clinical notes and based on the literature. Qualitative data analysis software, NVivo (version 11), was used for this corpus analysis.

Corpus Analysis
For the corpus analysis, we (1) described the subtypes of SC and SI, (2) determined the distribution of SC and SI mentions by provider type, and (3) determined the distribution of subtypes of SC and SI across clinical notes.

Subtypes of SC and SI
We report the distribution of SC and SI subtypes based on the number of clinical notes and the number of mentions ( Seven SC subtypes were observed: family or relatives, school activity, friend, marital status, social-cultural, spiritual activities, and employment. They included activities or experiences with others: engaging in spiritual, academic, cultural, or work activities and committing to a personal relationship status. For example, "patient stated her parents and family are her biggest support." Eight SI subtypes were observed: restriction from contact with others, being asked to leave others or groups, distancing self from desired relationships, isolation, not being understood, lack of meaningful social institutions, feeling loneliness, and lack of meaningful social relationship. These subtypes can be divided into objective isolation (i.e., restriction from contact with others, being asked to leave others or groups, distancing self from desired relationships, and isolation) and subjective isolation (i.e., not being understood, lack of meaningful social institutions, feeling loneliness, and lack of meaningful social relationship) based on whether the mention related to self-expression. For example, "patient mentioned that Mom doesn't understand symptoms of depression and thinks I am lazy." The most frequent subtypes of SC were: family or relatives, school activity, and friend. The most frequent subtypes of SI were: restriction from contact with others, being asked to leave others or groups, distancing self from desired relationships, and isolation (Table 1).

Provider Types
In Table 2, we report the distribution of notes containing one or more mentions of SC and SI by provider type. The highest frequencies of SC and SI mentions were written by physicians and social workers.

Clinical Note Types
Among 145 notes, more than 20 different note types were observed. The majority of notes were: behavioral health group notes (n=41, 28.3%), unspecified due to lack of the note title (n=28, 19.3%), psychiatric attending daily progress notes (n=17, 11.7%), psychology progress notes (n=12, 8.3.7%), nutrition reassessments (n=9, 6.2%), and progress notes (n=9, 6.2%). Some note types are written by multiple providers. For example, social workers, psychologists, or therapists can document behavioral health group notes. Similarly, a provider could be the author of multiple note types. For example, a social worker can document behavioral health group notes, behavior health social work notes, or discharge notes. The detailed distribution of each SC or SI subtype by note types is presented in Table 3.  Table 3. Distribution of note type by SC and SI subtypes.

Discussion
We conducted a corpus analysis to characterize the documentation of SC and SI mentions in clinical notes from patients diagnosed as depression. About a third of notes contain only mentions of SC; in contrast to 9% of notes with only mentions of SI. There are two possible explanations. First, the subtypes of SC were named or grouped by social entity because SC has the meaning of belonging to certain social groups (Haslam et al., 2015). This may be easier to identify from the notes while the subtypes of SI were described specific situations which may require more interpretation or judgment form the annotators. Second, we did not double annotate the mentions but some SC mentions with negative meanings could possibly be interpreted as SI. For example, the SC mention, "Patient's mom states that my son has had difficulty his entire life making friends", implying that mom has been paying attention about her son's friendship (a form of SC); however, this mention could be also annotated as Lack of meaningful social relationship of SI. Therefore, we plan to update the annotation protocol for the double annotation mention when it is needed.
Mentions of SC often include interactions and relationships between the patient and other individuals, the most frequent of which describe family or relatives. In this context, most SC mentions describe receiving support from a parent or sibling or perhaps missing loved ones who live at a distance. School activities are one of the most annotated mentions; this could be because most of the patients were school age. School activities mentions include attending school (middle school through college) and living away from home (dorms).
Friends are also often reported as a source of connection including descriptions of spending leisure time with a close other or having a roommate at home.
However, patients also report SC difficulties such as making friends, desiring a relationship, experiencing jealousy when not receiving attention of others, and ending close relationships. Marital status mentions were consistently reporting single status; however, this is not surprising given the age of our study population. Spiritual activity was not always a source of connection. For example, one patient reports not identifying with family religious values. More informative descriptions of SC include loving to learn at school and reporting high grades in classes, but also include patient's accounts of dealing with school stresses (bullying) and being expelled from school. Patients also report social-culture as a reason for a lack of connection including descriptions of ethnicity, language barriers, and moving cities.
Although not as frequent as SC, SI mentions were observed. Common themes of SI mentions include general struggles with isolation as well as particular types of isolation including verbal isolation, e.g., being asked to leave others or groups ("getting kicked out of the house or dorms"), physical isolation e.g., restricting from contact with others i.e., avoiding others, being placed in time out, having phone privileges revoked, and distancing self from desired relationships i.e., voluntarily removing oneself from the group, asking others to leave, and refusing to talk with others. Patients report a lack of meaningful social group e.g., unable to find meaningful work and a lack of meaningful social relationships e.g., difficulty establishing relationships outside of family. Implications and reasons for SI include feeling loneliness and not feeling understood by family e.g., "doesn't understand their illness or listen to them".
The notes containing the highest frequencies of SC and SI mentions were written by physicians, social workers, psychologist, and nurses. This suggests that future efforts could be focused on specific providers' notes.

Limitations and Future Work
This pilot work has limitations. We only annotated 145 clinical notes and new information about SC and SI may emerge with continued annotation efforts on a larger sample. Therefore, we plan to continue the annotation work until there is no new information identified. The patients were adolescents or young adults with depression; therefore, the findings may not generalize to other patient populations or clinical problems.

Conclusion
This study is the first study to explore SC and SI from EHR clinical notes and a precursor to more computational work for extracting SC and SI information from the notes. We found that SC and SI information documented in the notes and can be reliably identified with human review suggesting the content may be amenable to more automated methods (natural language processing). We are actively developing a linguistic model to support SC and SI information extraction and qualification of the relationship of SC and SI information as this relates to a patient's mental health status and outcomes of depression treatment.