Social media data as a lens onto care-seeking behavior among women veterans of the US armed forces

In this article, we examine social media data as a lens onto support-seeking among women veterans of the US armed forces. Social media data hold a great deal of promise as a source of information on needs and support-seeking among individuals who are excluded from or systematically prevented from accessing clinical or other institutions ostensibly designed to support them. We apply natural language processing (NLP) techniques to more than 3 million Tweets collected from 20,000 Twitter users. We find evidence that women veterans are more likely to use social media to seek social and community engagement and to discuss mental health and veterans’ issues significantly more frequently than their male counterparts. By contrast, male veterans tend to use social media to amplify political ideologies or to engage in partisan debate. Our results have implications for how organizations can provide outreach and services to this uniquely vulnerable population, and illustrate the utility of non-traditional observational data sources such as social media to understand the needs of marginalized groups.


Introduction
Women comprise a small but rapidly growing portion of the US military veteran population. Indeed, women are one of the fastest-growing demographics in this population (Danan et al., 2017), and saw a two-fold increase between 1988 and 2008 (Manning, 2008). This trend is reflected in the use of Department of Veterans Affairs (VA) services: since 2000, the number of women seeking care at the VA has increased from around 150,000 to half a million. The rapid increase in the number of women veterans, coupled with increases in the rates at which these women use veteran healthcare resources, represents a profoundly significant development for veteran service organizations (VSOs) and veteran healthcare providers. While women veterans face many of the same challenges as their male counterparts during and after deployment, there are also notable differences in the types and rates of specific challenges faced by these two groups. Perhaps the most striking example concerns military sexual trauma (MST). A recent study found that, among veterans diagnosed with PTSD, 31% of women reported MST, compared to just 1% of men (Maguen et al., 2010). The same study found that MST was, in turn, highly comorbid with depression, anxiety, and eating disorders in women veterans. While the incidence of PTSD, particularly related to combat exposure, and substance use disorder seem to be greater among male veterans, women veterans are overall more likely to experience service-related disabilities than men (Frayne et al., 2007), and are also more likely than men to meet diagnostic criteria for depression and anxiety (Runnals et al., 2014). Finally, according to a recently released report from the VA, the suicide rate among women veterans increased by 6.5% between 2005 and 2017, which positions women veterans as the demographic group of veterans with the fastinggrowing suicide rate (VA, 2019).
Not only do women veterans present a clinical picture that differs in systematic ways from that of their male counterparts, and therefore place distinctive demands on clinicians and other healthcare providers, women veterans are also a remarkably demographically diverse population. Relative to the general population, racial, ethnic, and sexual minorities are significantly over-represented among women veterans (Blosnich et al., 2013;Gates, 2010;Koo et al., 2015). Women in the US armed forces also tend to be younger, on average, and are more likely to belong to racial and ethnic minorities than their male counterparts (Maguen et al., 2010).

Barriers to care for women veterans
Are VSOs and the VA up to the challenge posed by such a seismic shift in the composition of the population they are intended to serve? While 30% of women veterans are estimated to interact with VSOs, the literature on this front is somewhat equivocal, with some evidence suggesting that women veterans face specific structural barriers to care. Relative to their male counterparts, women veterans are more likely to shoulder caretaking responsibilities for children and other family members (Mattocks et al., 2012), often forcing them to seek after-hours healthcare services that are frequently unavailable. Familial and domestic responsibilities may exacerbate one of the most commonly-cited reasons among veterans for not seeking care, which is distance to the nearest VA clinic (Institute of Medicine, 2014). Moreover, a high percentage of women veterans, particularly those ages 18-44, report being unsure of whether they qualify for VA services or believing-in many cases erroneously-that they do not (Mattocks et al., 2012). In addition to such structural barriers to care-seeking, women veterans also sometimes report feeling generally unwelcome or unaccommodated at veteran-oriented care centers such as the VA. For instance, women veterans have reported a lack of gender-sensitivity in health care services provided at the VA, especially non-academically affiliated VA centers with smaller and more male-dominated caseloads (Runnals et al., 2014), and have reported feeling unwelcome at VA centers, which tend to be maledominated (VA, 2015). Finally, women veterans report sexual harassment by male VA clients with dismaying frequency (Steinhauer, 2019).

Social media and the "clinical whitespace"
The rapid increase in the proportion of young women veterans, coupled with well-documented barriers to care experienced by women veterans, highlights the need for better insight into the dayto-day challenges facing women veterans. Because women veterans report feeling uncomfortable in settings such as the VA, and are using VA services at a lower rate than their male coun-terparts (House Committee on Veterans Affairs, 2019), there are relatively few clinical encounters for healthcare providers to gain a deep understanding of the specific issues facing women, the barriers to care as they perceive them, and the compensating behaviors they are engaging in when care is not available or not perceived to be available. Moreover, much of the literature cited above is based on data from women receiving VA care. It is highly likely that much of the literature on barriers to care-seeking among women veterans is systematically under-representing the women who are most likely to avoid encounters with the VA and other VSOs. Such gaps in health data derived from clinical encounters have been previously referred to as the "clinical whitespace" (Coppersmith et al., 2017).
A large body of previous work suggests that this whitespace can be to some degree filled by largescale analysis of public social media data, such as Twitter. Social media data constitutes a particularly rich and ecological data source for understanding a wide variety of physical and behavioral conditions through an epidemiological lens. Previous work has demonstrated the utility of social media and other sources of observational online data in identifying and understanding mental health conditions in a population (Coppersmith et al., 2018(Coppersmith et al., , 2015, tracking flu infections (Lamb et al., 2013), and screening for pancreatic cancer (Paparrizos et al., 2016), to name just a few examples. Here, we make the point-not previously made to our knowledge-that such data sources may prove particularly illuminating and necessary in efforts to understand the health of marginalized groups, whose "clinical whitespaces" may be even greater those of the general population. In the context of the current discussion, because women veterans are often excluded or marginalized in care settings-implicitly and explicitly-it is likely that emergent, informal social networks such as those that form through social media may prove to be a particularly important source of connection and source of information for women veterans.
In what follows, we use techniques from NLP and computational psychology to conduct an exploratory analysis into the motivations, beliefs, attitudes, and behaviors of women veterans based on the content of their social media posts. We present a series of analyses that, we believe, strongly suggest that women veterans use social media in systematically different ways from male veterans. Our results suggest that women veterans are more likely than men to use social media to discuss and form community around their experiences as veterans in ways that are constructive and positive. We argue that this finding has implications for how veteran-focused organizations educate, engage, and serve a diverse veteran population.

Data collection
The analyses reported below are based on a sample of approximately 3 million messages from 20,000 unique Twitter users.
Social media posts by US military veterans were collected from the Twitter API, using methods closely modeled on those described in (Beller et al., 2014). Beller and colleagues describe techniques for automatically identifying profession and other fine-grained social roles on the basis of self-disclosure. We began by manually constructing a corpus of words judged by subject matter experts 1 to indicate military experience (e.g., "veteran", "deployed", "USMC", "OIF" (Operation Iraqi Freedom), to name a few. We first searched messages and user descriptions for these words, then further refined the search using regular expressions to determine when these words occurred in contexts indicating that the author identified with the role or experience (e.g., "I'm a ", "As a I think"). Trained human annotators then manually inspected 10% of messages returned in this way to validate our search algorithm, which was found to have a 95% true positive rate. For each user who passed this filtering step, we collected additional public posts.
Following the initial data collection procedure described above, we applied age and gender classifiers to each user's data. These classifiers were previously trained on a separate sample of Twitter users who explicitly indicated their age or year of birth. Of the 20,000 users in our sample, 86% were male and 14% female (our gender classifier also includes an "other" category to capture individuals who do not conform to a gender binary).
Women veterans in the sample had a mean age of 29.7 years (SD=5.5), and men in our sample were closely matched, with a mean age of 29.8 years (SD=5.5). Our sample contains more women than the overall veteran population (9% in 2015, according to recent VA statistics (VA, 2017), but closely matches the demographics of currentlyserving military, in which 16% of enlisted service members and 19% of officers are women (Council on Foreign Relations, 2020). This difference from the overall veteran population was expected, given that social media users skew younger than the general population.

Topic modeling
To begin with, we pursued a bottom-up, datadriven analysis of the data-namely, topic modeling-to assess differences in how men and women veterans use social media. Topic modeling is an unsupervised machine learning algorithm that discovers latent semantic structure in collections of text by finding coherent "topics". Intuitively, topics can be thought of as bundles of words with high probabilities of co-occurrence. Topic modeling is therefore more sensitive to subtle linguistic patterns than simply counting word frequencies. For instance, simple word frequency analyses might reveal that the word "party" occurs frequently in a discourse. This finding, however, would be open to interpretation, owing to the ambiguity of the word "party" and the lack of context of word frequency (sometimes called unigram frequency). By contrast, topic modeling might reveal that "party" frequently co-occurs with "America", "freedom", "liberty". These words would be said to form a "topic". Topic modeling assigns a distribution to word-topic combinations, such that for a given word or term, we can estimate the probability that that word came from a given topic with p(term|topic). To prepare the data for topic modeling, we first removed all retweets, URLs, hashtags, and mentions. Topic modeling using latent Dirichlet allocation (LDA; (Blei et al., 2003)) was applied to the remaining 2.1 million posts that remained after data cleaning. Posts were then assigned a topic by finding the topic which maximizes the probability of all words in the post. Next, we computed the rate at which each user mentioned each topic by simply dividing the number of posts about each topic authored by the user by the total number of posts from the user; we will refer to this quotient as mention rate. We then computed an average topic mention rate for men and women by averaging by-user mention rates separately for the two groups. For ease of exposition, we asked human annotators to assign humanreadable labels to the topics generated by our analysis. Creating such topic labels is often as much art as science, and runs the risk of entrenching the biases of the annotators. In this context, we'll note first that we created these labels before conducting the analyses reported below. Second, the goal of this work is ultimately to inform and educate non-technical stakeholders and empower them to use methods such as these to, for example, programmatically identify women veterans on social media who may benefit from the services of particular VSOs. We feel that it's important, in early phases of this work especially, to present these results in a way that makes them accessible to the intended end users.
We began by finding which topics were, overall, most frequent in the posts of women veterans and male veterans. We found that the same five topics were most common among men and women, and these are shown in Figure 1. We found that the most common topic among male veterans was "Partisan politics", corresponding to around 3.75% of all posts. For women, the most common topic was "Personal reflection", corresponding to about 3.52% of all posts.
To identify more subtle differences between the two groups, we then computed the difference in mention rate across men and women (women's mention rate -men's mention rate). In Figure 2, we plot the difference in topic mention rate for the five topics with the largest positive difference (i.e., topics mentioned more by women) as well as the five topics with the largest negative difference (i.e., topics mentioned more by men).

Topic modeling results and discussion
While both men and women frequently mention Civic Engagement and Partisan Politics, the former occurred substantially more frequently in women's tweets, while the opposite was true of the latter. Women-favored topics generally reflect more pro-social concerns (civic engagement), community and interpersonal relatedness, and the day-to-day business of running a family or business. By contrast, male-favored topics reflect an over-arching concern with abstract political topics, conflict with other users, and sports. These results suggest that women veterans may be more likely than their male counterparts to use social media to form community and relationships in ways that are constructive and positive. By contrast, male veterans appear more likely to use social media to engage with news and controversial, divisive current events.
These results are suggestive, and accord very broadly with the hypothesis under discussionnamely, that social media may be used by women towards healthier or more pro-social ends, and may therefore serve as a particularly important platform for connecting with women veterans. However, these results do not directly address the key question of how women are talking about both their status as veterans and their health. In the section that follows, we take a different methodological tack to explore this question.

Mental health and veteran keywords
In order to more directly examine differences in social media use between women veterans and male veterans and how this might pertain to careseeking, we asked whether there are gender differences in the use of specific keywords related to veteran's issues and health issues of common concern to veterans, such as post-traumatic stress. To identify these keywords, we used word embeddings to find the semantic nearest neighbors of seed terms deemed a priori relevant to these subjects, in collaboration with subject matter experts (see footnote 1). Word embeddings encode the words in a collection of text in terms of a numeric vector of all the contexts in which each word occurs. For instance, suppose we had a sample of English that consisted of the sentences "The visitor likes dogs" and "The dog likes visitors". Leaving aside singular/plural morphology, we would represent "visitor" and "dogs" in very similar terms, since they seem to occur in very similar types of contexts (after "the" or after "likes"), whereas "the" appears to occur in very different contexts. This is useful because it allows us to precisely define how similar two words are by computing the Euclidean distance between those two words in the encoding scheme. Here, we used word2vec (Mikolov et al., 2013) to compute embeddings for the words in our corpus. We then identified mental health-related keywords by finding the 50 terms most similar to "depression" and  "PTSD". We identified veteran-related keywords by finding the 50 terms (specifically, lemmatized words) most similar to "veteran" and "service". For the sake of illustration, we display the top 15 terms in each group in Table 1.

Keywords results and discussion
To compare rates of keyword use between men and women, we computed the average number of unique user mentions per day, per gender, and compared the rate of keyword mentions across genders. Because the sample is not balanced, with far more men than women, we normalize the mention rate by dividing average daily unique user mentions by the total number of users. The results of this comparison are shown in Figure 3.
Women were overall significantly more likely to mention keywords related to both veteran's issues and mental health than men (p < .001). Moreover, both groups were more likely to mention veteran-related keywords than mental healthrelated keywords (p < .05). The latter pattern did not differ across the two groups, and the interaction between keyword type and gender was not significant (p > .5). This finding indicates that women veterans are significantly more likely than their male counterparts to use social media platforms to discuss both their mental health and their experiences as veterans. These results have wide-ranging significance, which we will elaborate on below. In order to provide further insight into the gender-driven differences in the use of these keywords, we ranked, by gender, the frequency with which each keyword was mentioned and computed the difference in rank between men and women by subtracting each keyword's ranking among women from its ranking among men (i.e., large values indicate that a word is used more often among women). In Figure 4 we plot the ten most women-and men-biased keywords in the corpus.
Women-favored keywords were largely consonant with the findings above. Women were much more likely to mention the word "caregiver", for example. This resonates with our own findings as well as the findings from existing literature summarized above-namely, that women veterans are far more likely to serve in caretaking roles in their families, and that this is particularly relevant to their experience as veterans. Perhaps most strikingly, four out of the ten most women-biased keywords were directly related to health, wellbeing, or care-seeking: "caregiver", "hosp", "asthma", and "drs" (Social Security Administration), "arthritis", "paramedic", "migraine". By contrast, only one of the ten most male-favored keywords relates to these topics in any straightforward way.

Discussion
These exploratory analyses provide an initial, rich set of indications that women veterans and male veterans use social media differently, and that these differences are relevant to the work of VSOs and the VA. Specifically, women veterans seem more likely than their male counterparts to use social media for finding community and for discussing or seeking information about healthcare. These findings have specific implications for the VSO community and the VA in considering opportunities for outreach to women veterans using social media platforms. Before proceeding further, however, we acknowledge that the method used to classify users according to gender, in their attempt to identify general patterns, could systematically mis-classify individuals whose social media behavior does not conform to gender norms, or to their own gender identity. This kind of systematic bias falls under the heading of what is often known as "algorithmic bias". We believe that the best remedy for algorithmic bias-or any other form of difficult-to-avoid bias-is to seek converging evidence from multiple methods and data sources, and thus avoid over-interpreting any one particular set of results.
Analyses like those reported here suggest several avenues for action to veteran organizations such as VSOs and mental health organizations that serve veteran populations. Specifically, our results suggest that social media-based outreach campaigns intended to promote self-care and treatment-seeking among veterans may be a particularly promising way to engage women veterans. Results from modeling efforts such as those reported here could be used to construct tailored audiences on platforms such as Facebook, Instagram, and Twitter. Indeed, pilot studies are underway to test the efficacy of constructing social media advertising campaigns for veteran-oriented, inpatient treatment based on the data reported here.
Moreover, topics, keywords, and n-grams (not explored here) constructed from corpora such as the one described in this paper are particularly well-suited to informing search-based advertising, veteran  suicide  supporter  crime  family  murder  marines  medication  military  pain  army  ptsd  service  disability  unit  stress  commitment  depression  patriot  alcoholism  honorably  addiction  leader  anxiety  sir  nightmares  doctor  mysteriously  medical  disorder   Table 1: 15 terms most similar to "veteran" and "service" (left column) and "PTSD" and "mental health" (right column) Figure 3: Normalized mention rate of veteran-and mental health-related keywords for men and women. in which VSOs pay for the ability to promote services to internet users based on the strings they search on platforms such as Google. In general, commonly-used strings cost more. Data such as these, derived directly from the population of interest, would allow VSOs to deploy often highlyconstrained advertising budgets to reach the individuals that stand to gain the most from their services. For example, we saw above that women veterans are likely to mention "pain" and "nightmares" in similar contexts as "depression". A targeted insight such as this would empower a nonprofit VSO to promote its services to women veterans and avoid spending large sums of money for strings explicitly containing the word "depression", the high cost of which is likely to be driven in part by life sciences companies with vastly larger advertising budgets.

Veteran keywords Mental health keywords
Finally, evidence that women veterans may be more likely to use social media for care-and community-seeking suggests the intriguing possibility that telehealth services such as web-based mental health counseling may be particularly welcomed by this community.
Beyond engagement and outreach in the traditional sense, it is important to recognize that social media constitutes an important outlet for seeking care and information about care for women vet-erans. It should therefore be a priority for VSOs and the VA to disseminate quality content for these platforms so that those using it for this purpose can find the right information at the right time and ultimately get connected to appropriate resources.

Conclusion
In this paper we presented a series of analyses strongly suggesting that women veterans of the US military use social media-specifically, Twitterin a qualitatively different way from male veterans. Specifically, women veterans appear more likely to use social media platforms to engage in conversations about mental health and veteranspecific issues. These findings suggest that social media may be hugely important for veteran service organizations seeking to reach, connect with, and care for more women.