Investigating the Documentation of Electronic Cigarette Use in the Veteran Affairs Electronic Health Record: A Pilot Study

In this paper, we present pilot work on characterising the documentation of electronic cigarettes (e-cigarettes) in the United States Veterans Administration Electronic Health Record. The Veterans Health Administration is the largest health care system in the United States with 1,233 health care facilities nationwide, serving 8.9 million veterans per year. We identified a random sample of 2000 Veterans Administration patients, coded as current tobacco users, from 2008 to 2014. Using simple keyword matching techniques combined with qualitative analysis, we investigated the prevalence and distribution of e-cigarette terms in these clinical notes, discovering that for current smokers, 11.9% of patient records contain an e-cigarette related term.


Introduction
Electronic cigarettes -e-cigarettes -were developed in China in the early 2000s and first introduced to the US market in 2007. Once established in the US, the product experienced explosive growth, with the number of e-cigarette users doubling every year between 2008 and 2012 (Grana et al., 2014). In 2012 it was estimated that 75% of US adults had heard of e-cigarettes, and 8.1% had tried them . By 2014, the proportion of adult Americans who had tried e-cigarettes increased to 12.6% (Schoenborn and Gindi, 2015).
Public health practitioners, government regulatory authorities, professional associations, the media, as well as individual clinicians and health workers are divided as to whether e-cigarettes represent an exciting new smoking cessation opportunity (Green et al., 2016;McNeill et al., 2015;Caponnetto et al., 2013) or are an untested, potentially dangerous technology that risks undermining recent successes in "denormalising" smoking (Choi et al., 2012;Etter et al., 2011;Gornall, 2015;U.S. Department of Health and Human Services, 2016;Department of Health and Human Services, 2014).
Currently, little is known about how clinicians "on-the-ground" advise patients who use, or are considering using, e-cigarettes. While Winden et al. (2015) has gone some way to describing e-cigarette Electronic Health Record (EHR) documentation behaviour in the context of a medical system in Vermont, national patterns in ecigarette documentation have not been explored. In this paper, we present pilot work on characterising the documentation of e-cigarettes in the United States Veterans Administration Electronic Health Record. The Veterans Health Administration (VA) is the largest health care system in the United States with 1,233 health care facilities nationwide, serving 8.9 million veterans per year. VA EHR data provides the opportunity for nationwide population-health surveillance of e-cigarette use.
The remainder of this document consists of five sections. Following a discussion of related work in Section 2, Section 3 describes both our cohort selection procedure, and our method of identifying e-cigarette documentation in clinical notes, while Sections 4 and 5 present the results of out analysis, and some discussion of those results. The final section outlines some broad conclusions.

Background
The VA collects data about patient smoking history and status using several approaches at the time of a patient encounter. Most patient clinical encounters have an associated health factor (i.e. semi-structured data that describes patient smoking status or smoking history (Barnett et al., 2014)). In addition, if the veteran has received dental care, the VA dental data contains descriptions of patient smoking status as a coded database field. However, neither of these data sources can be used to define what type of tobacco the patient uses and more specifically, if the patient uses ecigarettes. This information is only found embedded in clinical text.
Given the rapid rise in popularity of ecigarettes, and the lack of adequate public health surveillance systems currently focussing on these novel tobacco products, various methods and data sources have been used to understand changes in e-cigarette prevalence and usage patterns, including analysing search engine queries relevant to e-cigarettes (Ayers et al., 2011), mining social media data (Myslín et al., 2013;, and -the focus of this paper -analysing EHR data for e-cigarette related documentation (Winden et al., 2015).
Previous work on smoking status identification in the EHR context has focussed on structured data EHR corpus analysis has been the focus of several research efforts in the tobacco domain. For example, Chen et al. (2014) investigated the documentation of general tobacco use in clinical notes from Vermont's Fletcher Allen Health Center, discovering that free-text clinical notes are frequently used to document amount of tobacco used, tobacco use frequency, and start and end dates of tobacco use (i.e. important clinical information that is difficult to represent with structured data). In follow-up work focussing specifically on ecigarettes rather than general tobacco use, Winden et al. (2015), again using EHR data from Fletcher Allen Health Center, developed a sophisticated annotation scheme to code e-cigarette documentation, with categories including dose, device type, frequency, and use for smoking cessation. One result of particular note garnered from this research is the observation that less than 1% of patients had e-cigarette mentions in their note.
In this pilot study, our aim is to complete an initial corpus analysis of VA patient record data with the goal of quantifying the frequency with which e-cigarette usage is documented within the VA patient record.

Materials and Methods
We queried the VA dental record data found in the VA Corporate Data Warehouse to identify a national cohort of all Veterans Affairs patients with a coded history of current (or current and past) smoking between the years 2008-2014. Dental records were chosen as a data source as they are believed to be the most reliable indicators of smoking status in the VA context. From these data we identified 87,392 unique patients (77,491 current smokers, 9,901 current and past smokers). We then selected a random sample of 2,000 patients and extracted their associated clinical notes yielding 154,991 clinical notes. Note types include progress notes, consultation notes, consent documents, instructions, triage notes, history and physical notes, amongst others.
Based on an iterative process of corpus exploration, along with insights gleaned from previous work on e-cigarette related natural language processing (Myslín et al., 2013;Winden et al., 2015), we identified twenty e-cigarette related terms (listed in Table 1), and -using these terms -performed a keyword search within the pa-tient clinical notes. We reviewed each e-cigarette term instance in its context to ascertain whether the e-cigarette term instance actually referred to ecigarette usage.
We report the precision of each e-cigarette term defined as the proportion of term match instances actually referencing e-cigarette usage of all term matches.

Term
Total We analysed notes from 2,000 VA patients. From these notes, we observed 238 patients (11.9%) with one or more e-cigarette mentions within their notes (see Figure 1). In total, there were 601 mentions, with 436 notes containing more than one mention. Of these 601 mentions, 199 (33.1%) mentions described true e-cigarette usage ( Table 1) as ascertained by manual inspection. The most frequent e-cigarette term matches included variants of the term vapor (vapor: 241,vaporizer: 192,Vapor: 73). These terms were also the most frequent sources of false positives (vapor: 160,vaporizer: 156,and Vapor: 69). Thirteen of the twenty terms yielded precision scores greater than 0.500. Of these high-precision terms, the most prevalent terms included vape: 19, ecig: 14, and electronic cig: 10.

Discussion
We observed a variety of linguistic contexts describing e-cigarette usage. Patients report use of e-cigarettes with other tobacco products (e.g., "smokes 10 tobacco cigs per day and uses vape"). Similar to tobacco cessation, clinicians report providing encouragement and counselling for patients to stop e-cigarette use. Patients often contemplate e-cigarettes as an alternative to tobacco usage (e.g., "thinking about switching to ecig") or as an approach to tobacco cessation (e.g., "uses nicotine vaporizer and hasn't smoked tobacco in 6 mos"). This was not a surprising finding given that, according to the Centers for Disease Control, "among current cigarette smokers who had tried to quit smoking in the past year, more than onehalf had ever tried an e-cigarette and 20.3% were current e-cigarette users" (Schoenborn and Gindi, 2015). Patients reported differing experiences of using e-cigarettes as a smoking cessation aid, with one patient stating directly that e-cigarettes were an ineffective tool in his struggle to quit smoking. Consistent with current uncertainty regarding the safety of e-cigarettes and their utility as a smoking cessation aid, not all clinicians support the use of e-cigarettes as a safe alternative to tobacco usage (e.g., "I do not recommend ecig/vapor").
Analogous to the "packs-per-day" metric used by clinicians to document volume of combustable tobacco use, patients report their frequency of e-cigarette use in volume over time (e.g., "6mg/day"). E-cigarette usage goals are often set by both clinicians ("reducing consumption from 9 grams to 3 with goal of quitting") and patients ("using e cig and cutting back by half") alike. One clinician reported a patient's use of ecigarettes with "no side effects with current meds" suggesting that clinicians are aware that known side effects with medication use is a possibility.
Although most of the twenty e-cigarette terms used in this study yielded precision scores greater than 0.500, we also observed a substantial proportion of term matches that did not indicate actual e-cigarette usage. Many false positives occurred due to the ambiguous nature of the word vaporizer and its variants. For example, the domestic use of a vaporizer to increase room humidity, the treatment of patients with over-the-counter sinus From notes containing matched e-cigarette variants, we discovered several co-occurring terms which could improve the term's precision, with examples including nicotine vaporizer, vapornicotine, vapor cig, vapor cigarettes, vapor pens, vapor cigarets, methonol vapor, and vapor nicotine.
The pilot work described in this short paper has several limitations. First, our list of e-cigarette related keywords was limited to twenty. As indicated above, there may well be additional high precision e-cigarette related terms that we did not use in this work. Second, unlike Winden et al. (2015) we have not conducted a large scale annotation effort or mapped to an annotation scheme. Finally, while the VA is the largest integrated medical system in the United States, and the only nationwide system, VA patients are not necessarily representative of the general population. It is particularly important to note that approximately 92% of veterans are male (National Center for Veterans Analysis and Statistics, 2013).

Conclusion
In conclusion, we have demonstrated that for current smokers, e-cigarette terms are present in 11.9% (238) of VA patient records. Of this 11.9% of patients, it is estimated that around two thirds of e-cigarette mentions are false positives, suggesting that around 4% of smokers have e-cigarette use documented in their clinical notes.