Extraction of Lactation Frames from Drug Labels and LactMed

This paper describes a natural language processing (NLP) approach to extracting lactation-specific drug information from two sources: FDA-mandated drug labels and the NLM Drugs and Lactation Database (LactMed). A frame semantic approach is utilized, and the paper describes the selected frames, their annotation on a set of 900 sections from drug labels and LactMed articles, and the NLP system to extract such frame instances automatically. The ultimate goal of the project is to use such a system to identify discrepancies in lactation-related drug information between these resources.


Introduction
Medical information about prescription drugs is publicly available in a variety of sources, including the biomedical literature, consumer-focused websites, and the drug labels mandated by the U.S. Food & Drug Administration (FDA). But the rapid advances in biomedicine-especially recently-approved drugs-threatens to make these sources discordant. Synchronizing such sources is difficult due to their unstructured nature and the wide variety of ways in which they are organized. This paper presents initial work in an effort to align two such sources-drug labels and a single consumer health website-for information particular to a single sub-population-nursing mothers. This is a critical sub-population for providing validated health information to, especially for prescription drugs. Notably, randomized trials, the gold standard in drug evaluation, contain few if any nursing mothers in their trial populations. Thus, the information that a pharmaceutical substance has on such mothers and their children is scarce and of poor evidence quality, which only serves to promote misinformation and discourage mothers from taking needed medications. Authoritative guidance is critical in regards to what is supported or contradicted by the limited evidence, as well as what is simply unknown. Several public sources attempt to provide such authoritative information. Here, two such sources are studied: a section of drug labels specific to nursing mothers and a government website specific to drugs and lactation, LactMed. By identifying discrepancies in the free text narratives of these sources, further review can pinpoint information gaps, conflicting opinions, and out-of-date guidance.
The general strategy proposed in this paper involves (a) identifying seven key information types of drug information specific to nursing mothers, (b) utilizing linguistically-motivated frame semantic representations for these information types, (c) annotating instances of these frames on both lactation information sources, and (d) developing natural language processing (NLP) methods to extract this information automatically from these sources.
Our specific contributions include: 1. The first NLP method to focus specifically on drug information for nursing mothers.
2. Development of frame representations for lactation-specific drug information.
3. Application of a deep learning-based system on two separate lactation information sources, drug labels and LactMed.
4. Evaluation of cross-corpus similarity in terms of important lactation information.
While this paper's scope is quite narrow, just lactation information from two sources, we posit the techniques described here are generalizable to other lactation information sources (with minimal annotation/training) as well as to other important pharmaceutical sub-populations (with, albeit, considerable annotation effort).

Related Work
The existing work related to that proposed here is broken down into information extraction efforts on drug labels ( §2.1), maternal health in particular ( §2.2), and frame semantics in biomedicine ( §2.3).

Drug Label Information Extraction
Drug labels contain a wealth of unstructured information relating to FDA-approved pharmaceuticals, and thus have proven to be a consistent target of NLP-based systems interested in automatically creating knowledge bases (KBs) (Harpaz et al., 2014). For instance, SIDER (Kuhn et al., 2010(Kuhn et al., , 2016) is a well-used KB, constructed from drug labels, for adverse drug reaction (ADR) information (i.e., side effects). The 2017 TAC ADR task  utilized a corpus of 200 drug labels with sections specific to ADR information (Demner-Fushman et al., 2018b). On the other hand, Duke et al. (2013) demonstrated the dangers of using drug labels as an ADR KB by identifying numerous inconsistencies between the labels for bioequivalent drugs. Meanwhile, drug indications (i.e., the medical condition the drug is intended to treat) have also been well-studied (Névéol and Lu, 2010;Fung et al., 2013;Khare et al., 2014), as have drug interactions (Demner-Fushman et al., 2018a). All of these focus on general aspects of a drug, while hardly any work has focused on the information in drug labels related to specific populations, though both the TAC task as well as Culbertson et al. (2014) identified ADRpopulation relations.

Maternal Health Information Extraction
A few NLP methods have been applied to support maternal health. This includes processing biomedical literature to support evidence-based review of maternal mortality (de Groot et al., 2015) and identifying genes associated with placentamediated maternal diseases (Rodriguez et al., 2017). Electronic health record (EHR) data has been used to identify important maternal health information (Borra et al., 2013;Abhyankar and Demner-Fushman, 2013) and screen for suicide (Zhong et al., 2018(Zhong et al., , 2019. Social media has been used to identify pregnant women . Finally, only one known work focuses drug labels for maternal health, focusing on the identification of pregnancy risk categories (Rodriguez and Fushman, 2015).

Frame Semantics in Biomedicine
Frame semantics (Fillmore, 1976(Fillmore, , 1982 is a linguistic theory that postulates the meaning of most words is understood in relation to a conceptual frame in which entities take part. E.g., the meaning of sell in the "Jerry sold a car to Chuck" evokes a frame related to COMMERCE, which includes four elements: BUYER, SELLER, MONEY, and GOODS, though not all elements are required (as with MONEY here). Frames also include a lexical unit that triggers the frame ("sold" in the example). Frames provide a good connection between an abstract information representation and the actual text that specifies that information, and is thus a natural choice for a task such as identifying detailed lactation information in drug labels. Most notably, frame semantics have been operationalized in the large-scale resource FrameNet (Baker et al., 1998(Baker et al., , 2003, though this resource is not specific to biomedicine. Several works have explicitly extended FrameNet for biomedical tasks. This includes frame for molecular biology information (Dolbey et al., 2006;Dolbey, 2009;Tan, 2014), cancer information from EHRs Datta et al., 2017), and general medical information for Swedish (Kokkinakis, 2013). Many other works have implicitly used representations that are similar to frames, including the TAC ADR task data on drug labels Demner-Fushman et al., 2018b).

Data
Two different datasets were used to create the text corpus for frame annotation. Section 3.1 describes the drug labels dataset and Section 3.2 describes the LactMed dataset.

Lactation Information in Drug Labels
Drug labels were downloaded in August 2018 from the full release collection made available by DailyMed 1 . DailyMed is a public website operated by the National Library of Medicine (NLM) and is the official provider of FDA label information. These labels are maintained in a document markup standard approved by Health Level Seven (HL7) referred to as Structured Product Labeling (SPL), which specifies various drug label sections. For this work, only the lactation section was extracted. An example of this section is shown in Figure 1.

LactMed
LactMed 2 is a database created by the National Library of Medicine under the collection of TOXNET databases. LactMed provides information about various drugs and chemicals that nursing mothers may be exposed to that may then be passed to their infant through breast feeding. Information provided in LactMed includes the amount of a substance that may be excreted into breast milk, the absorption rate of an infant, and any potential adverse effects to a nursing infant. Data in LactMed is derived from reviews of the scientific literature, with each entry including references. Additionally, all records are peerreviewed by a panel of experts. For this work, only the "Summary of Use During Lactation" section was extracted for each LactMed article. An example of this section is shown in Figure 2.

Preprocessing
Each individual drug label is stored in the Daily-Med collection as a zip compressed folder that includes the drug label as an XML file and scanned images of the label. We extracted the folders and parsed each XML document to identify the relevant lactation information. While the drug labels provide additional information regarding the use of the drug, only section 8.2, "Lactation", was extracted into individual documents for each label. Prior to a specification change in June 2015, 2 https://toxnet.nlm.nih.gov/newtoxnet/lactmed.htm Figure 2: Example "Use During Lactation" section from LactMed this section was labeled as section 8.3, "Nursing Mothers". There were 37,005 separate drug labels parsed. Of those, lactation information was identified in 31,309 drug labels. Additionally, since many drug labels exist for the same drug, due to multiple manufactures and dosage amounts, only the lactation information from the most recent label for a drug was extracted. After this process of selecting only unique drug labels based on name, a dataset of 4,486 documents was created.
The entirety of LactMed is made available as a single XML document. This file was parsed to identify the drug name and the Summary of Use During Lactation section. Each LactMed article was already unique, therefore no de-duplication process is required. In total, 1,151 documents were created from LactMed.

Lactation Frames
Section 4.1 describes the frames annotated for both the drug labels and LactMed. Section 4.2 describes the annotation process.

Frame Descriptions
Since one of the primary purposes of annotating these two datasets is to compare information between them, a standard set of frames was chosen that would be applicable to both datasets. Seven lactation-related frames were chosen based on an initial review of sample drug labels and LactMed entries. These frames, detailed in Table 1, are: INFORMATION AVAILABILITY,  EFFECT ON MILK SUPPLY,  EXCRETION INTO MILK, ABSORPTION, ADVERSE REACTION, ALTERNATIVES, and VERDICT. For each of these frames, elements

Element Description
Non-Core Elements -Elements that are common across all/most frames.

ANIMAL
Marks non-humans to which the frame applies. Frequently information is only available in animals studies and as not been verified/observed in human studies. CONDITION A condition (specific circumstance) under which the rest of the frame applies. DRUG The name of the drug or the class of drugs to which the frame applies. INFORMATION Any reference to how the information was obtained/published or the information quality that results in the frame's information. LIKELIHOOD Any expression that suggests the frame is less than 100% positive, including hedging ("possible"), infrequency ("sometimes"), and negation ("no evidence"). ???
Marks any span that the annotator feels is important but does not currently have an annotation to match.
INFORMATION AVAILABILITY -The quantity/quality of lactation information for the drug.
QUALITY A reference to the quality of information available. (e.g., observational studies, randomized controlled trials) QUANTITY A reference to the quantity of information available (e.g., a large number of studies, minimal information) SOURCE The source of information (e.g., journal article, post marketing surveillance) EFFECT ON MILK SUPPLY -The impact the drug has on the overall milk supply.

QUALITY
A reference to the change in quality of the breast milk due to the drug. QUANTITY A quantitative expression of the impact of the drug on the milk supply. TREND The generalized trend (e.g., increases, decreases) in milk supply due to the drug.
EXCRETION INTO MILK -Information that the drug is excreted into the breast milk.

QUANTITY
A quantitative expression of how much of the drug (or other substance) is excreted into the breast milk. TIMEFRAME Either the span of time from taking the medication till initial excretion (e.g., "2 hours after taking") or the span of time (possibly half-life) until the drug will no longer be excreted (e.g., "within 4 days") ABSORPTION -Information that the nursing infant absorbs the drug from the breast milk.
QUANTITY A quantitative expression of how much of the drug (or other substance) is actually absorbed by the infant from the breast milk. TIMEFRAME Some span of time related to the absorption of the drug/substance by the infant.
ADVERSE REACTION -Reactions the infant may have from being exposed to the drug.

REACTION
The adverse reaction resulting from the drug.
ALTERNATIVES -Alternative drug options for breastfeeding mothers.
ALTERNATIVE The name of the alternative drug, drug class, or agent. PREFERENCE A statement about the preference for the alternative, which can be positive ("preferred") or negative ("not recommended").
VERDICT -Recommendations for nursing mothers using the drug.

POLARITY
Positive or negative verdict. DECISION What the nursing mother taking the drug should do (or not do). MONITOR Statement that the mother/child should be monitored (e.g., for adverse reactions).

REASON
The particular reason leading to the verdict decision.  were selected that describe the individual attributes and relations for the frame. These elements are where the detailed semantic information is located. Certain elements were selected that exist across all frames, these are referred to as Non-Core Elements.

Annotation
A random subset of equal amounts of documents from the drug labels dataset and LactMed dataset were selected for manual annotation. Example annotations from LactMed articles and the drug labels are shown in Figure 3.
The annotation process was completed by three individuals using BRAT (Stenetorp et al., 2013). Documents were double-annotated with a pair of individuals first annotating a collection of documents independently and then meeting to reconcile any differences. In cases where annotations could not be easily reconciled, the case was presented to two other individuals to help establish rules which could be used in similar situations moving forward. Annotation guidelines, which included frequently occurring lexical units for a given frame and example annotations, were devel-  oped in order to identify and ensure consistency.
After annotation of each subset of documents was completed and reconciled, a final review was performed by one of the annotators to ensure that any newly-established guidelines were consistent throughout all documents.
In total, 900 documents were double-annotated, 450 drug labels and 450 LactMed entries. Within these 900 documents a total of 2,984 frames and 8,384 frame elements were annotated. The frequency breakdown for each frame and frame element type is shown in Table 2.
The most frequently identified frames were EXCRE-TION INTO MILK with 853 frames, VERDICT with 852 frames, and ADVERSE REACTION with 727 frames. Table 3 shows the inter-annotator agreement for each frame and frame element. When determining the inter-annotator agreement, only exact matches are considered, though partial disagreements were quite common. For example if one annotator choose the lexical unit "breast milk" for an EXCRETION INTO MILK frame and the second annotator choose "milk", this would be considered a mismatch. (The annotation guidelines specify that "breast milk" is the correct lexical unit in such a case.)

Extraction
A standard bi-directional Long Short-Term Memory (Bi-LSTM) Conditional Random Field (CRF) was utilized to extract lactation frames. The Bi-LSTM utilizes both character embeddings (dynamic) and word embeddings (static, described below). Specifically, a pipeline approach was used that extracts frames and frame elements are identified in two separate steps. The first step identifies lexical units in a sentence for all potential frames, essentially equivalent to a named entity recognition approach. The second step performs relation extraction for each identified lexical unit, identifying frame elements associated for the frames identified in the first step.

Evaluation
For our evaluation we created three separate collections of training, test, and validation sets by  splitting the documents of the drug labels (DL), LactMed (LM), and LactMed and drug label combined (DL+LM). 80 percent of each dataset was used for training, 10 percent for testing, and 10 percent for validation. We trained and tested on various combinations of datasets.
We also experimented with training and testing on the different combinations of datasets, for example training on drug labels and LactMed (DL+LM) and testing on LactMed (LM), or training on LactMed (LM) and testing on the drug labels (DL).
Finally, to determine the effect that creating more manual annotations may improve the results of our model we generated a learning curve, using the full LactMed and drug label combined datasets. For generation of the learning curve, the same testing set was maintained and documents were added to the training set 50 documents at a time to generate a new model and evaluate against the test set.    Table 5 shows the different combinations of training and testing on various datasets. This data shows that training on the drug labels and LactMed together does improve the prediction performance on a single dataset opposed to just training on one dataset alone. For example, the model that was trained on drug labels and LactMed performed better on the LactMed test set (frame F1 of 77.18) than the model that was trained only on the LactMed dataset (frame F1 of 71.54). This effect is likely caused by an increase in training data overcoming the differences between the datasets. Table 6 shows the breakdown of the results by each frame and frame element for the model that was created using the embeddings that performed best on frame identification (MIMIC + GloVe (Wikipedia)) and the combined drug labels and LactMed dataset for training and testing. Figure 4 shows the learning curve as additional documents were added to the training set and the effect it has on the overall F1-measure. For both the frame and frame element the curve is beginning to level off, however it does seem to show that additional training data may continue to have a positive effect on the overall F1 for both cases.

Discussion
This paper addresses a critical component for assessing the consistency of drug information for nursing mothers, namely the information extraction techniques to extract semi-structured information from two drug information sources: manufacturer-supplied drug labels and expertsourced LactMed. A frame-based approach was devised utilizing seven frames dealing with the availability/quality of lactation information, the effects the drug has on a mother's milk supply, the degree to which the drug is excreted into the milk, the degree to which that drug is absorbed into the child's body, any potential adverse reactions the child may experience due to breastfeeding, recommended alternative drugs while nursing, and any general statements or verdicts on what nursing mothers should do as it relates to the particular drug. Each of these seven frames was double-annotated on a corpus of 450 drug label sections and 450 LactMed article summaries. A standard Bi-LSTM-CRF combining character and word embeddings is trained to extract these frames automatically. Experiments were performed to assess the best set of embeddings to use, the transferability of drug label and LactMed annotations, and whether sufficient annotated data exists to maximize frame extraction performance. These experiments yield several observations that have implications on further development of such a frame extraction system. First, the fact that open-domain embeddings outperformed embeddings trained on the drug labels and LactMed (see Table 4) can be considered a negative, but not entirely conclusive, result. In our initial error analysis it was clear that the lack of embedding information for common terms in the dataset (such as particular drug names) resulted in numerous errors. We did not experiment with additional embedding combinations, such as concatenating separate embeddings for Wikipedia, MIMIC, and drug label/LactMed, though this concatenation strategy has shown promise in other biomedical tasks (Roberts, 2016).
Second, the experiments demonstrated that training on both drug labels and LactMed improves performance over training on each individually (Table 5). This improvement is despite the fact that the drug labels and LactMed data appears to be quite different, as can be seen by comparing the result of training on one and testing on the other. For instance, LactMed results are quite poor when only training on drug labels (51.49 F1) and improve significantly when training on LactMed (71.54), but improve further still when training on both drug labels and LactMed (77.18). This would suggest that there is sufficient similarity to train on both, but perhaps domain adaptation methods could be employed to gain the benefits of larger training datasets while still identifying source-specific differences in the data.
Third, the amount of data available for train-ing ( Figure 4) suggest small gains are still likely to be expected given more data. Our error analysis, however, suggested that many of the "errors" could in fact be considered legitimate frame instances. This is typically a result of inconsistent frame annotation, which is of course quite common in complex semantic annotation tasks. However, it is clear that further quality control on the existing annotations will likely be a more promising effort prior to adding further annotations. Beyond the work described in this paper, there is still a good distance to go before an automatic method exists for detecting inconsistencies in lactation information sources. Notably, this work only extracts the basic frame instances from each of these sources, but does nothing to compare frames. Future work will thus be necessary to compare frame instances from a drug label and its corresponding LactMed article (not to mention that there are often multiple labels per drug). Comparing frames is certainly much easier than comparing full documents, but not without its challenges. Comparing individual frames requires both frame element-specific comparisons (e.g., is "safe during breastfeeding" equivalent to "no major concerns") as well as comparing to null frame elements (e.g., if one frame has a QUANTITY of "high levels" but the other frame has no QUAN-TITY at all). It is unlikely simple rule-based procedures can be used to identify equivalent or contradictory frames with high accuracy. However, this need not be the goal. Instead of providing a complete list of all inconsistent labels/articles, a likely application for the use of such a system is to provide a ranked list of labels/articles that are most likely to be incongruous. This approach may have greater robustness to errors in frame matching, and is a likely direction of future work.

Conclusion
This paper described a frame-based approach for lactation information extraction from drug labels and LactMed. Seven lactation-related frames were identified, manually annotated, and automatically extracted using a standard NLP approach. Future work will involve utilizing this system in order to identify discordant information present in drug information sources for nursing mothers.