Enhancing Drug-Drug Interaction Extraction from Texts by Molecular Structure Information

We propose a novel neural method to extract drug-drug interactions (DDIs) from texts using external drug molecular structure information. We encode textual drug pairs with convolutional neural networks and their molecular pairs with graph convolutional networks (GCNs), and then we concatenate the outputs of these two networks. In the experiments, we show that GCNs can predict DDIs from the molecular structures of drugs in high accuracy and the molecular information can enhance text-based DDI extraction by 2.39 percent points in the F-score on the DDIExtraction 2013 shared task data set.


Introduction
When drugs are concomitantly administered to a patient, the effects of the drugs may be enhanced or weakened, which may also cause side effects. These kinds of interactions are called Drug-Drug Interactions (DDIs). Several drug databases have been maintained to summarize drug and DDI information such as DrugBank (Law et al., 2014), Therapeutic Target database , and PharmGKB ( Thorn et al., 2013). Automatic DDI extraction from texts is expected to support the maintenance of databases with high coverage and quick update to help medical experts. Deep neural network-based methods have recently drawn a considerable attention (Liu et al., 2016;Sahu and Anand, 2017;Zheng et al., 2017;Lim et al., 2018) since they show state-of-the-art performance without manual feature engineering.
In parallel to the progress in DDI extraction from texts, Graph Convolutional Networks (GCNs) have been proposed and applied to estimate physical and chemical properties of molec-ular graphs such as solubility and toxicity (Duvenaud et al., 2015;Gilmer et al., 2017).
In this study, we propose a novel method to utilize both textual and molecular information for DDI extraction from texts. We illustrate the overview of the proposed model in Figure 1. We obtain the representations of drug pairs in molecular graph structures using GCNs and concatenate the representations with the representations of the textual mention pairs obtained by convolutional neural networks (CNNs). We trained the molecule-based model using interacting pairs mentioned in the DrugBank database and then trained the entire model using the labeled pairs in the text data set of the DDIExtraction 2013 shared task (SemEval-2013 Task 9) (Segura . In the experiment, we show GCNs can predict DDIs from molecular graphs in a high accuracy. We also show molecular information can enhance the performance of DDI extraction from texts in 2.39 percent points in F-score.
The contribution of this paper is three-fold: • We propose a novel neural method to extract DDIs from texts with the related molecular structure information. • We apply GCNs to pairwise drug molecules for the first time and show GCNs can predict DDIs between drug molecular structures in a high accuracy. • We show the molecular information is useful in extracting DDIs from texts.

Text-based DDI Extraction
Our model for extracting DDIs from texts is based on the CNN model by Zeng et al. (2014). When an input sentence S = (w 1 , w 2 , · · · , w N ) is given, We prepare word embedding w w i of w i and word Figure 1: Overview of the proposed model position embeddings w p i,1 and w p i,2 that correspond to the relative positions from the first and second target entities, respectively. We concatenate these embeddings as in Equation (1), and we use the resulting vector as the input to the subsequent convolution layer: where [; ] denotes the concatenation. We calculate the expression for each filter j with the window size k l .
where L is the number of windows, W conv j and b conv are the weight and bias of CNN, and max indicates max pooling (Boureau et al., 2010).
We convert the output of the convolution layer into a fixed-size vector that represents a textual pair as follows: where J is the number of filters. We get a predictionŷ t by the following fully connected neural networks: where W (1) t and W (2) t are weights and b (1) t and b (2) t are bias terms.

Molecular Structure-based DDI Classification
We represent drug pairs in molecular graph structures using two GCN methods: CNNs for fingerprints (NFP) (Duvenaud et al., 2015) and Gated Graph Neural Networks (GGNN) . They both convert a drug molecule graph G into a fixed size vector h g by aggregating the representation h T v of an atom node v in G. We represent atoms as nodes and bonds as edges in the graph.
NFP first obtains the representation h t v by the following equations (Duvenaud et al., 2015).
is the degree of a node v and σ is a sigmoid function. NFP then acquires the representation of the graph structure where W t is a weight matrix. GGNN first obtains the representation h t v by using Gated Recurrent Unit (GRU)-based recurrent neural networks  as follows: where A evw is a weight for the bond type of each edge e vw . GGNN then acquires the representation of the graph structure.
where i and j are linear layers and is the element-wise product.
We obtain the representation of a molecular pair by concatenating the molecular graph representations of drugs g 1 and g 2 , i.e., We get a predictionŷ m as follows: where W (1) m and W (2) m are weights and b (1) m and b (2) m are bias terms.

DDI Extraction from Texts Using Molecular Structures
We realize the simultaneous use of textual and molecular information by concatenating a textbased and molecule-based vectors: We normalize molecule-based vectors.
We then use h all instead of h t in Equation 7.
In training, we first train the molecular-based DDI classification model. The molecular-based classification is performed by minimizing the loss function L m = − y m logŷ m . We then fix the parameters for GCNs and train text-based DDI extraction model by minimizing the loss function L t = − y t logŷ t .

Experimental Settings
In this section, we explain the textual and molecular data and task settings and training settings.

Text Corpus and Task Setting
We followed the task setting of Task 9.2 in the DDIExtraction 2013 shared task  for the evaluation. This data set is composed of documents annotated with drug mentions and their four types of interactions: Mechanism, Effect, Advice and Int. For the data statistics, please refer to the supplementary materials.
The task is a multi-class classification task, i.e., to classify a given pair of drugs into the four interaction types or no interaction. We evaluated the performance with micro-averaged precision (P), Figure 2: Associating DrugBank entries with texts and molecular graph structures recall (R), and F-score (F) on all the interaction types. We used the official evaluation script provided by the task organizers.
As preprocessing, we split sentences into words using the GENIA tagger (Tsuruoka et al., 2005). We replaced the drug mentions of the target pair with DRUG1 and DRUG2 according to their order of appearance. We also replaced other drug mentions with DRUGOTHER. We did not employ negative instance filtering unlike other existing methods, e.g., Liu et al. (2016), since our focus is to evaluate the effect of the molecular information on texts.
We linked mentions in texts to DrugBank entries by string matching. We lowercased the mentions and the names in the entries and chose the entries with the most overlaps. As a result, 92.15% and 93.09% of drug mentions in train and test data set matched the DrugBank entries.

Data and Task for Molecular Structures
We extracted 255,229 interacting (positive) pairs from DrugBank. We note that, unlike text-based interactions, DrugBank only contains the information of interacting pairs; there are no detailed labels and no information for non-interacting (negative) pairs. We thus generated the same number of pseudo negative pairs by randomly pairing drugs and removing those in positive pairs. To avoid overestimation of the performance, we also deleted drug pairs mentioned in the test set of the text corpus. We split positive and negative pairs into 4:1 for training and test data, and we evaluated the classification accuracy using only the molecular information.
To obtain the graph of a drug molecule, we took  (Weininger, 1988) string encoding of the molecule from DrugBank and then converted it into the graph using RDKit (Landrum, 2016) as illustrated in Figure 2. For the atom features, we used randomly embedded vectors for each atoms (i.e., C, O, N, ...). We also used 4 bond types: single, double, triple, or aromatic.

Training Settings
We employed mini-batch training using the Adam optimizer (Kingma and Ba, 2015). We used L2 regularization to avoid over-fitting. We tuned the bias term b (2) t for negative examples in the final softmax layer. For the hyper-parameters, please refer to the supplementary materials.
We employed pre-trained word embeddings trained by using the word2vec tool (Mikolov et al., 2013) on the 2014 MEDLINE/PubMed baseline distribution. The vocabulary size was 215,840. The embedding of the drugs, i.e., DRUG1 and DRUG2 were initialized with the pre-trained embedding of the word drug. The embeddings of training words that did not appear in the pretrained embeddings were initialized with the average of all pre-trained word embeddings. Words that appeared only once in the training data were replaced with an UNK word during training, and the embedding of words in the test data set that did not appear in both training and pre-trained embeddings were set to the embedding of the UNK word. Word position embeddings are initialized with random values drawn from a uniform distribution.
We set the molecule-based vectors of unmatched entities to zero vectors. Table 1 shows the performance of DDI extraction models. We show the performance without negative instance filtering or ensemble for the fair comparison. We observe the increase of recall and F-score by using molecular information,    Both GCNs improvements were statistically significant (p < 0.05 for NFP and p < 0.005 for GGNN) with randomized shuffled test. Table 2 shows F-scores on individual DDI types. The molecular information improves Fscores especially on type Mechanism and Effect.

Results
We also evaluated the accuracy of binary classification on DrugBank pairs by using only the molecular information in Table 3. The performance is high, although the accuracy is evaluated on automatically generated negative instances.
Finally, we applied the molecular-based DDI classification model trained on DrugBank to the DDIExtraction 2013 task data set. Since the Drug-Bank has no detailed labels, we mapped all four types of interactions to positive interactions and evaluated the classification performance. The results in Table 4 show that GCNs produce higher recall than precision and the overall performance is low considering the high performance on Drug-Bank pairs. This might be because the interactions of drugs are not always mentioned in texts even if the drugs can interact with each other and because hedged DDI mentions are annotated as DDIs in the text data set. We also trained the DDI extraction model only with molecular information by replacing h all with h m , but the F-scores were quite low (< 5%). These results show that we cannot predict textual relations only with molecular information.

Related Work
Various feature-based methods have been proposed during and after the DDIExtraction-2013 shared task . Kim et al. (2015) proposed a two-phase SVM-based approach that employed a linear SVM with rich features that consist of word, word pair, dependency graph, parse tree, and noun phrase-based constrained coordination features. Zheng et al. (2016) proposed a context vector graph kernel to exploit various types of contexts. Raihani and Laachfoubi (2017) also employed a two-phase SVM-based approach using non-linear kernels and they proposed five groups of features: word, drug, pair of drug, main verb and negative sentence features. Our model does not use any features or kernels.
Various neural DDI extraction models have been recently proposed using CNNs and Recurrent Neural Networks (RNNs). Liu et al. (2016) built a CNN-based model based on word and position embeddings. Zheng et al. (2017) proposed a Bidirectional Long Short-Term Memory RNN (Bi-LSTM)-based model with an input attention mechanism, which obtained target drug-specific word representations before the Bi-LSTM. Lim et al. (2018) proposed Recursive neural networkbased model with a subtree containment feature and an ensemble method. This model showed the state-of-the-art performance on the DDIExtraction 2013 shared task data set if systems do not use negative instance filtering. These approaches did not consider molecular information, and they can also be enhanced by the molecular information. Vilar et al. (2017) focused on detecting DDIs from different sources such as pharmacovigilance sources, scientific biomedical literature and social media. They did not use deep neural networks and they did not consider molecular information.
Learning representations of graphs are widely studied in several tasks such as knowledge base completion, drug discovery, and material science Gilmer et al., 2017). Several graph convolutional neural networks have been proposed such as NFP (Duvenaud et al., 2015), GGNN , and Molecular Graph Convolutions (Kearnes et al., 2016), but they have not been applied to DDI extraction.

Conclusions
We proposed a novel neural method for DDI extraction using both textual and molecular informa-tion. The results show that DDIs can be predicted with high accuracy from molecular structure information and that the molecular information can improve DDI extraction from texts by 2.39 percept points in F-score on the data set of the DDIExtraction 2013 shared task.
As future work, we would like to seek the way to model the textual and molecular representations jointly with alleviating the differences in labels. We will also investigate the use of other information in DrugBank.