MSIT_SRIB at MEDIQA 2019: Knowledge Directed Multi-task Framework for Natural Language Inference in Clinical Domain.

In this paper, we present Biomedical Multi-Task Deep Neural Network (Bio-MTDNN) on the NLI task of MediQA 2019 challenge. Bio-MTDNN utilizes “transfer learning” based paradigm where not only the source and target domains are different but also the source and target tasks are varied, although related. Further, Bio-MTDNN integrates knowledge from external sources such as clinical databases (UMLS) enhancing its performance on the clinical domain. Our proposed method outperformed the official baseline and other prior models (such as ESIM and Infersent on dev set) by a considerable margin as evident from our experimental results.


Introduction
The task of natural language inference (NLI) intends to determine whether a given hypothesis can be inferred from a given premise. This task also referred to as recognizing textual entailment (RTE), is one of the most prevalent tasks among NLP researchers . It has been one of the significant components for several other language applications such as Information Extraction (IE), Question Answering (QA) or Document Summarization. For example, Harabagiu and Hickl (2006) argue that RTE can enable QA systems to identify correct answers by allowing filtering and reranking them w.r.t a given question. Another approach is proposed by Ben Abacha and Demner-Fushman (2016), whereby the authors employ RTE in IE/QA domain to answer a given question (queried by a consumer) by retrieving similar questions that are already well responded by professionals.
In order to address this simple yet challenging task of NLI, several open domain datasets have been proposed, with Stanford Natural Language Inference (SNLI) (Bowman et al., 2015) and MultiNLI (Williams et al., 2018) being the most popular ones. They serve as a standard to assess recent NLI systems. However, there have been only a few resources available in specialized domains such as biomedical or medicine. Language inference in the medical domain is extremely complex and remains less explored by the ML community. This scantiness of adequate resources (in terms of datasets) can be attributed to the fact that patient's data is sensitive, is accessible to authorized medical professionals only, and requires domain experts to annotate it, unlike generic domains where one can rely on crowdsourcing based techniques to acquire annotations.
To this end, Ben Abacha et al. (2019) released a new dataset made available through MIMIC-III derived data repository, named MedNLI, for NLI in the clinical domain which has been annotated by experts. Along these lines, the MediQA 2019 challenge aims to foster the development of appropriate methods, techniques and standards for inference/entailment in the medical domain, specifically on MedNLI dataset through a shared task. The task intends to recognize three inference relations between two sentences: Entailment, Neutral and Contradiction.
Previous research associated with the present task, such as work by Romanov and Shivade (2018) analyzed several state-of-the-art open domain models for NLI on the MedNLI dataset. The same has been utilized as a baseline for comparison in the above mentioned shared task. Prior to this, efforts have been made towards the automatic construction of RTE datasets (Ben Abacha and Demner-Fushman, 2016;Abacha et al., 2015), application of active learning on small RTE data (Shivade et al., 2015).
Our approach to solving the NLI task on the MedNLI data is based on leveraging transfer learning paradigm integrated with direct incorporation of domain-specific knowledge from medical knowledge bases (KB). Unlike Romanov and Shivade (2018) which utilizes transfer learning to utilize standard NLI models (such as InferSent and ESIM trained specifically on NLI task only) in the clinical domain, we employ Mutli-task learning (MTL) framework with domain adaptation to learn representations across multiple natural language understanding (NLU) tasks. This approach not only leverages vast amounts of cross-task data but also benefits from a regularization effect that leads to better generalization and facilitates adaptation to new tasks and domains. Besides domain adaptation, we also directly infuse domain specific knowledge from database of medical terminologies so as to enable the system to perform well in the clinical domain. The rest of the paper is organized as follows: Section 2 describe the details of our approach. Section 3 demonstrates the experimental results. We conclude in Section 4.

Approach
This section elaborates on the various methods we experimented with for the NLI task. In order to establish a simple baseline first, we utilize a feature-based system. The extracted features include word containment (Lyon et al., 2001) and Jaccard similarity (unigram, bigram, and trigram) based features. We also use similarity measure of distributed sentence representations obtained using universal sentence encoder (Cer et al., 2018). We consider Levenshtein, and Euclidean distance, negations and cosine function as similarity measures. In order to find the n-grams, we utilize NLTK and scispaCy tokenizer (Neumann et al., 2019). We train a 3-class logistic regression classifier with above-mentioned features to output the inference relations. Apart from this baseline, We now elaborate on the transfer learning and external knowledge integration based method in the following subsections.

Transfer Learning
Given the vast amounts of data available in the open-domain NLU tasks, we leverage them to attack the NLI task on MedNLI. Given a source domain D S , a corresponding source task T S , as well as a target domain D T and a target task T T , the objective of transfer learning is to learn the target conditional probability distribution P (Y T |X T ) in D T with the information gained from D S and T S where D S = D T and/or T S = T T . X and Y are feature and label space respectively.
We consider the scenario when D S = D T (D S being open-domain and D T being clinical domain) and T S = T T , with two possibilities for target task T T . In the first scenario, we consider a single related T T and in the second scenario we leverage multi-task framework where we augment the T T with multiple but related NLU tasks. For both the scenarios, we utilize the method of sequential transfer where a model is pre-trained on the large source domain data and fine-tuned on limited target domain data (clinical here). Next, we describe the neural network based models that we utilize.

Bi-CNN-MI
We leverage Bi-CNN-MI model (Yin and Schütze, 2015) to realize the single transfer task scenario. This DNN model is trained on a similar NLU task of paraphrase identification (PI) which is formalized as a binary classification task: for given two sentences, determine whether they both convey roughly the same meaning.
Bi-CNN-MI compares two sentences on multiple levels of granularity (word, short n-gram, long n-gram and sentence) and learns corresponding sentence representations using a convolutional neural network (CNN) based Siamese network. It also captures the sentence interactions between two sentences by computing an interaction matrix at each level of granularity. This model has been reported to outperform various earlier approaches on PI (Yin and Schütze, 2015). We leverage this model for sequential transfer by learning the model parameters on the PI task and fine-tuning them on MedNLI dataset. Note that the classification task in MedNLI can also benefit by capturing interactions at various levels of granularity making it related to PI task but at the same time different from PI as the objective of MedNLI is not only to determine if a pair of sentences convey the same meaning but also segregate if they oppose each other or are unrelated.

MT-DNN
In the second scenario of transfer learning, we augment the target task T T by various related NLU tasks and train the model to perform on all of them. This approach not only leverages exten- sive amounts of data on multiple tasks but also enables the regularization effect leading to better generalization ability. Essentially, we want to use the knowledge acquired by learning from related tasks to do well on a target task. For this approach, we utilize MT-DNN (Liu et al., 2019) which combines MTL with pre-trained language model (BERT) to improve the text representations.
The MT-DNN model combines four types of NLU tasks: single-sentence classification (sentiment classification, grammatical acceptability), pairwise text classification (NLI on several corpus and PI), text similarity scoring (STS-B), and relevance ranking (QNLI). Note, the pairwise text classification task is the NLI task that we originally intended to address in MedNLI.
The model architecture of MT-DNN involves lower layers that are shared across all tasks, while the top layers represent task-specific outputs. The input X, comprising of premise P and hypothesis H is concatenated and represented as a sequence of embedding vectors (Layer L1). The transformer encoder (BERT) then captures contextual information in the second layer (L2). This is the shared semantic representation that is trained by the multitask objectives.
MT-DNN trained on all of the above-mentioned tasks on open-domain datasets is then fine-tuned by MedNLI dataset. In this fine-tuning step, we update the shared weights and weights associated with only the pairwise text classification task. Essentially, we first try to capture the knowledge from several related tasks in NLU followed by adapting the model to the clinical domain.

Knowledge from External Sources
Medical texts often hold relations between entities which require domain-specific knowledge for the analysis. For example, the knowledge that pneumonia is a lung disease may not be evident from the clinical text directly. In such a scenarios, incorporation of external knowledge which conveys such relationships can help. We utilize UMLS database (restricted to the SNOMED-CT terminology) represented as a graph where clinical concepts are nodes, connected by edges representing relations, such as synonymy, parent-child, etc. Next we discuss the details of the mechanism to incorporate this external knowledge, thus elaborating our Bio-MTDNN model architecture.

Bio-MTDNN
We propose Bio-MTDNN model which integrates domain knowledge on top of the MT-DNN model in a way similar to how interactions are captured in Bi-CNN-MI model. Specifically, we calculate the interaction matrix I R N ×M between all pairs of tokens P i and H j in the input premise (length N) and hypothesis (length M) respectively. The value in each cell is the length of the shortest path l ij between the corresponding concepts of the premise and the hypothesis in SNOMED-CT. This matrix is then utilized to generate knowledge attended representations,P andH. Each tokenP i of the premise is a weighted sum of the embedding H e j of the relevant tokens H j of the hypothesis, weights derived from the interaction matrix. Finally, the two knowledge directed representations (averaged over the token representations) of the premiseP and hypothesisH are composed together using elementary operations (concatenation, multiplication and subtraction) and fed to a single feed forward layer. This composed representation is then concatenated with the L2 layer of MT-DNN before passing it to the task-specific layers.
In the above process, the creation of knowledge directed representations relies upon the input token embeddings of premise (P e j ) and hypothesis (H e j ). One of the simplest options for token embeddings is to use GloVe embeddings (Pennington et al., 2014). However, these embeddings are not specific to the clinical domain and may result in many tokens being mapped to the embedding of the unknown (UNK) token. To alleviate this issue, we learned a non-linear transformation (Sharma et al., 2018) that maps words from PubMed (Pyysalo et al., 2013) to GloVe subspace. We train the DNN using the common words in both the embeddings. We obtain the transformed embeddings for all the words in the PubMed that are not present in the GloVe by using inference step of the learned DNN.
Note that, here we cannot utilize the embeddings learned in the first layer (L1) of MT-DNN as they incorporate segment embeddings of the premise and hypothesis concatenated together. Thus, the L1 layer of MT-DNN learns the interactions between premise and hypothesis in an endto-end manner. However, what we are trying is to learn these interactions which are directed by the knowledge obtained from UMLS enabling Bio-MTDNN to incorporate external information.

Setup and Implementation Details
For the feature-based system we used Logistic Regression classifier from the scikit-learn library (Pedregosa et al., 2011). We use publicly available implementations for Bi-CNN-Mi 1 and MT-DNN 2 . For external knowledge integration, the required medical concepts in SNOMED-CT were identified in the premise and hypothesis sentences using MetaMap by Aronson and Lang 1 https://github.com/chantera/bicnn-mi 2 https://github.com/namisan/mt-dnn (2010). We used glove and PubMed word embeddings and used DNN (Sharma et al., 2018) for nonlinear projection. In all experiments we report the average result (on the dev set) of 5 different runs, with the same hyperparameters and different random seeds. For the best performing systems, we also report the results on the test set. In order to compare against other transfer learning based approaches (Romanov and Shivade, 2018), we also mention the results of Infersent and ESIM (note that for both these models, D S = D T and T S = T T , unlike the scenarios we considered). It can be observed that Bio-MTDNN outperforms both ESIM and Infersent with significant margins. This can be attributed to the external knowledge incorporation and ability of MTL framework which empowers the model to learn better shared representations. However, contrary to the expectations, Bi-CNN-MI model performs very poorly on the dev dataset with only 54.1% accuracy, only slightly better than feature based baseline which achieves 51.9 % accuracy. This may be attributed to the possibility that the knowledge gained by Bi-CNN-MI when trained on PI task (although a related task to NLI) is not sufficient for the model to be able to segregate contradicting premise and hypothesis.

Results and Discussions
In this paper, we introduce Bio-MTDNN , which is a knowledge directed, multi-task learning based language inference model for biomedical text mining. While MT-DNN was built for general purpose language understanding, Bio-MTDNN effectively leverages domain specific knowledge from UMLS as demonstrated by our experimental study. We presented our results on the MedNLI dataset under MediQA challenge. Incorporation of knowledge from external sources such as UMLS gives performance advantage to Bio-MTDNN. Our proposed system outperformed the official baseline and other prior models (ESIM and Infersent on dev set) by a great margin.