Wei-Hung Weng


2020

pdf bib
Entity-Enriched Neural Models for Clinical Question Answering
Bhanu Pratap Singh Rawat | Wei-Hung Weng | So Yeon Min | Preethi Raghavan | Peter Szolovits
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

We explore state-of-the-art neural models for question answering on electronic medical records and improve their ability to generalize better on previously unseen (paraphrased) questions at test time. We enable this by learning to predict logical forms as an auxiliary task along with the main task of answer span detection. The predicted logical forms also serve as a rationale for the answer. Further, we also incorporate medical entity information in these models via the ERNIE architecture. We train our models on the large-scale emrQA dataset and observe that our multi-task entity-enriched models generalize to paraphrased questions ~5% better than the baseline BERT model.

2019

pdf bib
Publicly Available Clinical BERT Embeddings
Emily Alsentzer | John Murphy | William Boag | Wei-Hung Weng | Di Jindi | Tristan Naumann | Matthew McDermott
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Contextual word embedding models such as ELMo and BERT have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on 3/5 clinical NLP tasks, establishing a new state-of-the-art on the MedNLI dataset. We find that these domain-specific models are not as performant on 2 clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.