BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Zachariah Zhang; Jingshu Liu; Narges Razavian

doi:10.18653/v1/2020.clinicalnlp-1.3

BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Zachariah Zhang, Jingshu Liu, Narges Razavian

Abstract

ICD coding is the task of classifying and cod-ing all diagnoses, symptoms and proceduresassociated with a patient’s visit. The process isoften manual, extremely time-consuming andexpensive for hospitals as clinical interactionsare usually recorded in free text medical notes. In this paper, we propose a machine learningmodel, BERT-XML, for large scale automatedICD coding of EHR notes, utilizing recentlydeveloped unsupervised pretraining that haveachieved state of the art performance on a va-riety of NLP tasks. We train a BERT modelfrom scratch on EHR notes, learning with vo-cabulary better suited for EHR tasks and thusoutperform off-the-shelf models. We furtheradapt the BERT architecture for ICD codingwith multi-label attention. We demonstratethe effectiveness of BERT-based models on thelarge scale ICD code classification task usingmillions of EHR notes to predict thousands ofunique codes.

Anthology ID:: 2020.clinicalnlp-1.3
Volume:: Proceedings of the 3rd Clinical Natural Language Processing Workshop
Month:: November
Year:: 2020
Address:: Online
Editors:: Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann
Venue:: ClinicalNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24–34
Language:
URL:: https://aclanthology.org/2020.clinicalnlp-1.3/
DOI:: 10.18653/v1/2020.clinicalnlp-1.3
Bibkey:
Cite (ACL):: Zachariah Zhang, Jingshu Liu, and Narges Razavian. 2020. BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 24–34, Online. Association for Computational Linguistics.
Cite (Informal):: BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining (Zhang et al., ClinicalNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.clinicalnlp-1.3.pdf
Video:: https://slideslive.com/38939836

PDF Cite Search Video Fix data