ChiMed: A Chinese Medical Corpus for Question Answering

Yuanhe Tian, Weicheng Ma, Fei Xia, Yan Song


Abstract
Question answering (QA) is a challenging task in natural language processing (NLP), especially when it is applied to specific domains. While models trained in the general domain can be adapted to a new target domain, their performance often degrades significantly due to domain mismatch. Alternatively, one can require a large amount of domain-specific QA data, but such data are rare, especially for the medical domain. In this study, we first collect a large-scale Chinese medical QA corpus called ChiMed; second we annotate a small fraction of the corpus to check the quality of the answers; third, we extract two datasets from the corpus and use them for the relevancy prediction task and the adoption prediction task. Several benchmark models are applied to the datasets, producing good results for both tasks.
Anthology ID:
W19-5027
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
250–260
Language:
URL:
https://aclanthology.org/W19-5027
DOI:
10.18653/v1/W19-5027
Bibkey:
Cite (ACL):
Yuanhe Tian, Weicheng Ma, Fei Xia, and Yan Song. 2019. ChiMed: A Chinese Medical Corpus for Question Answering. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 250–260, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
ChiMed: A Chinese Medical Corpus for Question Answering (Tian et al., BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5027.pdf
Code
 yuanheTian/ChiMed
Data
ChiMed-VL