Named Entity Recognition for Chinese biomedical patents

Yuting Hu, Suzan Verberne


Abstract
There is a large body of work on Biomedical Entity Recognition (Bio-NER) for English but there have only been a few attempts addressing NER for Chinese biomedical texts. Because of the growing amount of Chinese biomedical discoveries being patented, and lack of NER models for patent data, we train and evaluate NER models for the analysis of Chinese biomedical patent data, based on BERT. By doing so, we show the value and potential of this domain-specific NER task. For the evaluation of our methods we built our own Chinese biomedical patents NER dataset, and our optimized model achieved an F1 score of 0.54±0.15. Further biomedical analysis indicates that our solution can help detecting meaningful biomedical entities and novel gene-gene interactions, with limited labeled data, training time and computing power.
Anthology ID:
2020.coling-main.54
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
627–637
Language:
URL:
https://aclanthology.org/2020.coling-main.54
DOI:
10.18653/v1/2020.coling-main.54
Bibkey:
Cite (ACL):
Yuting Hu and Suzan Verberne. 2020. Named Entity Recognition for Chinese biomedical patents. In Proceedings of the 28th International Conference on Computational Linguistics, pages 627–637, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Named Entity Recognition for Chinese biomedical patents (Hu & Verberne, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.54.pdf
Code
 yukihuyt/chinese_biomed_patents_ner