Incorporating Figure Captions and Descriptive Text in MeSH Term Indexing

Xindi Wang, Robert E. Mercer


Abstract
The goal of text classification is to automatically assign categories to documents. Deep learning automatically learns effective features from data instead of adopting human-designed features. In this paper, we focus specifically on biomedical document classification using a deep learning approach. We present a novel multichannel TextCNN model for MeSH term indexing. Beyond the normal use of the text from the abstract and title for model training, we also consider figure and table captions, as well as paragraphs associated with the figures and tables. We demonstrate that these latter text sources are important feature sources for our method. A new dataset consisting of these text segments curated from 257,590 full text articles together with the articles’ MEDLINE/PubMed MeSH terms is publicly available.
Anthology ID:
W19-5018
Volume:
Proceedings of the 18th BioNLP Workshop and Shared Task
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, Junichi Tsujii
Venue:
BioNLP
SIG:
SIGBIOMED
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–175
Language:
URL:
https://aclanthology.org/W19-5018
DOI:
10.18653/v1/W19-5018
Bibkey:
Cite (ACL):
Xindi Wang and Robert E. Mercer. 2019. Incorporating Figure Captions and Descriptive Text in MeSH Term Indexing. In Proceedings of the 18th BioNLP Workshop and Shared Task, pages 165–175, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Incorporating Figure Captions and Descriptive Text in MeSH Term Indexing (Wang & Mercer, BioNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-5018.pdf
Code
 xdwang0726/Mesh