ERLKG: Entity Representation Learning and Knowledge Graph based association analysis of COVID-19 through mining of unstructured biomedical corpora

Sayantan Basu, Sinchani Chakraborty, Atif Hassan, Sana Siddique, Ashish Anand


Abstract
We introduce a generic, human-out-of-the-loop pipeline, ERLKG, to perform rapid association analysis of any biomedical entity with other existing entities from a corpora of the same domain. Our pipeline consists of a Knowledge Graph (KG) created from the Open Source CORD-19 dataset by fully automating the procedure of information extraction using SciBERT. The best latent entity representations are then found by benchnmarking different KG embedding techniques on the task of link prediction using a Graph Convolution Network Auto Encoder (GCN-AE). We demonstrate the utility of ERLKG with respect to COVID-19 through multiple qualitative evaluations. Due to the lack of a gold standard, we propose a relatively large intrinsic evaluation dataset for COVID-19 and use it for validating the top two performing KG embedding techniques. We find TransD to be the best performing KG embedding technique with Pearson and Spearman correlation scores of 0.4348 and 0.4570 respectively. We demonstrate that a considerable number of ERLKG’s top protein, chemical and disease predictions are currently in consideration for COVID-19 related research.
Anthology ID:
2020.sdp-1.15
Volume:
Proceedings of the First Workshop on Scholarly Document Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Muthu Kumar Chandrasekaran, Anita de Waard, Guy Feigenblat, Dayne Freitag, Tirthankar Ghosal, Eduard Hovy, Petr Knoth, David Konopnicki, Philipp Mayr, Robert M. Patton, Michal Shmueli-Scheuer
Venue:
sdp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
127–137
Language:
URL:
https://aclanthology.org/2020.sdp-1.15
DOI:
10.18653/v1/2020.sdp-1.15
Bibkey:
Cite (ACL):
Sayantan Basu, Sinchani Chakraborty, Atif Hassan, Sana Siddique, and Ashish Anand. 2020. ERLKG: Entity Representation Learning and Knowledge Graph based association analysis of COVID-19 through mining of unstructured biomedical corpora. In Proceedings of the First Workshop on Scholarly Document Processing, pages 127–137, Online. Association for Computational Linguistics.
Cite (Informal):
ERLKG: Entity Representation Learning and Knowledge Graph based association analysis of COVID-19 through mining of unstructured biomedical corpora (Basu et al., sdp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sdp-1.15.pdf
Video:
 https://slideslive.com/38940725
Code
 sayantanbasu05/ERKLG
Data
BC5CDRCORD-19NCBI Disease