Scalable, Semi-Supervised Extraction of Structured Information from Scientific Literature

Kritika Agrawal, Aakash Mittal, Vikram Pudi


Abstract
As scientific communities grow and evolve, there is a high demand for improved methods for finding relevant papers, comparing papers on similar topics and studying trends in the research community. All these tasks involve the common problem of extracting structured information from scientific articles. In this paper, we propose a novel, scalable, semi-supervised method for extracting relevant structured information from the vast available raw scientific literature. We extract the fundamental concepts of “aim”, ”method” and “result” from scientific articles and use them to construct a knowledge graph. Our algorithm makes use of domain-based word embedding and the bootstrap framework. Our experiments show that our system achieves precision and recall comparable to the state of the art. We also show the domain independence of our algorithm by analyzing the research trends of two distinct communities - computational linguistics and computer vision.
Anthology ID:
W19-2602
Volume:
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Vivi Nastase, Benjamin Roth, Laura Dietz, Andrew McCallum
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–20
Language:
URL:
https://aclanthology.org/W19-2602
DOI:
10.18653/v1/W19-2602
Bibkey:
Cite (ACL):
Kritika Agrawal, Aakash Mittal, and Vikram Pudi. 2019. Scalable, Semi-Supervised Extraction of Structured Information from Scientific Literature. In Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications, pages 11–20, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
Scalable, Semi-Supervised Extraction of Structured Information from Scientific Literature (Agrawal et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/W19-2602.pdf
Data
SemEval-2017 Task-10