Scientific Discovery as Link Prediction in Influence and Citation Graphs

Fan Luo, Marco A. Valenzuela-Escárcega, Gus Hahn-Powell, Mihai Surdeanu


Abstract
We introduce a machine learning approach for the identification of “white spaces” in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence statements such as “CTCF activates FOXA1”, which were automatically extracted using open-domain machine reading. We model this prediction task using graph-based features extracted from the above influence graph, as well as from a citation graph that captures scientific communities. We evaluated the proposed approach through backtesting. Although the data is heavily unbalanced (50 times more negative examples than positives), our approach predicts which influence links will be discovered in the “near future” with a F1 score of 27 points, and a mean average precision of 68%.
Anthology ID:
W18-1701
Volume:
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana, USA
Editors:
Goran Glavaš, Swapna Somasundaran, Martin Riedl, Eduard Hovy
Venue:
TextGraphs
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–6
Language:
URL:
https://aclanthology.org/W18-1701
DOI:
10.18653/v1/W18-1701
Bibkey:
Cite (ACL):
Fan Luo, Marco A. Valenzuela-Escárcega, Gus Hahn-Powell, and Mihai Surdeanu. 2018. Scientific Discovery as Link Prediction in Influence and Citation Graphs. In Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pages 1–6, New Orleans, Louisiana, USA. Association for Computational Linguistics.
Cite (Informal):
Scientific Discovery as Link Prediction in Influence and Citation Graphs (Luo et al., TextGraphs 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-1701.pdf