Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT

Zaiqiao Meng, Fangyu Liu, Thomas Clark, Ehsan Shareghi, Nigel Collier


Abstract
Infusing factual knowledge into pre-trained models is fundamental for many knowledge-intensive tasks. In this paper, we proposed Mixture-of-Partitions (MoP), an infusion approach that can handle a very large knowledge graph (KG) by partitioning it into smaller sub-graphs and infusing their specific knowledge into various BERT models using lightweight adapters. To leverage the overall factual knowledge for a target task, these sub-graph adapters are further fine-tuned along with the underlying BERT through a mixture layer. We evaluate our MoP with three biomedical BERTs (SciBERT, BioBERT, PubmedBERT) on six downstream tasks (inc. NLI, QA, Classification), and the results show that our MoP consistently enhances the underlying BERTs in task performance, and achieves new SOTA performances on five evaluated datasets.
Anthology ID:
2021.emnlp-main.383
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4672–4681
Language:
URL:
https://aclanthology.org/2021.emnlp-main.383
DOI:
10.18653/v1/2021.emnlp-main.383
Bibkey:
Cite (ACL):
Zaiqiao Meng, Fangyu Liu, Thomas Clark, Ehsan Shareghi, and Nigel Collier. 2021. Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4672–4681, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT (Meng et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.383.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.383.mp4
Code
 cambridgeltl/mop
Data
MedQAPubMedQA