Early Exiting BERT for Efficient Document Ranking

Ji Xin, Rodrigo Nogueira, Yaoliang Yu, Jimmy Lin


Abstract
Pre-trained language models such as BERT have shown their effectiveness in various tasks. Despite their power, they are known to be computationally intensive, which hinders real-world applications. In this paper, we introduce early exiting BERT for document ranking. With a slight modification, BERT becomes a model with multiple output paths, and each inference sample can exit early from these paths. In this way, computation can be effectively allocated among samples, and overall system latency is significantly reduced while the original quality is maintained. Our experiments on two document ranking datasets demonstrate up to 2.5x inference speedup with minimal quality degradation. The source code of our implementation can be found at https://github.com/castorini/earlyexiting-monobert.
Anthology ID:
2020.sustainlp-1.11
Volume:
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
Month:
November
Year:
2020
Address:
Online
Editors:
Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, Goran Glavaš, Shafiq Joty, Alex Wang, Thomas Wolf
Venue:
sustainlp
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
83–88
Language:
URL:
https://aclanthology.org/2020.sustainlp-1.11
DOI:
10.18653/v1/2020.sustainlp-1.11
Bibkey:
Cite (ACL):
Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin. 2020. Early Exiting BERT for Efficient Document Ranking. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 83–88, Online. Association for Computational Linguistics.
Cite (Informal):
Early Exiting BERT for Efficient Document Ranking (Xin et al., sustainlp 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sustainlp-1.11.pdf
Video:
 https://slideslive.com/38939433
Code
 castorini/earlyexiting-monobert
Data
MS MARCO