On the weak link between importance and prunability of attention heads

Aakriti Budhraja, Madhura Pande, Preksha Nema, Pratyush Kumar, Mitesh M. Khapra


Abstract
Given the success of Transformer-based models, two directions of study have emerged: interpreting role of individual attention heads and down-sizing the models for efficiency. Our work straddles these two streams: We analyse the importance of basing pruning strategies on the interpreted role of the attention heads. We evaluate this on Transformer and BERT models on multiple NLP tasks. Firstly, we find that a large fraction of the attention heads can be randomly pruned with limited effect on accuracy. Secondly, for Transformers, we find no advantage in pruning attention heads identified to be important based on existing studies that relate importance to the location of a head. On the BERT model too we find no preference for top or bottom layers, though the latter are reported to have higher importance. However, strategies that avoid pruning middle layers and consecutive layers perform better. Finally, during fine-tuning the compensation for pruned attention heads is roughly equally distributed across the un-pruned heads. Our results thus suggest that interpretation of attention heads does not strongly inform pruning.
Anthology ID:
2020.emnlp-main.260
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3230–3235
Language:
URL:
https://aclanthology.org/2020.emnlp-main.260
DOI:
10.18653/v1/2020.emnlp-main.260
Bibkey:
Cite (ACL):
Aakriti Budhraja, Madhura Pande, Preksha Nema, Pratyush Kumar, and Mitesh M. Khapra. 2020. On the weak link between importance and prunability of attention heads. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3230–3235, Online. Association for Computational Linguistics.
Cite (Informal):
On the weak link between importance and prunability of attention heads (Budhraja et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.260.pdf
Video:
 https://slideslive.com/38939353