Guiding Attention for Self-Supervised Learning with Transformers

Ameet Deshpande; Karthik Narasimhan

doi:10.18653/v1/2020.findings-emnlp.419

Guiding Attention for Self-Supervised Learning with Transformers

Abstract

In this paper, we propose a simple and effective technique to allow for efficient self-supervised learning with bi-directional Transformers. Our approach is motivated by recent studies demonstrating that self-attention patterns in trained models contain a majority of non-linguistic regularities. We propose a computationally efficient auxiliary loss function to guide attention heads to conform to such patterns. Our method is agnostic to the actual pre-training objective and results in faster convergence of models as well as better performance on downstream tasks compared to the baselines, achieving state of the art results in low-resource settings. Surprisingly, we also find that linguistic properties of attention heads are not necessarily correlated with language modeling performance.

Anthology ID:: 2020.findings-emnlp.419
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4676–4686
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.419/
DOI:: 10.18653/v1/2020.findings-emnlp.419
Bibkey:
Cite (ACL):: Ameet Deshpande and Karthik Narasimhan. 2020. Guiding Attention for Self-Supervised Learning with Transformers. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4676–4686, Online. Association for Computational Linguistics.
Cite (Informal):: Guiding Attention for Self-Supervised Learning with Transformers (Deshpande & Narasimhan, Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.419.pdf
Optionalsupplementarymaterial:: 2020.findings-emnlp.419.OptionalSupplementaryMaterial.zip
Video:: https://slideslive.com/38940124
Video:: https://slideslive.com/38939446
Code: ameet-1997/AttentionGuidance
Data: GLUE, MultiNLI, QNLI

PDF Cite Search Code Optionalsupplementarymaterial Video Video Fix data