Investigating Learning Dynamics of BERT Fine-Tuning

Yaru Hao, Li Dong, Furu Wei, Ke Xu


Abstract
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks. In this paper, we inspect the learning dynamics of BERT fine-tuning with two indicators. We use JS divergence to detect the change of the attention mode and use SVCCA distance to examine the change to the feature extraction mode during BERT fine-tuning. We conclude that BERT fine-tuning mainly changes the attention mode of the last layers and modifies the feature extraction mode of the intermediate and last layers. Moreover, we analyze the consistency of BERT fine-tuning between different random seeds and different datasets. In summary, we provide a distinctive understanding of the learning dynamics of BERT fine-tuning, which sheds some light on improving the fine-tuning results.
Anthology ID:
2020.aacl-main.11
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Kam-Fai Wong, Kevin Knight, Hua Wu
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
87–92
Language:
URL:
https://aclanthology.org/2020.aacl-main.11
DOI:
Bibkey:
Cite (ACL):
Yaru Hao, Li Dong, Furu Wei, and Ke Xu. 2020. Investigating Learning Dynamics of BERT Fine-Tuning. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 87–92, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Investigating Learning Dynamics of BERT Fine-Tuning (Hao et al., AACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aacl-main.11.pdf
Data
MultiNLISSTSST-2