Similarity Analysis of Contextual Word Representation Models

John Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass


Abstract
This paper investigates contextual word representation models from the lens of similarity analysis. Given a collection of trained models, we measure the similarity of their internal representations and attention. Critically, these models come from vastly different architectures. We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation. The analysis reveals that models within the same family are more similar to one another, as may be expected. Surprisingly, different architectures have rather similar representations, but different individual neurons. We also observed differences in information localization in lower and higher layers and found that higher layers are more affected by fine-tuning on downstream tasks.
Anthology ID:
2020.acl-main.422
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4638–4655
Language:
URL:
https://aclanthology.org/2020.acl-main.422
DOI:
10.18653/v1/2020.acl-main.422
Bibkey:
Cite (ACL):
John Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, and James Glass. 2020. Similarity Analysis of Contextual Word Representation Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4638–4655, Online. Association for Computational Linguistics.
Cite (Informal):
Similarity Analysis of Contextual Word Representation Models (Wu et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.422.pdf
Video:
 http://slideslive.com/38928932
Code
 johnmwu/contextual-corr-analysis
Data
GLUEMultiNLISSTSST-2