Context Analysis for Pre-trained Masked Language Models

Yi-An Lai; Garima Lalwani; Yi Zhang

doi:10.18653/v1/2020.findings-emnlp.338

Context Analysis for Pre-trained Masked Language Models

Abstract

Pre-trained language models that learn contextualized word representations from a large un-annotated corpus have become a standard component for many state-of-the-art NLP systems. Despite their successful applications in various downstream NLP tasks, the extent of contextual impact on the word representation has not been explored. In this paper, we present a detailed analysis of contextual impact in Transformer- and BiLSTM-based masked language models. We follow two different approaches to evaluate the impact of context: a masking based approach that is architecture agnostic, and a gradient based approach that requires back-propagation through networks. The findings suggest significant differences on the contextual impact between the two model architectures. Through further breakdown of analysis by syntactic categories, we find the contextual impact in Transformer-based MLM aligns well with linguistic intuition. We further explore the Transformer attention pruning based on our findings in contextual analysis.

Anthology ID:: 2020.findings-emnlp.338
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2020
Month:: November
Year:: 2020
Address:: Online
Editors:: Trevor Cohn, Yulan He, Yang Liu
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3789–3804
Language:
URL:: https://aclanthology.org/2020.findings-emnlp.338/
DOI:: 10.18653/v1/2020.findings-emnlp.338
Bibkey:
Cite (ACL):: Yi-An Lai, Garima Lalwani, and Yi Zhang. 2020. Context Analysis for Pre-trained Masked Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3789–3804, Online. Association for Computational Linguistics.
Cite (Informal):: Context Analysis for Pre-trained Masked Language Models (Lai et al., Findings 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.findings-emnlp.338.pdf
Data: CoNLL 2003, GLUE

PDF Cite Search Fix data