Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Xuanli He, Lingjuan Lyu, Lichao Sun, Qiongkai Xu


Abstract
Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pretrained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by the malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model, and find that unless the performance of the victim model is sacrificed, both model extraction and adversarial transferability can effectively compromise the target models.
Anthology ID:
2021.naacl-main.161
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2006–2012
Language:
URL:
https://aclanthology.org/2021.naacl-main.161
DOI:
10.18653/v1/2021.naacl-main.161
Bibkey:
Cite (ACL):
Xuanli He, Lingjuan Lyu, Lichao Sun, and Qiongkai Xu. 2021. Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2006–2012, Online. Association for Computational Linguistics.
Cite (Informal):
Model Extraction and Adversarial Transferability, Your BERT is Vulnerable! (He et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.161.pdf
Video:
 https://aclanthology.org/2021.naacl-main.161.mp4
Code
 xlhex/extract_and_transfer
Data
AG News