Multi-granularity Textual Adversarial Attack with Behavior Cloning

Yangyi Chen, Jin Su, Wei Wei


Abstract
Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1)They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.
Anthology ID:
2021.emnlp-main.371
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4511–4526
Language:
URL:
https://aclanthology.org/2021.emnlp-main.371
DOI:
10.18653/v1/2021.emnlp-main.371
Bibkey:
Cite (ACL):
Yangyi Chen, Jin Su, and Wei Wei. 2021. Multi-granularity Textual Adversarial Attack with Behavior Cloning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4511–4526, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Multi-granularity Textual Adversarial Attack with Behavior Cloning (Chen et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.371.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.371.mp4
Code
 yangyi-chen/maya
Data
MultiNLISSTSST-2