BAE: BERT-based Adversarial Examples for Text Classification

Siddhant Garg; Goutham Ramakrishnan

doi:10.18653/v1/2020.emnlp-main.498

BAE: BERT-based Adversarial Examples for Text Classification

Abstract

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

Anthology ID:: 2020.emnlp-main.498
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6174–6181
Language:
URL:: https://aclanthology.org/2020.emnlp-main.498
DOI:: 10.18653/v1/2020.emnlp-main.498
Bibkey:
Cite (ACL):: Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, Online. Association for Computational Linguistics.
Cite (Informal):: BAE: BERT-based Adversarial Examples for Text Classification (Garg & Ramakrishnan, EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.498.pdf
Video:: https://slideslive.com/38938695
Code: QData/TextAttack + additional community code
Data: IMDB-BINARY, MPQA Opinion Corpus, MR, SUBJ, TREC-10

PDF Cite Search Code Video