Adversarial Attack and Defense of Structured Prediction Models

Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu


Abstract
Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on classification problems. In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. Besides the difficulty of perturbing discrete words and the sentence fluency problem faced by attackers in any NLP tasks, there is a specific challenge to attackers of structured prediction models: the structured output of structured prediction models is sensitive to small perturbations in the input. To address these problems, we propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model with feedbacks from multiple reference models of the same structured prediction task. Based on the proposed attack, we further reinforce the victim model with adversarial training, making its prediction more robust and accurate. We evaluate the proposed framework in dependency parsing and part-of-speech tagging. Automatic and human evaluations show that our proposed framework succeeds in both attacking state-of-the-art structured prediction models and boosting them with adversarial training.
Anthology ID:
2020.emnlp-main.182
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2327–2338
Language:
URL:
https://aclanthology.org/2020.emnlp-main.182
DOI:
10.18653/v1/2020.emnlp-main.182
Bibkey:
Cite (ACL):
Wenjuan Han, Liwen Zhang, Yong Jiang, and Kewei Tu. 2020. Adversarial Attack and Defense of Structured Prediction Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2327–2338, Online. Association for Computational Linguistics.
Cite (Informal):
Adversarial Attack and Defense of Structured Prediction Models (Han et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.182.pdf
Video:
 https://slideslive.com/38939221
Code
 WinnieHAN/structure_adv