TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. This paper introduces TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP. TextAttack builds attacks from four components: a goal function, a set of constraints, a transformation, and a search method. TextAttack’s modular design enables researchers to easily construct attacks from combinations of novel and existing components. TextAttack provides implementations of 16 adversarial attacks from the literature and supports a variety of models and datasets, including BERT and other transformers, and all GLUE tasks. TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness.TextAttack is democratizing NLP: anyone can try data augmentation and adversarial training on any model or dataset, with just a few lines of code. Code and tutorials are available at https://github.com/QData/TextAttack.


Introduction
Over the last few years, there has been growing interest in investigating the adversarial robustness of NLP models, including new methods for generating adversarial examples and better approaches to defending against these adversaries (Alzantot et al., 2018;Kuleshov et al., 2018;Gao et al., 2018;Ebrahimi et al., 2017;Zang et al., 2020;Pruthi et al., 2019). It is difficult to compare these attacks directly and fairly, since they are often evaluated on different data samples and victim models. Re- (2019)'s TextFooler for a BERT-based sentiment classifier. Swapping out "perfect" with synonym "spotless" completely changes the model's prediction, even though the underlying meaning of the text has not changed.
implementing previous work as a baseline is often time-consuming and error-prone due to a lack of source code, and precisely replicating results is complicated by small details left out of the publication. These barriers make benchmark comparisons hard to trust and severely hinder the development of this field.
To encourage the development of the adversarial robustness field, we introduce TextAttack, a Python framework for adversarial attacks, data augmentation, and adversarial training in NLP.
To unify adversarial attack methods into one system, we decompose NLP attacks into four components: a goal function, a set of constraints, a transformation, and a search method. The attack attempts to perturb an input text such that the model output fulfills the goal function (i.e., indicating whether the attack is successful) and the perturbation adheres to the set of constraints (e.g., grammar constraint, semantic similarity constraint). A search method is used to find a sequence of transformations that produce a successful adversarial example.
This modular design enables us to easily assemble attacks from the literature while reusing components that are shared across attacks.
TextAttack provides clean, readable implementations of 16 adversarial attacks from the literature. For the first time, these attacks can be benchmarked, compared, and analyzed in a standardized setting. TextAttack is directly integrated with Hug-gingFace's transformers and nlp libraries. This allows users to test attacks on models and datasets.
TextAttack provides dozens of pre-trained models (LSTM, CNN, and various transformerbased models) on a variety of popular datasets. Currently TextAttack supports a multitude of tasks including summarization, machine translation, and all nine tasks from the GLUE benchmark.
TextAttack also allows users to provide their own models and datasets.
Ultimately, the goal of studying adversarial attacks is to improve model performance and robustness. To that end, TextAttack provides easyto-use tools for data augmentation and adversarial training. TextAttack's Augmenter class uses a transformation and a set of constraints to produce new samples for data augmentation. Attack recipes are re-used in a training loop that allows models to train on adversarial examples. These tools make it easier to train accurate and robust models.

The TextAttack Framework
TextAttack aims to implement attacks which, given an NLP model, find a perturbation of an input sequence that satisfies the attack's goal and adheres to certain linguistic constraints. In this way, attacking an NLP model can be framed as a combinatorial search problem. The attacker must search within all potential transformations to find a sequence of transformations that generate a successful adversarial example.
Each attack can be constructed from four components: 1. A task-specific goal function that determines whether the attack is successful in terms of the model outputs.

Developing NLP Attacks with TextAttack
TextAttack is available as a Python package installed from PyPI, or via direct download from GitHub. TextAttack is also available for use through our demo web app, displayed in Figure 3.
Python users can test attacks by creating and manipulating Attack objects. The command-line API offers textattack attack, which allows users to specify attacks from their four components or from a single attack recipe and test them on different models and datasets.

Benchmarking Existing Attacks with Attack Recipes
TextAttack's modular design allows us to implement many different attacks from past work in a shared framework, often by adding only one or two new components. Table 1 categorizes 16 attacks based on their goal functions, constraints, transformations and search methods.
All of these attacks are implemented as "attack recipes" in TextAttack and can be benchmarked with just a single command. See A.3 for a comparison between papers' reported attack results and the results achieved by running TextAttack.

Creating New Attacks by Combining Novel and Existing Components
As is clear from Table 1, many components are shared between NLP attacks. New attacks often reuse components from past work, adding one or two novel pieces. TextAttack allows researchers to focus on the generation of new components rather than replicating past results. For example,  introduced TextFooler as a method for attacking classification and entailment models. If a researcher wished to experiment with applying TextFooler's search method, transformations, and constraints to attack translation models, all they need is to implement a translation goal function in TextAttack. They would then be able to plug in this goal function to create a novel attack that could be used to analyze translation models.

Evaluating Attacks on TextAttack's Pre-Trained Models
As of the date of this submission, TextAttack provides users with 82 pre-trained models, including word-level LSTM, word-level CNN, BERT, and other transformer based models pre-trained on various datasets provided by HuggingFace nlp. Since TextAttack is integrated with the nlp library, it can automatically load the test or validation data set for the corresponding pre-trained model. While the literature has mainly focused on classification and entailment, TextAttack's pretrained models enable research on the robustness of models across all GLUE tasks. 4 Utilizing TextAttack to Improve NLP Models

Evaluating Robustness of Custom Models
TextAttack is model-agnostic -meaning it can run attacks on models implemented in any deep learning framework. Model objects must be able to take a string (or list of strings) and return an output that can be processed by the goal function. For example, machine translation models take a list of strings as input and produce a list of strings as output. Classification and entailment models return an array of scores. As long as the user's model meets this specification, the model is fit to use with TextAttack.

Model Training
TextAttack users can train standard LSTM, CNN, and transformer based models, or a usercustomized model on any dataset from the nlp library using the textattack train command. Just like pre-trained models, user-trained models are compatible with commands like textattack attack and textattack eval.

While searching for adversarial examples,
TextAttack's transformations generate perturbations of the input text, and apply constraints to verify their validity. These tools can be reused to dramatically expand the training dataset by introducing perturbed versions of existing samples. The textattack augment command gives users access to a number of pre-packaged recipes for augmenting their dataset. This is a stand-alone feature that can be used with any model or training framework. When using TextAttack's models and training pipeline, textattack train --augment automatically expands the dataset before training begins. Users can specify the fraction of each input that should be modified and how many additional versions of each example to create. This makes it easy to use existing augmentation recipes on different models and datasets, and is a great way to benchmark new techniques. Figure 4 shows empirical results we obtained using TextAttack's augmentation. Augmentation with TextAttack immediately improves the performance of a WordCNN model on small datasets.

Adversarial Training
With textattack train --attack, attack recipes can be used to create new training sets of adversarial examples. After training for a number of epochs on the clean training set, the attack generates an adversarial version of each input. This perturbed version of the dataset is substituted for the original, and is periodically regenerated according to the model's current weaknesses. The resulting model can be significantly more robust against the attack used during training. Table 2 shows the accuracy of a standard LSTM classifier with and without adversarial training against different attack recipes implemented in TextAttack.

TextAttack Under the Hood
TextAttack is optimized under-the-hood to make implementing and running adversarial attacks simple and fast.
AttackedText. A common problem with implementations of NLP attacks is that the original text is discarded after tokenization; thus, the transformation is performed on the tokenized version of the text. This causes issues with capitalization and word segmentation. Sometimes attacks swap a piece of a word for a complete word (for example, transforming ''aren't" into ''aren'too").
To solve this problem, TextAttack stores each input as a AttackedText object which contains the original text and helper methods for transforming the text while retaining tokenization.
Instead of strings or tensors, classes in TextAttack operate primarily on AttackedText objects. When words are added, swapped, or deleted, an AttackedText can maintain proper punctuation and capitalization. The AttackedText also contains implementations for common linguistic functions like splitting text into words, splitting text into sentences, and part-of-speech tagging.
Caching. Search methods frequently encounter the same input at different points in the search. In these cases, it is wise to pre-store values to avoid unnecessary computation. For each input examined during the attack, TextAttack caches its model output, as well as the whether or not it passed all of the constraints. For some search methods, this memoization can save a significant amount of time. 2

Related Work
We draw inspiration from the Transformers library (Wolf et al., 2019) as an example of a well-designed Natural Language Processing library. Some of TextAttack's models and tokenizers are implemented using Transformers.
cleverhans (Papernot et al., 2018) is a library for constructing adversarial examples for computer vision models. Like cleverhans, we aim to provide methods that generate adversarial examples across a variety of models and datasets. In some sense, TextAttack strives to be a solution like cleverhans for the NLP community. Like cleverhans, attacks in TextAttack all implement a base Attack class. However, while cleverhans implements many disparate attacks in separate modules, TextAttack builds attacks from a library of shared components. that generated examples satisfy a given constraint (Kulynych et al., 2018). TEAPOT is a library for evaluating adversarial perturbations on text, but only supports the application of ngram-based comparisons for evaluating attacks on machine translation models (Michel et al., 2019). Most recently, AllenNLP Interpret includes functionality for running adversarial attacks on NLP models, but is intended only for the purpose of interpretability, and only supports attacks via input-reduction or greedy gradient-based word swap (Wallace et al., 2019). TextAttack has a broader scope than any of these libraries: it is designed to be extendable to any NLP attack.

Conclusion
We presented TextAttack, an open-source framework for testing the robustness of NLP models. TextAttack defines an attack in four modules: a goal function, a list of constraints, a transformation, and a search method. This allows us to compose attacks from previous work from these modules and compare them in a shared environment. These attacks can be reused for data augmentation and adversarial training. As new attacks are developed, we will add their components to TextAttack. We hope TextAttack helps lower the barrier to entry for research into robustness and data augmentation in NLP. 3

Acknowledgements
The authors would like to thank everyone who has contributed to make TextAttack a reality: Hanyu Liu, Kevin Ivey, Bill Zhang, and Alan Zheng, to name a few. Thanks to the IGA creators  for contributing an implementation of their algorithm to our framework. Thanks to the folks at HuggingFace for creating such easy-touse software; without them, TextAttack would not be what it is today.