Textual Adversarial Attack as Combinatorial Optimization

Adversarial attack is carried out to reveal the vulnerability of deep neural networks. Textual adversarial attack is challenging because text is discrete and any perturbation might bring big semantic change. Word substitution is a class of effective textual attack method and has been extensively explored. However, all existing word substitution-based attack methods suffer the problems of bad semantic preservation, insufficient adversarial examples or suboptimal attack results. In this paper, we formalize the word substitution-based attack as a combinatorial optimization problem. We also propose a novel attack model, which comprises a sememe-based word substitution strategy and the particle swarm optimization algorithm, to tackle the existing problems. In experiments, we evaluate our attack model on the sentiment analysis task. Experimental results demonstrate our model achieves higher attack success rates and less modification than the baseline methods. The ablation study also verifies the superiority of the two parts of our model over previous ones.


Introduction
Adversarial attack is aimed at generating adversarial examples (Szegedy et al., 2013;Goodfellow et al., 2014) by perturbing the original input to fool the deep neural networks (DNNs). It is believed that adversarial attack can reveal the vulnerability of DNNs and improve their robustness and interpretability. Recently extensive researches into adversarial attack on images (Szegedy et al., 2013;Chen et al., 2018) have been conducted.
Compared with images, adversarial attack on text is more challenging. Text is composed of discrete words, which means it is hard to adopt gradient-based methods to perturb it. Moreover, readability and meaning-preservation are of vital importance to textual adversarial examples, because any subtle perturbation is perceptible and can lead to significant semantic difference. Various methods are proposed to tackle the challenges, such as back-translation (Iyyer et al., 2018), searching in underlying semantic space (Zhao et al., 2017), character flipping (Ebrahimi et al., 2018) and word substitution (Ribeiro et al., 2018;Alzantot et al., 2018;Ren et al., 2019).
Among these methods, word substitution is promising and has been extensively explored. That is because a word is the smallest element of languages and properly substituting some words of a sentence hardly leads to unreadability or substantial semantic change. Word substitution-based textual attack can be formalized as a combinatorial optimization problem, which is targeted at finding the best one from a set of adversarial examples with substituted words. It has two steps including (1) generating adversarial example candidate set and (2) searching for the best one from the candidate set.
The first step develops a strategy to substitute some words of the original text and generates as many perturbed sentences as possible. Common word substitution strategies include finding words with the closest word embeddings (Alzantot et al., 2018) and using synonyms as substitutes (Kang et al., 2018;Ren et al., 2019). They suffer the problems of either bad semantic preservation or insufficient candidate adversarial examples. In the second step, different search algorithms are utilized to find the best adversarial example which can successfully fool the target model. The genetic algorithm (Alzantot et al., 2018), tailor-made word saliency-based method (Ren et al., 2019), etc. are used. However, these methods are hard to find the global optimal result.
In this paper, we propose a novel word substitution-based textual attack model, which reforms both the aforementioned two steps. In the first step, we adopt a sememe-based word substitution strategy, which can generate more candidate adversarial examples with better semantic preservation. In the second step, we utilize particle swarm optimization (Eberhart and Kennedy, 1995) as the adversarial example searching algorithm for the first time, which does better in finding the global optimal solution. In experiments, we use our model to attack the widely employed BiLSTM on the task of sentiment analysis. Experimental results demonstrate that our model achieves higher attack success rate and less modification of original text than baseline methods. An ablation study also verifies the superiority of both sememe-based word substitution strategy and particle swarm optimization searching algorithm.

Background
In this section, we first briefly introduce sememe, and then give an overview of classical particle swarm optimization.

Concept and Applications of Sememes
In linguistics, a sememe is defined as the minimum semantic unit of human languages (Bloomfield, 1926), and the meaning of a word can be represented by the composition of its sememes.
In the field of NLP, sememe knowledge bases are built to utilize sememes in practical applications. HowNet (Dong and Dong, 2006) is the most well-known one. It annotates over 100 thousand English and Chinese words with a predefined sets of about 2,000 sememes. The sememe annotation of HowNet is sense-level, i.e., for a polysemous words, each of its senses is annotated with some sememes separately.
With the help of large sememe knowledge bases like HowNet, sememes have been successfully applied to various NLP tasks such as word representations learning (Niu et al., 2017), sentiment analysis (Fu et al., 2013), language modeling (Gu et al., 2018), semantic composition (Qi et al., 2019a), etc.

Particle Swarm Optimization
Particle swarm optimization (PSO) was firstly proposed as a kind of evolutionary computation paradigms (Eberhart and Kennedy, 1995). In-spired by social behavior such as bird flocking and fish schooling, PSO exploits a collection of interacting individuals to search in the specific space for the optimal solution. The collection is called a swarm and the individuals are called particles. Each particle moves with an adaptable velocity in the search space. Formally, when searching in an n-dimensional space S ∈ R n with a swarm containing N particles, the i-th particle is actually an n-dimensional vector: and its velocity is also an n-dimensional vector: Next we describe the process of PSO step by step, which is also illustrated in Figure 1. Initialize Before searching, a swarm of N particles is initialized and each particle is initialized with a random position and a random velocity ranging in [−V max , V max ].
Record For each particle, the best position it has encountered is recorded as the historical best position of this particle. And for the swarm, the best position ever reached by all particles of the swarm is recorded as the global best position.
Terminate When the global best position meets the condition, that is to say at least one of the particles successfully finds the solution, the algorithm will terminate and output the global best position.
Update Speed and Position If the termination condition is not reached, the velocity of each particle is updated according to its position and its historical best position and the global best position. The updating formula of the velocity and the position of the d-th dimension of the i-th particle is shown below: where ω is called the inertia weight, c 1 and c 2 are called constriction factors, and P i is the historical best position of the i-th particle and P g is the global best position. PSO is widely used to solve optimization problems such as evolving neural networks (Eberhart and Hu, 1999) and image classification (Omran et al., 2004). However, for lots of optimization problems, such as POS tagging (Silva et al., 2012) and texts clustering (Cagnina et al., 2014), whose search space is discrete, some concepts such as the velocity and the position in original PSO are not applicable anymore. Kennedy and Eberhart (1997) propose a discrete version of PSO to solve this problem, where the updating formula of velocity is as follows: where ϕ is a random positive number. The value of X id changes with the probability Sigmoid(V id ), where

Methodology
In this section, we first delineate our sememebased word substitution strategy. Then we detail the PSO-based adversarial example searching algorithm.

Sememe-based Word Substitution Strategy
According to the definition of a sememe, the sememes of a word accurately depict the meaning of the word. Therefore, selecting the words with similar sememes as substitutes is sensible. For one thing, compared with the embedding-based word substitution, sememe-based word substitution will not substitute victim words with their related words, e.g., "car" and "road" may have close embeddings but have different sememes. Therefore, it can better preserve the original semantics and readability. For another, the synonymbased substitution depends on thesauri like Word-Net (Miller, 1995) but they provide no synonyms for named entities and the number of a word's synonyms is very limited. In contrast, HowNet annotates sememes for all kinds of words including named entities and the portrayal of words given by sememes is fine-grained, which means sememe-based word substitution strategy can generate much more candidate adversarial examples. Particularly, we only substitute notional words 1 and restrict the substitutes to having the same POS tag as the victim word. Considering the situation of polysemy, word A can be substituted by word B if one sense of A has the same sememe annotation as one sense of B. Also, to prevent introducing grammatical mistakes, we restrict that the substitutes and the victim word must have the same morphological form including tense of verbs and plural of nouns.

PSO-based Adversarial Example Searching
Before demonstrating our algorithm, we first explain some concepts in our algorithm: • Particle, Swarm and Searching Space Each particle in the swarm is a sentence of n words, which can be assumed as an ndimensional vector. The set of substitution words of the i-th word in the sentence constitutes the search space of the i-th dimension.
• Target Label and Target Score Target label is the label that we want the target model to predict for the adversarial example. For example, if the true label of the original sentence is "positive", the target label is "negative", and vice versa. Target score is the prediction probability of the target label given by the target model, which is denoted by P (y target |x), where x is the input sentence and y target is the target label.
• Modification Rate The modification rate of one particle (sentence) is defined as follows: where dif (x, x orig ) is the number of words in the perturbed sentence that are different from the original sentence and length(x) is the length of the perturbed sentence.
• Mutation Since we want to find adversarial examples with as few modifications as possible, we can not initialize the particles in the swarm with stochastic positions in the search space. Instead, we define an operation on each particle named mutation. For a particle (sentence), one mutation means changing one dimension (word) of it.
We initialize the swarm by applying mutation on the original sentence N times and giving each particle a stochastic velocity between −V max and V max . In order to enhance the diversity of the population, we apply mutation to each particle x with the probability P (x) at each iteration if the termination condition is not reached. In order to prevent excessive modifications, we define that where m(x) is the modification rate of x. For our problem, we define the updating formula of velocity (changing probability) as follows: where V id is restricted to [−V max , V max ]. Function equal is defined as: ω is called the active factor. A higher value of ω makes the particle more active in the searching space and a lower one makes the particle more easy to move towards the best positions. In order to make the swarm be more active in the preliminary stage and gather to the best positions in the later stage, we design a time decreasing ω inspired by Shi and Eberhart (1998). Formally, where ω 1 and ω 2 are constants between 0 and 1, M axIter is the max iteration time and CurIter is the current iteration time. As in Kennedy and Eberhart (1997), the particles move to the best positions with the probability Sigmoid(V id ).
At each iteration, when the predicted label of one of the particles in the swarm is the target label, the particle is outputed and the algorithm teminates. Otherwise, all particles in the swarm update its velocity and position and the next iteration starts.

Experiments
In this section, we evaluate our attack model on sentiment analysis tasks, comparing the difference of attack effect among different word substitution strategies and different optimization algorithms.
Dataset We use OpenHowNet (Qi et al., 2019b), the open API of HowNet to obtain sememe annotations of words. We use the Stanford Sentiment Treebank (SST) dataset (Socher et al., 2013) as the evaluation set of the target model.

Target Model
We train a Bi-LSTM with maxpooling as the target model to evaluate the attack methods. We use 300-dimensions pre-trained Glove (Pennington et al., 2014) word embeddings as the inputs of the model. The test accuracy of the target model is 83.8%.

Baseline Model
We choose the attack model in Alzantot et al. (2018) as our basic baseline method, which adopts word embeddingbased substitution and uses genetic algorithm (GA) to search for the best adversarial examples. To further analyze our model, we also conduct an ablation study which combines different word substitution strategy (embedding, synonym and sememe-based) with different searching algorithms (GA and PSO).
Evaluation Metrics First, we randomly sample 1, 000 correctly classified examples from the test set as the evaluation set of attack models. Then we evaluate the attack models based on both success rates and modification rates. For every instance in the evaluation set, a model successfully attacks only when its generated adversarial example is classified by the target model as a different label (e.g., negative to positive) within at most G iterations.
Experimental Settings Following the settings in Alzantot et al. (2018), we set the max iteration time G to 20. We adjust PSO on the validation set of SST and set ω 1 as 0.8 and ω 2 as 0.2. We set the max velocity of the particles V max to 3, which means the changing probability of the particles ranges from 0.047 (sigmoid(−3)) to 0.953 (sigmoid (3)).
Results The attack success rates and modification rates of all the models are shown in Table 1. We can observe that:  1) Our model (PSO+Sememe) achieves the highest success rate and the lowest modification rate than all the baseline methods, which demonstrates the effectiveness of our attack method.
2) Among the three word substitution strategies, sememe-based strategy performs best in terms of both the attack success rate and the modification rate. We also give an example of substitutes in Table 2. It shows that sememe-based word substitution strategy can find more diverse substitutes with good semantic and readability preservation.
The mushroom soup and roast-duck are delicious and I also like the salt and pepper.
Embedding-based Substitutes: Sememe-based Substitutes: Quanjude, mutton, roast, roast-chicken, duck-store, braised-chicken, snacks, oven cake, rice, porridge, bread, tofu, chocolate, wonton, food, set-meal Table 2: An example of the substitutes found by embedding-based and sememe-based strategy, where the victim word is colored green and substitutes that fit well in the context are colored red.
3) Compared with genetic algorithm, our PSObased searching algorithm can find more successful adversarial examples with less modification.

Conclusion and Future Work
In this paper, we formalize word substitutionbased textual adversarial attack as a combinatorial optimization problem. And we propose an attack model comprising the sememe-based word substitution strategy and the particle swarm optimization algorithm. We evaluate our attack model on the task of sentiment analysis, finding our model achieves higher attack success rates and lower modification rates than baseline methods. In the future, we will consider sememe-based adversarial defense method.