BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

Healthcare predictive analytics aids medical decision-making, diagnosis prediction and drug review analysis. Therefore, prediction accuracy is an important criteria which also necessitates robust predictive language models. However, the models using deep learning have been proven vulnerable towards insignificantly perturbed input instances which are less likely to be misclassified by humans. Recent efforts of generating adversaries using rule-based synonyms and BERT-MLMs have been witnessed in general domain, but the ever-increasing biomedical literature poses unique challenges. We propose BBAEG (Biomedical BERT-based Adversarial Example Generation), a black-box attack algorithm for biomedical text classification, leveraging the strengths of both domain-specific synonym replacement for biomedical named entities and BERT-MLM predictions, spelling variation and number replacement. Through automatic and human evaluation on two datasets, we demonstrate that BBAEG performs stronger attack with better language fluency, semantic coherence as compared to prior work.


Introduction
Recent studies have exposed the importance of biomedical NLP in the well-being of human-beings, analyzing the critical process of medical decisionmaking. However, the dialogue managing tools targeted for medical conversations (Zhang et al., 2020), (Campillos Llanos et al., 2017), (Kazi and Kahanda, 2019) between patients and healthcare providers in assisting diagnosis may generate certain insignificant perturbations (spelling errors, paraphrasing), which when fed to the classifier to determine the type of diagnosis required/detecting adverse drug effects/drug recommendation, might provide unreasonable performance. Insignificant * The work started when the author was a student at IIT Kharagpur, India. perturbations might also creep in from the casual language expressed in the tweets (Zilio et al., 2020). Thus, the classifier needs to be robust towards these perturbations.
Generating adversarial examples in text is challenging compared to computer vision tasks because of (i) discrete nature of input space and (ii) preservation of semantic coherence with original text. Initial works for attacking text models relied on introducing errors at the character level or manipulating words (Feng et al., 2018) to generate adversarial examples. But due to grammatical disfluency, these seem very unnatural. Some rule-based synonym replacement strategies (Alzantot et al., 2018), (Ren et al., 2019) have lead to more natural looking examples. (Jin et al., 2019) proposed TextFooler, as a baseline to generate adversaries for text classification models. But, the adversarial examples created by TextFooler rely heavily on word-embedding based word similarity replacement technique, and not overall sentence semantics. Recently, (Garg and Ramakrishnan, 2020) proposed BERT-MLM-based (Devlin et al., 2019) word replacements to create adversaries to better fit the overall context. Despite these advancements, there is much less attention towards making robust predictions in critical domains like biomedical, which comes with its unique challenges. (Araujo et al., 2020) has proposed two types of rule-based adversarial attacks inspired by natural spelling errors and typos made by humans and synonym replacement in the biomedical domain. Some challenges include: 1) Biomedical named entities are usually multi-word phrases such as colorectal adenoma. During token replacement, we need the entire entity to be replaced, but the MLM model (token-level replacement) fails to generate correct synonym of entity fitting in the context. So, we need a BioNER+Entity Linker (Martins et al., 2019), (Mondal et al., 2019) to link entity to ontology for generating correct synonyms.
2) Due to several variations of representing medical entities such as Type I Diabetes could be expressed as 'Type One Diabetes', we explore numeric entity expansion strategies for generating adversaries. 3) Spelling variations (keyboard swap, modification). While we evaluate on two benchmark datasets, our method is general and is applicable for any biomedical classification datasets.
In this paper, we present BBAEG (Biomedical BERT-based Adversarial Example Generation) 1 , a novel black-box attack algorithm for biomedical text classification task leveraging both the BERT-MLM model for non-named entity replacements combined with NER linked synonyms for named entities to better fit the overall context. In addition to replacing words with synonyms, we explore the mechanism of generating adversarial examples using typographical variations and numeric entity modification. Our BBAEG attack beats the existing baselines by a wide margin on both automatic and human evaluation across datasets and models. To the best of our knowledge, we are the first to introduce a novel algorithm for generating adversarial examples for biomedical text whose success attack is higher than the existing baselines like TextFooler and BAE (Garg and Ramakrishnan, 2020), (Li et al., 2020). The overall contributions of the paper include: 1) We explore several challenges of biomedical adversarial example generation. 2) We propose BBAEG, a biomedical adversarial example generation technique for text classification combining the power of several perturbation techniques. 3) We introduce 3 type of attacks for this purpose on two biomedical text classification datasets. 4) Through human evaluation, we show that BBAEG yields adversarial examples with improved naturalness.

Methodology
Problem Definition: Given a set of n inputs (D, Y ) = [(D 1 , y 1 ), . . .(D n , y n )] and a trained classifier M : D → Y , we assume the soft-label black-box setting where the attacker can only query the classifier for output probabilities on a given input, and has no access to the model parameters, gradients or training data. For an input of length l consisting of words w i , where 1 ≤ i ≤ l, (D i = [w 1 , ..., w l ], y), we want to generate an adversarial example D adv such that M (D adv ) = y. We would like D adv to be grammatically correct, 1 https://github.com/Ishani-Mondal/BBAEG.git

BBAEG Algorithm:
Our proposed BBAEG algorithm consists of four steps: 1) Tagging the biomedical entities on D and prepare two classes NE (named entities) and Non-NE (non-named entities) 2) Ranking the important words for perturbation 3) Choosing perturbation schemes 4) Final adversaries generation.
1) Named Entity Tagging: For each input instance D i (Line 1 in Algorithm), we apply sciSpacy 2 with en-ner-bc5cdr-md to extract biomedical named entities (drugs and diseases), followed by its Entity Linker (Drugs to DrugBank (Wishart et al., 2017), Disease to MESH 3 )). After linking the NE to respective ontologies, we use pyMeshSim 4 (for disease) and DrugBank (for drugs) to obtain synonyms. In each D i of size l (w 1 , w 2 , ...[w i ...w i+2 ], ...w l ), multi-word expressions (w i ...w i+2 ) are named entities. We put them in Named Entities Set (S N E ) and other words in non-Named Entity set (S N N E ).
2) Ranking of important words: We estimate token importance I i of each w i ∈ D, by deleting w i from D and computing the decrease in probability of predicting the correct label y (Line 2), similar to (Jin et al., 2019). Thus, we receive a set for each token which contains the tokens in decreasing order of their importance.
3) Choosing perturbation schemes: Consider the input D i , we describe a sieve-based approach of perturbing D i . Sieves are ordered by precision, with the most precise sieve appearing first.
Sieve 1 : In the first sieve, we propose to alter the synonyms of the tokens in S N E (Line 5-9) using Ontology linking and the words in S N N E (Line 10-15) using BERT-MLM predicted tokens. This stems from the fact that synonym replacement of the non-named entities using BERT-MLM generates reasonable predictions considering the surrounding context (Garg and Ramakrishnan, 2020). If the token is a part of S N E , replace them with the domain-specific synonyms one by one, but if the token is part of S N N E , then replace those words by the top-K BERT-MLM predictions. To achieve high semantic similarity with the original text, we filter the set of top K tokens (K is a pre-defined constant) (Line 12) predicted by BERT-MLM for the masked token, using a Sentence-Transformer (Reimers and Gurevych, 2019) based sentence similarity scorer. Additionally, we filter out predicted tokens that do not belong to the same part of speech as original token. If this sieve generates adversaries for D i , then D adv is being returned.
Sieve 2: (Line 20-28) If the first sieve does not generate adversary, we introduce two typographical noise in the input 1) Spelling Noise-N1: Rotating random p characters (Line 20) 2) Spelling Noise-N2: insertion of symbols to the beginning or end (Line 21). If this sieve generates adversaries for D i , then D adv is being returned.
Sieve 3: (Line 29-31) If Sieve 2 does not generate adversary, we replace the numeric entities by expanding the numeric digit. For example: PMD1 can be rewritten as PMD One, Covid19 as Covid nineteen. If this sieve generates adversaries for D i , then D adv is being returned.

4) Final adversaries generation:
For each of the three sieves, among all the winning adversaries, the one which is the most similar to original text as measured by (Reimers and Gurevych, 2019) is returned. If the sieves do not generate adversaries, we return the perturbed example which causes maximum reduction in the probability of output.

Experimental setup
Datasets and Experimental Details: We evaluate BBAEG on two different biomedical text classification datasets: 1) Adverse Drug Event (ADE) Detection (Gurulingappa et al., 2012) and 2) Twitter ADE dataset (Rosenthal et al., 2017) for the task of classifying whether the sentence contains mention of ADE (binary). . We fine-tune these models on the training data (of each corpus) using Adam Optimizer (Kingma and Ba, 2015) with learning rate of 0.00002, 10 epochs and perform adversarial attack on the test data. For the BBAEG non-NER synonym attacks, we use BERT-base-uncased MLM to predict the masked tokens. We consider top K=10 synonyms from the BERT-MLM predictions and set threshold α of 0.75 for cosine similarity between (Reimers and Gurevych, 2019) embeddings of the adversarial and input text, we set p=2 characters for rotation to introduce noise in input. For more details refer to the appendix. Successful challenge with clozapine in a history of pulmonary eosinophilia ailment.

BAE (Using BERT-MLM):
Successful challenge with hydrochloride in a history of pulmonary disease ailment.

BBAEG (Best Combination):
Successful challenge with clozapinum in a history of Loeffler Syndrome ailment.

Original:
A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with clozapine for schizophrenia. BBAEG (Spelling Noise-N2): A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with inoclozapine for cdschizophrenia.

BBAEG (Synonyms):
A 21-year-old patient developed rhabdomyolysis during 19th week of treatment with Clozapinum for dementia Praecox.

BBAEG (Number Replacement):
A twenty-one-year-old patient developed rhabdomyolysis during nineteenth week of treatment with clozapine for schizophrenia.

Results
Automatic Evaluation Results: We examine the success of adversarial attack using two criteria: (1) Performance Drop (Adrop): Difference between original (accuracy on original test set) and afterattack accuracy (accuracy on the perturbed test set) (2) Perturbation of input (%): Percentage of perturbed words in adversary generated. Success of attack is directly and indirectly proportional with criteria 1 and 2 respectively.
Effectiveness: Table 1 shows the results of BBAEG attack on two datasets across all the models. During our experiments with HAN (general deep learning model), we observe that the attack is the most successful compared to BERT-variants, RoBERTa and the existing baselines, in terms of both the criteria (1 and 2). Also, using BioBERT and Sci-BERT (35-45% and 40-50% accuracy drop respectively), the attack is the most successful. This stems from the fact that the vocabularies used in the datasets have already been explored during pre-training by the contextual embeddings, thus more sensitive towards small perturbations. Moreover, it has been clearly observed that unlike BERT and HAN, RoBERTa is very less susceptible to adversarial attacks (10-20% accuracy drop), perturbing 20-25% words in the input space. We also observe that BERT-MLM-based synonym replacement techniques for non-NER, combined with multi-word NER synonym replacement using entity linking outperforms TextFooler(TF) and BAEbased approaches in terms of accuracy drop.
Ablation Analysis: In Table 3, we perform an ablation analysis on the different perturbation schemes and the effect of the attack using each of the sieves by making use of two fine-tuned contextual embedding model as the target model for ADE classification. Synonym replacement (S1) (average 35% accuracy drop) and character rotation (S2-1) (average 38% accuracy drop) seems to be the most promising approach for success attacks on biomedical text classification. Moreover, we conduct a deeper analysis to gain an insight of how much the synonyms of NER vs Non-NER entities contribute towards prediction change. We have found that the multi-    Table 4. During ablation analysis, we observe that the synonym replaced perturbed samples looked more natural to the human evaluators compared to the spelling perturbed samples and number replaced entities. When considered jointly, the number replaced and synonym replaced samples seemed more natural to the annotators compared to spelling perturbed samples. This arises due to the fact that the number replaced entities when thrown to the annotators they could easily interpret the meaning correctly when given in combination with the original sample. For instance, in the examples shown in table 2, the number replaced samples (21-year old → twenty-one-year old) look more natural and easily interpretable compared to spelling perturbed samples (clozapine → clpazoine).

Conclusion and Future Work
In this paper, we propose a new technique for generating adversarial examples combining contextual perturbations based on BERT-MLM, synonym replacement of biomedical entities, typographical errors and numeric entity expansion. We explore several classification models to demonstrate the efficacy of our method. Experiments conducted on two benchmark biomedical datasets demonstrate the strength and effectiveness of our attack. As a future work, we would like to explore more about retraining the models with the perturbed samples in order to improve model robustness.