Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples

Despite achieving prominent performance on many important tasks, it has been reported that neural networks are vulnerable to adversarial examples. Previously studies along this line mainly focused on semantic tasks such as sentiment analysis, question answering and reading comprehension. In this study, we show that adversarial examples also exist in dependency parsing: we propose two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase levels, and design algorithms to construct such examples in both of the black-box and white-box settings. Our experiments with one of state-of-the-art parsers on the English Penn Treebank (PTB) show that up to 77% of input examples admit adversarial perturbations, and we also show that the robustness of parsing models can be improved by crafting high-quality adversaries and including them in the training stage, while suffering little to no performance drop on the clean input data.


Introduction
Deep neural network-based machine learning (ML) models are powerful but vulnerable to adversarial examples. Adversarial examples also yield broader insights into the targeted models by exposing them to such maliciously crafted examples. The introduction of the adversarial example and training ushered in a new era to understand and improve the ML models, and has received significant attention recently (Szegedy et al., 2013;Goodfellow et al., 2015;Moosavi-Dezfooli et al., 2016;Papernot et al., 2016b;Carlini and Wagner, 2017;Yuan et al., 2019;Eykholt et al., 2018;Xu et al., 2019).
Even though generating adversarial examples for texts has proven to be a more challenging task  (Dozat and Manning, 2017). Replacing a word "stock" with an adversarially-chosen word "exchange" in the sentence causes the parser to make four mistakes (blue, dashed) in arc prediction. The adversarial example preserves the original syntactic structures, and the substitute word is assigned to the same part of speech (POS) as the replaced one. The assigned POS tags (blue) are listed below the words.
than for images and audios due to their discrete nature, a few methods have been proposed to generate adversarial text examples and reveal the vulnerability of deep neural networks in natural language processing (NLP) tasks including reading comprehension (Jia and Liang, 2017), text classification (Samanta and Mehta, 2017;Wong, 2017;Liang et al., 2018;Alzantot et al., 2018), machine translation (Zhao et al., 2018;Ebrahimi et al., 2018;Cheng et al., 2018) and dialogue systems . These recent methods attack text examples mainly by replacing, scrambling, and erasing characters or words or other language units under certain semantics-preserving constraints.
Although adversarial examples have been studied recently for NLP tasks, previous work almost exclusively focused on semantic tasks, where the attacks aim to alter the semantic prediction of ML models (e.g., sentiment prediction or question answering) without changing the meaning of original texts. To the best of our knowledge, adversarial examples to syntactic tasks, such as dependency parsing, have not been studied in the literature. Motivated by this, we take the neural network-based dependency parsing algorithms as targeted models and aim to answer the following questions: Can we construct syntactic adversarial examples to fool a dependency parser without changing the original syntactic structure? And can we make dependency parsers robust with respect to these attacks?
To answer these questions, we propose two approaches to study where and how parsers make mistakes by searching over perturbations to existing texts at sentence and phrase (corresponding to subtrees in a parse tree) levels. For the sentencelevel attack, we modify an input sentence to fool a dependency parser while such modification should be syntactically imperceptible to humans (see Figure 1). Any new error (excluding the arcs directly connected to the modified parts) made by the parser is accounted as a successful attack.
For the phrase-level (or subtree-level) attack, we choose two phrases from a sentence, which are separated by at least k words (say k ≥ 0), and modify one phrase to cause the parser's prediction errors in another target phrase (see Figure 2). Unlike the sentence-level attack, any error occurred outside the target subtree is not considered as a successful attacking trial. It helps us to investigate whether an error in one part of a parse tree may exert longrange influence, and cause cascading errors (Ng and Curran, 2015). We study the sentence-level and subtree-level attacks both in white-box and black-box settings. In the former setting, an attacker can access to the model's architecture and parameters while it is not allowed in the latter one.
Our contributions are summarized as follows: (1) we explore the feasibility of generating syntactic adversarial sentence examples to cause a dependency parser to make mistakes without altering the original syntactic structures; (2) we propose two approaches to construct the syntactic adversarial examples by searching over perturbations to existing texts at sentence and phrase levels in both the blackbox and white-box settings; (3) our experiments with a close to state-of-the-art parser on the English Penn Treebank show that up to 77% of input examples admit adversarial perturbations, and moreover that robustness and generalization of parsing models can be improved by adversarial training with the proposed attacks. The source code is available at (https://github.com/zjiehang/DPAttack).

S u b t r e e t o b e m o d if ie d T a r g e t s u b t r e e
A example sentence: In a stock-index arbitrage sell program, traders buy or sell big baskets of stocks and offset the trade in futures to lock in a price difference. Figure 2: Phrase-level attack: two separate subtrees in a parse tree are selected, and one of them (left) is deliberately modified to cause a parser to make incorrect arc prediction for another target subtree (right). For example, we can make a neural dependency parser (Dozat and Manning, 2017) to attach the word "difference" in the target subtree to its sibling "in" instead of the correct head "lock" (the subtree's root) by maliciously manipulating the selected leftmost subtree only.

Related Work
Generating adversarial examples -inputs intentionally crafted to fool a model -has become an important means of exploring model vulnerabilities. Furthermore, adding adversarial examples in the training stage, also known as adversarial training, has become one of the most promising ways to improve model's robustness. Although there is limited literature available for NLP adversarial examples, some studies have been conducted on NLP tasks such as reading comprehension (Jia and Liang, 2017), text classification (Samanta and Mehta, 2017;Wong, 2017;Liang et al., 2018;Alzantot et al., 2018), machine translation (Zhao et al., 2018;Ebrahimi et al., 2018;Cheng et al., 2018), and dialogue systems .
Depending on the degree of access to the target model, adversarial examples can be constructed two different settings: white-box and black-box settings (Xu et al., 2019;Wang et al., 2019). In the white-box setting, an adversary can access the model's architecture, parameters and input feature representations while not in the black-box one. The white-box attacks normally yield a higher success rate because the knowledge of target models can be used to guide the generation of adversarial examples. However, the black-box attacks do not require access to target models, making them more practicable for many real-world attacks. Such attacks also can be divided into targeted and non-targeted ones depending on the purpose of adversary. Our phrase-level attack can be viewed as a targeted at-tack towards a specific subtree while the sentencelevel attack can be taken as a non-targeted one.
For text data, input sentences can be manipulated at character (Ebrahimi et al., 2018), sememe (the minimum semantic units) (Zang et al., 2019), or word (Samanta and Mehta, 2017;Alzantot et al., 2018) levels by replacement, alteration (e.g. deliberately introducing typos or misspellings), swap, insertion, erasure, or directly making small perturbations to their feature embeddings. Generally, we would like to ensure that the crafted adversarial examples are sufficiently similar to their original ones, and these modifications should be made within semantics-preserving constraints. Such semantic similarity constraints are usually defined based on Cosine similarity (Wong, 2017;Barham and Feizi, 2019;Jin et al., 2019;Ribeiro et al., 2018) or edit distance (Gao et al., 2018).
Text adversarial example generation usually involves two steps: determine an important position (or token) to change; modify it slightly to maximize the model's prediction error. This two-step can be repeated iteratively until the model's prediction changes or certain stopping criteria are reached. Many methods have been proposed to determine the important positions by random selection (Alzantot et al., 2018), trial-and-error testing at each possible point (Kuleshov et al., 2018), analyzing the effects on the model of masking various parts of a input text (Samanta and Mehta, 2017;Gao et al., 2018;Jin et al., 2019;Yang et al., 2018), comparing their attention scores , or gradient-guided optimization methods (Ebrahimi et al., 2018;Lei et al., 2019;Wallace et al., 2019;Barham and Feizi, 2019).
After the important positions are identified, the most popular way to alter text examples is to replace the characters or words at selected positions with similar substitutes. Such substitutes can be chosen from nearest neighbours in an embedding space (Alzantot et al., 2018;Kuleshov et al., 2018;Jin et al., 2019;Barham and Feizi, 2019), synonyms in a prepared dictionary (Samanta and Mehta, 2017;, visually similar alternatives like typos (Samanta and Mehta, 2017;Ebrahimi et al., 2018;Liang et al., 2018) or Internet slang and trademark logos (Eger et al., 2019), paraphrases (Lei et al., 2019) or even randomly selected ones (Gao et al., 2018). Given an input instance, Zhao et al. (2018) proposed to search for adversaries in the neighborhood of its corresponding representation in latent space by sampling within a range that is recursively tightened. Jia and Liang (2017) tried to insert few distraction sentences generated by a simple set of rules into text examples to mislead a reading comprehension system.

Preliminary
Dependency parsing is the task of constructing a parse tree of a sentence that represents its syntactic structure and defines the relationships between "head" words and dependent ones, which modify their heads (see the arcs in Figure 1). In this section, we first describe a graph-based dependency parsing method, and then formally present the adversarial attack problem of dependency parsing.

Dependency Parsing
Graph-based parsing models learn the parameters to score correct dependency subgraphs over incorrect ones, typically by factoring the graphs directed edges (or arcs), and performs parsing by searching the highest-scoring graph for a given sentence.
Given a sentence x, we denote the set of all valid parse trees that can be constructed from x as Y(x). Assume that there exists a graph scoring function s, the dependency parsing problem can be formulated as finding the highest scoring directed spanning tree for the sentence x.
where y * (x) is the parse tree with the highest score, and θ are all the parameters used to calculate the scores. Given a sentence x [1:n] that is a sequence of n words x i , 1 ≤ i ≤ n, the score of a graph is usually factorized into the sum of its arc scores to make the search tractable (McDonald et al., 2005).
where A(ŷ) represents a set of directed edges in the parse treeŷ. The score of an arc (x h , x m ) represents the likelihood of creating a dependency from head x h to modifier x m in a dependency tree.

Problem Definition
A neural network can be considered as a mapping f : X → Y from an input x ∈ X to a output y ∈ Y with parameters θ. For classification problems, y is a label which lies in some finite set of categories. For the dependency parsing, y is one of valid parses that can be built from x. The model f maps x to y * with the highest score, as defined in Equation (1).
Given the original input x, adversarial examples are crafted to cause an ML model to misbehave. Following the common definition in previous papers (e.g., Kuleshov et al., (2018)), for a model f , we say x is a good adversarial example of x for untargeted attack if where y is the truth output for x. For targeted attack the goal is to turn f (x ) into a particular targeted class, denoted by y , under the same constraint in (3). The constraint function c : X × X → R g + and a vector of bounds ∈ R g (g ≥ 1) reflect the notion of the "imperceptibility" of perturbation to ensure that the true label of x should be the same as x. In the context of image classification, popular choices of such constraint include 0 , 2 and ∞ distances. For natural language tasks, x and x are sentences composed with discrete words, and previous methods often define c to measure the semantic similarity between them, and thus x, x should have the same semantic meaning while being predicted differently using model f . In this paper, we consider the syntactic similarity and propose various ways to define such constraint for the dependency parsing task (see Section 4).
Generating adversarial examples can be formulated as an optimization problem of maximizing the probability of f (x ) = y by choosing x for x subject to c(x, x ) ≤ . Algorithms for solving this problem include fast gradient sign method (Goodfellow et al., 2015), iterative methods based on constrained gradient descent (Papernot et al., 2016a), GAN-based strategy (Wong, 2017), genetic algorithms (Alzantot et al., 2018), and submodular set function maximization (Lei et al., 2019).

Method
Adversarial examples are required to maintain the original functionality of the input. In the adversarial NLP literature, previous studies often expect the adversarial examples to retain the same or similar semantic meaning as the original one (Samanta and Mehta, 2017;Wong, 2017;Alzantot et al., 2018;Zhao et al., 2018;Zang et al., 2019). However, in this paper we focus on the dependency parsing task, which focuses on predicting the syntactic structure of input sentences. Therefore, to expose regions of the input space where the dependency parsers perform poorly, we would like the modified examples x to preserve the same syntactic structure as the original x, but slightly relax the constraint on their similarity in semantic properties. A robust parser should perform consistently well on the sentences that share the same syntactic properties, while differ in their meaning. For example, substituting the word "black" for "white", or "dog" for "cat" are acceptable replacements because they are grammatically imperceptible to humans.

Adversarial Examples for Parsing
We craft the adversarial examples mainly by replacing few words in an input sentence with carefully selected ones. To preserve the same syntactic structure as the original sentence x, we impose the following three constraints that should be satisfied by the word replacement when generating the adversarial examples x : (i) The substitute word x i should fit in well with the context, and can maintain both the semantic and syntactic coherence. (ii) For any word x i in an original example, the word x i to replace x i must have the same partof-speech (POS) as x i . (iii) Pronouns, articles, conjunctions, numerals, interjections, interrogative determiners, and punctuations are not allowed to be replaced 1 .
To select a substitute word that agrees well with the context of a sentence, we use the BERT (Devlin et al., 2019) to generate a set of candidate words that are suitable to replace the original word thanks to its bidirectional language model that is capable of capturing the wider context of the entire sentence 2 . Words that are assigned to the same POS generally have similar grammatical properties and display similar syntactic behavior. To enforce the second constraint, we require that the substitute x i should be assigned to the same part of speech as x i by a POS tagger like (Samanta and Mehta, 2017;Ebrahimi et al., 2018). We filter out the aforementioned words in the third constraint.
We adopt the following two-step procedure for generating text adversarial examples: choose weak spots (or positions) to change, and then modify them to maximize the model's error. In the blackbox setting, we first identify the weak spots of an input sentence with the greedy search strategy by replacing each word, one at a time, with a special "unknown" symbol (<unk>), and examining the changes in unlabeled attachment score (UAS) like (Yang et al., 2018;Gao et al., 2018;. For each identified weak spot, we replace it with a word in the candidate set proposed by the BERT to form an attack. We select the substitute word that causes the greatest decrease in UAS while satisfying the aforementioned constraints to construct the adversarial example. This process is repeated until all candidate words are exhausted and every weak spot is tested (see Figure 3).
In the white-box setting, full access to the target model's parameters and features enables us to launch a "surgical" attack by crafting more accurate adversarial examples. We propose a scoring function to determine which parts are more vulnerable to adversarial attacks for an input sentence x of n words x i (1 ≤ i ≤ n) as follows.
where θ are all the parameters of a target dependency parser, e x i is the embedding of word x i , and ε ≥ 0 denotes a confidence margin. A larger ε will lead to a more confident output and a higher success rate, but with the cost of more iterations. The function F(x, θ) sums up all the differences between the score of any ground truth arc (x h , x m ) and that of the incorrect, but the highest scoring one with the same dependant x m . Generally speaking, the greater the value of this function is, the harder we can find adversarial examples for the input x because it has a larger margin between the true parse tree and any incorrect one. Minimizing this function maximizes the probability of causing the parser to misbehave. We determine the importance of words by their values of S(x i , θ), namely the norm of the partial derivative of the function F(x, θ) with respect to the word x i . The key idea is that we use the magnitude of the gradient to decide which words to attack. Assuming we have a set of candidate words C x i , we select the optimal one x * i by: where the coefficient α governs the relative importance of the normalized gradient term. We want the selected word as close to the replaced one x i as possible in their embedding space according to the Euclidean distance, where the embedding of x i is updated in the opposite direction of the gradient at the rate of α. Such replacement will lead to a decrease in the value of the function F(x, θ). Our algorithm of generating adversarial examples for dependency parsing in the white-box setting is shown in Figure 4. Inputs: x [1:n] : an input sentence of n words xi, 1 ≤ i ≤ n. f : a target parser. γ: the maximum percentage of words that can be modified. ψ: the size of the set of candidate words. Output: an adversarial example x of x. Algorithm: 1: κ = γ ·n (the maximum number of words to be modified) 2: for each word xi except those listed in the constraint (iii) 3:xi = replace xi with a special symbol "<unk>" in x; 4: calculate the unlabeled attachment score of f (xi). 5: sortxi by their UAS, and append the top-κ positions into an ordered index list [1 : κ]; 6: for each position j in the list [1 : κ] 7: generate a set of ψ candidate words Cj by BERT; 8: remove the words from Cj if they do not have the same part-of-speech as the xj; 9: select the word x * j ∈ Cj that causes the greatest decrease in UAS if we replace xj with x * j in x; 10: x = replace xj with the word x * j in x. 11: return x .

Sentence-level and Phrase-level Attacks
For the sentence-level attack, we simply use the algorithms listed in Figure 3 and 4 to form a attack. For the phrase-level attack, we first choose two phrases (corresponding to two subtrees in a parse) from a sentence, which do not overlap each other and are separated by at least k words. Then, we try to cause the parser to make mistakes in a target subtree by modifying another one. Unlike the sentence-level attack, any error occurred outside the target subtree will not be counted as a successful trial. Note that even if we can force the parser to change its prediction on the head of the target subtree's root, it is still not considered as a successful attack because the changed edge connects a certain word outside the subtree.
We require that all the subtrees should contain 4 to 12 words 3 , and the source subtree to be modified and its target share no word in common. Depending on the purpose of the adversary, adversarial attacks can be divided into two categories: targeted attack and non-targeted attack. The subtree-level attack can be viewed as a targeted attack while the sentence-level attack as a non-targeted one.
A small subtree can be taken as a relatively independent structure. If a parser is robust enough, it should always give the consistent result for a target subtree even when there are some errors in another source subtree that does not overlap with the target one. Therefore, we relax some constraints in the cases of the phrase-level attacks, and allow the words in the source tree to be replaced with any word in the vocabulary if the number of modified words is no more than a given value. With the help of these adversarial examples, we can investigate whether an error in one part of a parse tree may exert long-range influence, and successfully cause cascading errors.
In the black-box setting, we first collect all the subtrees from an input sentence, and then perform trial-and-error testing with every source-target pair. For each pair, we try to modify the source subtree up to κ words (say κ = 3) by replacing them with other randomly selected words. This process is repeated until a pair is found where the UAS of the target subtree decreases.
In the white-box setting, we can obtain a function as F(x, θ) in Equation (4) for every possible target subtree (excluding its root), and then calculate a score for each source-target pair as follows.
where x [s] denotes a source subtree, and x [t] a target one. Such scores can be used to rank the sourcetarget pairs for their potential to deliver a successful attack. Generally, the greater the score is, the more vulnerable the target subtree is to the source one. If we remove the sum from the right hand side of (6), we can obtain the norm of the partial derivative of the function F(x [t] , θ) with respect to each word x i in the source subtree, which helps us to determine which words have higher priority to be changed. For an input sentence, we successively take one from the list of the source-target pairs in the order of their scores. For each pair, we simultaneously tence examples for the experiment. According to our statistics on the English PTB test set, 35.14% sentences have two such subtrees, 17.18% have three, and 8.98% have four or more.
Inputs: x [1:n] : an input sentence of n words xi, 1 ≤ i ≤ n. f : a target parser. γ: the maximum percentage of words that can be modified. ψ: the size of the set of candidate words. ξ: the maximum number of trials. Output: an adversarial example x of x. Algorithm: 1: κ = γ ·n (the maximum number of words to be modified) 2: while no decrease of UAS in the latest ξ trials do 3: select the word xi to be replaced as Equation (4); 4: if the number of words to replace is greater than κ then break; 5: generate a set of ψ candidate words Ci by BERT; 6: remove the words from Ci if they do not have the same part-of-speech as the xi; 7: choose the word x * i ∈ Ci to replace xi as Equation (5); 8: x = replace xi with the word x * i in x. 9: return x . replace three words in the source subtree guided by their gradients as Equation (5). More than one word are replaced at each iteration to avoid getting stuck in a local optimum. This two-step procedure is repeated until the parser's prediction changes.

Experiments
We first describe the target parser as well as its three variants, evaluation dataset, and hyper-parameter settings. We then report the empirical results of the proposed adversarial attacking and training. We also list some adversarial examples generated by our attacking algorithms in Table 5.

Target Parser and Its Variants
We choose the graph-based dependency parser proposed by Dozat and Manning (2017) as our target model. This well-known parser achieved 95.7% unlabeled attachment scores (UAS) and 94.1% labeled attachment scores (LAS) on English PTB dataset and close to state-of-the-art performance on standard treebanks for five other different natural languages (Buchholz and Marsi, 2006).
Specifically, Dozat and Manning (2017) extends bidirectional LSTM-based approach (Kiperwasser and Goldberg, 2016) with biaffine classifiers to predict arcs and labels. They presented two variants of their model: one takes only words as input, and the other takes both the words and their POS tags. Moreover, we use the Stanford POS tagger (Toutanova et al., 2003) to generate the POS tag for each word. In addition to these two, we add a new  Table 1: Results of sentence-level adversarial attacks on a state-of-the-art parser with the English Penn Treebank in both the black-box and white-box settings. "Word-based", "Word + POS", and "Character-based" denote three variants of the model (Dozat and Manning, 2017) with differences in their input forms. "Max%" denotes the maximum percentage of words that are allowed to be modified, "UAS" unlabeled attachment scores, "#Word" the average number of words actually modified, and "Succ%" the success rate in terms of the number of sentences.   Table 3: The attack success rate and the corresponding changes in UAS by modifying the words with different part-of-speech. "JJ" denotes adjective, "NN" noun, "RB" adverb, "VB" verb, and "IN" preposition.

Datasets and Hyper-parameter Settings
We evaluate our methods on the English Penn Treebank (PTB), converted into Stanford dependencies using version 3.3.0 of the Stanford dependency converter (de Marneffe et al., 2006) 4 . We follow the standard PTB split, using section 2-21 for training, section 22 for development and 23 for testing.
For the target parsing models, we use the same choice of hyperparameters as (Dozat and Manning, 2017): 100-dimensional uncased word embeddings and POS tag vectors; three bi-directional LSTM layers (400 dimensions in each direction); and 500and 100-dimensional ReLU MLP layers for arc and label predictions respectively. For the characterbased variant, we use 100-dimensional character vectors, and 200-dimensional LSTM. The other hyper-parameters were tuned with the PTB 3.3.0 development set by trying only a few different settings. In the following experiments, the maximum size of candidate words ψ was set to 50, the coefficient α in Equation (5) to 15, and the maximum number of trials to 40. For each example, we terminate the trials immediately if the drop in UAS is more than 30% in the white-box setting.

Results of the Sentence-level Attacks
We now report the empirical studies of adversarial attacks for sentence-level methods. In Table 1, we present both clean accuracy and accuracy under attacks on PTB with the three variants of the parsing model (Dozat and Manning, 2017), where we allow three different, 5%, 10% and 15% word replacements. A success rate is defined as the number of sentences successfully modified (causing the model to make errors) divided by all the number of sen-tences to be attempted. The results show that the proposed attacks are effective. With less than two words perturbed on average, our white-box attack can consistently achieve > 60% success rate.
We also observe that the word-based model is most vulnerable to the adversarial examples among the three variants. Its performance drops 15.17% in UAS, and 77% sentence examples admit the adversarial perturbations under the white-box attack with 15% word replacement. The model taking the words and their POS tags as input ("Word + POS") seems to be more robust against adversarial examples in both settings. One reasonable explanation is that we require the substitute words to have the same part-of-speech as the original ones, and the model can produce more consistent results with the help of the POS tags. The white-box attacks are clearly much more effective than the black-box ones across the three variants of the parsing model and different word replacement rates.
Despite having high success rates, we want to know whether the generated examples are syntactically faithful to and coherent with the original sentences. To evaluate the quality of these adversarial examples, we randomly collect 100 sentences and their adversarial examples each generated in the black-box and white-box settings, and presented them to three human evaluators. The evaluators were asked to examine whether each generated example still preserve the original syntactic structure. We adopted a majority vote for the results, and found that 80% examples generated in the whitebox setting and 75% in the black-box setting are considered unchanged in their syntactic structures.
The three human evaluators are postgraduate students with at least three years of research experience in syntactic parsing. Those three annotators' pairwise-agreement percentages are 90%, 82%, and 82% for the adversarial examples generated in the white-box setting, and 93%, 85%, 84% for those generated in the black-box setting. Their average Kappa coefficients are 53.8% (white-box), and 67.3% (black-box) respectively. In Table 5, we listed five sentences and their adversarial examples generated by our algorithms each in the black-box and white-box settings, which were randomly extracted from the PTB test set.
We would like to know which type of words to modify is most likely to form a successful attack like (Hashemi and Hwa, 2016). In this experiment, we only allowed to replace the words belonging to one part of speech, and also tried to generate adversarial examples by replacing prepositions, which is forbidden in the above experiments. It can be seen from Table 3 that the following dependencies especially suffer: prepositional, verbal and adverbial phrases. Not surprisingly most of the errors occur with structures which are inherently hard to attach in the dependency parsing.

Model
Word

Results of the Phrase-level Attacks
For the phrase-level attacks, we aim to study whether changes in a source subtree can alter the prediction on another target subtree (see an illustration in Figure 2). We tried two different settings: one asks for the source and target subtrees to be separated by at least one word (k ≥ 1), and another only requires those two subtrees do not overlap with each other (k ≥ 0). In the case of k ≥ 0, we can find 1420 sentence examples from the test set, while for k ≥ 1, there are 1340 valid examples that can be used to deliver phrase-level attacks (there are 2416 sentences in total in the PTB test set). Note that all the subtrees should contain 4 to 12 words. For each source-target pair, we allow to modify the source subtree up to 3 words. For some sentences, their adversarial examples can be generated by replacing just one or two words.
The success rate for the phrase-level attacks is defined as the ratio between the number of the sentences where there is at least one source-target subtree pair, such that a modification in the source subtree causes the model to make errors in the target subtree, and the number of the sentences that contain at least one source-target subtree pair, regardless of whether the model is caused to make an error or not. It can be seen from Table 4 that with only three words perturbed, the proposed white-box attack can achieve 27.47% success rate on average for all the settings. The white-box attacks are again much more effective, and spend less than 50% of the time to find the most vulnerable pairs than the black-box ones. Like the sentence-level attacks, verbal and prepositional phrases have been shown to be more susceptible to such attacks.
" We 're are after a little bigger niche , " he said .
Looking ahead to other big commodity markets this week .
The centers normally usually are closed through the weekend .
But at least most part of the increase could have come from higher prices , analysts said .
Posted yields on 30 year mortgage commitments for delivery within 30 years days priced at par .
But his release within the next few months is widely highly excepted .
The most popular such shows appeals focus on narrow national concerns .
Size Breadth and weight considerations also have limited screen displays .
Columbia savings officials were not available last for comment on the downgrade .
That would be the lowest worst level since the early 1970s .

Adversarial Training
We also investigated whether our adversarial examples can aid in improving model robustness. We randomly selected 50% of the training data and generated adversarial examples from them using the algorithms listed in Figure 3 and 4. We merged these adversarial examples with the original training set. Some previous studies show that the models tend to overfit the adversarial examples, and their performance on the clean data will drop if too many adversarial examples are used. Therefore, we used a similar training strategy.
The testing and adversarial performance with and without adversarial training are listed in Table  2. Under all circumstances, adversarial training improved the generalization of the models and made them less vulnerable to the attacks, while suffering little to no loss in on the clean data. For example, 88.69 (column 1, row 2) is the accuracy achieved by the original model on the adversarial examples generated in the black-box setting, 90.03 (column 2, row 2) and 89.98 (column 3, row 2) are the accuracy achieved on the perturbed test data with the test-time adversarial attacks by the models with the adversarial training. It is clear that the robustness of parsing models was improved by the adversarial training. Furthermore, from the first row of Table 2 these robust models suffer from little to no performance drop on the clean testing data.

Conclusion
In this paper, we study the robustness of neural network-based dependency parsing models. To the best of our knowledge, adversarial examples to syntactic tasks, such as dependency parsing, have not been explored in the literature. We develop the first adversarial attack algorithms for this task to successfully find the blind spots of parsers with high success rates. Furthermore, by applying adversarial training using the proposed attacks, we are able to significantly improve the robustness of dependency parsers without sacrificing their performance on clean data.