Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In NLP, however, most example generation strategies produce input text by using known, pre-specified semantic transformations, requiring significant manual effort and in-depth understanding of the problem and domain. In this paper, we investigate the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference (NLI). We reduce the problem of identifying such adversarial examples to a combinatorial optimisation problem, by maximising a quantity measuring the degree of violation of such constraints and by using a language model for generating linguistically-plausible examples. Furthermore, we propose a method for adversarially regularising neural NLI models for incorporating background knowledge. Our results show that, while the proposed method does not always improve results on the SNLI and MultiNLI datasets, it significantly and consistently increases the predictive accuracy on adversarially-crafted datasets – up to a 79.6% relative improvement – while drastically reducing the number of background knowledge violations. Furthermore, we show that adversarial examples transfer among model architectures, and that the proposed adversarial training procedure improves the robustness of NLI models to adversarial examples.


Introduction
An open problem in Artificial Intelligence is quantifying the extent to which algorithms exhibit intelligent behaviour (Levesque, 2014).In Machine Learning, a standard procedure consists in esti-mating the generalisation error, i.e. the prediction error over an independent test sample (Hastie et al., 2001).However, machine learning models can succeed simply by recognising patterns that happen to be predictive on instances in the test sample, while ignoring deeper phenomena (Rimell and Clark, 2009;Paperno et al., 2016).
Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake (Szegedy et al., 2014;Goodfellow et al., 2014).In Natural Language Processing (NLP) and Machine Reading, generating adversarial examples can be really useful for understanding the shortcomings of NLP models (Jia and Liang, 2017;Kannan and Vinyals, 2017) and for regularisation (Minervini et al., 2017).
In this paper, we focus on the problem of generating adversarial examples for Natural Language Inference (NLI) models in order to gain insights about the inner workings of such systems, and regularising them.NLI, also referred to as Recognising Textual Entailment (Fyodorov et al., 2000;Condoravdi et al., 2003;Dagan et al., 2005), is a central problem in language understanding (Katz, 1972;Bos and Markert, 2005;van Benthem, 2008;MacCartney and Manning, 2009), and thus it is especially well suited to serve as a benchmark task for research in machine reading.In NLI, a model is presented with two sentences, a premise p and a hypothesis h, and the goal is to determine whether p semantically entails h.
The problem of acquiring large amounts of labelled data for NLI was addressed with the creation of the SNLI (Bowman et al., 2015) and MultiNLI (Williams et al., 2017) datasets.In these processes, annotators were presented with a premise p drawn from a corpus, and were required to generate three new sentences (hypotheses) based on p, according to the following criteria: a) Entailment -h is definitely true given p (p entails h); b) Contradiction -h is definitely not true given p (p contradicts h); and c) Neutral -h might be true given p.Given a premise-hypothesis sentence pair (p, h), a NLI model is asked to classify the relationship between p and h -i.e.either entailment, contradiction, or neutral.Solving NLI requires to fully capture the sentence meaning by handling complex linguistic phenomena like lexical entailment, quantification, co-reference, tense, belief, modality, and lexical and syntactic ambiguities (Williams et al., 2017).
In this work, we use adversarial examples for: a) identifying cases where models violate existing background knowledge, expressed in the form of logic rules, and b) training models that are robust to such violations.
The underlying idea in our work is that NLI models should adhere to a set of structural constraints that are intrinsic to the human reasoning process.For instance, contradiction is inherently symmetric: if a sentence p contradicts a sentence h, then h contradicts p as well.Similarly, entailment is both reflexive and transitive.It is reflexive since a sentence a is always entailed by (i.e. is true given) a.It is also transitive, since if a is entailed by b, and b is entailed by c, then a is entailed by c as well.
Example 1 (Inconsistency).Consider three sentences a, b and c each describing a situation, such as: a) "The girl plays", b) "The girl plays with a ball", and c) "The girl plays with a red ball".Note that if a is entailed by b, and b is entailed by c, then also a is entailed by c.If a NLI model detects that b entails a, c entails b, but c does not entail a, we know that it is making an error (since its results are inconsistent), even though we may not be aware of the sentences a, b, and c and the true semantic relationships holding between them.
Our adversarial examples are different from those used in other fields such as computer vision, where they typically consist in small, semantically invariant perturbations that result in drastic changes in the model predictions.In this paper, we propose a method for generating adversarial examples that cause a model to violate preexisting background knowledge (Section 4), based on reducing the generation problem to a combinatorial optimisation problem.Furthermore, we outline a method for incorporating such background knowledge into models by means of an adversarial training procedure (Section 5).
Our results (Section 8) show that, even though the proposed adversarial training procedure does not sensibly improve accuracy on SNLI and MultiNLI, it yields significant relative improvement in accuracy (up to 79.6%) on adversarial datasets.Furthermore, we show that adversarial examples transfer across models, and that the proposed method allows training significantly more robust NLI models.
Let S denote the set of all possible sentences, and let a = (a 1 , . . ., a a ) ∈ S and b = (b 1 , . . ., b b ) ∈ S denote two input sentencesrepresenting the premise and the hypothesis -of length a and b , respectively.In neural NLI models, all words a i and b j are typically represented by k-dimensional embedding vectors a i , b j ∈ R k .As such, the sentences a and b can be encoded by the sentence embedding matrices a ∈ R k× a and b ∈ R k× b , where the columns a i and b j respectively denote the embeddings of words a i and b j .
Given two sentences a, b ∈ S, the goal of a NLI model is to identify the semantic relation between a and b, which can be either entailment, contradiction, or neutral.For this reason, given an instance, neural NLI models compute the following conditional probability distribution over all three classes: where score Θ : R k× a × R k× b → R 3 is a modeldependent scoring function with parameters Θ, and softmax(x) i = exp{x i }/ j exp{x j } denotes the softmax function.Several scoring functions have been proposed in the literature, such as the conditional Bidirectional LSTM (cBiLSTM) (Rocktäschel et al., 2016), the Decomposable Attention Model (DAM) (Parikh et al., 2016), and the Enhanced LSTM model (ESIM) (Chen et al., 2017).One desirable quality of the scoring function score Θ is that it should be differentiable with respect to the model parameters Θ, which allows the neural NLI model to be trained from data via back-propagation.
Model Training.Let D = {(x 1 , y 1 ), . . ., (x m , y m )} represent a NLI dataset, where x i denotes the i-th premise-hypothesis sentence pair, and y i ∈ {1, . . ., K} their relationship, where K ∈ N is the number of possible relationshipsin the case of NLI, K = 3.The model is trained by minimising a cross-entropy loss J D on D: where ŷi,k = p Θ (y i = k | x i ) denotes the probability of class k on the instance x i inferred by the neural NLI model as in Eq. (1).
In the following, we analyse the behaviour of neural NLI models by means of adversarial examples -inputs to machine learning models designed to cause the model to commit mistakes.In computer vision models, adversarial examples are created by adding a very small amount of noise to the input (Szegedy et al., 2014;Goodfellow et al., 2014): these perturbations do not change the semantics of the images, but they can drastically change the predictions of computer vision models.In our setting, we define an adversary whose goal is finding sets of NLI instances where the model fails to be consistent with available background knowledge, encoded in the form of First-Order Logic (FOL) rules.In the following sections, we define the corresponding optimisation problem, and propose an efficient solution.

Background Knowledge
For analysing the behaviour of NLI models, we verify whether they agree with the provided background knowledge, encoded by a set of FOL rules.Note that the three NLI classes -entailment, contradiction, and neutrality -can be seen as binary logic predicates, and we can define FOL formulas for describing the formal relationships that hold between them.
In the following, we denote the predicates associated with entailment, contradiction, and neutrality as ent, con, and neu, respectively.By doing so, we can represent semantic relationships between sentences via logic atoms.For instance, given three sentences s 1 , s 2 , s 3 ∈ S, we can represent Table 1: First-Order Logic rules defining desired properties of NLI models: X i are universally quantified variables, and operators ∧, ¬, and denote logic conjunction, negation, and tautology.the fact that s 1 entails s 2 and s 2 contradicts s 3 by using the logic atoms ent(s 1 , s 2 ) and con(s 2 , s 3 ).
Let X 1 , . . ., X n be a set of universally quantified variables.We define our background knowledge as a set of FOL rules, each having the following body ⇒ head form: where body and head represent the premise and the conclusion of the rule -if body holds, head holds as well.In the following, we consider the rules R 1 , . . ., R 5 outlined in Table 1.Rule R 1 enforces the constraint that entailment is reflexive; rule R 2 that contradiction should always be symmetric (if s 1 contradicts s 2 , then s 2 contradicts s 1 as well); rule R 5 that entailment is transitive; while rules R 3 and R 4 describe the formal relationships between the entailment, neutral, and contradiction relations.
In Section 4 we propose a method to automatically generate sets of sentences that violate the rules outlined in Table 1 -

Generating Adversarial Examples
In this section, we propose a method for efficiently generating adversarial examples for NLI models -i.e.examples that make the model violate the background knowledge outlined in Section 3.

Inconsistency Loss
We cast the problem of generating adversarial examples as an optimisation problem.In particular, we propose a continuous inconsistency loss that measures the degree to which a set of sentences causes a model to violate a rule.
Example 2 (Inconsistency Loss).Consider the rule R 2 in Table 1, i.e. con(X 1 , X 2 ) ⇒ con(X 2 , X 1 ).Let s 1 , s 2 ∈ S be two sentences: this rule is violated if, according to the model, a sentence s 1 contradicts s 2 , but s 2 does not contradict s 1 .However, if we just use the final decision made by the neural NLI model, we can simply check whether the rule is violated by two given sentences, without any information on the degree of such a violation.
Intuitively, for the rule being maximally violated, the conditional probability associated to con(s 1 , s 2 ) should be very high (≈ 1), while the one associated to con(s 2 , s 1 ) should be very low (≈ 0).We can measure the extent to which the rule is violated -which we refer to as inconsistency loss J I -by checking whether the probability of the body of the rule is higher than the probability of its head: where S is a substitution set that maps the variables X 1 and X 2 in R 2 to the sentences s 1 and s 2 , [x] + = max(0, x), and p Θ (con | s i , s j ) is the (conditional) probability that s i contradicts s j according to the neural NLI model.Note that, in accordance with the logic implication, the inconsistency loss reaches its global minimum when the probability of the body is close to zero -i.e. the premise is false -and when the probabilities of both the body and the head are close to one -i.e. the premise and the conclusion are both true.
We now generalise the intuition in Ex. 2 to any FOL rule.Let r = (body ⇒ head) denote an arbitrary FOL rule in the form described in Eq. ( 3), and let vars(r) = {X 1 , . . ., X n } denote the set of universally quantified variables in the rule r.
Furthermore, let S = {X 1 → s 1 , . . ., X n → s n } denote a substitution set, i.e. a mapping from variables in vars(r) to sentences s 1 , . . ., s n ∈ S. The inconsistency loss associated with the rule r on the substitution set S can be defined as: where p(S; body) and p(S; head) denote the probability of body and head of the rule, after replacing the variables in r with the corresponding sentences in S. The motivation for the loss in Eq. ( 4) is that logic implications can be understood as "whenever the body is true, the head has to be true as well".In terms of NLI models, this translates as "the probability of the head should at least be as large as the probability of the body".
For calculating the inconsistency loss in Eq. ( 4), we need to specify how to calculate the probability of head and body.The probability of a single ground atom is given by querying the neural NLI model, as in Eq. ( 1).The head contains a single atom, while the body can be a conjunction of multiple atoms.Similarly to Minervini et al. (2017), we use the Gödel t-norm, a continuous generalisation of the conjunction operator in logic (Gupta and Qi, 1991), for computing the probability of the body of a clause: where a 1 and a 2 are two clause atoms.
In this work, we cast the problem of generating adversarial examples as an optimisation problem: we search for the substitution set S = {X 1 → s 1 , . . ., X n → s n } that maximises the inconsistency loss in Eq. ( 4), thus (maximally) violating the available background knowledge.

Constraining via Language Modelling
Maximising the inconsistency loss in Eq. (4) may not be sufficient for generating meaningful adversarial examples: they can lead neural NLI models to violate available background knowledge, but they may not be well-formed and meaningful.
For such a reason, in addition to maximising the inconsistency loss, we also constrain the perplexity of generated sentences by using a neural language model (Bengio et al., 2000).In this work, we use a LSTM (Hochreiter and Schmidhuber, 1997) neural language model p L (w 1 , . . ., w t ) for generating low-perplexity adversarial examples.

Searching in a Discrete Space
As mentioned earlier in this section, we cast the problem of automatically generating adversarial examples -i.e.examples that cause NLI models to violate available background knowledge -as an optimisation problem.Specifically, we look for substitutions sets S = {X 1 → s 1 , . . ., X n → s n } that jointly: a) maximise the inconsistency loss described in Eq. ( 4), and b) are composed by sentences with a low perplexity, as defined by the neural language model in Section 4.2.
The search objective can be formalised by the following optimisation problem: maximise where log p L (S) denotes the log-probability of the sentences in the substitution set S, and τ is a threshold on the perplexity of generated sentences.
For generating low-perplexity adversarial examples, we take inspiration from Guu et al. (2017) and generate the sentences by editing prototypes extracted from a corpus.Specifically, for searching substitution sets whose sentences jointly have a high probability and are highly adversarial, as measured the inconsistency loss in Eq. ( 4), we use the following procedure: a) we first sample sentences close to the data manifold (i.e. with a low perplexity), by either sampling from the training set or from the language model; b) we then make small variations to the sentences -analogous to adversarial images, which consist in small perturbations of training examples -so to optimise the objective in Eq. ( 5).
When editing prototypes, we consider the following perturbations: a) change one word in one of the input sentences; b) remove one parse subtree from one of the input sentences; c) insert one parse sub-tree from one sentence in the corpus in the parse tree of one of the input sentences.
Note that the generation process can easily lead to ungrammatical or implausible sentences; however, these will be likely to have a high perplexity according to the language model (Section 4.2), and thus they will be ruled out by the search algorithm.

Adversarial Regularisation
We now show one can use the adversarial examples to regularise the training process.We propose training NLI models by jointly: a) minimising the data loss (Eq.( 2)), and b) minimising the inconsistency loss (Eq.( 4)) on a set of generated adversarial examples (substitution sets).
More formally, for training, we jointly minimise the cross-entropy loss defined on the data J D (Θ) and the inconsistency loss on a set of generated adversarial examples max S J I (S; Θ), resulting in the following optimisation problem: // Generate the adv.substitution sets S i 6: {S 1 , . . ., S na } ← generate(D j ) 7: // Compute the gradient of Eq. ( 6) 8: // Update the model parameters 11: end for 13: end for 14: return Θ where λ ∈ R + is a hyperparameter specifying the trade-off between the data loss J D (Eq. ( 2)), and the inconsistency loss J I (Eq.( 4)), measured on the generated substitution set S.
In Eq. ( 6), the regularisation term max S J I (S; Θ) has the task of generating the adversarial substitution sets by maximising the inconsistency loss.Furthermore, the constraint log p L (S) ≤ τ ensures that the perplexity of generated sentences is lower than a threshold τ .For this work, we used the max aggregation function.However, other functions can be used as well, such as the sum or mean of multiple inconsistency losses.
For minimising the regularised loss in Eq. ( 6), we alternate between two optimisation processesgenerating the adversarial examples (Eq.( 5)) and minimising the regularised loss (Eq.( 6)).The algorithm is outlined in Algorithm 1.At each iteration, after generating a set of adversarial examples S, it computes the gradient of the regularised loss in Eq. ( 6), and updates the model parameters via a gradient descent step.On line 6, the algorithm generates a set of adversarial examples, each in the form of a substitution set S. On line 9, the algorithm computes the gradient of the adversarially regularised loss -a weighted combination of the data loss in Eq. ( 2) and the inconsistency loss in Eq. ( 4).The model parameters are finally updated on line 11 via a gradient descent step.
Premise A man in a suit walks through a train station.Hypothesis Two boys ride skateboard.

Type Contradiction
Premise Two boys ride skateboard.Hypothesis A man in a suit walks through a train station.

Type Contradiction
Premise Two people are surfing in the ocean.Hypothesis There are people outside.
Type Entailment Premise There are people outside.Hypothesis Two people are surfing in the ocean.

Type Neutral
Table 2: Sample sentences from an Adversarial NLI Dataset generated using the DAM model, by maximising the inconsistency loss J I .

Creating Adversarial NLI Datasets
We crafted a series of datasets for assessing the robustness of the proposed regularisation method to adversarial examples.Starting from the SNLI test set, we proceeded as follows.We selected the k instances in the SNLI test set that maximise the inconsistency loss in Eq. ( 4) with respect to the rules in R 1 , R 2 , R 3 , and R 4 in Table 1.We refer to the generated datasets as A k m , where m identifies the model used for selecting the sentence pairs, and k denotes number of examples in the dataset.
For generating each of the A k m datasets, we proceeded as follows.
Let D = {(x 1 , y i ), . . ., (x n , y n )} be a NLI dataset (such as SNLI), where each instance x i = (p i , h i ) is a premise-hypothesis sentence pair, and y i denotes the relationship holding between p i and h i .For each instance x i = (p i , h i ), we consider two substitution sets: each corresponding to a mapping from variables to sentences.
We compute the inconsistency score associated to each instance x i in the dataset D as J I (S i ) + J I (S i ).Note that the inconsistency score only depends on the premise p i and hypothesis h i in each instance x i , and it does not depend on its label y i .
After computing the inconsistency scores for all sentence pairs in D using a model m, we select the k instances with the highest inconsistency score, we create two instances x i = (p i , h i ) and xi = (h i , p i ), and add both (x i , y i ) and (x i , ŷi ) to the dataset A k m .Note that, while y i is already known from the dataset D, ŷi is unknown.For this reason, we find ŷi by manual annotation.

Related Work
Adversarial examples are receiving a considerable attention in NLP; their usage, however, is considerably limited by the fact that semantically invariant input perturbations in NLP are difficult to identify (Buck et al., 2017).Jia and Liang (2017) analyse the robustness of extractive question answering models on examples obtained by adding adversarially generated distracting text to SQuAD (Rajpurkar et al., 2016) dataset instances.Belinkov and Bisk (2017)  Adversarial examples are also used for assessing the robustness of computer vision models (Szegedy et al., 2014;Goodfellow et al., 2014;Nguyen et al., 2015), where they are created by adding a small amount of noise to the inputs that does not change the semantics of the images, but drastically changes the model predictions.

Experiments
We trained DAM, ESIM and cBiLSTM on the SNLI corpus using the hyperparameters provided in the respective papers.The results provided by such models on the SNLI and MultiNLI validation   1.
In Table 4 we report the number sentence pairs in the SNLI training set where DAM, ESIM and cBiLSTM violate R 1 , R 2 , R 3 , R 4 .In the |B| column we report the number of times the body of the rule holds, according to the model.In the |B ∧ ¬H| column we report the number of times where the body of the rule holds, but the head does not -which is clearly a violation of available rules.
We can see that, in the case of rule R 1 (reflexivity of entailment), DAM and ESIM make a relatively low number of violations -namely 0.09 and 1.00 %, respectively.However, in the case of cBiLSTM, we can see that, each sentence s ∈ S in the SNLI training set, with a 23.76 % chance, s does not entail itself -which violates our background knowledge.
With respect to R 2 (symmetry of contradiction), we see that none of the models is completely consistent with the available background knowledge.Given a sentence pair s 1 , s 2 ∈ S from the SNLI training set, if -according to the model -s 1 contradicts s 2 , a significant number of times (between 9.84% and 46.17%) the same model also infers that s 2 does not contradict s 1 .This phenomenon happens 16.70 % of times with DAM, 9.84 % of times with ESIM, and 46.17 % with cBiLSTM: this indicates that all considered mod-  els are prone to violating R 2 in their predictions, with ESIM being the more robust.
In Appendix A.2 we report several examples of such violations in the SNLI training set.We select those that maximise the inconsistency loss described in Eq. ( 4), violating rules R 2 and R 3 .We can notice that the presence of inconsistencies is often correlated with the length of the sentences.The model tends to detect entailment relationships between longer (i.e., possibly more specific) and shorter (i.e., possibly more general) sentences.

Generation of Adversarial Examples
In the following, we analyse the automatic generation of sets of adversarial examples that make the model violate the existing background knowledge.We search in the space of sentences by applying perturbations to sampled sentence pairs, using a language model for guiding the search process.The generation procedure is described in Section 4.
The procedure was especially effective in generating adversarial examples -a sample is shown in Table 6.We can notice that, even though DAM and ESIM achieve results close to human level performance on SNLI, they are likely to fail when faced with linguistic phenomena such as negation, hyponymy, and antonymy.Gururangan et al. (2018) recently showed that NLI datasets tend to suffer from annotation artefacts and limited linguistic variations: this allows NLI models to achieve nearly-human performance by capturing repetitive patterns and idiosyncrasies in a dataset, without being able of effectively capturing textual entailment.This is visible, for instance, in example 5 of Table 6, where the model fails to capture the hyponymy relation between "male" and "man", incorrectly predicting an entailment in place of a neutral relationship.Furthermore, it is clear that models lack commonsense knowledge, such as the relation between "pushing" and "carrying" (example 1), and being outside and swimming (example 2).Generating such adversarial examples provides us with useful insights on the inner workings of neural NLI models, that can be leveraged for improving the robustness of state-ofthe-art models.

Adversarial Regularisation
We evaluated whether our approach for integrating logical background knowledge via adversarial training (Section 5) is effective at reducing the number of background knowledge violations, without reducing the predictive accuracy of the model.We started with pre-trained DAM, ESIM, and cBiLSTM models, trained using the hyperparameters published in their respective papers.
After training, each model was then fine-tuned for 10 epochs, by minimising the adversarially regularised loss function introduced in Eq. ( 6).Table 3 shows results on the SNLI and MultiNLI development and test set, while Fig. 1 shows the number of violations for different values of λ, where regularised models are much more likely to make predictions that are consistent with the available background knowledge.
We can see that, despite the drastic reduction of background knowledge violations, the improvement may not be significant, supporting the idea that models achieving close-to-human performance on SNLI and MultiNLI may be capturing annotation artefacts and idiosyncrasies in such datasets (Gururangan et al., 2018).
Evaluation on Adversarial Datasets.We evaluated the proposed approach on 9 adversarial datasets A k m , with k ∈ {100, 500, 1000}, generated following the procedure described in Section 6 -results are summarised in Table 5.We can see that the proposed adversarial training method significantly increases the accuracy on the adversarial test sets.For instance, consider A 100 DAM : prior to regularising (λ = 0), DAM achieves a very low accuracy on this dataset -i.e.47.4%.By increasing the regularisation parameter λ ∈ {10 −4 , 10 −3 , 10 −2 , 10 −1 }, we noticed sensible accuracy increases, yielding relative accuracy improvements up to 75.8% in the case of DAM, and 79.6% in the case of cBiLSTM.
From Table 5 we can notice that adversarial examples transfer across different models: an unregularised model is likely to perform poorly also on adversarial datasets generated by using different models, with ESIM being the more robust model to adversarially generated examples.
Furthermore, we can see that regularised models are generally more robust to adversarial examples, even when those were generated using different model architectures.For instance we can see that, while cBiLSTM is vulnerable also to adversarial examples generated using DAM and ESIM, its adversarially regularised version cBiLSTM AR is generally more robust to any sort of adversarial examples.

Conclusions
In this we investigated the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in NLI.We reduced the problem of identifying such adversarial examples to an optimisation problem, by maximising a continuous relaxation of the violation of such constraints, and by using a language model for generating linguistically-plausible examples.Furthermore, we proposed a method for adversarially regularising neural NLI models for incorporating background knowledge.
Our results showed that the proposed method consistently yields significant increases to the predictive accuracy on adversarially-crafted datasets -up to a 79.6% relative improvement -while drastically reducing the number of background knowledge violations.Furthermore, we showed that adversarial examples transfer across model architectures, and the proposed adversarial training procedure produces generally more robust models.
The source code and data for re-producing our results is available online, at https://github.com/uclmr/adversarial-nli/.

A.4 Optimisation algorithms
In Algorithm 2 we describe our algorithm for generating adversarial examples by perturbing sentences in a dataset, and using a language model for constraining the generation process.In Al end for 13: end for 14: return Θ effectively generating adversarial examples.Then, in Section 5 we show how we can leverage such adversarial examples by generating them on-the-fly during training and using them for regularising the model parameters, in an adversarial training regime.
minimise Θ J D (D, Θ) + λ max S J I (S; Θ) subject to log p L (S) ≤ τ (6) Algorithm 1 Solving the optimisation problem in Eq. (6) via Mini-Batch Gradient Descent Require: Dataset D, weight λ ∈ R + Require: No. of epochs τ ∈ N + Require: No. of adv.substitution sets n a ∈ N + 1: // Initialise the model parameters Θ 2: Θ ← initialise() 3: for i ∈ {1, . . ., τ } do 4:for D j ∈ batches(D) do 5: also notice that character-level Machine Translation are overly sensitive to random character manipulations, such as typos.Hosseini et al. (2017) show that simple character-level modifications can drastically change the toxicity score of a text.Iyyer et al. (2018) proposes using paraphrasing for generating adversarial examples.Our model is fundamentally different in two ways: a) it does not need labelled data for generating adversarial examples -the inconsistency loss can be maximised by just making an NLI model produce inconsistent results, and b) it incorporates adversarial examples during the training process, with the aim of training more robust NLI models.

Figure 1 :
Figure 1: Number of violations (%) to rules in Table 1 made by ESIM on the SNLI test set.
middle-aged oriental woman in a green headscarf and blue shirt is flashing a giant smile.middle aged oriental woman in a green headscarf and blue shirt is flashing a giant smile s middle aged oriental young woman in a green headscarf and blue shirt is flashing a giant smile s gorithm 3 we describe our adversarial training algorithm: it solves a minimax problem, where first a set of adversarial examples is generated by maximising the inconsistency loss J I .Then, the model is trained by jointly minimising the data loss J D and inconsistency loss on the generated adversarial examples.Algorithm 2 Generation of Adversarial Sentences via Stochastic Perturbation Re-ranking Require: Perplexity threshold τ ∈ R + // Sample seed sentences from the dataset S ← sample(D) // Generate a set of candidates, excluding the ones with a perplexity higher than τ P ← { S ∈ perturb(S) | log p L ( S) ≤ τ } // Return the perturbations that maximise the inconsistency loss J I return arg max S∈P J I ( S) Algorithm 3 Solving the optimisation problem in Eq. (6) via Mini-Batch Gradient Descent Require: Dataset D, weight λ ∈ R + Require: No. of epochs τ ∈ N + Require: No. of adv.substitution sets n a ∈ N + 1: // Initialise the model parameters Θ 2: Θ ← initialise() 3: for i ∈ {1, . . ., τ } do 4: for D j ∈ batches(D) do 5:// Generate the adv.substitution sets S i 6:{S 1 , . . ., S na } ← generate(D j ) D (D j , Θ) + λ na k=1 J I (S k ; Θ)

Table 3 :
Accuracy on the SNLI and MultiNLI datasets with different neural NLI models before (left) and after (right) adversarial regularisation.

Table 4 :
Violations (%) of rules R 1 , R 2 , R 3 , R 4 fromTable 1 on the SNLI training set, yield by cBiLSTM, DAM, and ESIM.and tests sets are provided in Table 3.In the case of MultiNLI, the validation set was obtained by removing 10,000 instances from the training set (originally composed by 392,702 instances), and the test set consists in the matched validation set.
Background Knowledge Violations.As a first experiment, we count the how likely our model isto violate rules R 1 , R 2 , R 3 , R 4 in Table

Table 5 :
Accuracy of unregularised and regularised neural NLI models DAM, cBiLSTM, and ESIM, and their adversarially regularised versions DAM AR , cBiLSTM AR , and ESIM AR , on the datasets A k m .The school is having a special event in order to show the american culture on how other cultures are dealt with in parties.

Table 6 :
Inconsistent results produced by DAM on automatically generated adversarial examples.The notation segment one segment two denotes that the corruption process removes "segment one" and introduced "segment two" in the sentence, and s 1 p − →s 2 indicates that DAM classifies the relation between s 1 and s 2 as contradiction, with probability p.We use different colours for representing the contradiction, entailment and neutral classes.Examples 1, 2, 3, and 4 violate the rule R 2 , while example 5 violates the rule R 5 ..00.99 indicates that the corruption process increases the inconsistency loss from .00 to .99, and the red boxes are used for indicating mistakes made by the model on the adversarial examples.

Table 8 :
Inconsistent results produced by DAM on adversarial examples generated using the discrete search procedure described in Section 4.3 -the pattern segment one segment two denotes that the corruption process replaced "segment one" with "segment two".Examples {1, 2, 3} (resp.{4,5,6})violate the rule R 2 (resp.R 4 ), while examples {7, 8, 9} violate the logic rule in R 5 .A.3 Background Knowledge ViolationsIn the following we report the number of violations (%) to rules in Table1made by DAM, ESIM, and cBiLSTM on the SNLI test set.