A Corpus of Negations and their Underlying Positive Interpretations

Negation often conveys implicit positive meaning. In this paper, we present a corpus of negations and their underlying positive interpretations. We work with negations from Simple Wikipedia, automatically generate potential positive interpretations, and then collect manual annotations that effectively rewrite the negation in positive terms. This procedure yields positive interpretations for approximately 77% of negations, and the final corpus includes over 5,700 negations and over 5,900 positive interpretations. We also present baseline results using seq2seq neural models.


Introduction
Negation is present in every human language. It is in the first place a phenomenon of semantical opposition. As such, negation relates an expression e to another expression with a meaning that is in some way opposed to the meaning of e (Horn and Wansing, 2015). Sentences containing negation are generally (a) less informative than affirmative ones (e.g., Milan is not the capital of Italy vs. Rome is the capital of Italy), (b) morphosyntactically more marked-all languages have negative markers while only a few have affirmative markers, and (c) psychologically more complex and harder to process (Horn and Wansing, 2015).
Negation often conveys implicit positive meanings (Rooth, 1992). This meaning ranges from implicatures to entailments, and we refer to it as positive interpretations. Consider the following text from Simple Wikipedia: 1 An abjad is an alphabet in which all its letters are consonants. Though vowels can be added in some abjads, they are not needed to write a word correctly. Some examples of abjads are the Arabic alphabet and the 1 https://simple.wikipedia.org/wiki/ Abjad 1 Mr. Smith apologized for :: not getting involved. Mr. Smith apologized for staying passive. 2 I :::: never heard of this guy before they started doing these commercials on television and radio. I heard of this guy after they started doing these commercials on on television and radio. 3 In Hinduism, beef is :: not allowed to be eaten.
In Hinduism, chicken is allowed to be eaten. In other religions, beef is allowed to be eaten. Hebrew alphabet. Humans intuitively understand that the negation (second sentence) implies the following positive interpretation: Though vowels can be added in some abjads, only consonants are needed to write a word correctly. Table 1 shows three sentences containing negation and their underlying positive interpretations. Positive interpretations do not have any negation cues (e.g., not, never) and Example 3 shows that some negations may have more than one underlying positive interpretation depending on the context. Revealing the underlying positive interpretation of negation is challenging. First, we need to identify which tokens are intended to be negated (e.g., getting involved and before in Examples 1 and 2 from Table 1). Second, we need to rewrite those tokens to generate an actual positive interpretation (e.g., getting involved: staying passive).
This paper presents a corpus of negations and their underlying positive interpretations. 2 The main contributions are: (a) deterministic procedure to generate potential positive interpretations from negations, (b) corpus of negations and their positive interpretations manually annotated, (c) detailed analysis including which subtrees in the dependency tree are more likely to be rewritten and qualitative analysis of positive interpreta-tions. Additionally, we establish baseline results with sequence-to-sequence neural models.

Background and Definitions
Negation is well-understood in grammars and the valid ways to express negation are documented (Quirk et al., 2000;van der Wouden, 1997). In this paper, we focus on verbal negations, i.e., when the negation mark-usually an adverb such as never and not-is grammatically associated with a verb. Positive Interpretations. In philosophy and linguistics, it is accepted that negation conveys positive meaning (Horn, 1989). This positive meaning ranges from implicatures, i.e., what is suggested in an utterance even though neither expressed nor strictly implied (Blackburn, 2008), to entailments. Other terms used in the literature include implied meanings (Mitkov, 2005), implied alternatives (Rooth, 1985) and semantically similar (Agirre et al., 2013). We do not strictly fit into any of this terminology, we reveal positive interpretations as intuitively done by humans when reading text. Note that a positive interpretation is a statement that does not contain negation, not a statement that conveys positive sentiment. For example, The seller didn't ship the right parts implicitly conveys The seller shipped the wrong parts, which has negative sentiment. Potential Positive Interpretations. Given a sentence containing negation, we use the term potential positive interpretation to refer to positive interpretations that are automatically generated by replacing selected tokens with a placeholder. If the placeholder can be rewritten so that the result is an affirmative statement that is true given the original sentence, potential positive interpretations become actual positive interpretations. Negation and natural language understanding. Generating positive interpretations from negation has several potential applications.
First, while neural machine translation is in general superior to phrase-based methods, that is not the case when translating negation (Bentivogli et al., 2016). Since our positive interpretations effectively rewrite negation-containing sentences to remove the negation, we argue that they have the potential to help machine translation.
Second, current benchmarks for natural language inference (Bowman et al., 2015), do not include challenging examples with negation. As a result, state-of-the-art approaches (Chen et al., 2017) trained on these benchmarks are unable to solve text-hypothesis pairs that contain negation. Indeed, we tested the aforecited systems with 100 text-hypothesis pairs from our corpus (text: sentence with negation, hypothesis: positive interpretation with correctness score of 4; see examples in Table 7), and discovered that 48 of them are predicted contradiction, 30 neutral and only 22 entaioment (the correct prediction is entailment for all of them). While relatively small, we argue that the corpus presented here is a step towards language understanding when negation is present.

Previous Work
From a theoretical perspective, it is accepted that negation has scope and focus, and that the focus yields positive interpretations (Horn, 1989;Rooth, 1992). Scope is "the part of the meaning that is negated" and focus "the part of the scope that is most prominently or explicitly negated" (Huddleston and Pullum, 2002).
Identifying the focus of negation is generally more challenging than the scope. The challenge lies on determining which tokens within the scope are intended to be negated. The largest corpus to date is PB-FOC, which was released as part of the *SEM-2012 Shared Task (Morante and Blanco, 2012). PB-FOC annotates the semantic role most likely to be the focus in the 3,993 negation in PropBank (Palmer et al., 2005). Anand and Martell (2012) refine PB-FOC and argue that 27.4% of negations with a focus annotated in PB-FOC do not actually have a focus. Sarabi and Blanco (2016) present a complementary approach grounded on syntactic dependencies. All of these efforts identify the tokens that are the focus of negation. We build upon them and generate actual positive interpretations from negation.

Corpus Creation
This section details our data collection and annotation effort. We follow 5 steps. First, we describe the source corpus. Second, we ouline the procedure to select negations so that the annota-    tion effort is feasible. Third, we discuss the steps to automatically generate potential positive interpretations. Fourth, we detail the annotation effort to rewrite placeholders in the potential positive interpretations to generate actual positive interpretations. Fifth, we present the final validation strategy to ensure quality of the final corpus.

Selecting the Source Corpus: Simple Wikipedia
We chose to work with Simple Wikipedia texts. 3 Simple Wikipedia is a version of Wikipedia that is written in basic English. Compared to regular Wikipedia, articles in Simple Wikipedia use simpler words, shorter sentences, and simple grammar. These characteristics help us to reduce the overhead of dealing with complex sentences and leads to a more realistic learning task. We process Simple Wikipedia with spaCy (Honnibal and Johnson, 2015) to obtain part-of-speech tags and dependency trees. Inspired by Fancellu et al. 3 Version 2018-03-01; available at https://dumps. wikimedia.org/simplewiki/ (2016), we identify sentences containing negation using the following cues: n't, not, never, no, nothing, nobody, none, nowhere. Note that this method selects negations that would be discarded if we relied only on dependency type neg. Table 2 shows basic counts for sentences containing at least one negation in Simple Wikipedia. 93% of them contain only one negation, and 67% have medium length (between 6 to 25 tokens). Table 3 categorizes the Simple Wikipedia negations based on their type. We identify negation types using the part-of-speech tag of the syntactic head of the negation cue, i.e., the syntactic parent or governor of the negation cue. More than 70% of the negations in Simple Wikipedia are verbal negations, and the verb is the root of the dependency tree in 44% of them. Finally, Figure 1 shows the most frequent verbal negations in Simple Wikipedia. We observe that many verbs and in particular the verb to be are very frequent, and there is a long tail of (relatively) infrequent verbs.

Selecting Negations
Working with all negation types in Simple Wikipedia is out of the scope of this paper. After doing pilot annotations and manual examination, we decided to limit the negation types grounded on the counts presented in Section 4.1. Table 4 summarizes the filters and the number of negations that remain after running each filter. We apply sequentially five filters (Filters 1-5) on negations and four filters (Filters 6-9) on sentences. Filter 1 discards non-verbal negations (recall that 74.6% of negations are verbal, Table 3). Filter 2 discards those verbal negations which are not the root of the dependency tree. Filter 3 discards infrequent verbal negations, more specifically, those whose verbs occurred less than five times. Filter 4 caps the number of verbal negations per verb to 200 negations to increase verb coverage (recall that some verbs are negated very frequently, Figure 1). Filter 5 discards verbal negations with partof-speech tag interjection (less than 1%, e.g., They said "no" to his offer). Filter 6 discards sentences whose length is not greater than five tokens and less than 26 tokens (recall that most sentences containing negation satisfy this filter: 67.3%, Table 2). Filter 7 discards sentences with more than one verbal negation (93% of sentences containing negation only contain one, Table 2). Filter 8 discards negated sentences in question form (i.e., the first  token has any of the following part-of-speech tags: WDT, WP, WRB). Filter 9 discards sentences that include any of the following tokens: because, until, but, if, except. The final dataset consists of 7,469 negations, which are approximately 10% of negations in Simple Wikipedia.

Generating Potential Positive Interpretations
We convert each negation into its positive counterpart in four steps following the rules by Huddleston and Pullum (2002): remove the negation cue, remove auxiliaries, fix third-person singular and past tense, and rewrite negatively-oriented polarity-sensitive items. These steps can be implemented using straightforward regular expressions. For example, the positive counterpart of The seller did not ship the right part, is The seller shipped the right parts. Then, we automatically generate all plausible positive interpretations of the negation by traversing the dependency tree and selecting all direct dependents of the negated verb. We filter out subtrees whose syntactic dependency is aux, auxpass, punct (auxiliary, passive auxiliary and punctuation). We also exclude the verb. These exceptions were defined after manual examination of several examples. Finally, we replace the selected subtrees with a placeholder. Table 5 shows the number of negations depending on how many positive interpretations are generated. We generate two or more potential positive interpretations for over 84% of negations.

Rewriting Placeholders
In order to rewrite placeholders in potential positive interpretations and collect actual positive interpretations, we implement an annotation inter-  This rewriting process was done in-house by one linguistics student. A second annotator validated the rewrites independently (Section 4.5).
Each negation along with its context and all its potential positive interpretations are grouped into a Human Intelligence Task (HIT) for annotation purposes. Each HIT presents a set of instructions to the annotator along with examples. Potential positive interpretations are presented in consecutive rows, and each token in a cell. The placeholders generated in Section 4.3 are presented as blank cells and the annotator fills the blanks (or, in other words, the annotator rewrites placeholders) based on the context around the negation or world knowledge. A sample HIT along with the answers collected is shown in Figure 2.
In the rest of the paper, we use unknown answer to refer to placeholders for which the annotator cannot find a rewriting. We divide unknown answers into invalid and not specified, and ask the annotator to distinguish between them. Invalid is used to refer to placeholders that cannot be rewritten. Not specified describes placeholders that hypothetically can be rewritten but the answer is unknown given the context. We also provide an extra empty box at the bottom of the interface for additional positive interpretations. If the annotator cannot find any answers for the rewrites, she can write a positive interpretation from scratch.

Validating Positive Interpretations
In order to validate the rewrites of placeholders and resulting positive interpretations (Section 4.4), a second annotator validates them. We create a similar interface to the one in Figure 2, but this time we only show the negation in context (Text in Figure 2), and one positive interpretation at a time (i.e., potential positive interpretation for which the Figure 2: Sample negation along with its context and automatically-generated potential positive interpretations. The annotation process reveals three positive interpretations: "Relationships that end are normaly called breakups," "Marriages which end are rarely called breakups," and "Marriages which end are normaly called divorce." placeholder was rewritten). The annotator determines correctness and novelty as follows.
Correctness measures whether a positive interpretation is true given the negation in context. It is measured using the following scale: 1. After reading the text, it is clear that the positive interpretation is false. 2. After reading the text, the positive interpretation is probably false, but I am not sure. 3. After reading the text, the positive interpretation is probably true, but I am not sure. 4. After reading the text, it is clear that the positive interpretation is true. Novelty measures whether the meaning conveyed by a positive interpretation is already explicitly stated in the text, and it is measured using the following numeric scale: 1. The positive interpretation is stated explicitly in the text with the very same words. I could copy and paste chunks from text and get the positive interpretation. 2. The positive interpretation is not stated in the text with the same words. The positive interpretation and the text have synonyms in common, but I could not get the positive interpretation simply copying and pasting from text.

The positive interpretation is not stated in the
text with the same words. Additionally, there are few synonyms in common between the positive interpretation and text.

Corpus Analysis
The procedure described in Section 4 generates 15,875 potential positive interpretations from 7,469 negations. Out of all potential positive interpretations, we rewrite 3,831 with an actual answer and annotate 12,044 with an unknown answer  (11,030 not specified and 1,014 invalid). We also rewrite a new positive interpretation from scratch for 2,158 negations for which we cannot find any actual rewrites. Overall, we rewrite 5,989 positive interpretations for 5,770 unique negations. In other words, the procedure in Section 4 yields a positive interpretation for 77% of negations. Table 6 shows the distribution of known vs unknown rewrites per dependency type, where dependency type refers to the dependency type from the selected subtree of the verb to the verb itself. Out of all dependency types, advmod and xcomp (adverbial modifier and open clausal complement respectively) have the highest ratios of known rewrites, and nsubj (nominal subject) has the most unknown answers. In other words, the easiest placeholders to rewrite are those whose syntactic function is adverbial modifier or open clausal complement, and the most challenging are those whose syntactic function is nominal subject.
To understand high-level characteristics of negations and their positive interpretations beyond  dependency types, we explore a random sample of 100 negations and all their positive interpretations. We discover six major categories (quantities, times, objects, adjectives, proper nouns and others) and 4 subcategories ( Table 7): • The first category is quantities and includes both specific and abstract quantities. An example of abstract quantity is Many do not use their real names, as Everett does and its corresponding positive interpretation Few use their real names, as Everett does. A fourth of positive interpretations in the sample were obtained after rewriting quantities. • The second category is time and includes both actual and abstract times. An example of actual time is Since 2012, this channel never goes off the air during the day and its positive interpretation Before 2012, this channel went off the air during the day. 15% of positive interpretations in the sample were obtained rewriting temporal expressions. • The third category is objects and refers to positive interpretations obtained by rewriting verbal objects. An example is It does not need sunlight to grow and its positive interpretation It needs water to grow. 9% of positive interpretations in the sample were obtained after rewriting the verbal objects. • The fourth category is adjectives and refers to positive interpretations obtained by rewriting adjectives. An example is Crops did not grow as well when they were close together and its positive interpretation Crops grew poorly when they were close together. 27% of positive interpretations in the sample were obtained after rewriting adjectives. American open wheel racing series. 2% of positive interpretation in the sample were obtained after rewriting proper nouns.

Annotation Quality
To assess the quality of the rewrites and positive interpretations, we ask a second annotator to validate them based on two criteria: correctness and novelty (Section 4.5). Recall that correctness ranges from 1 (minimum) to 4 (maximum) and novelty from 1 (minimum) to 3 (maximum). We assess novelty only if positive interpretations are correct (correctness scores 3 or 4). Figure 3 reports the validation results. Out of all positive interpretations obtained during the annotation process, 90% are either correct (77%) or probably correct (13%) (correctness scores 4 and 3), and 95% of them are either very novel (52%) or novel (43%). This validation scores mean not only that positive interpretations are sound given the original negation (correctness score), but also that they are not explicitly stated in the context and thus reveal implicit meaning (novelty score). Table 8 presents three negations, all potential positive interpretations, and manual annotations along with the correctnes and novelty scores. Example (1) is a simple negated clause. The procedure described in Section 4.3 generates four potential positive interpretations, and three of them were rewritten. Given Phosgene usually does not cause its worst effects right away and its context, the following positive interpretations are deemed correct (correctness = 4) with different degrees of novelty (2, 3 and 1 respectively): Phosgene rarely causes its worst effects right away (Interpretation 1.2), Phosgene usually causes mild effects right away (Interpretation 1.3), and Phosgene usually causes its worst effects 12 hours after a person breathes it in (Interpretation 1.4). Note that 1 Context: Phosgene can be a liquid or a gas. As a gas, it is heavier than air, so it can stay near the ground (where people can breathe it in for long periods of time). It smells like freshly cut grass or moldy hay. Along with being a choking agent, phosgene is also a blood agent. This means it keeps oxygen from getting into the body's cells. Without oxygen, a person's cells will die, and the person will suffocate. Phosgene usually does not cause its worst effects right away.    Interpretation 1.1 is most likely correct, but context does not provide clues about which chemicals cause their worst effects right away and thus it is annotated not specified (NS).

Annotation Examples
Example (2) has three potential positive interpretations, and we rewrite two of them. Note that Intepretation 2.2, Hungary has observed Central European Time since 1916, is correct but not novel because it is explicitly stated in the context. Interpretation 2.3 is correct but received novelty score of 2 because it only replaces since with prior to.
Example (3) shows an example in which rewriting placeholders is not successful. The additional interpretation, however, reveals that He has the intention of getting more money. Context, which is not shown in Table 8, support the correctness and validation scores (e.g., He is wealthy).

Experiments
The task of generating positive interpretations from a sentence containing negation can be approached with sequence-to-sequence (seq2seq) models (input: sentence containing negation, output: positive interpretation). In this section, we present baseline results with existing seq2seq models. Specifically, we experiment with a basic seq2seq model , two seq2seq models with attention (Luong et al., 2015;, and Google's neural machine translation (NMT) system (Wu et al., 2016), which is also seq2seq model with attention and arguably the most complex. We acknowledge that these systems are usually trained with orders of magnitude more examples, and comparing them when trained with our fairly small corpus may be unfair because they were designed for other tasks. Our goal is not to obtain the best results possible, but rather provide baseline results for our task and corpus.
The 3,831 negations become source sentences and the correct positive interpretations become target sentences. We randomly select 100 short sentences (up to 12 tokens) and 100 long sentences (over 12 tokens) for testing, 200 sentences for development, and the remainder for training. All positive interpretations collected from a negation are assigned to the testing, development or training splits in order to ensure a more realistic scenario. Evaluation and Results. We use three metrics to evaluate the models: BLEU-4, correctness and grammaticality. BLEU-4 is automated, convenient and useful for development purposes. While larger BLEU-4 scores generally indicate better correctness and grammaticality scores, we do not observe a linear correlation (Table 9). Correctness is measured manually with the scale presented in Section 4.5. Finally, grammaticality is measured manually using the following numeric scale: 1. The sentence is not grammatical at all, e.g., it does not contain a verb. 2. The sentence is mostly ungrammatical, e.g., it contains a verb but the word order is wrong. 3. The sentence has a few grammatical issues, e.g., the subject-verb agreement is wrong, missing punctuation. 4. The sentence is grammatically correct (regardless of its correctness). Table 9 shows the results. In general terms, results are better for short sentences than long ones. This is not surprising given the small size of our corpus. The basic seq2seq model performs poorly: it barely generates any correct positive interpretaions, and most are ungrammatical. Adding attention performs better. The best results are with the system by Luong et al. (2017): 30% of the short positive interpretations generated are correct, and 68% grammatical. We believe Google's NMT performs the worst because of the small corpus.
We also conduct a manual analysis of the correct positive interpretations generated by the best system. Following with the categories described in Section 5 and Table 7, 37% of them belong to the adjectives category, 27% to abstract quanti-ties, 17% to objects, and 10% to abstract time.

Conclusions
We have presented a corpus of negations and their positive interpretations. Positive interpretations do not contain negations, range from implicatures to entailments, and are intuitively understood by nonexperts when reading the negations. We work with verbal negations selected from Simple Wikipedia, automatically generate potential positive interpretation by replacing subtrees with placeholders, and manually collect rewrites for the placeholders in order to obtain actual positive interpretations. This strategy yields positive interpretations for 77% of negations, and manual validation step ensures both correctnes and novelty.
Neural machine translation struggles with negation, and natural language inference benchmarks do not account for the intricacies of negation (Section 2). While small, we believe the corpus presented here is a step towards enabling natural language understanding when negation is present.