Annotating Negation in Spanish Clinical Texts

In this paper we present on-going work on annotating negation in Spanish clinical documents. A corpus of anamnesis and radiology reports has been annotated by two domain expert annotators with negation markers and negated events. The Dice coefficient for inter-annotator agreement is higher than 0.94 for negation markers and higher than 0.72 for negated events. The corpus will be publicly released when the annotation process is finished, constituting the first corpus annotated with negation for Spanish clinical reports available for the NLP community.


Introduction
In this paper we present the UHU-HUVR corpus of Spanish clinical reports annotated with negation information. Negation processing (Morante and Blanco, 2012) is an emergent task in Natural Language Processing (NLP). Initially, negation processing systems were developed to detect negation in clinical texts in order to extract accurate information about the patients. The systems were rule-based (Chapman et al., 2001) because of the lack of annotated corpora to train machine learning systems. Currently, annotated corpora are still scarce due to the legal and ethical requirements that have to be fulfilled. In addition, most of the corpora only contain documents written in English. An example is the BioScope corpus (Szarvas et al., 2008) which contains 1,954 radiology reports from the Computational Medicine Center in Cincinnati, annotated for negations and uncertainty along with the scopes of each phenomenon.
For Spanish clinical reports, to the best of our knowledge, there are only a few attempts to anno-tate negation, mostly as a secondary task within broader projects, and none of the annotated resources are publicly available. For example, Costumero et al. (2014) adapted the NegEx algorithm (Chapman et al., 2001) to detect negated clinical conditions in Spanish written medical documents. Although the authors mention that 500 medical texts have been manually annotated, they do not provide more information about this goldstandard. Oronoz et al. (2015), annotated negated adverse drug reactions, but they did not annotate the words expressing negation. The negation was annotated as a modifier of the disorder or drug. Stricker et al. (2015) used a dataset composed of about 85,600 reports of ultrasonography studies performed in a Spanish public hospital. Clinical findings were annotated in a subset of the reports as being affirmed (if it was possible to infer that the finding was present in the patient) or negated (if the finding was absent). The annotations were recently expanded by Cotik et al. (2016), including the values probable (if it was not certain that the finding was present, but it was probable) and doubt (if the finding corresponded to the past or if it was not clear for the annotator if the finding was present or not). None of the studies provides information about the availability of the corpora.
Being aware of the limitations of the current approaches and the necessity of filling this gap, we present the first version of the UHU-HUVR Corpus, a set of clinical documents annotated with negation markers and their linguistic scope. The annotated corpus and its guidelines will be made publicly available. The paper is structured as follows: Section 2 describes the annotation process and provides information about the corpus. Section 3 presents the results of the agreement analysis and discusses difficult or interesting cases. Finally, conclusions and future work are presented in Section 4.

Annotating Negation in Spanish Clinical Reports
The corpus that we present consists of 604 clinical reports from the Virgen del Rocío Hospital in Seville (Spain). Specifically, the set of documents consists of diagnosis information of 276 radiology reports and the personal history of 328 anamnesis reports written in free text, as shown in Table 1. The reports were randomly collected among reports from the first semester of 2013, and were fully anonymized in order to satisfy legal and ethical issues. The entire corpus was annotated by two domain experts who followed the guidelines that we developed. The annotation process started with two pilot experiments. In the first one, the authors annotated 7 reports in order to gain insight to produce the first version of the guidelines, which was written by one of the authors, who is a linguistics expert. In the second one 100 anamnesis documents were annotated by the two annotators in order to train them, and test and improve the guidelines. These documents are not included in the final corpus. During all the annotation process, the annotators were not allowed to communicate with each other. The problematic cases that they encountered were discussed with the expert linguist and author of the guidelines. As work in progress, another author who is expert annotator is currently solving the disagreement cases, acting as adjudicator, in order to generate the gold-standard corpus. The annotation tool used was CAT (Lenzi et al., 2012). Two markables were defined (negation marker and negated event) plus a negation relation between the marker and the event.
The annotation task consisted on annotating the events that are affected by contextual negation, as well as the words that express negation. We will refer to these words as markers. In the examples that follow, negated events are marked within brackets and negation markers in bold. In example 1, the word no is the negation marker and alteraciones en el luminograma aéreo de tráquea is the event affected by this negation. Examples of events with negative polarity are the following ones: 1 The annotation guidelines follow closely the Thyme corpus guidelines (Styler IV et al., 2014) with some adaptations. We defined as clinical event any event that is relevant to elaborate the clinical chronology of a patient such as a diagnosis, tumors, habits, medical tests, or events related to the functional evaluation of the patient. Events that do not contribute clinical information are not annotated. The difficulty of annotating events lies not just in identifying what is an event, but in determining which is the chain of words that express the event. We decided to annotate all the words that express the event, excluding punctuation marks. In Example (1) the full NP is annotated as event: alteraciones en el luminograma aéreo de tráquea. For negation markers, the minimum number of words is annotated.
We do not always annotate the mentions of negative results of a test as negation markers because the fact that a test is negative does not necessarily imply that a clinical event is not happening. In Example (5) no negation is annotated because, although the results of the Z-N stain are negative, the test has taken place. However, when the name of the test is the same as the name of the clinical condition, such as a disease, then the negative results are annotated as negation because the negative result indicates that the patient does not have the disease (see Example (6) In addition, negative results are sometimes expressed with the "-" sign, which was annotated as negation marker in the same cases as indicated above.
As for affixal negation, we take a pragmatic approach and annotate the full word where the affix occurs as a negation marker and as a negated event, when an event is negated. It is the case of afebril (En. without fever), where a-is the negation marker. Due to limitations of the tools we could not split the affixes in order to mark them independently.
Finally, some coordination structures involve negation markers. In this cases, each negation marker in the structure has its own scope, as shown in Example (7), where the first coordination marker no scopes over alteraciones a nivel de los distintos ligamentos y estructuras músculotendinosas and Example (8), where the second coordination marker así como scopes over alteraciones ... de las restantes partes blandas. In clinical reports, some expressions like así como, which are usually not negation markers, are used as such, and thus, we decided to mark them as negation markers, even if what brings negation to the event is the coordination structure and not only the second coordinating element. We have also marked "/" as negation marker when it acts as a coordinating particle, as in Examples (9)

Negation in Spanish Clinical Texts
The annotations show that negation is a frequent phenomenon in Spanish clinical texts. As it is shown in Table 2, more than 22% of the sentences in radiology reports contain negation markers, whereas in anamnesis reports this percentage is even higher, 35.20%. This fact is related to the nature of the two types of documents. Whereas a radiology report reflects the radiologists observations, the anamnesis report describes the history of the patient, including clinical conditions that the patient has not gone through. In Spanish clinical domain, (Oronoz et al., 2015) reported that 27.58% of diseases presented in a corpus of electronic health records are negated. Although this percentage is not directly comparable with the percentage of negated sentences shown in Table 2, the numbers seem similar. The frequency of negation in Spanish for the reviews domain is sensibly higher.  analysed the 75% of SFU Review SP-NEG corpus and claimed that 46.64% of sentences contain at least one negation. However, unpublished analysis about the entire SFU Review SP-NEG corpus shown that the amount of sentences that contain at least one negation in this corpus is the 31.90%, which represents a similar value to negation frequency in anamnesis reports.  Negation markers in the UHU-HUVR corpus amounted to 69 in the anamnesis subcollection and 52 in the radiology subcollection, with the top 10 most frequent markers shown in Tables 3 and 4. It is interesting to note that, in Spanish, the first two markers, no ('no') and sin ('without'), constitute more than 70% of the total frequency of all the markers found in the corpus, while the remaining markers cover only about 30%. 28 of the markers are common for radiology and anamnesis reports. It is also interesting to see that signs as "/" or "-" are considered to be negation markers. A specific negation marker of the radiology reports is namc which stands for no alergias medicamentosas conocidas (En. no known drug allergies).    Table 5 illustrates the results obtained for Inter-Annotator Agreement (IAA) between the two annotators in terms of Dice's coefficient measure (Dice, 1945) as calculated by the CAT annotation tool. We are currently working on the adjudication process. The IAA for markers is high and almost equal in radiology (0.949) and anamnesis (0.947) reports, whereas the IAA for events is lower than for markers. The IAA for events in radiology reports is lower (0.729) than for anamnesis reports (0.840).  The lower scores for radiology reports can be explained by the fact that the average length for events in anamnesis reports is lower (2,10 words) than in radiology reports (2,81 words), and thus there are more tokens that can be differently annotated for the same event. Additionally, it is also the case that interpreting the information in the radiology reports requires more specialised knowledge than interpreting the information in the anamnesis reports. In the latter, the events are better delimited and there are certain frequent negated expressions that are repeated throughout the reports, such as No or Niega followed by an event, as in the examples below. In the radiology reports it is more difficult to agree on which is the event affected by the marker, as in Example (14).

Preliminary results
14. Sin [hallazgos en la distribución visualización de asas y mesos intestinales yórganos pélvicos]. (En. No findings in the distribution visualization of intestinal loops and mesenteries and pelvic organs.) Furthermore, in many radiology reports incorrect syntactic constructions are used, as in Example (15), where two negation markers are used after each other forming an agrammatical expression ni tampoco, which can be confusing for the annotators. The IAA is similar to the IAA reported for the clinical collection of the BioScope corpus (Szarvas et al., 2008) in English, where the IAA between the two annotators was 0.907 for the markers and 0.762 for the scope. This shows that the difficulty of the task is independent of the language in which the documents are written and inherent to the type of texts. Disagreement cases were analysed by the adjudicator who decides on the correct annotations. Most of the disagreements were the result of a human error, i.e., the annotators missed a word or included a word that did not belong either to the event or to the marker. However, other cases of disagreement can be explained by the difficulty of the task and the lack of clear guidance.

15.
A great number of disagreements are related to the difficulty of detecting what is an event and which is the chain of words that express the event, as in Example (16), where the first annotator identified as event the words Contraste I while the second one annotated Contraste I . V ., but none annotated the correct span of the event. Cases in which negation is expressed by affixes were a source of disagreement due to the lack of initial guidelines. An example is the word afebril which expresses absence of fever. In these cases, the whole word should be marked both as an event and as a marker. In contrast, words such as incontinencia urinaria ('urinary incontinence'), which contain a negation affix (in-in this case) do not have to be annotated as negation markers since the clinical condition that they express is not negated.
More cases of disagreement involve a false double negation as shown in Example (17). One of the annotators identified two markers (no and tampoco) instead of one, which is the correct solution because this is not a case of double negation.
(En. (S)he has never been told that (s)he had anemia because (s)he did not go to the doctor either.) Another source of disagreement are cases in which one marker negates several events, as in Example (18). One annotator annotated the word niega as a marker and marked as event the rest of the sentence, while the other annotator also identified the word niega as a marker but annotated each event separately, which is the correct solution. This case represents an enumeration where several events are negated by the same marker. Therefore, each event had to be annotated and related to the marker independently, generating 4 negation relations.  (Taboada et al., 2006) with negation information, i.e., lack of agreement between annotators about the scope, the event and the discontinuities. This type of errors amount to 63.26% of the total. They also mentioned semantic problems that arose from the interpretation of negation patterns in comparative and contrastive constructions.

Conclusions
We have presented the first version of the UHU-HUVR Corpus, a collection of 604 clinical documents written in Spanish and annotated with negation markers, negated events and their relations. The corpus contains two types of reports, anamnesis and radiology. In both of them negation is a frequent phenomenon that needs to be treated for natural language processing purposes. The high IAA obtained suggests that the task is well defined. As expected, the agreement is higher for negation markers than for negated events, and higher in the anamnesis reports than in the radiology reports. As future work we plan to perform a detailed disagreement analysis in order to improve the guidelines for future annotation projects and to gain insight into the complexity of the task.
Many thanks to the annotators Carmen Cirilo and José Manuel Asencio.