Building a Cross-document Event-Event Relation Corpus

We propose a new task of extracting event-event relations across documents. We present our efforts at designing an annotation schema and building a corpus for this task. Our schema includes ﬁve main types of relations: Inheritance, Expansion, Contingency, Comparison and Tem-porality, along with 21 subtypes. We also lay out the main challenges based on detailed inter-annotator disagreement and error analysis. We hope these resources can serve as a benchmark to encourage research on this new problem.


Introduction
The ultimate goal of Information Extraction (IE) is to construct "Information Networks" (Li et al., 2014) from unstructured texts. Most previous IE work focused on constructing entity-centric Information Networks where each node represents an entity and each edge represents a relation. We propose a novel task to construct a new layer of eventcentric Information Networks across multiple documents, where each node is an event and the edges capture the relations between two events. This task can provide building blocks for many important applications such as event knowledge base population and temporal event tracking (Do et al., 2012). The nodes can be extracted by existing fine-grained event extraction approaches (Ji and Grishman, 2008;Liao and Grishman, 2010;Hong et al., 2011;Li et al., 2013;Li et al., 2014). However, little previous work can be directly exploited to construct the edges.
In this paper we define a comprehensive schema that includes multiple fine-grained event-event relation types. Some types are similar to those in discourse parsing (Soricut and Marcu, 2003).
However, event-event relations are fundamentally different from discourse relations: (1) The input consists of structured events instead of unstructured sentences.
(2) For cross-document event pairs, there are neither explicit textual clues nor implicit information about the ordering of clauses that might indicate the relation. Following this schema, we annotated a cross-document eventevent relation corpus built on top of the Automatic Content Extraction (ACE2005) 1 event annotations. We will define the task (Section 2), describe the annotation schema (Section 3) and present corpus statistics and annotation challenges (Section 4).

Task Definition
In an event-event relation schema, events form a crucial foundation because they serve as nodes and are indispensable in event-centric information networks. We follow the definition of events in the ACE guideline 2 : Event trigger: the main word which most clearly expresses an event occurrence. Event arguments: the entities, time expressions and values that are involved in an event. Event mention: a phrase or sentence within which an event is described, including a trigger and a set of arguments. Event: a set of coreferential event mentions within one document.
We define the event-event relation task as the annotation of all applicable logical relations between two events. For example, as illustrated in Figure 1, the following events are connected by Condition and Temporality relations: Event 1: Media tycoon Barry Diller on Wednesday quit as chief of Vivendi Universal Entertain-  Event 2: Parent company chairman Jean-Rene Fourtou will replace Diller as chief executive of US unit. This example reveals the fact that a successor takes the place only after the time when (Temporality) and under the condition that (Condition) the predecessor makes room for the successor.

Event-Event Relation Schema
Our event-event relation schema includes 5 main Types -Inheritance, Expansion, Contingency, Comparison and Temporality -along with 21 Subtypes as shown in Table 1. Table 1 also demonstrates Roles. Events involved in a relation play certain roles. For example, an Attack event and an Injure event in a Contingency.Causality will play Cause and Result roles respectively. In the following we will present a detailed definition of each subtype.

Inheritance and Expansion
Inheritance relations include both traditional Coreference relations as well as Subevent that marks aggregation-to-component relations.

Reemergence connects recurrent events while
Variation summarizes the prototype of an event.
Expansion relations include Confirmation, which encodes a concept-to-instance or "subset" relation, and Conjunction and Disjunction, which relate two subevents within a larger event, and mark two subevents as playing similar (Conjunction) or dissimilar (Disjunction) roles within the larger event. This kind of relation is useful, since a larger event is often not explicitly mentioned.
The combination of these two kinds of relations allow one to build hierarchical representations of parts of an event network, as shown in Figure 2.

Contingency and Comparison
A Contingency relation indicates either an event leading to the emergence (Causality) or serving as a triggering condition (Conditional) of another event.
Comparison relations indicate deeper logical contrasts between relations. Opposition indicates a relation in which two events are mutually contradictory, and unlikely to be both true. This has some similarity to Contrast.Opposition in the Penn Discourse Treebank (Miltsakaki et al., 2004) or specific annotations of opposition (Feltracco et al., 2015;Takabatake et al., 2015). Negation indicates that while two events could both be true, one shows that the other is no longer true. Competition shows that two events are contrasting versions of the same underlying "event" (e.g., retreat versus escape in disorder).

Temporality
Last but not least, we also define subtypes of Temporality, which represents the temporal order of events. Temporality has been an active research topic for a long time. We arrange all categories and normalize the subtype names from the previous work to constitute our Temporality schema. Figure 3 illustrates the temporal relation subtypes.
In this work, we elaborate the subtypes Temporality in comparison with conventional work by introducing Meet, Start and Finish, which emphasizes the existence of time intervals among events.
The correct subtype of the Temporality relation has a great impact on the decision of whether the Start-Position and End-Position events have a Comparison.Opposite or a Contingency.Condition

Corpus Annotation
Annotating event-event relations requires an annotator to gain a global view of the overall scenario or topic (e.g., MH17) before exhaustively annotating each event pair. In addition, our relation types are more fine-grained than previous work such as the Richer Event Descriptions (RED) (Ikuta et al., 2014). There are no existing annotation tools to meet these needs, so we developed a new annotation tool to visualize trigger words, arguments and contexts for each event pair to ensure that annotators fully understand documents, background and storyline.
We created an event-event relation corpus based on gold standard events in ACE2005 newswire documents and some additional news documents about Malaysian Airline 17 (denoted as MH17).  results. Table 3 and 4 indicate that this is a very challenging task for annotators. We can see that the major challenge for annotators is the determination of the existence of relations. Causality and Condition stand as the most challenging types, which require annotators to figure out the storyline of documents and exploit background knowledge. For example, the following two events are from the same document but there are no explicit connectives to indicate the conditional relation between them: Event 1: Edward Snowden claimed he was trained as a secret agent. Event 2: The certification would also have given him some of the skills he needed to escape scrutiny.
A1 and A2 also tend to mistakenly label Sub-   event as Coreference. Such mistakes happen when the arguments from one event appear as more specific and detailed entities (e.g., an attack in Baghdad vs. an attack in Iraq). However, when the event network becomes larger and more complicated, errors can be propagated across types, e.g., incorrectly labeled Sub-event pairs will also trigger Conjunction errors. Moreover, we have attempted to align the inventory here with other ongoing efforts to annotate within-document event-event relations. Table 5 shows a mapping between a subset of the relations proposed here and those used in the Richer Event Descriptions (RED) (Ikuta et al., 2014). Other similar resources -such as Penn Discourse Treebank (Miltsakaki et al., 2004) -could also be used.
Similar event-event relation schema such as  RED (Ikuta et al., 2014) is in general more coarsegrained and has fewer types and subtypes. Event-event relations differ from textual entailment (Dagan et al., 2013) or discourse relations (Soricut and Marcu, 2003;Miltsakaki et al., 2004;Radev, 2000), which focus on the relatedness between two sentences, by tackling a full document or multiple documents. We adopted some terminology (e.g., Causality and Expansion) from the taxonomy of discourse relations (Miltsakaki et al., 2004). We focus on a wider scope of cross-document events with richer and more finegrained structured event representations.
If we consider each event-event relation instance as a frame (e.g., a contingency/causality event-event relation is similar to the frame causation), the architecture of the Event Networks is also similar to FrameNet (Baker and Sato, 2003) and thus the ontological analysis and constraints in (Ovchinnikova et al., 2010) are also applicable to our task.

Conclusions and Future Work
Our work will expand the research venue of IE from entity-centric to event-centric. In the future we will further expand the corpus 3 , and compare and integrate with other within-document eventevent relation schemas such as RED. We also plan to develop a pilot system using these resources.