Constructing an Annotated Corpus for Protest Event Mining

We present a corpus for protest event mining that combines token-level annotation with the event schema and ontology of entities and events from protest research in the social sciences. The dataset uses newswire reports from the English Gigaword corpus. The token-level annotation is inspired by annotation standards for event extraction, in particular that of the Automated Content Extraction 2005 corpus (Walker et al., 2006). Domain experts perform the entire annotation task. We report competitive intercoder agreement results.


Introduction
Social scientists rely on event data to quantitatively study the behavior of political actors. Public protest (demonstrations, industrial strikes, petition campaigns, political and symbolic violence) accounts for a large part of events involving sub-state actors. Protest event data are central to the study of protest mobilization, political instability, and social movements (Hutter, 2014;Koopmans and Rucht, 2002).
To advance the machine coding 1 of protest data, we have been building a manually annotated corpus of protest events. Our protest event coding follows guidelines adapted from successful manual coding projects. All coding decisions are supported by careful token-level annotation inspired by annotation standards for event extraction. Both event cod-ing and token-level annotation are performed by domain experts. We find that domain experts without specialist linguistic knowledge can be trained well to follow token-level annotation rules and deliver sufficient annotation quality.
Contentious politics scholars often need more fine-grained information on protest events than can be delivered by available event coding software. Our event schema includes issues-the claims and grievances of protest actors-and the number of protesters. We also code protest events that are not the main topic of the report. This is often desirable (Kriesi et al., 1995), although event coding systems would not always code them by design.
We code newswire reports from the widely used English Gigaword corpus and will release all annotations. 2 2 Related Work 2.1 Machine coding of events The machine coding of political event data from newswire text goes back to early 1990s and has been first applied to the study of international relations and conflicts (Gerner et al., 1994;Schrodt and Hall, 2006). Many widely used systems-e.g. TABARI (O'Brien, 2010) / PETRARCH 3 , VRA-Reader 4 (King and Lowe, 2003)-have relied on pattern matching with large dictionaries of hand-crafted patterns. A system scans a news lead attempting to match an event, source and target actors-thereby extracting who did what to whom; the date of the event is taken to be the date of publication. Common ontologies CAMEO (Gerner et al., 2002) and IDEA (Bond et al., 2003) define dozens of event types and hundreds of actors. Proprietary event coder BBN ACCENT, which uses statistical entity and relation extraction and co-reference resolution, considerably outperforms a pattern matching-based coder (Boschee et al., 2013;Boschee et al., 2015). O'Connor et al. (2013) present an unsupervised Bayesian coder, which models the gradual change in the types of events between actors.
Pattern-matching coders have been found to predict event types on a par with trained human coders (King and Lowe, 2003) and sufficiently accurate for near real-time event monitoring (O'Brien, 2010). That event coding is hard and coding instructions are often not rigorous enough manifests itself in low intercoder reliability (Schrodt, 2012). Boschee et al. (2015) report an intercoder agreement of F1 45% for two human coders coding 1,000 news reports using only the top event types of the CAMEO ontology.

Machine coding of protest events
Pattern matching-based systems have been employed to assist humans in coding protest events (Imig and Tarrow, 2001;Francisco, 1996). Some (Maher and Peterson, 2008) use only machine-coded protest events. More recently, statistical learning has been applied to the coding of protest events. Hanna (2014) trains a supervised learning system leveraging the events hand-coded by the Dynamics of Collective Action 5 project (Earl et al., 2004). Nardulli et al. (2015) employ a human-in-the-loop coding system that learns from human supervision.

Corpora annotated with protest events A major benchmark for event mining is the Automated Content Extraction (ACE) 2005 Multilingual
Training Corpus (Walker et al., 2006). The corpus, distributed by the Linguistic Data Consortium, comes with token-level annotations of entities, relations, and events. Its event ontology includes the CONFLICT event type. Its sub-type ATTACK overlaps with violent protest; the other sub-type, DEMON-STRATE, is close to our understanding of demonstrative protest. Some important protest event types e.g. petition campaign, industrial strike, symbolic protest, are not included. Unlike our project, the targets of ATTACK events are annotated (but not the targets of DEMONSTRATE events). Issues and the number of participants are not annotated.
Of some interest is the corpus of Latin American terrorism 6 used in the Message Understanding Conference evaluations 3 and 4 (Chinchor et al., 1993). It comes with a highly complex event schema that includes detailed information on the actor, human and physical targets, and distinguishes several types of terrorist acts. The corpus predates information extraction by statistical learning from annotated text and thus does not contain token-level annotation. 7

Annotated Corpus of Protest Events
The main motivation for this work has been the connection of event coding, which is performed at the level of the document, to token-level annotation. In that respect, we follow the trend towards annotating for social science tasks at below the document level (Card et al., 2015;Žukov-Gregorič et al., 2016). Unlike these projects, we have chosen to train domain experts to perform careful token-level annotation. The downside of having coders annotate in a linguistically unconstrained manner-an approach sometimes advocated for annotation tasks performed by domain experts (Stubbs, 2013)-is that the resulting annotation requires extensive standardization. This is challenging in the case of a complex task like ours.
The overall coding procedure is thus twofold. The coders perform traditional event coding, which involves the identification of protest events and classification of their attributes (type, actors, etc.). In parallel, the coders carry out token-level annotation, which we motivate as a means of supporting coding decisions with the help of text. The coder connects the two by linking coded events to their mentions in the text. Figure 1 shows sample annotation.
All our coders are doctoral students in political science. All are non-native English speakers with a high command of English. One project leader is a trained linguist. Loc (b) Token-level annotation Figure 1: An annotation example. (1a) coded events. The coder has identified four protest events-all of the same structure. Event one is a campaign event, the other events are the episode events of event one (e.g. "protested" refers to two episode events, which are differentiated based on the two distinct city-level locations). (1b) in-text annotations. Event mentions are in orange. In the superscript are the indices of the coded events that an event mention refers to. In the annotation interface, the coder draws links from event mentions to the mentions of event attributes (actor, location, date, etc.).

Event schema and ontology
A protest event has an event type, a date range, a country, a number of participants, a set of actor types, and a set of issue types. We distinguish ten protest event types, twenty-six issues, and twelve actor types. The types are organized hierarchically. We have not used a large ontology of entities and events that one typically finds in event coding. Our aim has been to ensure that each type occurs sufficiently often and the reliability of coding does not suffer due to codebook complexity.
The choice and definition of the types reflect our primary interest in European protest. Having examined a sample of recent European protest events coded using a larger codebook, we have selected some frequent actor types and issues and reworked the event ontology. For example, all specific issues now come with the stance on the issue fixed: against cultural liberalism, for regionalism, etc.
We code only asserted specific past and currently unfolding events-in contrast to the ACE 2005 corpus 8 and despite the practice of coding planned future events in manual coding work. 8 Yet, much like the lighter-weight DEFT ERE (Entities, Relations, Events) annotation standard (Aguilar et al., 2014).

Token-level annotation
In devising token-level annotation guidelines, we have relied on the ACE English Annotation Guidelines for Events and Entities 9 (Doddington et al., 2004). We have borrowed many ideas, e.g. the annotation of event mentions largely as one-word triggers, which we have found to work well in practice. The ACE guidelines are written for annotators with a background in linguistics, not domain experts. We have found that it is often possible to convey more or less the same idea in less technical language, e.g. simplifying present-participle in the nominal premodifier position to participle modifying a noun, and by providing extensive examples.
Not all ACE rules could be adapted in this way. We do not distinguish between heads and multiword extents, but rather annotate the one which appears easier for a given attribute. For example, we annotate collective actors ("postal workers", "leftwing protesters") as head nouns only and not full noun phrases, which would be more in line with the ACE guidelines but is challenging even for trained linguists. On the other hand, issue annotations are predominantly multi-word expressions.
The linking of coded events to token-level annotation is at the core of our approach. To consolidate the information about an event scattered across multiple sentences, we would need an annotated event co-reference. Yet, annotating event co-reference (as well as non-strict identity relations) is hard (Hovy et al., 2013). In the annotation interface, the coders explicitly link coded events to event mentions that refer to them, and many events can be linked to the same event mention. Thus, unlike the ACE 2005 corpus, we do not explicitly define the co-reference relation between event mentions, but read it off the explicit references. We do not annotate entity co-reference.

Workflow
We code newswire texts by the Agence France Press (AFP) and the Associated Press World (APW) of the English Gigaword Third Edition Corpus (Graff et al., 2007). We sample from two-month periods in the 2000s. For a given period, we select all documents that mention a European location in the dateline and are found protest-relevant by a classifier that we have trained, for a related project, on newswire stories including AFP and APW. For each month and agency, we randomly sample forty documents of which a project leader picks ten to twenty documents for coding. In this way, we typically get groups of documents from various dates, each group covering the same story.
Each document gets coded by at least two coders. A project leader performs adjudication. We estimate that coding one document takes an average of fifteen minutes. Our budget allows for coding up to 300 documents. In the first rounds of coding, we have not used any pre-annotation.

Intercoder reliability
We achieve competitive intercoder agreement for the first batch of documents (Table 1). During the coding of this batch, the coders received general feedback on token-level annotation (Table 1b), which partly explains the high agreement. For reference, we show the agreement achieved by the ACE coders on newswire documents annotated with events of type CONFLICT. Crucially, the ACE documents are almost twice as long on average, which drags down agreement. While the agreement on coded events is expectedly low, our coders agree substantially on coding subsets of event attributes (Table 1d).

Conclusion and Future Work
We have presented our work on a corpus of protest events, which combines event coding with careful token-level annotation. The corpus comes with coded issues and numbers of participants. Overall, we observe substantial intercoder agreement. Little work has been done on the evaluation of event coders (Boschee et al., 2013), 10 and none on widely available data despite interest (Schrodt, 2016). We would encourage the use of our corpus as an evaluation benchmark. That would require mapping our ontology of events and entities to CAMEO categories.
As we often code groups of documents covering the same sets of events (Section 3.3), the corpus could be extended to include cross-document event co-reference annotations.