Weakly Supervised Subevent Knowledge Acquisition

Subevents elaborate an event and widely exist in event descriptions. Subevent knowledge is useful for discourse analysis and event-centric applications. Acknowledging the scarcity of subevent knowledge, we propose a weakly supervised approach to extract subevent relation tuples from text and build the ﬁrst large scale subevent knowledge base. We ﬁrst obtain the initial set of event pairs that are likely to have the subevent relation, by exploiting two observations that 1) subevents are temporally contained by the parent event, and 2) the deﬁni-tions of the parent event can be used to further guide the identiﬁcation of subevents. Then, we collect rich weak supervision using the initial seed subevent pairs to train a contextual classiﬁer using BERT and apply the classiﬁer to identify new subevent pairs. The evaluation showed that the acquired subevent tuples (239K) are of high quality (90.1% accuracy) and cover a wide range of event types. The acquired subevent knowledge has been shown useful for discourse analysis and identifying a range of event-event relations 1 .


Introduction
A subevent is the event that happens as a part of the other event (i.e., parent event) spatio-temporally . Subevents, which elaborate and expand an event, widely exist in event descriptions. For instance, when describing election events, people usually describe typical subevents such as "nominate candidates", "debates" and "people vote". Knowing typical subevents of an event can help with analyzing several discourse relations (such as expansion and temporal relations) between text units. Furthermore, knowing typical 1 Code and the knowledge base are available at https://github.com/wenlinyao/ EMNLP20-SubeventAcquisition subevents of an event is important for understanding the internal structure of the event (what is the event about?) and its properties (is this a violent or peaceful event?), and therefore has great potential to benefit event detection, event tracking, event visualization and event summarization among many other applications.
While being in high demand, little subevent knowledge can be found in existing knowledge bases. Therefore, we aim to extract subevent knowledge from text and build the first subevent knowledge base covering a large number of commonly seen events and their rich subevents.
Little research has focused on identifying the subevent relation between two events in a text. Several datasets annotated with subevent relations Araki et al., 2014; exist, but they are extremely small and usually contain dozens to one/two hundred documents. Subevent relation classifiers trained on these small datasets are not suitable to use to extract subevent knowledge from text, considering that subevent relations can appear in dramatically different contexts depending on topics and events.
We propose to conduct weakly supervised learning and train a wide-coverage contextual classifier to acquire diverse event pairs of the subevent relation from text. We start by creating weak supervision, where we aim to identify the initial set of subevent relation tuples from a text corpus. With no contextual classifier at the beginning, it is difficult to extract subevent relation tuples because subevent relations are rarely stated explicitly. Instead, we propose a novel two-step approach to indirectly obtain the initial set of subevent relation tuples, exploiting two key observations that (1) subevents are temporally contained by the parent event, and thus can be extracted with linguistic expressions that in-dicate the temporal containment relationship 2 , and (2) the definition of the parent event is useful to prune spurious subevent tuples away to improve the quality.
Specifically, we first use several preposition patterns (e.g., e i during e j ) that indicate the temporal relation contained_by between events to identify candidate subevent relation tuples. Then, we conduct an event definition-guided semantic consistency check to remove spurious subevent tuples that often include two temporally overlapping but semantically incompatible events. For example, a news article may report a bombing event that happened in parallel during a festival, but the intense bombing event is not semantically compatible with the entertaining event festival, as informed by the common definition of festival: A festival is an organized series of celebration events, or an organized series of concerts, plays, or movies, typically one held annually.
Next, we identify sentences from the text corpus that contain an event pair, and use these sentences to train a contextual classifier that can recognize the subevent relation in text. We train the contextual subevent relation classifier by fine-tuning the pretrained BERT model (Devlin et al., 2019). We then apply the contextual BERT classifier to identify new event pairs that have the subevent relation.
We have built a large knowledge base of 239K subevent relation tuples. The knowledge base contains subevents for 10,318 unique events, with each event associated with 20.1 subevents on average. Intrinsic evaluation demonstrates that the learned subevent relation tuples are of high quality (90.1% of accuracy) and are valuable for event ontology building and exploitation.
The learned subevent knowledge has been shown useful for identifying subevent relations in text, including both intra-instance and cross-sentence cases. In addition, the learned subevent knowledge is shown useful for identifying temporal and causal relations between events as well, for the challenging cross-sentence cases where we usually have little contextual clues to rely on. Furthermore, when incorporated into a recent neural discourse parser, the learned subevent knowledge has noticeably improved the performance for identifying two types of implicit discourse relations, expansion and temporal relations.
In short, we made three main contributions: 1). We developed a novel weakly supervised approach to acquire subevent knowledge from text. 2). We have built the first large scale subevent knowledge base that is of high quality and covers a wide range of event types. 3). We performed extensive evaluation showing that the harvested subevent knowledge not only improves subevent relation extraction, but also improves a wide range of NLP tasks such as causal and temporal relation extraction and discourse parsing.

Related Work
Subevent Identification: Only a few studies have focused on identifying subevent relations in text. (Araki et al., 2014) built a logistic regression model to classify the relation between two events into full coreference (FC), subevent parent-child (SP), subevent sister (SS), and no coreference (NC). They improved the prediction of SP relations by performing SS prediction first and using SS prediction results in a voting algorithm.  trained a logistic regression classifier using a range of lexical and syntactic features and then used Integer Linear Programming (ILP) to enforce document-level coherence for constructing coherent event hierarchies from news. Recently, (Aldawsari and Finlayson, 2019) outperformed previous models for subevent relation prediction using a linear SVM classifier, by introducing several new discourse features and narrative features. Subevent Knowledge Acquisition: Considering the generalizability issue of supervised contextual classifiers trained on small annotated data, our pilot research on subevent knowledge acquisition (Badgett and Huang, 2016) relies on heuristics, where we first identify sentences in news articles that are likely to contain subevents by exploiting a sentential pattern 3 , and then, we extract subevent phrases from those sentences using a phrasal pattern 4 . In addition, this pilot work does not aim to acquire the parent event together with subevents, instead,  it learns a list of subevent phrases from documents that are known to describe a certain type of event.
Specifically, in this work, we only acquired 610 subevent phrases for one type of parent event, civil unrest events. The recent work (Bosselut et al., 2019; uses generative language models to generate subevent knowledge among many other types of commonsense knowledge. We can potentially incorporate our learned subevent knowledge into a general event ontology to enrich subevent links in the ontology. For instance, the Rich Event Ontology (REO) (Brown et al., 2017) unifies two existing knowledge resources (i.e., FrameNet (Fillmore et al., 2003) and VerbNet (Kipper et al., 2008)) and two event annotated datasets (i.e., ACE (Doddington et al., 2004) and ERE ) to allow users to query multiple linguistic resources and combine event annotations. However, REO contains few subevent relation links between events.
3 Overview of the Weakly Supervised Approach Figure 1 shows the overview of the weakly supervised learning approach for subevent knowledge acquisition. The key of this approach is to identify seed event pairs that are likely to be of the subevent relation in a two-step procedure (Section 4). We first use several temporal relation patterns (e.g., e i during e j ) to identify candidate seed pairs since a child event is usually temporally contained by its parent event; and then, we conduct a definitionguided semantic consistency check to remove spurious subevent pairs that are semantically incompatible and are unlikely to have the subevent relation, e.g., (festival, bombing). Next, we find occurrences of seed pairs in a large text corpus to quickly generate many subevent relation instances, we will also create negative instances to train the subevent relation classifier (Section 5). Then, the trained contextual classifier will be used to identify new event pairs of the subevent relation by examining multiple occurrences of an event pair in text (Section 6). We use the English Gigaword (Napoles et al., 2012) as the text corpus.

Seed Event Pair Identification
We use six preposition patterns (i.e., during, in, amid, throughout, including, and within) to extract candidate seed event pairs. Specifically, we use dependency relations 5 to recognize preposition patterns, and extract the governor word and dependent word of each pattern. We then check whether both words are event triggering words, and try to attach an argument to an event word to form an event phrase that tends to be more expressive and self-contained than a single event word, e.g., sign agreement vs sign, or, attack on troops vs attack. We consider both verb event phrases and noun event phrases (Appendix A provides more details). We further require that at least one argument is included in an event pair which may be attached to the first or the second event. In other words, we do not consider event pairs in which neither event has an argument.
To select seed subevent pairs, we consider event pairs that co-occur with at least two different patterns for at least three times. In this way, we identified around 43K candidate seed pairs from the Gigaword corpus. However, many candidate seed pairs identified by the preposition patterns only have the temporal contained_by relation but do not have the subevent relation. In order to remove such spurious subevent pairs, we present an event definition guided approach next to conduct semantic consistency check between the parent event and the child event of a candidate subevent relation tuple.

Definition-Guided Semantic Check
The intuition is that the definition of a parent event word describes important aspects of the event's meanings and signifies its potential subevents. For example, based on the definition of festival, events related to "celebrations", such as ceremony being held and set off fireworks, are likely to be correct subevents of festival; however, bomb explosion and people being killed may be distinct events that only happen temporally in parallel with festival.
Specifically, we perform semantic consistency checks collectively for many candidate event pairs by considering similarities between events and similarities between the definition of an event and its subevents, and we cluster event phrases into groups so that any two event phrases within a group are semantically compatible. Therefore, when the clustering operation is completed, we will recognize an event pair as a spurious subevent relation pair if its two events fall into different clusters. Next, we describe details on graph construction and the clustering algorithm we used.

Graph Construction
Given a set of event pairs needing the semantic consistency check, we construct an undirected graph G(V, E), where each node in V represents a unique event phrase. We connect event phrases with two types of weighted edges. First, for each candidate subevent relation tuple, we create an edge of weight 1.0 between the parent event and the child event. Second, we create an edge between any two event phrases if their similarity is greater than a certain threshold, and the edge weight is their similarity score. To calculate the similarity between two event phrases, we pair each word from one event phrase (either the event word or an argument) with each word from the other event phrase and calculate similarity between two word embeddings 6 , then the similarity between two event phrases is the average of their word pair similarities. We set the similarity threshold as 0.3, after inspecting 200 randomly selected event pairs with their similarities. If two event phrases are already connected because they are a candidate subevent relation pair, we add their similarity score to the edge weight.
Next, we incorporate event definitions by adding new nodes and new edges to the graph. Specifically, for each event phrase that appears as the parent event in some candidate subevent relation tuples, we create a new node for its event word representing the event word definition. If the event word has multiple meanings and therefore multiple definitions, we consider at most five definitions retrieved from WordNet (Miller, 1995) and create one node for each definition, assuming each definition of the parent event will attract different types of children events. Then, we connect each definition node of a parent event with its children events, if their similarity is over the same similarity threshold used previously. The similarity between a definition node and a child event is calculated by exhaustively pairing each non-stop word from the definition sentence and each word from the child event phrase and taking the average of word pair similarities.

The Clustering Algorithm
We use a graph propagation algorithm called Speaker-Listener Label Propagation Algorithm (SLPA) (Xie et al., 2011). SLPA has been shown effective for detecting overlapping clusters (Xie et al., 2013), which is preferred because multiple types of events may share common subevents. For instance, people being injured is a commonly seen subevent of conflict events (e.g., combat) as well as disaster events (e.g., hurricane). In addition, SLPA is selfadapted and can converge to the optimal number of clusters, with no pre-defined number of clusters needed. Event clusters often become stable soon after 50 iterations, to ensure convergence, we ran the algorithm for 60 iterations.
After performing the semantic consistency check, we retained around 30K seed event pairs. We find occurrences of these event pairs in the Gigaword corpus and obtained around 388K 7 sentences containing an event pair. These sentences will be used as positive instances to train the contextual classifier.

The Contextual Classifier Using BERT
Recently, BERT (Devlin et al., 2019) pretrained on massive data has achieved high performance on various NLP tasks. We fine-tune a pretrained BERT model to build the contextual classifier for subevent relation identification. BERT model is essentially a bi-directional Transformer-based encoder that consists of multiple layers where each layer has multiple attention heads. Formally, given a sentence with N tokens, each attention head transforms a token vector t i into query, key, and value vectors q i , k i , v i through three linear transformers. Next, for each token, the head calculates the self-attention scores for all other tokens of the input sentence against this token as the softmax-normalized dot products between two query and key vectors. The output o i of each attention head is a weighted sum of all value vectors: In this way, we can obtain N contextualized embed- in a sentence using the BERT model. To enforce the BERT encoder to look at context information other than the two event trigger words of a subevent pair, e.g., war, person battle, we replace the two event trigger words in a sentence with a special token [MASK] as the original BERT model did for masking. The contextualized embeddings at two event triggers' positions (two [MASK]'s positions) are concatenated and then fed into a feed-forward neural network with a softmax prediction layer for three-way classification, i.e., to predict two subevent relations (parent-child and child-parent relations depending on the textual order of two events) and no subevent relation (Other).
In our experiments, we use the pretrained BERT base model provided by (Devlin et al., 2019) with 12 transformer block layers, 768 hidden size and 12 self-attention heads 8 . We train the classifier using cross-entropy loss and Adam (Kingma and Ba, 2015) optimizer with initial learning rate 1e-5, 0.5 dropout, batch size 16 and 3 training epochs.

Negative Training Instances
High-quality negative training instances that can effectively compete with positive instances are important to enable the classifier to distinguish subevent relations from non-subevent relations. We include two types of negative instances to fine-tune the BERT classifier.
First, we randomly sample sentences that contain an event pair different from any seed pair or candidate pair (Section 6.1) as negative instances. We sample such negative sentences equal to five times of positive sentences, considering that most sentences in a corpus do not contain a subevent relation. Second, we observe that the subevent relation is often confused with temporal and causal event relations because a subevent is strictly temporally contained by its parent event. Therefore, to improve the discrimination capability of the classifier, we also include sentences containing temporally or causally related events as negative instances. Specifically, we apply a similar strategy -using patterns 9 to extract temporal and causal event pairs and then search for these pairs in the text corpus to collect sentences that contain a temporal or causal event pair. Event pairs that co-occur with temporal or causal patterns for at least three times are selected for population. We collected 63K temporally related event pairs and 61K causally related event pairs, which were used to identify 371K sentences that contain one of the event pairs. In total, we obtained around 1.8 million negative training instances.

Identifying New Subevent Pairs
We next apply the contextual BERT classifier to identify new event pairs that express the subevent relation. It is unnecessary to test on all possible pairs of events since two random events that cooccur in a sentence have a small chance to have the subevent relation. In order to narrow down the search space, we first identify candidate event pairs that are likely to have the subevent relation.
Then, we apply the contextual classifier to examine instances of each candidate event pair in order to determine valid subevent relation pairs.

Candidate Event Pairs
We consider two types of candidate event pairs. First, the preposition patterns used to identify seed subevent relation tuples are again used to identify candidate event pairs, but with less strict conditions. Specifically, we consider event pairs that co-occur with any pattern for at least two times as candidate event pairs. In this way, we identified 1.4 million candidate event pairs from the Gigaword corpus.
Second, when a subevent relation tuple appears in a sentence, it is common to observe other subevents of the same parent event in the surrounding context. Therefore, we collect sentences that contain a seed subevent relation tuple, and identify additional subevents of the same parent event in the two preceding and two following sentences. Furthermore, we observe that the additional subevents often share the subject or direct object with the subevent of the seed tuple, as a consequence, we only consider such event phrases found in the surrounding sentences and pair them with the parent event of the seed tuple to create new candidate event pairs. Using this method, we extracted around 89K candidate event pairs from the Gigaword corpus.

New Subevent Pair Selection Criteria
We identify a candidate event pair as a new subevent relation pair only if the majority of its sentential contexts, specifically more than 50% of them, were consistently labeled as showing the subevent relation by the BERT classifier. In addition, we disregard rare event pairs and require that at least three instances of an event pair have been labeled as showing the subevent relation.
The full weakly supervised learning process acquires 239K subevent relation pairs, including 30K seed pairs and 209K classifier identified pairs. The subevent knowledge base has 10,318 unique events shown as parent events, and each parent event is associated with 20.1 children events on average.

An Example Subevent Knowledge Graph
The initial exploration of the learned subevent knowledge shows two interesting observations of event hierarchies. Figure 2 shows an example event graph. First, we can draw a partition of the event space at multiple granularity levels by grouping  events based on subevents they share, e.g., the upper and the lower sections of the example event graph illustrate two broad event clusters sharing no subevent, and within each cluster, we see smaller event groups (colored) that share subevents extensively within a group while sharing fewer subevents between groups. Second, subevents encode event semantics and reveal different development stages of the parent events, e.g., subevents of natural disaster events (top left corner) reflect disaster response and recovery stages.

Precision of the Contextual Classifier
The contextual classifier is a key component of our learning approach. We evaluate the performance of the BERT contextual classifier on identifying subevent relations against several other types of event-event relations (e.g., temporal, causal relations, etc.), using the Richer Event Description (RED) corpus  that is comprehensively annotated with rich event-event relations. Since the contextual classifier mainly performs at the sentence level, we only consider to identify intra-sentence subevent relations in the RED dataset 10 . Table 1 shows the comparisons between two training settings -the BERT classifier trained on seed pairs before vs after applying the semantic check (43k vs 30k seed pairs) and their identified training instances. Conducting the semantic check improves the precision of the trained classifier by 11% with no loss on recall. Without using any annotated data, the classifier achieves the precision of 55.9%. While the precision on predicting each sentential context is not perfect, note that we retain a candidate subevent relation pair only if the majority and more than three of its sentential contexts show the subevent relation.

Accuracy of Acquired Subevent Pairs
We randomly sampled around 1% of acquired subevent pairs, including 300 from seed subevent pairs and 2,090 from newly learned subevent pairs,  Table 2: Subevent Relation Identification. P/R/F1 (%). We predict Parent-Child and Child-Parent subevent relations and report the micro-average performance. and asked two human annotators to judge whether the subevent relation exists between two events. The two annotators labeled 250 event pairs in common and agreed on 93.6% (234) of them, and the remaining subevent pairs were evenly split between the two annotators. According to human annotations, the accuracy of seed pairs is 91.6% and the accuracy of newly learned event pairs is 89.9%, with the overall accuracy of 90.1%.

Coverage of Acquired Subevent Pairs
To see whether the acquired subevent knowledge has good coverage of diverse event types, we compare the unique events appearing in the acquired subevent relation tuples with events annotated in two datasets, ACE (Doddington et al., 2004) and KBP , both with rich event types annotated and being commonly used for event extraction evaluation. We found that 73.8% (656/889) of events in ACE and 66.9% (934/1396) of events in KBP match with events in the acquired subevent pairs. Because we aim to evaluate the coverage on general event types instead of specific events, we ignore event arguments and only match event word lemmas.
In addition, we compare our learned 239K subevent pairs with the 30K ConceptNet subevent pairs. Interestingly, the two sets only have 311 event pairs in common, which shows that our learning approach extracts subevent pairs from real texts that are often hard to think of by crowd sourcing workers, the approach used by ConceptNet.

Subevent Relation Identification
To find out whether the learned subevent knowledge can be used to improve subevent relation identification in text, we conducted experiments on two datasets, RED 11 and HiEve 12 . In our experiments, we consider intra-sentence and cross-sentence event pairs separately. We randomly split data into five folds and conduct crossvalidation for evaluation. We fine-tune the same BERT model using RED or HiEve annotations to    predict subevent relations vs others 1314 . Note that for cross-sentence event pairs, we simply concatenate two sentences and insert in between the special token [SEP] used in the original BERT. We propose two methods to incorporate the learned subevent knowledge. 1) Subevent links. For a pair of events to classify in the RED or HiEve dataset, we check if they match with our learned subevent relation tuples. We ignore event arguments for matching events and only consider to match event word lemmas, for this reason, one pair of events might match with multiple learned subevent relations. We count subevent relations that match with a given event pair, (X, Y), in two 13 For the RED dataset, we consider all the annotated eventevent relations in RED other than subevent relations as others.
14 For the HiEve dataset, we exhaustively create event mention pairs among all the annotated event mentions in HiEve and consider all the mention pairs that were not annotated with the subevent relation as others. In this way, we generated 3.5K intra-sentence and 59.5K cross-sentence event mention pairs as others.
directions (X subevent ! Y) and (Y subevent ! X) separately, and encode the log values of the two counts in a vector. 2) Event embedding. Subevent relations can be used to build meaningful event embeddings to have the embeddings of a parent event and a child event preserve the subevent relation between them. Therefore, we train a BiLSTM encoder 15 to build event phrase embeddings, using the knowledge representation learning model TransE (Bordes et al., 2013) 16 such that p + r ⇡ c given a parent-child event pair (p, c) having the subevent relation r. We will use the trained BiL-STM encoder to obtain an embedding for an event phrase in the RED or HiEve dataset.
Finally, for subevent relation identification, we concatenate two event word representations obtained by the BERT encoder with either a subevent link vector or two event embeddings obtained using the above two methods. Results are shown in Table  2. We can see that compared to the basic BERT classifier, incorporating learned subevent knowledge achieves better performance on both datasets, for both intra-sentence and cross-sentence cases.

Temporal and Causal Relation Identification
Subevents indicate how an event emerges and develops, and therefore, the learned subevent knowledge can further be used to identify other semantic relations between events, such as temporal and causal relations. For evaluation, we use the same RED 17 dataset plus two more datasets, TimeBank v1.2 18 (Pustejovsky et al., 2003) and Event Storyline Corpus (ESC) v1.5 19 (Caselli and Inel, 2018), dedicated to evaluate temporal relation and causal relation identification systems respectively. We use the same experimental settings, including 5fold cross-validations and evaluating predictions of intra-and cross-sentence cases separately. In addition, we repurpose the BERT model to predict temporal relations vs others or predict causal relations vs others, and we use the same two methods to incorporate the learned subevent knowledge. Table 4 and 5 show results of temporal and causal relation identification. We can see that subevent knowledge has little impact for identifying intra-sentence temporal and causal relations that may heavily rely on local contextual patterns within a sentence. However, for identifying the more challenging cross-sentence cases that usually have little contextual clues to rely on, the learned subevent knowledge has noticeably improved the system performance on both datasets. This is true for both temporal relations and causal relations. Overall, the systems achieved the best performance when using the event embedding approach to incorporate subevent knowledge.

Implicit Discourse Relation Classification
We expect subevent knowledge to be useful for classifying discourse relations between two text units in general because subevent descriptions often elaborate and provide a continued discussion of a parent event introduced earlier in text. For experiments, we used our recent discourse parsing system (Dai and Huang, 2019) that easily incorporates external event knowledge as a regularizer into a two-level hierarchical BiLSTM model (Base Model) for paragraph-level discourse parsing. The experimental setting is exactly the same as in (Dai and Huang, 2019). Table 3 reports the performance of implicit discourse relation classification on PDTB 2.0 (Prasad et al., 2008). Incorporating the acquired subevent pairs (239K) into the Base Model improves the overall macro-average F1-score and accuracy by 2.0 and 2.6 points respectively, which is non-trivial considering the challenges of implicit discourse relation identification. The performance improvements are noticeable on both the expansion relation and the temporal relation categories. among all the annotated event mentions in ESC and consider all the mention pairs that were not annotated with the causal relation as others. In this way, we generated 4.1K intra-sentence and 34K cross-sentence event mention pairs as others.

Conclusions
We have presented a novel weakly supervised learning framework for acquiring subevent knowledge and built the first large scale subevent knowledge base containing 239K subevent tuples. Evaluation showed that the acquired subevent pairs are of high quality (90.1% of accuracy) and cover a wide range of event types. We performed extensive evaluations showing that the harvested subevent knowledge not only improves subevent relation extraction, but also improve a wide range of NLP tasks such as causal and temporal relation extraction and discourse parsing. In the future, we would like to explore uses of the subevent knowledge base for other eventoriented applications such as event tracking.

A Event Representations
We consider both verb event phrases and noun event phrases. Verb Event Phrases: To ensure good coverage of regular event pairs, we consider all verbs 20 as event words except possession verbs 21 . The thematic patient of a verb refers to the object being acted upon and is essentially part of an event, therefore, we first consider the patient of a verb in forming an event phrase 22 . The agent is also useful to specify an event especially for an intransitive verb event, which does not have a patient. Therefore, we include the agent of a verb event in an event phrase if its patient was not found. The patient or agent of a verb is identified using dependency relations 23 . If neither a patient nor an agent was found, we include a preposition phrase (a preposition and its object) that modifies a verb in the event representation to form an event phrase. Example verb event phrases are "agreement be signed" and "occupy territory". Noun Event Phrases: We include a preposition phrase (a preposition and its object)that modifies a noun event in the event representation to form a noun event phrase. We first consider a preposition phrase headed by the preposition of, then a preposition phrase headed by the preposition by, lastly a preposition phrase headed by any other preposition. Example noun event phrases are "ceremony at location" and "attack on troops".
Note that many noun words do not refer to an event, therefore, we apply two strategies to quickly compile a list of noun event words. First, we obtain a list of deverbal nouns 24 (5028 event nouns) by querying each noun in WordNet (Miller, 1995) and checking if its root word form has a verb sense. Second, we identify five intuitive textual patterns, e.g., participate in EVENT, and extract their prepositional direct objects as potential noun events. The five patterns are: participate in EVENT, involve in 20 We used POS tags to detect verb events. 21 We determined that possession verbs, such as "own", "have" and "contain", mainly express the ownership status so we discarded these event phrases. 22 In particular, we require a light verb (e.g., do, make, take etc.) to have a direct object because light verbs have little semantic content of their own. 23 We use Stanford dependency relations (Manning et al., 2014). We identify the patient as the direct object of an active verb or the subject of a passive verb; we identify the agent as the subject of an active verb or the object of preposition by modifying a passive verb. 24 Derivative nouns ending with suffixes -er, -or are discarded.
EVENT, engage in EVENT, play role in EVENT and series of EVENT. We rank extractions first by the number of times they occur with these patterns and then by the number of unique patterns they occur with. We next quickly went through the top 5,000 nouns and manually removed non-event words, which results in 3154 noun event words. Event Phrase Generalization: Including arguments into event representations generates event phrases that are too specific though. In order to obtain generalized event phrase forms, we replace named entity arguments with their entity types (Manning et al., 2014). We also replace personal pronouns with the entity type PERSON.