Joint Event Extraction with Hierarchical Policy Network

Most existing work on event extraction (EE) either follows a pipelined manner or uses a joint structure but is pipelined in essence. As a result, these efforts fail to utilize information interactions among event triggers, event arguments, and argument roles, which causes information redundancy. In view of this, we propose to exploit the role information of the arguments in an event and devise a Hierarchical Policy Network (HPNet) to perform joint EE. The whole EE process is fulfilled through a two-level hierarchical structure consisting of two policy networks for event detection and argument detection. The deep information interactions among the subtasks are realized, and it is more natural to deal with multiple events issue. Extensive experiments on ACE2005 and TAC2015 demonstrate the superiority of HPNet, leading to state-of-the-art performance and is more powerful for sentences with multiple events.


Introduction
Event extraction (EE) plays an important role in various real-life applications, such as information retrieval and news summarization (Glavas and Snajder, 2014;Daniel et al., 2003). It aims to discover events with triggers and their corresponding arguments. Typically, EE contains several subtasks: trigger identification, trigger classification, event argument identification, and argument role classification.
Some researchers handle these subtasks in a pipelined manner, i.e., perform trigger prediction and then identify arguments in separate stages, assuming that gold standard entities are provided (McClosky et al., 2011;Nguyen and Grishman, 2018;Yang et al., 2019). However, this staged manner has no strategy to take advantage of the deeper information interactions among these subtasks, so the upstream and the downstream subtasks cannot interfere with each other to improve their decisions. Although there have been joint approaches which aim to build a joint extractor (Yang and Mitchell, 2016;Nguyen and Nguyen, 2019;, they still follow a pipelined framework by first jointly predicting entities and triggers, and then scanning every entity-event pair for arguments and argument roles. One of the same limitations they share is that they produce redundant entity-event pairs information, thus bringing in possible errors. Another is the mismatch between arguments and triggers when the sentence contains multiple events. For example, consider the following sentence: In Baghdad, a cameraman died when an American tank fired on the Palestine hotel. In this sentence, "cameraman" is not only a Victim argument of event Die (trigger "died"), but also a Target argument of event Attack (trigger "fired"). However, as "cameraman" is far away from the trigger "fired" in the sentence, the extractor may fail to identify "cameraman" as one argument for event Attack.
In view of this, we propose a Hierarchical Policy Network (HPNet) to jointly solve the subtasks of EE, where 1) trigger identification and trigger classification are solved together, and 2) event argument identification and argument role classification are solved together by a hierarchy of an event-level policy network (PN) and an argument-level PN. This two-level hierarchical structure works as follows. During the scanning from the beginning to the end of a sentence, the event-level PN first detects the trigger and classifies its event type at each word. Once a certain event is detected, an argument-level PN is triggered to detect participating arguments and classify the roles they play in the current event. When the argument-level process under this event is finished, the event-level PN continues its scan to search for other events in the sentence until it scans to the end. The learning of the EE subtasks is formulated as a sequential decision problem, which can be addressed by policy gradient method (Sutton et al., 1999).
HPNet realizes deep information interactions among the subtasks by passing state representations and rewards in the two-level hierarchical structure: the event-level PN passes fine-grained semantic information to the argument-level PN through state representations when triggering it, and the argumentlevel PN passes rewards back to the event-level PN to convey how well it is completed. Besides, the event scheme information from the event-level decision also benefits the argument-level decision. As HPNet uses a hierarchical structure to detect the event first and then identify its participating arguments, redundant information and the mismatch issues can be potentially avoided.
To sum up, our main contribution is at least three-fold: (1) To our knowledge, this is the first work applying policy network, a deep reinforcement learning method, to extract event.
(2) We utilize a twolevel hierarchical structure to realize joint EE. With well-designed state representations, rewards, and an event scheme retrieval table, this structure fully explores the deep information interactions among EE subtasks and addresses the multiple events and mismatch issues. (3) We design comprehensive experiments on the widely used ACE2005 and TAC2015 datasets. The experimental results show that our method HPNet significantly outperforms other state-of-the-art methods, especially in dealing with multiple events. This paper proceeds as follows. First, we discuss the work that is related to pipelined EE, joint EE, and policy network ( §2). Next, we present HPNet, a two-level policy network-based structure that contains a hierarchy of an event-level PN and an argument-level PN, followed with their hierarchical training details ( §3). Then we describe the experiments on two benchmark datasets ACE2005 and TAC2015, followed by the results and discussions ( §4). We finish with a conclusion of the paper ( §5).

Related Work
Pipelined Event Extraction The early work on EE has focused on the pipelined approach, which performs the subtasks of EE in separate stages (McClosky et al., 2011;Nguyen and Grishman, 2018). They either rely on manually designed features or use convolutional neural networks (CNNs) to construct completely separate classifiers for trigger labeling and argument role classification, regarding entities as being provided by external annotators.
Recently, some methods extract entities and events to utilize the dependency of these two subtasks. Above work exploits different structured prediction methods, including Markov Logic Networks (Poon and Vanderwende, 2010), dual decomposition (Riedel and McCallum, 2011), structured perceptron (Li et al., 2013) and attention-based graph CNN (Liu et al., 2018). Some other methods develop inference models for events and argument roles, including structured predictions with Markov Logic (Riedel et al., 2009) and parameter sharing (Sha et al., 2018). However, the above studies are limited to modeling only two subtasks, either assuming the golden annotations for other subtasks or simply ignoring them.
Joint Event Extraction There have been efforts towards joint modeling of event triggers, event types, arguments, and argument roles. Yang and Mitchell (2016) consider information interactions among these subtasks through a two-stage framework, in which the k-best outputs of triggers and entities are first selected, and re-ranking is then used for joint inference. Nguyen and Nguyen (2019) propose to share common encoding layers to enable the information sharing, and decode the event triggers, arguments, and roles separately.  use transition systems for the dependency parsing of the input sentence and decode the labels for subtasks with score ranking. However, as mentioned in Section 1, these methods follow a pipelined decoding order, which makes them face the information redundancy and the mismatch problem. Moreover, they all strongly rely on the annotations of training data.
Policy Network Policy network is one of the most important deep reinforcement learning methods, which has been frequently witnessed in recent information extraction work. Zhang et al. (2018) employs PN for structure discovery in sentence representations prepared for the downstream text classification. Qin et al. (2018) propose a policy network-based distant supervision relation extraction method in which they use PN to define a policy for redistribution of false positives.  use PN to construct an instance selector to obtain a weak supervision signal for relation classification from noisy data. Takanobu et al. (2019) propose to apply hierarchical reinforcement learning to relation extraction task,  Figure 1: An example of EE process by HPNet on a sentence containing two events (Die and Attack). The straight arrow indicates the event-level and the argument-level process. The curved arrow marks a transition between the event-level and the argument-level process. O 3 indicates the event-level option (Die) for word "died" at time step 3, while A 5 is the argument-level action (B-Victim) for word "cameraman" at time step 5 under the event-level option O 3 .
which gives highly competitive results. To the best of our knowledge, we are the first to incorporate a hierarchical policy network for event extraction task.

Overview
We introduce a reward-driven, policy-based method HPNet to hierarchically detect events and their participating arguments. As shown in Figure 1, given a sentence, HPNet realizes the entire extraction process as follows. As the agent scans this sentence sequentially, the event-level PN keeps following a policy to sample an option (indicating a non-trigger word or a trigger with a specific event type) at each time step. Once a certain event is detected, the agent transfers to the argument-level PN and follows a policy to select an action (assigning an exact argument tag). When the agent finishes its scanning for participating arguments under the current event, it scans the rest of the sentence for other events. Since a reward can be computed once the option/action is sampled, the process can be naturally addressed by policy gradient method (Sutton et al., 1999). By detecting an event first and then detecting participating arguments for that event, our method can achieve effective EE even when multiple events exist.

Event-level Policy Network
Given the input sentence S = w 1 , w 2 , ..., w L , the event-level PN aims to detect the event type that the trigger word w i triggers. Specifically, at the current word/step t, the policy network adopts a stochastic policy µ over options and uses rewards to guide the policy learning. It samples an option with the probability at the current state whose representation is history-dependent. We briefly introduce option, state, policy, and reward as follows: Option The event-level option o e t is sampled from an option set O e = {N E} ∪ E, where N E indicates non-trigger word and E is the event type set, which is predefined in the dataset, indicating which type the current trigger is triggering.
State The state s t e ∈ S e of the event-level process is history-dependent, encoding the previous environment as well as the current input. s t e is the concatenation of 1) the state s t−1 from the last time step (where s t−1 = s t−1 e if the agent launches an event-level PN at time step t−1, otherwise s t−1 = s t−1 r ), 2) the event type vector v t e which is learned from the latest option o e t that satisfies o e t ∈ E and 3) the hidden state vector h t over the current input word embedding w t , represented by: We obtain the hidden state vector h t through a sequential word-level Bi-LSTM: Finally, we use MLP to represent the state as a continuous real-valued vector F e (s t e ). Policy The stochastic policy for event detection over the possible options µ : S e → O e samples an option o e t ∈ O e according to a probability distribution: where W e and b e are the parameters, F e (s t e ) is the state feature vector. During training, the option is sampled according to the probability in Eq 3. During test, the option with the maximal probability will be chosen (i.e.,o e t * = arg max o e µ(o e |s t e )).
Reward An intuition of rewards is that the event-level PN ultimately aims to recognize and categorize events, viewing triggers as intermediate results.
Once an event-level option o e t is sampled, the agent will receive an instant reward which estimates the short-term return under option o e t . The instant reward is computed by consulting the gold-standard event type annotations S e t of sentence S: where sgn(·) is the sign function, I(N E) is a switching function which distinguishes the reward of the trigger and the non-trigger word: and α is a bias weight. The smaller α (α < 1) is, the less reward on non-trigger words, which will prevent the model from learning a trivial policy to predict all words as NE (non-trigger words). The transition of the event-level PN depends upon option o e t . If o e t = N E at a time step, the agent will continue at a new event-level PN state. Otherwise, a specific event is detected, and the agent will launch a new subtask and transfer to the argument-level PN to detect participating arguments of this event. Hence, the agent starts the argument-level options with an event-initialized state. It will not transfer to the eventlevel PN until all the argument-level options under the current event o e t are finished. The event-level options are sampled until the option of the last word in S. When the agent finishes all the event-level options, it receives a terminal reward r e ter . The sentence-level event detection performance defines this delayed reward at the terminal state: where F 1 (·) represents F 1 scores which is the harmonic mean of sentence-level precision and recall of event detection results.

Argument-level Policy Network
Once a specific event is detected at a time step t (i.e., o e t ∈ E), the agent transfers to the argument-level PN to predict the role that each argument plays under the event o e t . Specifically, at each word/step t, the argument-level PN adopts a stochastic policy π over actions and uses rewards to guide the learning for participating arguments as well as their roles under the current event. To pass more fine-grained event information for the argument decision, the option o e t as well as the state representation s t e from the event-level process are taken as additional inputs throughout the whole argument-level process. We introduce action, state, policy and reward as follows: A cameraman died when an American tank fired on the Palestine Hotel .  Figure 2: Examples of argument tags under different moments for the sentence in Section 1. "cameraman" and "Palestine hotel" are both participating arguments in event Die and Attack, while "cameraman" has different roles under distinct events. Also, "American tank" is a participating argument of event Attack but is irrelevant with event Die. As a result, both "cameraman" and "American tank" have different tags under different moments.
Action The argument-level action a r t is to assign an argument tag to the current word. a r t is selected from an action space A r = ({B, I} × R) ∪ {O, N R}, where B/I represents the position information (Begin, Inside) of a word in an argument, O tags arguments unrelated to the current event, R is a predefined argument role set in the dataset and N R tags non-argument words. Note that, the same argument may be assigned with different tags depending on distinct event types concerned at the moment. In this way, multiple events and mismatch issues can be naturally solved. We use Figure 2 to illuminate.
Then we take advantage of the event scheme information, which specifies each event type's possible argument type to filter the action space. Event schema describes the specific argument roles for each event type, as shown in Table 1. Ding et al. (2018) uses an event scheme to build a mask matrix to filter out irrelevant argument roles. Inspired by their work, we devise a predefined event scheme retrieval table R |E| * |R| to narrow down the argument role set R. Finally, the filtered action space A motivation for this event scheme retrieval table is that the implicit argument information from the event-level decision can be fully utilized. Besides, by this retrieval table, the action space is narrowed down, enabling a more efficient agent.
State The state s t r ∈ S r of the argument-level process is also history-dependent, encoding the previous environment, the initial event environment and the current input. s t r is represented by the concatenation of 1) the state s t−1 from previous time step (where s t−1 may be a state either from the event-level PN or the argument-level PN), 2) the argument tag vector v t r which is a learnable embedding of the action a r t−1 , 3) the event state representation G e (s t e ) and 4) the hidden vector h t obtained from the similar Bi-LSTM in Eq. 2, as follows: The state is then represented as a real-valued vector F r (s t r ) with MLP.
Policy With event type o e t as an additional input, the stochastic policy for argument detection over actions π : S r → A r selects an action a r t according to a probability distribution: where W r and b r are the parameters, F r (s t r ) is the argument-level state feature vector, o t e is the representation of the event o e t and W µ is an array of |E| matrices. During training and test, the actions will be sampled in the similar way as the options of event-level PN.

Reward
Once an argument-level action a r t is selected, the agent will receive an instant reward r r t , provided by consulting the gold argument annotation S r t (o e t ) conditioned on the predicted event type o e  where I(N R) is a switching function which distinguishes the reward of argument and non-argument word: and β is a bias weight. The smaller β (β < 1) means the less reward on the non-argument word, preventing the agent from learning a trivial policy to select all the actions as N R.
The actions are selected until the action of the last word. When the agent finishes all the argument-level actions under the current event o e t , it receives a terminal reward r r ter :

Hierarchical Training
To optimize the event-level PN and the argument-level PN, we aim to maximize the expected cumulative discounted rewards from the option and action that the agent samples following the event-level policy µ( * ) and the argument-level policy π( * |o t ) at each time step t, which can be computed as follows: where γ ∈ [0, 1] is a discount rate, T e is the total time steps that the event-level process takes before it terminates and T r is the ended time step of the argument-level process. We then decompose the cumulative rewards as: where N is the time steps that the argument-level process continues under option o e t , so the next option of the agent is o t+N . If o e t = N E, then N = 1. The optimization is achieved by using policy gradient method (Sutton et al., 1999) and the REIN-FORCE algorithm (Williams, 1992), which updates parameters with the following stochastic gradients: We describe the entire training process of HPNet in Appendix A.

Experimental Setup
Datasets and Evaluations We perform evaluation on two standard datasets: Automatic Content Extraction program of 2005 (ACE2005) and Event Nugget data of TAC 2015 (TAC2015) (Mitamura et al., 2015). They contain text documents from a variety of sources, such as newswire reports and discussion forums. While ACE2005 provides annotations for all subtasks involved (event triggers and event arguments), TAC2015 provides only annotations for event triggers. For ACE2005, we use the same setup as   Table 3: Main results on the ACE2005 and TAC2015. The comparison between HPNet and JointTransition is significant with p < 0.05. indicates the results adapted from the original paper. For TAC2015, "Event Trigger Identification" corresponds to the "Span" metric and "Event Trigger Classification" corresponds to the "Type" metric reported in official evaluation (cf. Section 4.2 for detailed analysis). Liu et al. (2018). For TAC2015, we use the official training and test sets, and manually reserve some documents in the training set for development. Descriptive statistics of two datasets are provided in detail in Table 2. We utilize the Stanford CoreNLP toolkit 2 to do the preprocessing for the sentences (i.e., POS tagging, chunking, and dependency parsing). We follow standard evaluation criteria for events and arguments (Ji and Grishman, 2008). A trigger is correct if its span and event type match a reference trigger. An argument is correct if its span, event type and role type match a reference argument. Our evaluations report the identification and classification of event triggers and arguments. An event trigger or argument is correctly identified if its offsets match a reference trigger or argument, and correctly classified if there is an additional match in event type or role. We report micro-averaged Precision (P), Recall (R) and F1 score (F1) in all evaluations.
Parameters and Training Details All hyper-parameters are tuned on the development set. The word embeddings are pre-trained by Glove algorithm (Pennington et al., 2014). The vectors of event type and argument tag are initialized randomly. The bias weight α = 0.05 in Eq. 4, β = 0.1 in Eq. 10 and the discount rate γ = 0.9.
Baselines We compare our method with several state-of-the-art methods, which can be divided into two categories: the pipelined methods and the jointly learning methods. For the pipelined methods, the entity candidates are obtained through a CRF entity extractor (Yang and Mitchell, 2016). These methods include: (1) StagedMaxent is a typical feature-based two-stage approach that first detects event triggers and then event arguments (Yang and Mitchell, 2016). (2) TwoStageBeam is a pipelined model with structure perception and global features (Li et al., 2013). (3) DMCNN is the most successful pipelined model for EE, which uses dynamic multi-pooling convolutional neural networks . The joint methods include: (4) dbRNN adds dependency bridges over Bi-LSTM for joint EE. It uses Stanford constituency parser 3 to parse every sentence and takes predicted NP nodes as candidate arguments (Sha et al., 2018). (5) JMEE-NP is our reimplementation of JMEE (Liu et al., 2018) method, which also takes NP nodes from Stanford constituency parser as candidate arguments. (6) JointTransition performs joint decoding for the subtasks using a neural transition-based method . This method currently has the state-of-the-art performance on the ACE2005 dataset. We also include: (7) TAC2015Best corresponds to reported results for Hong et al. (2015) including semi-supervised learning, achieving the best performance in TAC2015 Evaluation. Table 3 shows the results on ACE2005 and TAC2015. From the table, we observe that: (1) HPNet steadily outperforms all the baselines significantly. Compared with them, HPNet gains at least 4.0% absolute F1 score improvement in triggers, 3.8% in arguments on ACE2005, and at least 2.1 % absolute   F1 score improvement in triggers on TAC2015, respectively.

Main Results
(2) Regarding argument detection, the joint system with predicted entity mentions (i.e., dbRNN with 57.2% and 50.1% for arguments and argument roles respectively) performs worse than that with perfect entity mentions (i.e., 67.7% and 58.7% for arguments and argument roles respectively in Sha et al. (2018)). This is consistent with the significant performance drop of JMEE-NP compared to JMEE (Liu et al., 2018) with gold entity mentions. Such evidence reveals that they rely heavily on entity annotations from external tools. However, our HPNet is significantly better than both methods with respect to all the F1 scores in Table 3, with no need for gold entity tagging.
(3) In general, the joint systems outperform the pipelined systems on both datasets for all the subtasks, proving the advancement of joint modeling to some extend. (4) Compared with the current state-of-the-art method JointTransition, our HPNet gives a lower Recall in argument detection. The reason may be that the event type information from the event-level process limits arguments decision to some extent, especially with the event scheme retrieval table. Nevertheless, our HPNet can balance the Precision and Recall, and gains the highest F1 score with the hierarchical structure and policy network.

Effect of Hierarchical Structure
In this section, we prove our two-level hierarchical structure's effectiveness for building the deep interactions between events and arguments. We design a variant of HPNet, HPNet-Argument which removes the argument-level PN and only keeps the event-level PN. We investigate event detection performance and report the results of event trigger classification from the event-level PN. Moreover, to prove that HPNet could alleviate multiple events issue through hierarchical structure, we divide the test data (all) of ACE2005 into two parts (1/1 and 1/N) following the previous work , where 1/1 means that only one event or one argument plays a role in one sentence, and 1/N contains all other cases. We perform evaluations on two test sets separately. Statistically, 27.3% of the sentences have multiple events, and 23.2% of arguments attend multiple events within one sentence in the test set. Table 4 shows the performance (F1 score) of HPNet, the variant, DMCNN and two baseline systems Embedding+T and CNN . Embedding+T uses word embeddings and the traditional sentence-level features (Li et al., 2013), while CNN is similar to DMCNN, except that it applies the standard pooling mechanism instead of the dynamic multi-pooling method. For comparison, we also include JMEE with gold entity mentions, which is the state-of-the-art system dealing with multiple events issue. From Table 4, we can see that our method achieves the best performance on both 1/N dataset and all. There is a significant performance drop (2.1%) of HPNet compared with HPNet-Argument on 1/1. The reason may be that, in 1/1 test set, the interactions between two policy networks have almost no influence on event detection as 1/1 has only sentences containing one event. However, comparing to the variant HPNet-Argument, our HPNet yields at least 29.8% improvement on 1/N, indicating that our two-level structure captures the interactions among the subtasks and event-level PN also benefits from the argument-level process. Moreover, our HPNet yields a substantial improvement over all baselines and the variant on 1/N dataset, implying that our structure can better capture the event type information and is more powerful in extracting multiple events. Therefore, our two-level hierarchical structure indeed enhances the information interactions among the subtasks and alleviates the multiple events issue.

Effect of Policy Network
In this section, we prove the policy network's effectiveness for improving the generalization ability of HPNet. We divide the triggers and arguments in the test data (all) of ACE2005 into two parts (A and B) following the previous work , where situation A represents the triggers or arguments appearing in both the training set and the test set, and situation B denotes all other cases. Statistically, 34.9% of the triggers and 83.1% of the test set's arguments never appear to be the same event type in the training set. We perform evaluations and report the results in event detection and argument detection. Table 5 gives the results (F1 score) of HPNet, two baselines Traditional and Reduced DMCNN. Traditional is described in previous work (Li et al., 2013) as the traditional method, and Reduced DMCNN is simplified DMCNN which only uses word embedding as lexical feature . In Table 5, our HPNet makes significant improvements compared with the above methods in the classification of both events and arguments under all situations. Noticeably, there is a significant performance gap between situation A and situation B. All the methods suffer from data sparsity and could not adequately handle a situation where a trigger or an argument does not appear in the training data. However, significant improvements (at least 36.8% for events and 19.7% for arguments) of HPNet can be observed on situation B. This occurs because the policy network adaptively communicates with the environments (also new environments) to obtain the knowledge and uses rewards to guide the learning for the subtasks. As a result, it is more powerful in dealing with generalized data and improving the generalization ability of HPNet.

Error Analysis
To analyze our method's operation on event detection, we notice from Table 3 that the event trigger classification performance is quite close to that of the trigger identification, suggesting that the main errors lie in the event trigger identification. So we examine the predicted results of the event to determine each event type's contributions to the trigger identification errors. There are two types of errors (MISSED and INCORRECT) where MISSED denotes that a trigger is missed during evaluation and INCORRECT indicates that a trigger is incorrectly detected. We report the top eight event types which appear in the errors and their proportion over the total numbers of errors in Figure 3. These top eight event types account for more than 90% of all errors.
As we can see from the Figure 3, the majority of errors relate to missing triggers. The rest of IN-CORRECT errors involve spurious triggers (e.g., a non-trigger word is classified to a specific event type) and misclassified triggers (e.g., a trigger of event End-Position is mistaken for event Attack). Attack and Transport are the event types that account for a large percentage of both errors. A closer look at the errors reveals that the MISSED errors are mainly due to lexical sparsity. For example, in the following sentence: Vivendi earlier this week ... that it planned to shed its entertainment assets ... Our model fails to decide the "shed" as an event Transfer-Ownership as it never saw "shed" with this sense during training and rarely saw that in testing. Although using a policy network can improve the generalization ability and alleviate the data sparsity issue to some extend, some instances are also necessary for the policy network to guide policy learning. The INCORRECT errors mainly result from ambiguous context. For example, in the following sentence: ... Davies is scheduled to leave, to become the chairman of the London. The word "leave" (event End-Position) can be easily misinterpreted as an event Movement due to its ambiguous context words "scheduled" and "London". The ambiguity of contexts requires better modeling of the input sentence. Other reasons for both types of event detection errors include complex languages such as metaphor, idioms, and sarcasm, which require lavish background knowledge and deeper reasoning.
For argument detection, we notice from Table 3 that the performance of event argument identification (60.9%) is better than that of argument role classification (56.8%). This suggests that a large number of correctly identified arguments cannot be classified properly. Among the misclassified arguments, many errors (87.3%) come from incorrect roles (e.g., an argument with role Target is misclassified as that with role Agent). The remaining errors (12.7%) are associated with incorrect event type (i.e., an argument role is correctly classified but is assigned to an incorrect event type). A closer look at the errors shows that the majority of incorrect roles belongs to co-occurring antonymous roles, such as the argument "Hassan" (with role Target) in the following sentence: British officials say they believe Hassan was a blindfolded woman seen being shot in the head ... The model wrongly detects that "Hassan" refers to the Attacker who conducted the "shot" while "woman" was the Target of the "shot". This is mainly due to the complex context between the words "Hassan" and "shot". Similar confusable roles include Attacker/Victim, Origin/Destination, assigner/receiver and so on.

Conclusion
This paper presents a novel model HPNet, which approaches event extraction via a hierarchical policy network. In HPNet, the extraction process follows a two-level hierarchical structure: an event-level PN for events and an argument-level PN for participating arguments. Thanks to the hierarchical structure, HPNet is good at modeling deep information interactions among the subtasks, and particularly better in dealing with the multiple events issue. Comprehensive experiments demonstrate that our HPNet outperforms the state-of-the-art methods. As future work, this HPNet will be applied in cross-event or cross-document EE.