TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition

Training neural models for named entity recognition (NER) in a new domain often requires additional human annotations (e.g., tens of thousands of labeled instances) that are usually expensive and time-consuming to collect. Thus, a crucial research question is how to obtain supervision in a cost-effective way. In this paper, we introduce “entity triggers,” an effective proxy of human explanations for facilitating label-efficient learning of NER models. An entity trigger is defined as a group of words in a sentence that helps to explain why humans would recognize an entity in the sentence. We crowd-sourced 14k entity triggers for two well-studied NER datasets. Our proposed model, Trigger Matching Network, jointly learns trigger representations and soft matching module with self-attention such that can generalize to unseen sentences easily for tagging. Our framework is significantly more cost-effective than the traditional neural NER frameworks. Experiments show that using only 20% of the trigger-annotated sentences results in a comparable performance as using 70% of conventional annotated sentences.


Introduction
Named entity recognition (NER) is a fundamental information extraction task that focuses on extracting entities from a given text and classifying them using pre-defined categories (e.g., persons, locations, organizations) (Nadeau and Sekine, 2007).Recent advances in NER have primarily focused on training neural network models with an abundance of human annotations, yielding state-of-theart results (Lample et al., 2016).However, collecting human annotations for NER is expensive and time-consuming, especially in social media messages (Lin et al., 2017a) and technical domains such as biomedical publications, financial documents, legal reports, etc.As we seek to advance NER into more domains with less human effort, how to learn neural models for NER in a costeffective way becomes a crucial research problem.
The standard protocol for obtaining an annotated NER dataset involves an annotator selecting token spans in a sentence as mentions of entities, and labeling them with an entity type.However, such annotation process provides limited supervision per example.Consequently, one would need large amount of annotations in order to train high-performing models for a broad range of entity types, which can clearly be cost-prohibitive.The key question is then how can we learn an effective NER model in presence of limited quantities of labeled data?
We, as humans, recognize an entity within a sentence based on certain words or phrases that act as cues.For instance, we could infer that 'Kasdfrcxzv' is likely to be a location entity in the sentence "Tom traveled a lot last year in Kasdfrcxzv." We recognize this entity because of the cue phrase "travel ... in," which suggests there should be a location entity following the word 'in'.We call such phrases "entity triggers."Similar to the way these triggers guide our recognition process, we hypothesize that they can also help the model to learn to generalize efficiently.Specifically, we define an "entity trigger" (or trigger for simplicity) as a group of words that can help explain the recognition process of a particular entity in the same sentence.For example, in Figure 1, "had ... lunch at" 2 and "where the food" are two distinct triggers associated with the RESTAURANT entity "Rumble Fish."An entity We had a fantastic lunch at Rumble Fish yesterday , where the food is my favorite .
trigger should be a necessary and sufficient cue for humans to recognize its associated entity even if we mask the entity with a random word.Thus, unnecessary words such as "fantastic" should not be considered part of the entity trigger.
In this paper, we argue that a combination of entity triggers and standard entity annotations can enhance the generalization power of NER models.This approach is more powerful because unlabeled sentences, such as "Bill enjoyed a great dinner with Alice at Zcxlbz.", can be matched with the existing trigger "had ... lunch at" via their semantic relatedness.This makes it easier for a model to recognize "Zcxlbz" as a RESTAURANT entity.In contrast, if we only have the entity annotation itself (i.e., "Rumble Fish") as supervision, the model will require many similar examples in order to learn this simple pattern.Annotation of triggers, in addition to entities, does not incur significantly additional effort because the triggers are typically short, and more importantly, the annotator has already comprehended the sentence, identifying their entities as required in the traditional annotation.On the benchmark datasets we consider, the average length of a trigger in our crowdsourced dataset is only 1.5-2 words.Thus, we hypothesize that using triggers as additional supervision is a more cost-effective way to train models.
We crowd-sourced annotations of 14,708 triggers on two well-studied NER datasets to study their usefulness for the NER task.We propose a novel framework named Trigger Matching Network that learns trigger representations indicative of entity types during the training phase, and identifies triggers in an unlabeled sentence at inference time to guide a traditional entity tagger for delivering better overall NER performance.Our TMN framework consists of three components: 1) a trigger encoder to learn meaningful trigger representations for an entity type, 2) a semantic trigger matching module for identifying triggers in a new sentence, and 3) an entity tagger that leverages trigger representations for entity recognition (as present in existing NER frameworks).Different from conventional training, our learning process consists of two stages, in which the first stage comprises jointly training a trigger classifier and the semantic trigger matcher, followed by a second stage that leverages the trigger representation and the encoding of the given sentence using an attention mechanism to learn a sequence tagger.
Our contributions in this paper are as follows: • We introduce the concept of "entity triggers," a novel form of explanatory annotation for named entity recognition problems.We crowd-source and publicly release 14k annotated entity triggers on two popular datasets: CoNLL03 (generic domain), BC5CDR (biomedical domain).• We propose a novel learning framework, named Trigger Matching Network, which encodes entity triggers and softly grounds them on unlabeled sentences to increase the effectiveness of the base entity tagger (Section 3).• Experimental results (Section 4) show that the proposed trigger-based framework is significantly more cost-effective.The TMN uses 20% of the trigger-annotated sentences from the original CoNLL03 dataset, while achieving a comparable performance to the conventional model using 70% of the annotated sentences.

Problem Formulation
We consider the problem of how to cost-effectively learn a model for NER using entity triggers.In this section, we introduce basic concepts and their notations, present the conventional data annotation process for NER, and provide a formal task definition for learning using entity triggers.
In the conventional setup for supervised learning for NER, we let x = [x (1) , x (2) , • • • , x (n) ] de-note a sentence in the labeled training corpus D L .Each labeled sentence has a NER-tag sequence y = [y (1) , y (2) , • • • , y (n) ], where y (i) ∈ Y and Y can be {O, B-PER, I-PER, B-LOC, I-LOC, • • • }.The possible tags come from a BIO or BIOES tagging schema for segmenting and typing entity tokens.Thus, we have D L = {(x i , y i )}, and an unlabeled corpus D U = {x i }.
We propose to annotate entity triggers in sentences.We use T (x, y) to represent the set of annotated entity triggers, where each trigger t i ∈ T (x, y) is associated with an entity index e and a set of word indices {w i }.Note that we use the index of the first word of an entity as its entity index.That is, t = ({w 1 , w 2 , • • • } → e), where e and w i are integers in the range of [1, |x|].For instance, in the example shown in Figure 1, the trigger "had ... lunch at" can be represented as a trigger t 1 = ({2, 5, 6} → 7), because this trigger specifies the entity starting at index 7, "Rumble", and it contains a set of words with indices: "had" (2), "lunch" (5), and "at" (6).Similarly, we can represent the second trigger "where the food" as t 2 = ({11, 12, 13} → 7).Thus, we have T (x, y) = {t 1 , t 2 } for this sentence.
Adding triggers creates a new form of data D T = {(x i , y i , T (x i , y i )}.Our goal is to learn a model for NER from a trigger-labeled dataset D T , such that we can achieve comparable learning performance to a model with a much larger D L .

Trigger Matching Networks
We now present our framework for a more costeffective learning method for NER using triggers.We assume that we have collected entity triggers (the trigger collection process is discussed in Section 4.1).At a high-level, we aim to learn trigger representations for entity types that allow the entity tagger to generalize for sentences beyond the training phase.Our intuition is that triggers acting as cues for the same named-entity type should have similar trigger representations, and thus triggers can be identified in an unlabeled sentence at inference time by soft-matching between the sentence representation and trigger representations seen during training.We perform such softmatching using a self-attention mechanism.
We propose a straightforward yet effective framework, named Trigger Matching Networks (TMN), consisting of a trigger encoder (TrigEncoder), a semantic-based trigger matching module (TrigMatcher), and a base sequence tagger (SeqTagger).We have two learning stages for the framework: the first stage (Section 3.1) jointly learns the TrigEncoder and TrigMatcher, and the second stage (Section 3.2) uses the trigger vectors to learn NER tag labels.Figure 2 shows this pipeline.We introduce the inference in Section 3.3.

Trigger Encoding & Semantic Matching
Learning trigger representations and semantically matching them with sentences are inseparable tasks.Desired trigger vectors capture the semantics in a shared embedding space with token hidden states, such that sentences and triggers can be semantically matched.Recall the example we discussed in Sec. 1, "enjoyed a great dinner at" versus "had ... lunch at."Learning an attention-based matching module between entity triggers and sentences is necessary so that triggers and sentences can be semantically matched.Therefore, in the first stage, we propose to jointly train the trigger encoder (TrigEncoder) and the attention-based trigger matching module (TrigMatcher) using a shared embedding space.
Specifically, for a sentence x with multiple entities {e 1 , e 2 , • • • }, for each entity e i we assume that there is a set of triggers T i = {t We then create a training instance by pairing each entity with one of its triggers, denoted (x, e i , t For each reformed training instance (x, e, t), we first apply a bidirectional LSTM (BLSTM) on the sequence of word vectors3 of x, obtaining a sequence of hidden states that are the contextualized word representations h i for each token x i in the sentence.We use H to denote the matrix containing the hidden vectors of all of the tokens, and we use Z to denote the matrix containing the hidden vectors of all trigger tokens inside the trigger t.
In order to learn an attention-based representation of both triggers and sentences, we follow the self-attention method introduced by (Lin et al., Kzch is the leader of our group Kzch is the leader of our group 2017b) as follows: W 1 and W 2 are two trainable parameters for computing self-attention score vectors a sent and a trig .We obtain a vector representing the weighted sum of the token vectors in the entire sentence as the final sentence vector g s .Similarly, g t is the final trigger vector, representing the weighted sum of the token vectors in the trigger.We want to use the type of the associated entity as supervision to guide the trigger representation.Thus, the trigger vector g t is further fed into a multi-class classifier to predict the type of the associated entity e (such as PER, LOC, etc) which we use type(e) to denote.The loss of the trigger classification is as follows: where θ T C is a model parameter to learn.Towards learning to match triggers and sentences based on attention-based representations, we use contrastive loss (Hadsell et al., 2006).The intuition is that similar triggers and sentences should have close representations (i.e., have a small distance between them, d).We create negative examples (i.e., mismatches) for training by randomly mixing the triggers and sentences, because TrigMatcher needs to be trained with both positive and negative examples of the form (sentence, trigger, label).For the negative examples, we expect a margin m between their embeddings.The contrastive loss of the soft matching is defined as follows, where 1 matched is 1 if the trigger was originally in this sentence and 0 if they are not: The joint loss of the first stage is thus L = L T C + λL SM , where λ is a hyper-parameter to tune.

Trigger-Enhanced Sequence Tagging
The learning objective in this stage is to output the tag sequence y.Following the most common design of neural NER architecture, BLSTM-CRF (Ma and Hovy, 2016), we incorporate the entity triggers as attention queries to train a triggerenhanced sequence tagger for NER.Note that the BLSTM used in the the TrigEncoder and TrigMatcher modules is the same BLSTM we use in the SeqTagger to obtain H, the matrix containing the hidden vectors of all of the tokens.Given a sentence x, we use the previously trained TrigMatcher to compute the mean of all the trigger vectors ĝt associated with this sentence.Following the conventional attention method (Luong et al., 2015), we incorporate the mean trigger vector as the query, creating a sequence of attention-based token representations, H .
and v are trainable parameters for computing the trigger-enhanced attention scores for each token.Finally, we concatenate the original token representation H with the trigger-enhanced one H as the input ([H; H ]) to the final CRF tagger.Note that in this stage, our learning objective is the same as conventional NER, which is to correctly predict the tag for each token.

Inference on Unlabeled Sentences
When inferencing tags on unlabeled sentences, we do not know the sentence's triggers.Instead, we use the TrigMatcher to compute the similarities between the self-attended sentence representations and the trigger representations, using the most suitable triggers as additional inputs to the SeqTagger.Specifically, we have a trigger dictionary from our training data, T = {t|(•, •, t) ∈ D T }.Recall that we have learned a trigger vector for each of them, and we can load these trigger vectors as a look-up table in memory.For each unlabeled sentence x, we first compute its self-attended vector g s as we do when training the TrigMatcher.Using L2-norm distances to compute the contrastive loss, we efficiently retrieve the most similar triggers in the shared embedding space of the sentence and trigger vectors.
Then, we calculate ĝt , the mean of the top k nearest semantically matched triggers, as this serves a proxy to triggers mentioned for the entity type in the labeled data.We then use it as the attention query for SeqTagger, similarly in Sec.3.2.Now, we can produce trigger-enhanced sequence predictions on unlabeled data, as shown in Fig. 3.

Experiments
In this section, we first discuss how to collect entity triggers, and empirically study the dataefficiency of our proposed framework.

Annotating Entity Triggers as Explanatory Supervision
We use a general domain dataset CoNLL2003 (Tjong Kim Sang and De Meulder, 2003) and a bio-medical domain dataset BC5CDR (Li et al., 2016).Both datasets are wellstudied and popular in evaluating the performance of neural named entity recognition models such as BLSTM-CRF (Ma and Hovy, 2016).
In order to collect the entity triggers from human annotators, we use Amazon SageMaker Ground Truth4 to crowd-source entity triggers.More recently, Lee et al. (2020) developed an annotation framework, named LEAN-LIFE, which supports our proposed trigger annotating.Specifically, we sample 20% of each training set as our inputs, and then reform them to be the same format as we discussed in Section 2. Annotators are asked to annotate a group of words that would be helpful in typing and/or detecting the occurrence of a particular entity in the sentence.We masked the entity tokens with their types so that human annotators are more focused on the non-entity words in the sentence when considering the triggers.We consolidate multiple triggers for each entity by taking the intersection of the three annotators' results.
Statistics of the final curated triggers are summarized in Table 1.We release the 14k triggers to the community for future research in trigger-enhanced NER.

Base model
We require a base model to compare with our proposed TMN model in order to validate whether the TMN model effectively uses triggers to improve model performance in a limited label setting.We choose the CNN-BLSTM-CRF (Ma and Hovy, 2016) as our base model for its wide usage in research of neural NER models and applications.Our TMNs are implemented within the same codebase and use the same external word vectors from GloVE (Pennington et al., 2014).
The hyper-parameters of the CNNs, BLSTMs, and CRFs are also the same.This ensures a fair comparison between a typical non-trigger NER model and our trigger-enhanced framework.

Results and analysis
Labeled data efficiency.We first seek to study the cost-effectiveness of using triggers as an additional source of supervision.Accordingly, we explore the performance of our model and the base-  2. We can see that by using only 20% of the triggerannotated data, TMN model delivers comparable performance as the baseline model using 50-70% traditional training data.The drastic improvement in the model performance obtained using triggers thus justifies the slightly additional cost incurred in annotating triggers.
Self-training with triggers.
We also do a preliminary investigation of adopting selftraining (Rosenberg et al., 2005) with triggers.We make inferences on unlabeled data and take the predictions with high confidences as the weak training examples for continually training the model.The confidence is computed following the MNLP metric (Shen et al., 2017), and we take top 20% every epoch.With the self-training method, we further improve the TMN model's F-1 scores by about 0.5∼1.0%.Annotation time vs. performance.Although it is hard to accurately study the time cost on the crowd-sourcing platform we use 5 , based on our offline simulation we argue that annotating both triggers and entities are about 1.5 times ("BLSTM-CRF (x1.5)") longer than only annotating entities.our offline simulation.In Figure 4, The x-axis for BLSTM-CRF means the number of sentences annotated with only entities, while for TMN means the number of sentences tagged with both entities and triggers.In order to reflect human annotators spending 1.5 to 2 times as long annotating triggers and entities as they spend annotating only entities, we stretch the x-axis for BLSTM-CRF.For example, the line labeled ("BLSTM-CRF (x2)") associates the actual F1 score for the model trained on 40% of the sentences with the x-axis value of 20%.We can clearly see that the proposed TMN outperforms the BLSTM-CRF model by a large margin.Even if we consider the extreme case that tagging triggers requires twice the human effort ("BLSTM-CRF (x2)"), the TMN is still significantly more labor-efficient in terms of F1 scores. 5Annotators may suspend jobs and resume them without interaction with the crowd-sourcing platform.
Interpretability. Figure 5 shows two examples illustrating that the trigger attention scores help the TMN model recognize entities.The training data has 'per day' as a trigger phrase for chemicaltype entities, and this trigger matches the phrase 'once daily' in an unseen sentence during the inference phase of TrigMatcher.Similarly, in CoNLL03 the training data trigger phrase 'said it' matches with the phrase 'was quoted as saying' in an unlabeled sentence.These results not only support our argument that trigger-enhanced models such as TMN can effectively learn, but they also demonstrate that trigger-enhanced models can provide reasonable interpretation, something that lacks in other neural NER models.

Related Work
Towards low-resource learning for NER, recent works have mainly focused on dictionary-based distantly supervision (Shang et al., 2018;Yang et al., 2018;Liu et al., 2019).These approaches create an external large dictionary of entities, and then regard hard-matched sentences as additional, noisy-labeled data for learning a NER model.Although these approaches largely reduce human ef- forts in annotating, the quality of matched sentences is highly dependent on the coverage of the dictionary and the quality of the corpus.The learned models tend to have a bias towards entities with similar surface forms as the ones in dictionary.Without further tuning under better supervision, these models have low recall (Cao et al., 2019).Linking rules (Safranchik et al., 2020) focuses on the votes on whether adjacent elements in the sequence belong to the same class.Unlike these works aiming to get rid of training data or human annotations, our work focuses on how to more cost-effectively utilize human efforts.
Another line of research which also aims to use human efforts more cost-effectively is active learning (Shen et al., 2017;Lin et al., 2019).This approach focuses on instance sampling and the human annotation UI, asking workers to annotate the most useful instances first.However, a recent study (Lowell et al., 2019) argues that actively annotated data barely helps when training new models.Transfer learning approaches (Lin and Lu, 2018) and aggregating multi-source supervision (Lan et al., 2020) are also studied for using less expensive supervision for NER, while these methods usually lack clear rationales to advise annotation process unlike the trigger annotations.
Inspired by recent advances in learning sentence classification tasks (e.g., relation extraction and sentiment classification) with explanations or human-written rules (Li et al., 2018;Hancock et al., 2018;Wang* et al., 2020;Zhou et al., 2020), we propose the concept of an "entity trigger" for the task of named entity recognition.These prior works primarily focused on sentence classification, in which the rules (parsed from natural lan-guage explanations) are usually continuous token sequences and there is a single label for each input sentence.The unique challenge in NER is that we have to deal with rules which are discontinuous token sequences and there may be multiple rules applied at the same time for an input instance.We address this problem in TMN by jointly learning trigger representations and creating a soft matching module that works in the inference time.
We argue that either dictionary-based distant supervision or active learning can be used in the context of trigger-enhanced NER learning via our framework.For example, one could create a dictionary using a high-quality corpus and then apply active learning by asking human annotators to annotate the triggers chosen by an active sampling algorithm designed for TMN.We believe our work sheds light on future research for more costeffectively using human to learn NER models.

Conclusion
In this paper, we introduce the concept of "entity trigger" as a complementary annotation.Individual entity annotations provide limited explicit supervision.Entity-trigger annotations add in complementary supervision signals and thus helps the model to learn and generalize more efficiently.We also crowdsourced triggers on two mainstream datasets and will release them to the community.We also propose a novel framework TMN which jointly learns trigger representations and soft matching module with self-attention such that can generalize to unseen sentences easily for tagging named entities.
Future directions with TriggerNER includes: 1) developing models for automatically extracting novel triggers, 2) trans-ferring existing entity triggers to low-resource languages, and 3) improving trigger modeling with better structured inductive bias (e.g., OpenIE).
without loss of generality.To enable more efficient batch-based training, we reformat the triggerbased annotated dataset D T such that each new sequence contains only one entity and one trigger.
t e x i t s h a 1 _ b a s e 6 4 = " 5 D r 2 f r p 6 Y 7 8 D x A G D P O m R U M 5 R 8 y I l a t e x i t > type(e) = PER L T C < l a t e x i t s h a 1 _ b a s e 6 4 = " r u N I 3 1 k r s r Y x e K e p s I h l e H A N N b i D B r S A w C M 8 w y u 8 O d J 5 c d 6 d j 2 V r z s l m T u E P n M 8 f U P G O + A = = < / l a t e x i t > L < l a t e x i t s h a 1 _ b a s e 6 4 = " T O O 3 I k l / I g Y 7 t 8 S s 6 b a 6 s x g g P M M r v D m P z o v z 7 n w s W n N O N n M M f + B 8 / g C m I Y z Y < / l a t e x i t > ①: Jointly Training TrigEncoder & TrigMatcher Trigger-based Global Attention concat ĝt < l a t e x i t s h a 1 _ b a s e 6 4 = " v 1 d 7 l

Figure 2 :
Figure 2: Two-stage training of the Trigger Matching Network.We first jointly train the TrigEncoder (via trigger classification) and the TrigMatcher (via contrastive loss).Then, we reuse the training data trigger vectors as attention queries in the SeqTagger.
t e x i t s h a 1 _ b a s e 6 4 = " v Y c S C M f j d 5 s W v K T i T 9 f K R q A e q C k = " > A A A B 8 3 i c b V D L S s N A F L 2 p r 1 p fV Z d u B o v g q i S i 6 L L o x m U F + 4 C m l M l 0 0 g 6 d T M L M j V B C f 8 O N C 0 X c + j P u / B s n b R b a e m D g c M 6 9 3 D M n S K Q w 6 L r f T m l t f W N z q 7 x d 2 d n d 2 z + o H h 6 1 T Z x q x l s s l r H u B t R w K R R v o U D J u 4 n m N A o k 7 w S T u 9 z v P H F t R K w e c Z r w f k R H S o S C U b S S 7 0 c U x 0 G Y j Q Z m N q j W 3 L o 7 B 1 k l X k F q U K A 5 q H 7 5 w 5 i l E V f I J D W m 5 7 k J 9 j O q U T D J Z x U / N T y h b E J H v G e p o h E 3 / W y e e U b O r D I k Y a z t U 0 j m 6 u + N j E b G T K P A T u Y Z z b K X i / 9 5 v R T D m 3 4 m V J I i V 2 x x K E w l w Z j k B Z C h 0 J y h n F p C m R Y 2 K 2 F j q i l D W 1 P F l u A t f 3 m V t C / q 3 m X 9 6 u G y 1 r g t 6 i j D C Z z C O X h w D Q 2 4 h y a 0 g E E C z / A K b 0 7 q v D j v z s d i t O Q U O 8 f w B 8 7 n D 3 b h k f k = < / l a t e x i t >Trigger-based Global Attention concat ĝt < l a t e x i t s h a 1 _ b a s e 6 4 = " v 1 d 7 l G K q u w 2 h k 4 t F p q J W s O r K 2 h w = " > AA A B + 3 i c b V D L S s N A F J 3 U V 6 2 v W J d u B o v g q i S i 6 L L o x m U F + 4 A m l M l 0 0 g 6 d T M L M j V h C f s W N C 0 X c + i P u / Bs n b R b a e m D g c M 6 9 3 D M n S A T X 4 D j f V m V t f W N z q 7 p d 2 9 n d 2 z + w D + t d H a e K s g 6 N R a z 6 A d F M c M k 6 w E G w f q I Y i Q L B e s H 0 t v B 7 j 0 x p H s s H m C X M j 8 h Y 8 p B T A k Y a 2 n V v Q i D z I g K T I M z G Q 8 j z o d 1 w m s 4 c e J W 4 J W m g E u 2 h / e W N Y p p G T A I V R O u B 6 y T g Z 0 Q B p 4 L l N S / V L C F 0 S s Z s Y K g k E d N + N s + e 4 1 O j j H A Y K / M k 4 L n 6 e y M j k d a z K D C T R U i 9 7 B X i f 9 4 g h f D a z 7 h M U m C S L g 6 F q c A Q 4 6 I I a s c u c I / Y H 1 + Q P W Y 5 T 4 < / l a t e x i t > t e x i t s h a 1 _ b a s e 6 4 = " y H T q d Z e z y 1 I z D p t h V / l 9 3 G U J a b I = " > A A A B + 3 i c b V B N S 8 N A F N z U r 1 q / Y j 1 6 W S y C p 5 K U i h 4 L X j x W s K 3 Q h r D Z b t q l m 0 3 Y f R F L y F / x 4 k E R r / 4 R b / 4 b N 2 0 O 2 j q w M M y 8 x 5 u d I B F c g + N 8 W 5 W N z a 3 t n e p u b W / / 4 P D I P q 7 3 d Z w q y n o 0F r F 6 C I h m g k v W A w 6 C P S S K k S g Q b B D M b g p / 8 M i U 5 r G 8 h 3 n C v I h M J A 8 5 J W A k 3 6 5 n o 4 j A N A i z i Z + B 3 8 r z 3 L c b T t N Z A K 8 T t y Q N V K L r 2 1 + j c U z T i E m g g m g 9 d J 0 E v I w o 4 F S w v D Z K N U s I n Z E J G x o q S c S 0 l y 2 y 5 / j c K G M c x s o 8 C X i h / t 7 I S K T 1 P A r M Z B F U r 3 q F + J 8 3 T C G 8 9 j I u k x S Y p M t D Y S o w x L g o A o + 5 Y h T E 3 B B C F T d Z M Z 0 S R S i Y u m q m B H f 1 y + u k 3 2 q 6 7 e b l X b v R a Z Z 1 V N E p O k M X y E V X q I N u U R f 1 E E V P 6 Bm 9 o j c r t 1 6 s d + t j O V q x y p 0 T 9 A f W 5 w + 4 7 p T U < / l a t e x i t > g t 3 < l a t e x i t s h a 1 _ b a s e 6 4 = " M q m P x 0 W k 6 X h a l f j Wx O f N B I u n 3 1 g = " > A A A B + 3 i c b V B N S 8 N A F N z U r 1 q / Y j 1 6 W S y C p 5 B o R Y 8 F L x 4 r 2 F p o Q 9 h s N + 3 S z S b s v o g l 5 K 9 4 8 a C I V / + I N / + N 2 z Y H b R 1 Y G G b e 4 8 1 O m A q u w X W / r c r a + s b m V n W 7 t r O 7 t 3 9 g H 9 a 7 O s k U Z R 2 a i E T 1 Q q K Z 4 J J 1 g I N g v V Q x Eo e C P Y S T m 5 n / 8 M i U 5 o m 8 h 2 n K / J i M J I 8 4 J W C k w K 7 n g 5 j A O I z y U Z B D c F E U R W A 3 X M e d A 6 8 S r y Q N V K I d 2 F + D Y U K z m E m g g m j d 9 9 w U / J w o 4 F S w o j b I N E s J n Z A R 6 x s q S c y 0 n 8 + z F / j U K E M c J c o 8 C X i u / t 7 I S a z 1 N A 7 N 5 C y o X v Z m 4 n 9 e P 4 P o 2 s + 5 T D N g a t e x i t > g t 5 < l a t e x i t s h a 1 _ b a s e 6 4 = " Y s B P t T 4 P y c p g e Q J C / a Z T l W r J r 0 Y = " > A A A B + 3 i c b V D L S s N A F L 2 p r 1 p f s S 7 d D B b B V U n E o s u C G 5 c V 7 A P a E C b T S T t 0 M g k z E 7 G E / I o b F 4 q 4 9 U f c + T d O 2 i y 0 9 c D A 4 Z x 7 u W d O k H C m t O N 8 W 5 W N z a 3 t n e p u b W / / 4 P D I P q 7 3 V J x K Q r s k 5 r E c B F h R z g T t a q Y 5 H S S S 4 i j g t B / M b g u / / 0 i l Y r F 4 0 P O E e h G e C B Y y g r W R f L u e j S K s p 0 G Y T f x M + 6 0 8 z 3 2 7 4 T S d B d A 6 c U v S g B I d 3 / 4 a jW O S R l R o w r F S Q 9 d J t J d h q R n h N K + N U k U T T G Z 4 Q o e G C h x R 5 W W L 7 D k 6 N 8 o Y h b E 0 T 2 i 0 U H 9 v Z D h S a h 4 F Z r I I q l a 9 Q v z P G 6 Y 6 v P E y J p J U U 0 G W h 8 K U I x 2 j o g g 0 Z p I S z e e G Y C K Z y Y r I F E t M t K m r Z k p w V 7 + 8 T n q X T f e q 2 b q / a r S b Z R 1 V O I U z u A A X r q E N d 9 C B L h B 4 g m d 4 h T cr t 1 6 s d + t j O V q x y p 0 T + A P r 8 w e 9 g 5 T X < / l a t e x i t > g t 1 < l a t e x i t s h a 1 _ b a s e 6 4 = "N E v l k A 3 N W H b V s K h n W c o g 2 E Z u + q 0 = " > A A A B + 3 i c b V D L S s N A F L 2 pr 1 p f s S 7 d D B b B V U i k o s u C G 5 c V 7 A P a E C b T S T t 0 M g k z E 7 G E / I o b F 4 q 4 9 U f c + T d O 2 y y 0 9 c D A 4 Z x 7 u W d O m H K m t O t + W 5 W N z a 3 t n e p u b W / / 4 P D I P q 5 3 V Z J J Q j s k 4 Y n s h 1 h R z g T t a K Y 5 7 a e S 4 j j k t B d O b + d + 7 5 F K x R L x o G c p 9 W M 8 F i x i B G s j B X Y 9 H 8 Z Y T 8 I o H w e 5 D r y i K A K 7 4 T r u A m i d e C V p Q I l 2 Y H 8 N R w n J Y i o 0 4 V i p g e e m 2 s + x 1 I x w W t S G m a I p J l M 8 p g N D B Y 6 p 8 v N F 9 g K d G 2 W E o k S a J z R a q L 8 3 c h w r N Y t D M z k P q l a 9 u f i f N 8 h 0 d O P n T K S Z p o I s D 0 U Z R z p B 8 y L Q i E l K N J 8 Z g o l k J i s i E y w x 0 a a u m i n B W / 3 y O u l e O l 7 T u b p v N l p O W U c V T u E M L s C D a 2 j B H b S h A w S e 4 B l e 4 c 0 q r B f r 3 f p Y j l a s c u c E / s D 6 / A G 3 Z 5 T T < / l a t e x i t > k-nearest "leader of … group" à PER Pyzxc is the head of our team Pyzxc is the head of our team ②: Tagging the Sequence ①: Soft-Matching Triggers

Figure 3 :
Figure3: The inference process of the TMN framework.It uses the TrigMatcher to retrieve the k nearest triggers and average their trigger vectors as the attention query for the trained SeqTagger.Thus, an unseen phrase (e.g., "head of ... team") can be matched with a seen trigger (e.g., "leader of ... group").

Figure 4 :
Figure4: The cost-effectiveness study.We stretch the curve of BLSTM-CRF parallel to the x-axis by 1.5/2.Even if we assume annotating entity triggers cost 150/200% the amount of human effort as annotating entities only, TMN is still much more effective.

Figure 5 :
Figure 5: Two case studies of trigger attention during inference.The darker cells have higher attention weights.

Table 1 :
Statistics of the crowd-sourced entity triggers.