Rewarding Coreference Resolvers for Being Consistent with World Knowledge

Unresolved coreference is a bottleneck for relation extraction, and high-quality coreference resolvers may produce an output that makes it a lot easier to extract knowledge triples. We show how to improve coreference resolvers by forwarding their input to a relation extraction system and reward the resolvers for producing triples that are found in knowledge bases. Since relation extraction systems can rely on different forms of supervision and be biased in different ways, we obtain the best performance, improving over the state of the art, using multi-task reinforcement learning.


Introduction
Coreference annotations are costly and difficult to obtain, since trained annotators with sufficient world knowledge are necessary for reliable annotations. This paper presents a way to simulate annotators using reinforcement learning. To motivate our approach, we rely on the following example from Martschat and Strube (2014, colors added to mark entity mentions): (1) [ . . . . . . . . Lynyrd . . . . . . . . . . Skynyrd] 1 was formed in Florida 2 . Other bands from [the Sunshine State] 2 include Fireflight and :::::::: Marilyn :::::::: Manson. Martschat and Strube (2014) cite the association between Florida and the Sunshine State as an example of a common source of name-name recall error for state-of-the-art coreference resolution systems. The challenge is that the two names co-occur relatively infrequently and are unlikely to do so in a moderate-sized, manually annotated training corpus. A state-of-the-art system may be able to infer the relation using distributional information about the phrase the Sunshine State, but is likely to have limited evidence for the decision that it is coreferential with Florida rather than .  While coreference-annotated data is scarce, knowledge bases including factual information (such as that Fireflight is from Florida) are increasingly available. For a human annotator unaware that Florida is sometimes referred to as the Sunshine State, the information that Fireflight is from Florida is sufficient to establish that Florida and the Sunshine State are (with high probability) coreferential. This paper explores a novel architecture for making use of such information from knowledge bases by tying a coreference resolution system to a relation extraction system, enabling us to reward the coreference system for making predictions that lead us to infer facts that are consistent with such knowledge bases. This potentially provides us with more evidence for resolving coreference such as (1).
We propose a training strategy ( Figure 1) in which we pass on the predictions of a neural coreference resolver to an open relation extraction (OpenRE) system, matching relations extracted from resolved sentences with a knowledge base. We show how checking the produced relationships for consistency against the knowledge base produces a reward that is, indirectly, a signal about the quality of the coreference resolution. In order to generalize this signal beyond the coverage of the knowledge base, we train a Universal Schema model (Riedel et al., 2013) and use its confidence as our reward function. With this reward function, we do policy-gradient fine-tuning of our coreference resolver, effectively optimizing its predictions' consistency with world knowledge.

Contributions
We demonstrate that training a coreference resolver by reinforcement learning with rewards from a relation extraction system, results in improvements for coreference resolution. Our code is made publicly available at https: //github.com/rahular/coref-rl

Consistency Reward for Coreference Resolution
In order to reward a coreference resolver for being consistent with world knowledge, we propose a simple training strategy based on relation extraction: (i) Sample a Wikipedia 1 document at random, (ii) Replace mentions with their antecedents using a coreference resolver, (iii) Apply an offthe-shelf openRE system to each rewritten document, (iv) Score relationships that include coreferent mentions using Universal Schema, and (v) Use the score as a reward for training the coreference resolvers.
Reward functions To model consistency with world knowledge, we train different Universal Schema models (Riedel et al., 2013;, resulting in three reward functions ( Figure 2): RE-KG (Knowledge Graph Universal Schema) is trained to predict whether two entities are linked in Wikidata 2 ; RE-Text (Textbased Universal Schema) is trained to predict whether two entities co-occur in Wikipedia; and RE-Joint (Joint Universal Schema) is trained to predict whether two entities are linked and cooccur. The three rewards focus on different aspects of relationships between entities, giving complimentary views of what entities are related. Similar to , we parameterize candidate relation phrases with a BiLSTM (Graves and Schmidhuber, 2005), and use pretrained Wikidata BigGraph embeddings (Lerer et al., 2019) as the entity representations. We apply a one-layer MLP on the concatenated representations to get the reward value.
Updating the coreference resolver Each resolved document is converted into n subjectrelation-object (SRO) triples by an open information retrieval system (Angeli et al., 2015). Each triple t i is then scored using a reward function to obtain a reward r i for i ∈ {1, . . . , n}. The final document-level reward is the normalized sum of the individual rewards as shown in Equation 1, where R h is a moving window containing the previous h = 100 normalized reward values.
Since R is not differentiable with respect to the coreference resolver's parameters, we use pol-icy gradient training to update the coreference resolver. We select the best action according to the current policy, using random exploration of the alternative solutions with p = 1 10 .
Multi-task reinforcement learning Our overall training procedure is presented in Algorithm 1. After training the three aforementioned reward models, we create RE-Distill by interpolating their trained weights. Next, we pre-train a coreference resolver using supervised learning, and finetune it using each of the three reward functions to get three different coreference policies: Coref-KG, Coref-Text and Coref-Joint, respectively. We then use multi-task reinforcement learning to combine these three policies to get Coref-Distill.
Our approach is a particular instance of DisTraL (Teh et al., 2017), using policy gradient and model interpolation. Finally, Coref-Distill is fine-tuned with rewards from RE-Distill.

Algorithm 1 Multi-task Reinforcement Learning
Require: Baseline initialized policies θn for n ∈ {1, 2, 3} Require: Reward functions rewardn for n ∈ {1, 2, 3} Require: Distilled reward function reward * while stopping criterion not met do Sample k documents D k for d ∈ D k do for n ∈ {1, 2, 3} do C d = entity clusters with θn d = resolve d with C d T = obtain OpenIE triples for d r = rewardn(d ) g k = policy gradient for θn with reward r θ k+1 n = θ k n + α kĝk end for end for end while Distilled policy θ * = θ 1 +θ 2 +θ 3 3 Sample k documents D k for d ∈ D k do d = resolve d with C d T = obtain OpenIE triples for d r = reward * (d ) g k = policy gradient for θ * with reward r θ k+1 * = θ k * + α kĝk end for return Distilled policy θ *

Experiments
We use a state-of-the-art neural coreference resolution model  as our baseline coreference resolver. 3 This model extends Lee et al. (2017) with coarse-to-fine inference and ELMo pretrained embeddings (Peters et al., 2018

Results
Since the quality of our reward models is essential to the performance of the coreference resolver adaptations, we first report the validation accuracy and F 1 scores of the four reward models used, in Table 1. We clearly see the advantage of distillation, with a 5% absolute difference between the best single model (RE-Text) and RE-Distill.  coreference policy. 6 The coreference resolution results are similar to the relation extraction results: using a distilled policy, learned through multi-task reinforcement learning, leads to better results on both datasets. 7 While improvements over the current state of the art are relatively small, they reflect significant progress, as they demonstrate the ability to successfully augment coreference resolvers with "free" data from large-scale KB like Wikidata. For relation extraction, this could have positive downstream effects, and also ensure that relations are consistent with real world knowledge. Moreover, this approach has the potential to also be beneficial for coreference resolution in low resource languages, where less annotated data is available, as Wikidata triples are abundant for many languages.

Analysis
Empirically, we find that fine-tuning the coreference resolver on Wikidata results in two kinds of improvements: Better mention detection Since the model is rewarded if the SRO triples produced from the resolved document are present in Wikidata, the model can do well only if it correctly resolves the subject and object, which are usually named entities (more generally, noun phrases). Indeed, we see an improvement in mention detection as exemplified in the first example of Figure 3. Compared to the baseline, the fine-tuned model identifies a larger number of entities, including "southern hemisphere", "Cambridge" and "Oxford", which are missed by the baseline model.
Better linking As a direct consequence of the above, the model is inclined to also link noun phrases that are not entities. In the second example of Figure 3, we see that "This attempt" is linked to "releasing" by the fine-tuned model. Interestingly, we do not see this type of eventive noun phrase linking either in OntoNotes or in the predictions of the baseline model. This phenomenon, however, also has a sideeffect of producing singleton clusters and spurious linking, which adversely affect the recall. On the OntoNotes test data, while the average precision of the best performing fine-tuned model is higher than the baseline (75.62 vs. 73.80), a drop in recall (70.75 vs. 71.34) causes the final F 1 score to only marginally improve.

Related Work
Coreference resolution Among neural coreference resolvers (Wu and Ma, 2017;Meng and Rumshisky, 2018), Lee et al. (2017) were the first to propose an end-to-end resolver which did not rely on hand-crafted rules or a syntactic parser. Extending this work,  introduced a novel attention mechanism for iteratively ranking spans of candidate coreferent mentions, thereby improving the identification of long distance coreference chains. Zhang et al. (2019) improve pronoun coreference resolution by 2.2 F1 using linguistic features (gender, animacy and plurality) and a frequency based predicateargument selection preference as external knowledge. Emami et al. (2018) incorporate knowledge into coreference resolution by means of information retrieval, finding sentences that are syntactically similar to a given instance, and improving F1 by 0.16.
Reinforcement learning RL has been used for many NLP tasks, including coreference resolution (Clark and Manning, 2016) and relation extraction (Zeng et al., 2018). Clark and Manning (2016) use RL to improve coreference resolution by optimizing their mention ranking model and directly use the standard evaluation metrics as the rewards. We, on the other hand, perform end-to-end optimization by rewarding the model's consistency with real world knowledge using relation extraction. To our knowledge, we are the first to use consistency with world knowledge as a reward for Baseline system Fine-tuned system

Mention detection
According to the library's publications, it is the largest academic library in the southern hemisphere. The university has a number of residential college and halls of residence, based on the college system of Cambridge and Oxford universities.
According to the library's publications, it is the largest academic library in the southern hemisphere. The university has a number of residential college and halls of residence, based on the college system of Cambridge and Oxford universities.   , and the best performing fine-tuned system (Coref-Distill). Mentions of the same color are linked to form a coreference cluster. tasks other than knowledge base construction. 8

Linking
Knowledge bases Knowledge bases have been leveraged across multiple tasks across NLP (Bordes et al., 2011;Chang et al., 2014;Lin et al., 2015;Toutanova et al., 2015;Yang and Mitchell, 2017). Specifically for coreference resolution, Prokofyev et al. (2015) implement a resolver that ensures semantic relatedness of resulting coreference clusters by leveraging Semantic Web annotations. Their work incorporates knowledge graph information only in the final stage of the resolver's pipeline, and not during training. In contrast, our work augments information from the knowledge base directly into the training pipeline. Also, they use DBpedia (Auer et al., 2007) as the ontology. Although both Wikidata and DBpedia are designed to support working with Wikipedia articles, DBpedia can be considered as a subset of Wikidata as Wikipedia infoboxes are its main data source. The advantage of Wikidata over DBpedia is its size, and the fact that it is multilingual, which will allow applying our method to other languages in the future.

Conclusion
We presented an architecture for adapting coreference resolvers by rewarding them for being consistent with world knowledge. Using simple multitask reinforcement learning and a knowledge extraction pipeline, we achieved improvements over the state of the art across two datasets. We believe