Unsupervised Entity Linking with Abstract Meaning Representation

Most successful Entity Linking (EL) meth-ods aim to link mentions to their referent entities in a structured Knowledge Base (KB) by comparing their respective contexts, often using similarity measures. While the KB structure is given, current methods have suffered from impoverished information representations on the mention side. In this paper, we demonstrate the effectiveness of Abstract Meaning Representation (AMR) (Banarescu et al., 2013) to select high quality sets of entity “collaborators” to feed a simple similarity measure (Jaccard) to link entity mentions. Experimental results show that AMR captures contextual properties discriminative enough to make linking decisions, without the need for EL training data, and that system with AMR parsing output outperforms hand labeled traditional semantic roles as context representation for EL. Finally, we show promising prelimi-nary results for using AMR to select sets of “coherent” entity mentions for collective entity linking 1 .


Introduction
The Entity Linking (EL) task (Ji et al., 2010; aims at automatically linking each named entity mention appearing in a source text document to its unique referent in a target knowledge base (KB). For example, consider the following sentence posted to a discussion forum during the 2012 U.S. presidential election: "Where would McCain be without Sarah?". An Entity Linker should link the entity mentions "Mc-Cain" and "Sarah" to the entities John McCain and Sarah Palin, respectively, which serve as unique identifiers for the real people.
A typical EL system works as follows. Given a mention m (a string in a source document), the top N most likely entity referents from the KB are enumerated based on prior knowledge about which entities are most likely referred to using m. The candidate entities are re-ranked to ultimately link each mention to the top entity in its candidate list. Reranking consists of two key elements: context representation and context comparison. For a given mention, candidate entities are re-ranked based on a comparison of information obtained from the context of m with known structured and/or unstructured information associated with the top N KB entities, which can be considered the "context" of the KB entity 2 . The basic intuition is that the entity referents of m and related mentions should be similarly connected in the KB.
However, there might be many entity mentions in the context of a target entity mention that could potentially be leveraged for disambiguation. In this paper, we show that a deeper semantic knowledge representation -including the Abstract Meaning Representation (AMR) (Banarescu et al., 2013) -can capture contextual properties that are discriminative enough to disambiguate entity mentions that current state-of-the-art systems cannot handle, without the need for EL training data. Specifically, for a given entity mention, using AMR provides a rich context representation, facilitating the selection of an optimal set of collaborator entity mentions, i.e., those co-occurring mentions most useful for disambiguation. In previous approaches, collaborator sets have tended to be too narrow or too broad, introducing noise. We then use unsupervised graph inference for context comparison, achieving results comparable with state-of-the-art supervised methods and substantially outperforming context representation based on traditional Semantic Role Labeling.
In addition, most state-of-the-art EL approaches now rely on collective inference, where a set of coherent mentions are linked simultaneously by choosing an "optimal" or maximally "coherent" set of named entity targets -one target entity for each mention in the coherent set. We show preliminary results suggesting that AMR is effective for the partitioning of all mentions in a document into coherent sets for collective linking.
We evaluate our approach using both human and automatic AMR annotation, limiting target named entity types to person (PER), organization (ORG), and geo-political entities (GPE) 3 .

Related Work
In most recent collective inference methods for EL (e.g., (Kulkarni et al., 2009;Pennacchiotti and Pantel, 2009;Fernandez et al., 2010;Radford et al., 2010;Cucerzan, 2011;Guo et al., 2011;Han and Sun, 2011;Ratinov et al., 2011;Chen and Ji, 2011;Kozareva et al., 2011;Dalton and Dietz, 2013)), the target entity mention's "collaborators" may simply include all mentions which co-occur in the same discourse (sentence, paragraph or document) (Ratinov et al., 2011;Nguyen et al., 2012). But this approach usually introduces many irrelevant mentions, and it's very difficult to automatically determine the scope of discourse. In contrast, some recent work exploited more restricted measures by only choosing those mentions which are topically related (Cassidy et al., 2012;Xu et al., 2012), bear a relation from a fixed set (Cheng and Roth, 2013), coreferential (Nguyen et al., 2012;Huang et al., 2014), socially related (Cassidy et al., 2012;Huang et al., 3 The mapping from AMR entity types to these three main types is at: amr.isi.edu/lib/ne-type-sc.txt 2014), dependent (Ling et al., 2014), or a combination of these through meta-paths (Huang et al., 2014). These measures can collect more precise collaborators but suffer from low coverage of predefined information templates and the unsatisfying quality of state-of-the-art coreference resolution, relation and event extraction.
In this paper, we demonstrate that AMR is an appropriate and elegant way to acquire, select, represent and organize deeper knowledge in text. Together with our novel utilization of the rich structures in merged KBs, the whole framework carries rich enough evidence for effective EL, without the need for any labeled data, collective inference, or sophisticated similarity.

Knowledge Network Construction from Source
Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a sembanking language that captures whole sentence meanings in a rooted, directed, labeled, and (predominantly) acyclic graph structure. AMR utilizes multi-layer linguistic analysis such as PropBank frames, non-core semantic roles, coreference, named entity annotation, modality and negation to represent the semantic structure of a sentence. AMR strives for a more logical, less syntactic representation. Compared to traditional dependency parsing and semantic role labeling, the nodes in AMR are entities instead of words, and the edge types are much more fine-grained 4 . AMR thus captures deeper meaning compared with other representations more commonly used to represent mention context in EL. We use AMR to represent semantic information about entity mentions expressed in their textual context. Specifically, given an entity mention m, we use a rule based method to construct a Knowledge Network, which is a star-shaped graph with m at the hub, with leaf nodes obtained from entity mentions reachable by AMR graph traversal from m, as well as AMR node attributes such as entity type. A subset of the leaf nodes are selected as m's collaborators using rules presented in the following subsec-tions. Note that while we only evaluate linking of PER, ORG, and GPE entities, collaborators may be of any type. We also outline preliminary efforts to use AMR to create sets of coherent entity mentions.
In each of the following subsections we describe elements of AMR useful for context representation in EL. For each element we explain how our current system makes use of it (primarily, by using it to add entity mentions to a particular entity mention's set of collaborators). In doing so, we mainly refer to several examples from political discussion forums about "Mitt Romney", "Ron Paul" and "Gary Johnson". Their AMR graphs are depicted in Figure 1.

Entity Nodes
Each AMR node represents an entity mention, and contains its canonical name as inferred from sentential context. This property is called name expansion. Consider the following sentence: "Indonesia lies in a zone where the Eurasian, Philippine and Pacific plates meet and occasionally shift, causing earthquakes and sometimes generating tsunamis.". Here, the nodes representing the three plates will be labeled as "Eurasian Plate", "Philippine Plate" and "Pacific Plate" respectively, even though these strings do not occur in the sentence. Note that these labels may be recovered primarily by appealing to syntactic reasoning, without consulting a KB. In our implementation we consider these expanded names as mentions (these strings supersede raw mentions as input to the salience based candidate enumeration (Section 5.2)). Because the initial enumeration of entity candidates depends heavily on the mention's surface form, independent of context, name expansion will help us link "Philippine" to "Philippine Sea Plate" as opposed to the country.
An AMR node also contains an entity type. AMR defines 8 main entity types (Person, Organization, Location, Facility, Event, Product, Publication, Natural object, Other) and over one hundred finegrained subtypes. For example, company, government organization, military, criminal organization, political party, school, university, research institute, team and league are subtypes of organization. The fine-grained entity types defined in AMR help us restrict KB entity candidates for a given mention by encouraging entity type matching. For exam-  year but the bulava nuclear-armed missile developed to equip the submarine has failed tests and the deployment prospects are uncertain.", AMR labels "Yuri dolgoruky" as a product instead of a person. We manually mapped AMR entity types to equivalent DBpedia types to inform type matching restrictions 5 . However, to make our context comparison algorithm less dependent on the quality of this mapping, and on automatic AMR name type assignment, we add a mention's type to its collaborators 6 . In future work we plan to investigate the effects of different type matching techniques, varying degrees of strictness.

Semantic Roles
AMR defines core roles based on the OntoNotes (Hovy et al., 2006) semantic role layer. Each predicate is associated with a sense and frame description. If a target entity mention m and a context entity mention n are both playing core roles for the same predicate, we consider n as a collaborator of m. Consider the following post: "Did Palin apologize to Giffords? He needs to conduct a beer summit between Palin and NBC.". We add "Giffords" and "NBC" as collaborators of "Palin", because they play core roles in both the "apologize-01" and "meet-03" events. AMR defines new core semantic roles which did not exist in PropBank (Palmer et al., 2005), Nom-Bank (Meyers et al., 2004), or Ontonotes (Hovy et al., 2006). Intuitively, the following special roles should provide discriminative collaborators: • The ARG2 role of the have-org-role-91 frame indicates the title held by an entity (ARG0), such as President and Governor, within a particular organization (ARG1). • ARG2 and ARG3 of have-rel-role-91 are used to describe two related entities of the same type, such as family members.
AMR defines a rich set of general semantic relations through non-core semantic roles. We choose the following subset of non-core roles to provide collaborators for entity mentions: domain, mod, cause, concession, condition, consist-of, extent, part, purpose, degree, manner, medium, instrument, ord, poss, quant, subevent, subset, topic.

Background Time and Location
AMR provides rich temporal and spatial information about entities and events. Types instantiated in AMR include time, year, month, day, source, destination, path and location. We exploit time and location entities as collaborators for entity mentions when they each play a role in the same predicate. For example, in the following post, the time role of the "die-01" event is "2008": "I just Read of Clark's death in 2008". We can link "Clark" to Arthur C Clark in the KB, which contains the triple: ăArthur C Clark, date-of-death, 2008-03-19ą (see Section 4). Similarly, it's very challenging to link the abbreviation "BMKG", in the following sentence, to the correct target entity Indonesian Agency for Meteorology, Climatology and Geophysics, whose headquarters are listed as Jakarta in the KB: "It keeps on shaking. Jakarta BMKG spokesman Mujuhidin said". Here, "Jakarta" is added as a collaborator of "BMKG" since AMR labels it as the location of the organization, which facilitates the correct link because in DBpedia Jakarta is listed as its headquarter.
Authors often assume that readers will infer implicit temporal information about events. In fact, half of the events extracted by information extraction (IE) systems lack time arguments (Ji et al., 2009). Therefore if an AMR parse includes no time information, we use the document creation time as an additional collaborator for mention in question. For example, knowing the document creation time "2005-06-05" can help us link "Hsiung Feng" in the following sentence "The BBC reported that Taiwan has successfully test fired the Hsiung Feng, its first cruise missile." to Hsiung Feng IIE, which was deployed in 2005. Similarly, we include document creation location as a global collaborator.

Coreference
For linking purposes, we treat a coreferential chain of mentions as a single "mention". In doing so, the collaborator set for the entire chain is computed as the union over all of the chain's mentions' collabo-rator sets. From here on we refer to a coreferential chain of mentions as simply a "mention".
AMR currently only represents sentence-level coreference resolution. In order to construct a knowledge network across sentences, we use the following heuristic rules. If two names have a substring match (on a token-wise basis with stop words removed), or one name consists of the initials of another in all capital letters, then we mark them as coreferential. We replace all names in a coreferential chain with their canonical name, which may have been derived via name expansion (Section 3.1): full names for people and abbreviations for organizations.

Knowledge Networks for Coherent Mentions
AMR defines a rich set of conjunction relations: "and", "or", "contrast-01", "either", "compared to", "prep along with", "neither", "slash", "between" and "both". These relations are often expressed between entities that have other relations in common. We therefore group mentions connected by conjunction relations into sets of coherent mentions. This representation is used only in preliminary experiments on collective entity linking. Figure 2 shows the expanded knowledge network that includes results from individual networks for each of the coherent mentions from the walkthrough example. For each coherent set, we merge the knowledge networks of all of its mentions 7 .   , and the edges represent relations. We use this structure for context representation for entities, which together with context representation for mentions (Section 3) feeds re-ranking based on context comparison. The KB is formally represented by triples: ă Entity, EdgeLabel, N ode ą where Entity is the entity's unique identifier, Edge-Label is relation type, and Node is the corresponding relation value -either another Entity or a constant. These triples are derived from typed relations expressed within Wikipedia infoboxes, Templates, and Categories, untyped hyperlinks within Wikipedia article text, typed relations within DBpedia (dbpedia.org) and Freebase (www.freebase.com), and Google's "people also searched for" list 8 . Figure 3 shows a portion of the KB pertaining to the example in Figure 1.
In order to merge nodes from multiple KBs, we use the Wikipedia title as a primary key, and then use DBpedia wikiPageID and Freebase Key relations.

Overview
In this section we present our detailed algorithm to link each mention to a KB entity using a simple similarity measure over knowledge networks. Recall that a rule-based method has already been employed to construct star-shaped knowledge networks for individual mentions and entities (see sections 3 and 4; A KB knowledge network is the subnetwork of the entire KB centered at a candidate entity).
For each mention to be linked, an initial list of candidate entities are enumerated based on entity salience with respect to the mention, independent of mention context (Section 5.2) 9 . Context collaborator re-ranking proceeds in an unsupervised fashion agnostic to knowledge network edge labels using the Jaccard similarity measure computed between the mention and each entity, by taking their collaborator sets as inputs (Section 5.3). We also describe Context Coherence re-ranking in terms of KB knowledge networks only, which constitutes preliminary steps toward unsupervised collective entity linking in section 5.4 based on the notion of coherence described in section 3.5. We leave a combination of the two re-ranking approaches to future work.

Salience
We use commonness (Medelyan and Legg, 2008) as a measure of context independent salience for each mention m, to generate an initial ranked list of candidate entities E " pe 1 , ..., e N q where N is the cutoff for number of candidates. In all experiments, we used N = 15 which can give us an oracle accuracy score 97.58%.
Commonnesspm, eq " countpm, eq ř e 1 countpm, e 1 q Here, countpm, eq is the number of hyperlinks with anchor text m and entity e within all of Wikipedia. As illustrated in Figure 3, using this salience measure "Romney" is successfully linked to Mitt Romney. For the mention "Paul", the politician Ron Paul is ranked at top 2 (less popular than the musician Paul McCartney). For the mention "Johnson", the correct entity Gary Johnson is ranked at top 9, after more popular entities such as Lyndon B. Johnson and Andrew Johnson.

Context Collaborator Based Re-ranking
Context collaborator based re-ranking is driven by the similarity between mention and entity knowledge networks. We construct knowledge network gpmq for each mention m, and knowledge network gpe i q for each entity candidate e i in m's entity candidate list E. We re-rank E according to Jaccard Similarity, which computes the similarity between gpmq and gpe i q: Note that the edge labels (e.g., nominate-01 for a mention, or infobox: religion for an entity) are ignored, as the similarity metric operates over sets of collaborators (leaf nodes in the knowledge networks). For set intersection and union computation, elements are treated as lists of lower-cased tokens with stop words removed, and two elements are considered equal if and only if they have one or more token in common. Due to the support from their neighbor Republican in the KB (Figure 3) which matches the neighbor "Republican" of mentions "Paul" and "Johnson" (Figure 2), Ron Paul and Gary Johnson are promoted to top 1 and top 3 respectively. Gary Johnson is still behind two former U.S. presidents Andrew Johnson and Lyndon B. Johnson who also shares the neighbor Republican in the KB.

Context Coherence Based Re-ranking
Context coherence based re-ranking is driven by the similarity among KB entities. Let R m be a set of coherent entity mentions, and R E be the set of corresponding entity candidate lists, which are generated according to salience. Given R E , we generate every combination of possible top candidate lists for the mentions in R m , and denote the set of these combinations C m . Formally, C m is the Cartesian product of all candidate lists E P R E . In the walk-through example, R m contains ["Romney", "Paul", "Johnson"], and C m contains [Mitt Romney, Ron Paul, Gary Johnson], [Mitt Romney, Paul McCartney, Lyndon Johnson], etc. We compute coherence for each combination c P C m as Jaccard Similarity, by applying a form of Equation 5.3 generalized to take any number of arguments to the set of knowledge networks for all entities in c, i.e., tgpeq|e P cu. The highest similarity combination is selected, yielding a top candidate for each m P R m . For example, compared to Andrew Johnson and Lyndon Johnson, Gary Johnson is more coherently connected with Mitt Romney and Ron Paul, therefore it is promoted to top 1 with the coherence measure.

Data And Scoring Metric
For our experiments we use a publicly available AMR R3 corpus (LDC2013E117) that includes manual EL annotations for all entity mentions (LDC2014E15) 10 .
For evaluation we used all the discussion forum posts (DF), and news documents (News) that were sorted according to alphabetic order of document IDs and taken as a tenth. The detailed data statistics are presented in Table 1  For each mention, we check whether the KB entity returned by an approach is correct or not. We compute accuracy for an approach as the proportion of mentions correctly linked.

Experiment Results
We focus primarily on context collaborator based reranking results. We compare our results with several baseline and state-of-the-art approaches in Table 2. In Table 3 we present preliminary results for collective linking.
Our Unsupervised Context Collaborator Approach substantially outperforms the popularity based methods. More importantly, we see that AMR provides the best context representation for collaborator selection. Even system AMR outperformed not only baseline co-occurrence based collaborator selection methods, but also outperforms the collaborator selection method based on human annotated core semantic roles. Figure 4 depicts accuracy increases as more AMR annotation is used in selecting collaborators. From the commonness baseline, additional knowledge about individual names leads to substantial gains followed by additional gains after incorporating links denoting semantic roles. Note that coreference here includes cross-sentence co-reference not based on AMR (Section 3.4). Furthermore, the results using human annotated AMR outperform the state-of-the-art supervised methods trained from a large scale EL training corpus, which rely on collective inference 12 . These results all verify the importance of incorporating a wider range of deep knowledge. Finally, Table 2  State-of-theart supervised re-ranking using multi-level linguistic features for collaborators and collective inference, trained from 20,000 entity mentions from TAC-KBP2009-2014. We combined two systems (Chen and Ji, 2011;Cheng and Roth, 2013)  Human SRL using human annotated core semantic roles defined in Prop-Bank (Palmer et al., 2005) and NomBank (Meyers et al., 2004)  context coherence method is used where possible (i.e., those 215 mentions that are members of coherent sets according to our criteria as described in Section 3.5), and the context collaborator approach based on human AMR annotation is applied elsewhere.  Table 3 focuses on the 215 mentions that met our narrow criteria for forming a coherent set of mentions. We applied the context coherence based reranking method (Section 5.4) to collectively link those mentions. This approach substantially outperforms the co-occurrence baseline, and even outperforms the context collaborator approach applied to those 215 mentions, especially for discussion forum data.

Remaining Error Analysis and Discussion
A challenging source of errors pertains to the knowledge gap between the source text and KB. News and social media are source text genres that tend to focus on new information, trending topics, breaking events, or even mundane details about the entity. In contrast, the KB usually provides a snapshot summarizing only the entity's most representative and important facts. A source-KB similarity driven approach alone will not suffice when a mention's context differs substantially from anything on the KB side. AMR annotation's synthesis of words and phrases from the surface texts into concepts only provides a first step toward bridging the knowledge gap. Successful linking may require (1) reasoning using general knowledge, or (2) retrieval of other sources that contain additional useful linking information. Table 4 illustrates two relevant examples  Type   Source Knowledge Base General Knowledge [Christies] m denial of marriage privledges to gays will alienate independents and his "I wanted to have the people vote on it" will ring hollow.
[Chris Christie] e has said that he favoured New Jersey's law allowing same-sex couples to form civil unions, but would veto any bill legalizing same-sex marriage in New Jersey. External Knowledge Translation out of hype-speak: some kook made threatening noises at [Brownback] m and go arrested.
[Samuel Dale "Sam" Brownback] e (born September 12, 1956) is an American politician, the 46th and current Governor of Kansas. Table 4: Examples of Knowledge Gap that our system does not correctly link. In the first example, if we don't already know that Christie is the topic of discussion, as humans we might use our general knowledge that "governors veto bills" to pick the correct entity. Using this type of knowledge presents interesting challenges (e.g., governors don't always veto bills, nor are they the only ones who can do so). In the second example, the rumor about this politician is not important enough to be reported in his Wikipedia page. We might first figure out, using cross-document coreference techniques, that a news article with the headline "Man Accused Of Making Threatening Phone Call To Kansas Gov. Sam Brownback May Face Felony Charge..." is talking about the same rumor. Then we might use biographical facts (e.g., Brownback is the governor of Kansas) from the article to enrich Brownback's knowledge network on the source side.
Sometimes helpful neighbor concepts are omitted because the current collaborator selection criteria are too restricted.
For example, "armed" and "conflicts" are informative words for linking "The Stockholm Institute" to Stockholm International Peace Research Institute in the following sentence "The Stockholm Institute stated that 23 of 25 major armed conflicts in the world in 2000 occurred in impoverished nations.", but they were not selected as context collaborators. In addition, our cross-sentence coreference resolution is currently limited to proper names. Expanding it to include nominals could further enrich context collaborators to overcome some remaining errors. For example, in the sentence, "The first woman to serve on SCOTUS", if we know "The first woman" is coreferential with "Sandra Day O'Connor" in the previous sentence, we can link "SCOTUS" to Supreme Court of the United States instead of Scotus College.

Conclusions and Future Work
EL requires a representation of the relations among entities in text. We showed that the Abstract Meaning Representation (AMR) can better capture and represent the contexts of entity mentions for EL than previous approaches. We plan to improve AMR representation as well as automatic annotation. We showed that AMR enables EL performance comparable to the supervised state of the art using an unsupervised, non-collective approach. We plan to combine collaborator and coherence methods into a unified approach, and to use edge labels in knowledge networks for context comparison (note that the last of these is quite challenging due to normalization, polysemy, and semantic distance issues). We have only applied a subset of AMR representations to the EL task, but we aim to explore how more AMR knowledge can be used for other more challenging Information Extraction and Knowledge Base Population tasks.