Improving Entity Linking through Semantic Reinforced Entity Embeddings

Entity embeddings, which represent different aspects of each entity with a single vector like word embeddings, are a key component of neural entity linking models. Existing entity embeddings are learned from canonical Wikipedia articles and local contexts surrounding target entities. Such entity embeddings are effective, but too distinctive for linking models to learn contextual commonality. We propose a simple yet effective method, FGS2EE, to inject fine-grained semantic information into entity embeddings to reduce the distinctiveness and facilitate the learning of contextual commonality. FGS2EE first uses the embeddings of semantic type words to generate semantic embeddings, and then combines them with existing entity embeddings through linear aggregation. Extensive experiments show the effectiveness of such embeddings. Based on our entity embeddings, we achieved new sate-of-the-art performance on entity linking.


Introduction
Entity Linking (EL) or Named Entity Disambiguation (NED) is to automatically resolve the ambiguity of entity mentions in natural language by linking them to concrete entities in a Knowledge Base (KB).For example, in Figure 1, mentions "Congress" and "Mr.Mueller" are linked to the corresponding Wikipedia entries, respectively.
Neural entity linking models use local and global scores to rank and select a set of entities for mentions in a document.Entity embeddings are critical for the local and global score functions.But the current entity embeddings (Ganea and Hofmann, 2017) encoded too many details of entities, thus are too distinctive for linking models to learn contextual commonality.*

Corresponding author
extracting fine-grained semantic types exonerated the president of obstruction of justice.We hypothesize that fine-grained semantic types of entities can let the linking models learn contextual commonality about semantic relatedness.For example, rugby related documents would have entities of rugby player and rugby team.If a linking model learns the contextual commonality of rugby related entities, it can correctly select entities of similar types using the similar contextual information.
In this paper, we propose a method FGS2EE to inject fine-grained semantic information into entity embeddings to reduce the distinctiveness and facilitate the learning of contextual commonality.FGS2EE uses the word embeddings of semantic words that represent the hallmarks of entities (e.g., writer, carmaker) to generate semantic embeddings.We find that the training converges faster when using semantic reinforced entity embeddings.
Our proposed FGS2EE consists of four steps: The local score Ψ(e i , c i ) (Ganea and Hofmann, 2017) measures the relevance of entity candidates of each mention independently.
where e i ∈ R d is the embedding of candidate entity In addition to the local score, the global score adds a pairwise score Φ(e i , e j , D) to take the coherence of entities in document D into account.
where e i and e j ∈ R d are the embeddings of entities e i , e j , which are candidates for mention m i and m j respectively; C ∈ R d×d is a diagonal matrix.The pairwise score of (Le and Titov, 2018) considers K relations between entities.
Φ(e i , e j , D) where α ijk is the weight for relation k, and R k is a diagonal matrix for measuring relations k between two entities.

Related Work
Our research focuses on improving the vector representations of entities through fine-grained semantic types.Related topics are as follows.
Entity Embeddings Similar to word embeddings, entity embeddings are the vector representations of entities.The methods of Yamada et al. (2016), Fang et al. (2016), Zwicklbauer et al. (2016), use data about entity-entity co-occurrences to learn entity embeddings and often suffer from sparsity of co-occurrence statistics.Ganea and Hofmann (2017) learned entity embeddings using words from canonical Wikipedia articles and local context surrounding anchor links.They used Word2Vec vectors (Mikolov et al., 2013) of positive words and random negative words as input to the learning objective.Thus their entity embeddings are aligned with the Word2Vec word embeddings.
Fine-grained Entity Typing Fine-grained entity typing is a task of classifying entities into fine-grained types (Ling and Weld, 2012) or ultra fine-grained semantic labels (Choi et al., 2018).Bhowmik and de Melo ( 2018) used a memorybased network to generate a short description of an entity, e.g."Roger Federer" is described as 'Swiss tennis player'.In this paper, we heuristically extract fine-grained semantic types from the first sentence of Wikipedia articles.
Embeddings Aggregation Our research is closely related to the work on aggregation and evaluation of the information content of embeddings from different sources (e.g., polysemous words have multiple sense embeddings), and fusion of multiple data sources (Wang et al., 2018).Arora et al. (2018) hypothesizes that the global word embedding is a linear combination of its sense embeddings.They showed that senses can be recovered through sparse coding.Mu et al. (2017) showed that senses and word embeddings are linearly related and sense sub-spaces tend to intersect over a line.Yaghoobzadeh et al. (2019) probe the aggregated word embeddings of polysemous words for semantic classes.They created a WIKI-PSE corpus, where word and semantic class pairs are annotated using Wikipedia anchor links, e.g., "apple" has two semantic classes: food and organization.A separate embedding for each semantic class was learned based on the WIKI-PSE corpus.They found that the linearly aggregated embeddings of polysemous words represent well their semantic classes.
The most similar work is that of Gupta et al. (2017), but there are many differences: (i) they use the FIGER (Ling and Weld, 2012) type taxonomy that contains manually curated 112 types organized into 2 levels; we employ over 3000 vocabulary words as type, and we treat them as a flat list; (ii) they mapped the Freebase types to FIGER types,but this method is less credible, as noted by Daniel Gillick et al. (2014); we extract type words directly from Wikipedia articles, which is more reliable.(iii) their entity vectors and type vectors are learned jointly on a limited corpus.Ours are linear aggregations of existing entity vectors, and word vectors learned from a large corpus, such finegrained semantic word embeddings are helpful for capturing informative context.

Motivation
Coarse-grained semantic types (e.g.person) have been used for candidate selection (Ganea and Hofmann, 2017).We observe that fine-grained semantic words appear frequently as apposition (e.g., Defense contractor Raytheon), coreference (e.g., the company) or anonymous mentions (e.g., American defense firms).These fine-grained types of entities can help capture local contexts and relations of entities.
Some of these semantic words have been used for learning entity embeddings, but they are diluted by other unimportant or noisy words.We reinforce entity embeddings with such fine-grained semantic types.

Extracting Fine-grained Semantic Types
We first create a dictionary of fine-grained semantic types, then we extract fine-grained types for each entity.

Semantic Type Dictionary
We select those words that can encode the hallmarks of individual entities.Desiderata are as follows: • profession/subject, e.g., footballer, soprano, biology, rugby.• title, e.g., president, ceo, head, director.
We extract noun frequency from the first sentence of each entity in the Wikipedia dump.Then some seed words are manually selected from frequent nouns.We use word similarity to extend these seed words and finally got a dictionary with 3,227 fine-grained semantic words.
Specifically, we use spaCy to compute the similarity between words.For each seed word, we find the top 100 similar words that also appear in Wikipedia articles.We then manually select semantic words from these extended words.

Extracting Semantic Types
For each entity, we extract at most 11 dictionary words (phrases) from its Wikipedia article.For example, "Robert Mueller" in Figure 1 will be typed as [american, lawyer, government, official, director].

Remapping Semantic Words
For some semantic words (e.g., conchologist) or semantic phrases (e.g., rugby league), there are no word embeddings available for generating the semantic entity embeddings.We remap these semantic words to semantically similar words that are more common.For example, the conchologist is remapped to zoologist, and rugby league is remapped to rugby league.

FGS2EE: Injecting Fine-Grained Semantic Information into Entity Embeddings
FGS2EE first uses semantic words of each entity to generate semantic entity embeddings, then combine them with existing entity embeddings to generate semantic reinforced entity embeddings.

Semantic Entity Embeddings
Based on the semantic words of each entity, we can produce a semantic entity embedding.We treat each semantic word as a sense of an entity.The embedding of each sense is represented by the Word2Vec embedding of the semantic word.Suppose we only consider T semantic words for each entity, and the set of entity words of entity e is denoted as S e .Then the semantic entity embedding e s of entity e is generated as follows: where w i ∈ S e is the ith semantic word, e w i is the Word2Vec embedding 1 of semantic word

Semantic Reinforced Entity Embeddings
We create a semantic reinforced embedding for each entity by linearly aggregating the semantic entity embeddings and Word2Vec style entity embeddings (Ganea and Hofmann, 2017) (hereafter referred to as "Wikitext entity embeddings").Our semantic entity embeddings tend to be homogeneous.If we average them with the Wikitext embeddings, the aggregated embeddings would be homogeneous too.Thus the entity linking model would not be able to distinguish between those similar candidates.Our semantic reinforced entity embedding is a weighted sum of semantic entity embedding and Wikitext entity embedding, similar to (Yaghoobzadeh et al., 2019).We use a parameter α to control the weight of semantic entity embeddings.Thus the aggregated (semantic reinforced) entity embeddings achieve a trade-off between homogeneity and heterogeneity.
where e w is the Wikitext entity embedding of entity e.

Datasets and Evaluation Metric
We use the Wikipedia dump 20190401 to extract fine-grained semantic type dictionary and semantic types for entities.We use the Wikitext entity embeddings shared by Le andTitov (2018, 2019).For entity linking corpora, we use the datesets shared by Ganea and Hofmann (2017) and Le andTitov (2018, 2019).We use the standard micro F1-score as evaluation metric.Our data and source code are publicly available at github 2 .

Experimental Settings
The parameters T in Equation ( 1) and α in Equation (2) are critical for the effectiveness of our se-2 https://github.com/fhou80/EntEmb/mantic reinforced entity embeddings.We got two sets of entity embeddings with two combinations of parameters: T = 6, α = 0.1 and T = 11, α = 0.2 To test the effectiveness of our semantic reinforced entity embeddings, we use the entity linking models mulrel (Le and Titov, 2018) (ment-norm K = 3) and wnel (Le and Titov, 2019) that are publicly available.We do not optimize their entity linking code.We just replace the entity embeddings with our semantic reinforced entity embeddings.
Similar to Ganea and Hofmann (2017) and Le andTitov (2018, 2019), we run our system 5 times for each combination of entity embeddings and linking model, and report the mean and 95% confidence interval of the micro F1 score.The results on six testing datasets are shown in Table 1.For the mulrel model, our entity embed- dings (T = 11, α = 0.2) improved performance drastically on MSNBC, ACE2004 and average of out-domain test sets.Be aware that CWEB and WIKI are believed to be less reliable (Ganea and Hofmann, 2017).For the wnel model, our both sets of entity embeddings are more effective for four of the five out-domain test sets and the average.Our entity embeddings are better than that of Ganea and Hofmann (2017) when tested on the mulrel (Le and Titov, 2018) (ment-norm K = 3) and wnel (Le and Titov, 2019) entity linking models.Ganea and Hofmann (2017) showed that their entity embeddings are better than that of Yamada et al. (2016) using the entity relatedness metrics.

Results
One notable thing for our semantic reinforced entity embeddings is that the training using our entity embeddings converges much faster than that using Wikitext entity embeddings, as shown in Figure 2. One reasonable explanation is that the fine-grained semantic information lets the linking models capture the commonality of semantic relatedness between entities and contexts, hence facilitate the training.
The properties of two different sets of entity embeddings can be visually manifested in Figure 3.Our semantic reinforced entity embeddings draw entities of similar types closer, and entities of different types further.For example, our semantic reinforced embeddings of "John F. Kennedy University" and "Harvard University" are closer than the Wikitext embeddings, while our embeddings of "John F. Kennedy International Airport" and "John F. Kennedy" are further.We believe this property contributes to the faster convergence.

Conclusion
In this paper, we presented a simple yet effective method, FGS2EE, to inject fine-grained semantic information into entity embeddings to reduce the distinctiveness and facilitate the learning of contextual commonality.FGS2EE first uses the word embeddings of semantic type words to generate semantic embeddings, and then combines them with existing entity embeddings through linear aggregation.Our entity embeddings draw entities of similar types closer, while entities of different types are drawn further.Thus can facilitate the learning of semantic commonalities about entity-context and entity-entity relations.We have achieved new state-of-the-art performance using our entity embeddings.
For the future work, we are planning to extract fine-grained semantic types from unlabelled documents and use the relatedness between the finegrained types and contexts as distant supervision for entity linking.

Figure 1 :
Figure 1: Entity linking with embedded fine-grained semantic types

Figure 2 :
Figure 2: Learning curves of mulrel (Le and Titov, 2018) using two different sets of entity embeddings.

Table 1 :
F1 scores on six test sets.The last column is the average of F1 scores on the five out-domain test sets.