Connecting Embeddings for Knowledge Graph Entity Typing

Knowledge graph (KG) entity typing aims at inferring possible missing entity type instances in KG, which is a very significant but still under-explored subtask of knowledge graph completion. In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge in KGs. Specifically, we present two distinct knowledge-driven effective mechanisms of entity type inference. Accordingly, we build two novel embedding models to realize the mechanisms. Afterward, a joint model via connecting them is used to infer missing entity type instances, which favors inferences that agree with both entity type instances and triple knowledge in KGs. Experimental results on two real-world datasets (Freebase and YAGO) demonstrate the effectiveness of our proposed mechanisms and models for improving KG entity typing. The source code and data of this paper can be obtained from: https://github.com/Adam1679/ConnectE .


Introduction
The past decade has witnessed great thrive in building web-scale knowledge graphs (KGs), such as Freebase (Bollacker et al., 2008), YAGO (Suchanek et al., 2007), Google Knowledge Graph (Dong et al., 2014), which usually consists of a huge amount of triples in the form of (head entity, relation, tail entity) (denoted (e, r, ẽ)).KGs usually suffer from incompleteness and miss important facts, jeopardizing their usefulness in downstream tasks such as question answering (Elsahar et al., 2018), semantic parsing (Berant et al., 2013), relation classification (Zeng et al., 2014).Hence, the task of knowledge graph completion (KGC, i.e. completing knowledge graph entries) is extremely significant and attracts wide attention.
This paper concentrates on KG entity typing, i.e. inferring missing entity type instances in KGs, which is an important sub-problem of KGC.Entity type instances, each of which is in the formed of (entity, entity type) (denoted (e, t)), are essential entries of KGs and widely used in many NLP tasks such as relation extraction (Zhang et al., 2018;Jain et al., 2018), coreference resolution (Hajishirzi et al., 2013), entity linking (Gupta et al., 2017).Most previous works of KGC focus on inferring missing entities and relationships (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015;Dettmers et al., 2017;Ding et al., 2018;Nathani et al., 2019), paying less attention to entity type prediction.However, KGs also usually suffer from entity types incompleteness.For instance, 10% of entities in FB15k (Bordes et al., 2013), which have the /music/artist type, miss the /people/person type (Moon et al., 2017).KG entity type incompleteness leads to some type-involved algorithms in KG-driven tasks grossly inefficient or even unavailable.
To solve KG entity type incompleteness issue, in this paper we propose a novel embedding methodology to infer missing entity type instances that employs not only local typing knowledge from entity type assertions, as most conventional mod-arXiv:2007.10873v1[cs.CL] 21 Jul 2020 els do, but also leverages global triple knowledge from KGs. Accordingly, we build two distinct knowledge-driven type inference mechanisms with these two kinds of structural knowledge.
Mechanism 1. Missing entity types of an entity can be found from other entities that are close to the entity in the embedding space, using local typing knowledge as in Fig. 1(Mech.1).
Mechanism 2. Missing entity types of an (head or tail) entity can be inferred from the types of other (tail or head) entities through their relationships, using global triple knowledge as in Fig. 1(Mech.2).
The main idea behind Mech.1 is based on the observation that the learned entities' embeddings by conventional KG embedding methods (Ji et al., 2016;Xie et al., 2016) cluster well according to their types in vector space.For instance, in Fig. 1(Mech.1),given an entity Barack Obama, it's missing hierarchical type /people/person can be induced by the given hierarchical type of similar entity Donald Trump.In addition, the key motivation behind Mech.2 is that the relationship shall remain unchanged if the entities in a triple fact are replaced with their corresponding hierarchical types.For instance, given a global triple fact (Barack Obama, born in, Honolulu), under this assumption, we can induce a new type triple (/people/person, born in, /location/location)1 .Formally, Honolulu − Barack Obama = /location/location − /people/person (= born in), which can be used to infer missing entity types, e.g.(Barack Obama, type=? ) via Barack Obama − Honolulu + /location/location = /people/person, as Mech.2 does.Fig. 1 demonstrates a simple illustration of effective mechanisms of entity type inference.Both mechanisms are utilized to build our final composite model.Specifically, we build two embedding models to realize the two mechanisms respectively.First, considering entities and entity types are completely distinct objects, we build two distinct embedding spaces for them, i.e., entity space and entity type space.Accordingly, we encode (e, t) entity type instance by projecting the entity from entity space to entity type space with mapping matrix M, hence we have (1): M • e t , called E2T.Moreover, we learn the plausibility of (t e , r, t ẽ) global type triple by newly generalizing from (e, r, ẽ) global triple fact, even though this type triple is not present originally.Following translating assumption (Bordes et al., 2013), we have (2): t ẽ − r • t e , called TRT.E2T and TRT are the implementation models of the two mechanisms.Fig. 2 demonstrates a brief illustration of our models.A ranking-based embedding framework is used to train our models.Thereby, entities, entity hierarchical types, and relationships are all embedded into low-dimensional vector spaces, where the composite energy score of both E2T and TRT are computed and utilized to determine the optimal types for (entity, entity type=?) incomplete assertions.The experimental results on real-world datasets show that our composite model achieves significant and consistent improvement compared to all baselines in entity type prediction and achieves comparable performance in entity type classification.
Our contributions are as follows: • We propose a novel framework for inferring missing entity type instances in KGs by connecting entity type instances and global triple information and correspondingly present two effective mechanisms.
• Under these mechanisms, we propose two novel embedding-based models: one for predicting entity types given entities and another one to encode the interactions among entity types and relationships from KGs.A combination of both models are utilized to conduct entity type inference.
• We conduct empirical experiments on two real-world datasets for entity type inference, which demonstrate our model can successfully take into account global triple information to improve KG entity typing.

Related Works
Entity typing is valuable for many NLP tasks (Yaghoobzadeh et al., 2018), such as knowledge base population (Zhou et al., 2018), question answering (Elsahar et al., 2018), etc.In recent years, researchers attempt to mine fine-grained entity types (Yogatama et al., 2015;Choi et al., 2018;Xu and Barbosa, 2018;Yuan and Downey, 2018) with external text information, such as web search query logs (Pantel et al., 2012), the textual surface patterns (Yao et al., 2013) 2018).Despite their success, existing methods rely on additional external sources, which might not be feasible for some KGs.
To be more universal, Neelakantan et al. (2015) propose two embedding models, i.e. linear model (LM) and projection embedding model (PEM), which can infer missing entity types only with KG itself.Although PEM has more expressive power than LM, however, both of them ignore global triple knowledge, which could also be helpful for encoding entity type assertions via shared entities' embeddings.To address this issue, Moon et al. (2017) propose a state-of-the-art model (ETE) to combine triple knowledge and entity type instances for entity type prediction, and build two entity type embedding methodologies: (1) Synchronous training: treat (entity, entity type) assertions as special triple facts that have a unique relationship "rdf:type", e.g.(Barack Obama, "rdf:type", person), and encode all mixed triple facts (original triple data fused with all generated special ones) by conventional entity relation embedding models, such as RESCAL (Nickel et al., 2011), HOLE (Nickel et al., 2016) and TransE (Bordes et al., 2013).( 2) Asynchronous training: first learn the entities' embeddings e by conventional entity relation embedding models mentioned above, and then only update entity types' embeddings t for min e − t 1 while keeping e fixed, called RESCAL-ET, HOLE-ET, TransE-ET and ETE.Although these approaches expect to explore global triple knowledge for entity type prediction, they still lack of expressive ability due to its simplicity of embeddings.In addition, they irrationally assume both the embeddings of entities and entity types being in the same latent space (∈ R κ ).Since entities and entity types are completely distinct objects, it may not be reasonable to represent them in a common semantic space.
In this paper, we introduce an enhanced KG entity type embedding model with better expressing and reasoning capability considering both local entity typing information and global triple knowledge in KGs.Note that incorporating more external information (Jin et al., 2018;Neelakantan et al., 2015) is not the main focus in this paper, as we only consider the internal structural information in KGs instead, which correspondingly makes our work much more challenging but also more universal and flexible due to the limited information.Recently, (Lv et al., 2018;Hao et al., 2019) also attempt to embedding structural information in KG.However, the goals and models are very different from ours.They encodes the concepts, not hierarchical types.On the contrary, we focus on the latter not the former.Table 1 summarizes the energy functions and other different settings of entity type embedding models.

Embedding-based Framework
We consider a KG containing entity type instances of the form (e, t) ∈ H (H is the training set consists of lots of (entity, entity type) assertions), where e ∈ E (E is the set of all entities) is an entity in the KG with the type t ∈ T (T is the set of all types).For example, e could be Barack Obama and t could be /people/person.As a single entity can have multiple types, entities in KG often miss some of their types.The aim of this work is to infer missing entity type instances in KGs.
Our work concerns energy-based methods, which learn low-dimensional vector representations (embeddings) of atomic symbols (i.e.entities, entity hierarchical types, relationships).In this framework, we learn two submodels: (1) one for predicting entity types given entities, and ( 2) another one to encode the interactions among entity types and relationships from KGs.The joint action of both models in prediction allows us to use the connection between triple knowledge and entity type instances to perform KG entity typing.

E2T: Mapping Entities to Types
The first model (E2T) of the framework concerns the learning of a function S e2t (e, t) with local typing knowledge from entity type instances, which is designed to score the similarity of an entity e and a type t.The main ideas behind this model are as follows: (1) Since the learned entity embeddings cluster well when they have the same or similar types, therefore, it is rather intuitive that the entity type embedding represents the projective common concept representation of a cluster of entities, i.e., f proj (e) t e , ∀e ∈ E. e (∈ R κ ) is the embedding of the entity e, t e (∈ R ) is the embedding of the type t e .The entity type embedding represents common information of their entities, it thus should have fewer variates, i.e., < κ. (2) Since the entities and entity types are totally distinct objects, we respectively build two embedding space for them, i.e., entity space and entity type space.
(3) Inspired by the previous work TranSparse (Ji et al., 2016) projecting entities from entity space to relation space with operation matrix M, which we adapted, replacing relation space with entity type space, we thus define f proj (e) = M • e ( t e ).Therefore, this model consists of first projecting entity embedding into entity type space, and then computing a similarity measure between this projection and an entity type embedding.The scoring function of E2T given (e, t) is: where M ∈ R ×κ is a transfer matrix mapping entity embeddings into entity type space.The score is expected to be lower for a golden entity type instance and higher for an incorrect one.(2) Since the relationship r remains unchanged in replacement, we build two differentiated embeddings for the i-th relationship r i in two embedding spaces: r i (∈ R κ ) in entity space and r • i (∈ R ) in entity type space.
(3) Given entity type triple (t e , r, t ẽ), under translation assumption2 as in (Bordes et al., 2013), we have: t ẽ − r • t e .Hence, the scoring function is defined as: where t e , r • , t ẽ ∈ R .The model returns a lower score if the two entity types is close under this relationship and a higher one otherwise.Fig. 2 shows an illustration of E2T and TRT.

Implementation for Entity Type Prediction
Our framework can be used for entity type prediction in the following way.First, for each entity e that appears in the testing set, a prediction by E2T is performed with: te = arg min t∈T S e2t (e, t). (3) In addition, a composite score (E2T+TRT) by connecting entity type instances and entity type triples with embedding model, which we call ConnectE3 , is defined as follows: where λ is a hyperparameter for the trade-off.

Datasets
We conduct the experiments on two real-world datasets (D) widely used in KG embedding literature, i.e.FB15k (Bordes et al., 2013) and YAGO43k (Moon et al., 2017), which are subsets of Freebase (Bollacker et al., 2008) and YAGO (Suchanek et al., 2007) respectively.They consist of triples, each of which is formed as (left entity, relationship, right entity).We utilize two entity type data (H, each of it is formed as (entity, entity type)) built in (Moon et al., 2017), called FB15kET and YAGO43kET, in which the entity types are mapped to entities from FB15k and YAGO43k respectively.Moreover, we build new type triple datasets (Z, each one in it is formed as (head type, relationship, tail type)), to train our model.They are built based on D and H. First, for each triple (e, r, ẽ) ∈ D, we replace the head and the tail with their types according to H.The generated datasets are called FB15kTRT(full) and YAGO43kTRT(full).Second, considering about the scalability of the proposed approach for full KGs, we further modify the generation method of type triples, which is the major training bottleneck.We discard newly generated ones with low-frequency (i.e.#frequency = 1).After that the size of both FB15kTRT(full) and YAGO43kTRT(full) decreased by about 90%, called FB15kTRT(disc.) and YAGO43kTRT(disc.)respectively.The statistics of the datasets are showed in Table 2.For saving space, we put more data processing details (include cleaning H, building Z, etc.) on our github website.

Entity Type Prediction
This task concentrates to complete a pair (entity, entity type) when its type is missing, which aims to verify the capability of our model for inferring missing entity type instances.
Evaluation Protocol.We focus on entity type prediction determined by Formula (3) and ( 4).We use ranking criteria for evaluation.Firstly for each test pair, we remove the type and replace it by each of the types in T in turn.The function value of the negative pairs would be computed by the related models and then sorted by ascending order.We can obtain the exact rank of the correct type in the candidates.Finally, we use two metrics for comparison: (1) the mean reciprocal rank (MRR), and (2) the proportion of correct entities ranked in the top 1/3/10 (HITS@1/3/10)(%).Since the evaluation setting of "Raw" is not as accurate as "Filter" (Bordes et al., 2013), we only report the experimental results with latter setting in this paper.
where C is a set of test pairs, and rank i is the rank position of the true entity type for the i-th pair.
We select the parameters based on MRR in valid dataset.The optimal configurations are: 85} on YAGO43k/ET/TRT.We run 800 epochs on both datasets, and the batch size is 4096.
Experimental Results.We can see from Table 3 that our ConnectEs outperform all baselines for entity type prediction in terms of all metrics on FB15kET and YAGO43kET.It confirms the capability of ConnectEs in modeling with local typing and global triple knowledge and inferring missing entity type instances in KGs.The model ConnectE-(E2T+TRT)(full) achieves the highest scores.

Analysis.
(1) In E2T, we utilize a mapping matrix M which compresses entity embeddings into type embedding space, considering that entity type embedding represents common information of all the entities which belong to this type.The type embedding should be in a sharing subspace of entity embeddings.The experimental results of E2T compared with the baselines demonstrate that this assumption would be quite reasonable.(2) In E2T+TRT, we build new type-relation-type data, and then connect them with entity type instances.This approach provides more direct useful information to (weakly) supervise entity type prediction.For example, given a fact that head entity Barack Obama belongs to type /people/person

Entity Type Classification
This task aims to judge whether each entity type instance in testing data holds or not, which could be viewed as a binary classification problem.Evaluation Protocol.Since there are no explicit negative entity type instances in existing KGs, in order to create datasets for classification, we build negative facts by randomly switching type from entity type pairs in validation and testing set with equal number of positive and negative examples.
Inspired by the evaluation metric of triple classification in (Socher et al., 2013), we calculate the scores of all entity type instances based on model energy function, and rank all instances in testing set with these scores.Those instances with lower scores are considered to be true.We use precision/recall curves to show the performances of all models.Moreover, we also compare the accuracy among different models.We first use validate set to find best threshold η.For instance, if the model score S e2t+trt (e, t e ) ≤ η in classification, the entity type instance will be classified to be positive, otherwise to be negative.The final accuracy is based on how many facts are classified correctly.Implementation.We utilize the source codes and parameter settings of several baselines provided by (Moon et al., 2017) for this task.The optimal parameter settings for our proposed models are:

Case Study
Table 5 shows the examples of entity type prediction by our model from FB15k/ET/TRT, which demonstrate our motivation of Mech. 2 that head type and tail type really maintain the relationship between head entity and tail entity.Given entity Peter Berg, TRT can find HITS@1 type pre-diction /people/person for it via the existing entity type assertion (New Youk, /location/location) and the relationship (/loc./loc./peopleborn here) between them, i.e.Peter Berg − New York + /location/location= /people/person.

Figure 1 :
Figure 1: Effective mechanisms of entity type inference with local typing knowledge and global triple knowledge.

3. 2
TRT: Encoding Triples in KGsUsing only entity type instances for training ignores much of relational knowledge that can leverage from triple facts in KGs.In order to connect this relational data with our model, we propose to learn entity type and relationship embeddings from global triple knowledge from KGs.The key motivations behind this model are: (1) As mentioned above, the entities cluster well according to their types.Therefore, we believe that an essential premise of a triple (head entity, relationship, tail entity) holds is that its corresponding entity types should first conform to this relationship.Hence, we can build a new entity type triple (head type, relationship, tail type) by replacing both head entity and tail entity with their corresponding types: i.e. (e, r, ẽ) replace −→ (t e , r, t ẽ).(e, r, ẽ) ∈ D, D is the training set consists of a lot of triples.r ∈ R (R is the set of relationships).t e and t ẽ stand for the hierarchical types of left entity e and right entity ẽ respectively.

Table 1 :
Entity type embedding models.
(Kingma and Ba, 2014)the embeddings of entity types (∀t ∈ T) and projecting matrix M, not the entities' embeddings that have been trained in J 1 .(3)J3 : We only update the embeddings of relationships (∀r • ∈ R • ) while keeping the entity types' embeddings fixed.The training is performed using Adagrad(Kingma and Ba, 2014).All embeddings in Θ are initialized with uniform distribution.The procedure, from J 1 , J 2 to J 3 , is iterated for a given number of iterations.We have: r, ẽ) ∈ D} (i.e.given e is head entity, P is the set of all corresponding tail entities' types.), and Q = {t ē|t ē ∈ T , (ē, r, e) ∈ D} (i.e.given e is tail entity, Q is the set of all corresponding head entities' types.).|P | and |Q| represent the total number of entity types in P and Q respectively.A prediction is performed with: te = arg min te∈T S e2t+trt (e, t e ). in which we update the embeddings of entities (∀e ∈ E) and the embeddings of relationships (∀r ∈ R ). (2) γ 1 , γ 2 , γ 3 > 0 are margin hyperparameters, and the corrupted datasets are built as follows: D :={(e , r, ẽ)|(e, r, ẽ) ∈ D, e ∈ E, e = e} ∪{(e, r, ẽ )|(e, r, ẽ) ∈ D, ẽ ∈ E, ẽ = ẽ} , H :={(e , te)|(e, te) ∈ H, e ∈ E, e = e} ∪{(e, t e )|(e, te) ∈ H, t e ∈ T , t e = te} , Z :={(t e , r, tẽ)|(te, r, tẽ) ∈ Z, t e ∈ T , t e = te} ∪{(te, r, t ẽ)|(te, r, tẽ) ∈ Z, t ẽ ∈ T , t ẽ = tẽ} D, H are training datasets of triple facts and entity type instances in KG.Z is the training data of type triples, built by replacing entities in D with their corresponding entity types.

Table 2 :
Statistics of D, H, Z.

Table 3 :
Entity type prediction results.Evaluation of different models on FB15kET and YAGO43kET.
YAGO43kET.In both datasets, we learn all the training data for 800 epochs and the batch size is 4096.After training, we firstly draw PR-curves with dynamic thresholds.We select the best threshold based on the accuracy in valid dataset, which is used to calculate the accuracy in test dataset.

Table 5 :
Entity type prediction examples.Extraction from FB15k/ET/TRT.In this paper, we described a framework for leveraging global triple knowledge to improve KG entity typing by training not only on (entity, entity type) assertions but also using newly generated (head type, relationship, tail type) type triples.Specifically, we propose two novel embedding-based models to encode entity type instances and entity type triples respectively.The connection of both models is utilized to infer missing entity type instances.The empirical experiments demonstrate the effectiveness of our proposed model.Our modeling method is general and should apply to other typeoriented tasks.Next, we are considering to use this framework to conduct KG entity type noise detection.