AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding

Recent advances in Knowledge Graph Embed-ding (KGE) allow for representing entities and relations in continuous vector spaces. Some traditional KGE models leveraging additional type information can improve the representation of entities which however totally rely on the explicit types or neglect the diverse type representations specific to various relations. Besides, none of the existing methods is capable of inferring all the relation patterns of symmetry, inversion and composition as well as the complex properties of 1-N, N-1 and N-N relations, simultaneously. To explore the type information for any KG, we develop a novel KGE framework with Automated Entity TypE Representation (AutoETER), which learns the latent type embedding of each entity by regarding each relation as a translation operation between the types of two entities with a relation-aware projection mechanism. Particularly, our designed automated type representation learning mechanism is a pluggable module which can be easily incorporated with any KGE model. Besides, our approach could model and infer all the relation patterns and complex relations. Experiments on four datasets demonstrate the superior performance of our model compared to state-of-the-art baselines on link prediction tasks, and the visualization of type clustering explains the similarity of type embeddings and verify the effectiveness of our model.


Introduction
In recent years, knowledge graph (KG) has been viewed as a significant technique for recognition systems and prevalent in many fields such as Ecommerce, intelligent healthcare and public security. Knowledge graphs collect and store a great deal of commonsense or domain knowledge in the form of factual triples which are composed of entity pairs with their relations. The existing large Figure 1: An actual example of the entity-specific triples and the type-specific triples with relation-aware projection mechanism. W ill Smith has multiple types such as Singer and Actor, but only the type Singer should be focused on for the relation SangSong.
However, the existing KGs are inevitably incomplete whether they are constructed manually or automatically, which will limit the effectiveness when exploited for downstream applications. Some existing KG inference approaches such as inductive logic programming algorithm (Ray, 2009), Markov logic networks based method (Qu and Tang, 2019) and reinforcement learning-based approach (Lin et al., 2018) succeed to predict entities or relations in KGs but face the limited performance and suffer from the low efficiency. Compared to the above approaches, knowledge graph embedding models could learn the latent representations of the entities and relations and show the best performance on the KG completion task. However, most of the KG embedding models such as TransE (Bordes et al., 2013) and its variants TransH (Wang et al., 2014), TransR (Lin et al., 2015b) learn KG embeddings relying on single triples, which simply exploit the structure information implied in KGs.
Entity types define categories of entities that are valid to enhance the representation of entities. In many type-embodied models such as TKRL (Xie et al., 2016) and TransT (Ma et al., 2017), the explicit types are necessary while some KGs (i.e. WordNet) lack them, which limits the versatility of these approaches. JOIE (Hao et al., 2019) jointly encodes both the ontology and instance views of KGs, while the concepts in ontologies always represent the general categories of entities but cannot reflect the specific types especially associated with different relations. Jain (Jain et al., 2018) learned the type embeddings by defining the compatibility between an entity type and a relation, but it ignores the semantics implied in a whole triple consisting of a relation jointly with its linked two entity types. Moreover, all the previous type-based approaches neglect the diversity of entity type representations specific to various relations. As Figure 1 shows, contrary to the previous researches considering entity types, the triples in the entity level could be extended to triples in the type level. Particularly, each entity has multiple types, and diverse types should be focused on for different specific relations.
Moreover, some models embed the entities and relations into the complex vector space instead of the frequently-used real space to improve the capability of representation learning, including Com-plEx (Trouillon et al., 2016) and RotatE . Nevertheless, none of the existing embedding models could model and infer all the relation patterns and the complex 1-N, N-1 and N-N relations, simultaneously.
To conduct the KG inference from the perspectives of both entity-specific triples and type-specific triples on any KG whether the explicit types exist in, we propose AutoETER to automatically learn the diverse type representations of each entity when focusing on the various associated relations. Intuitively, the high-dimensional entity embeddings imply the individual features to distinguish the diverse entities while the low-dimensional type embeddings capture the general features to discover the similarity of entities according to their categories. Inspired by the translational-based principle in TransE, we expect that given a head entity and its associated relation, the tail entity's type representation can be obtained by type head +relation = type tail . Particularly, the latent type embeddings of two head or two tail entities focused on the same relation should be close with each other since they imply the same type. Furthermore, the embeddings of the entity-specific triples and the type-specific triples are both capable of modeling and inferring the relations of symmetry, inversion, composition and complex 1-N, N-1, N-N relations.
The contributions of this work are summarized as follows: • We model type representations to enrich the general features of entities. A novel model Au-toETER is proposed to learn the embeddings of entities, relations and entity types from entity-specific triples and type-specific triples without explicit types in KGs. Furthermore, the type embeddings can be incorporated with the entity embeddings for inference.
• To the best of our knowledge, we are the first to model and infer all the relation patterns including symmetry, inversion and composition as well as complex relations of 1-N, N-1 and N-N for the KG inference.
• We conduct extensive experiments of link prediction on four real-world benchmark datasets. The evaluation results demonstrate the superiority of our proposed model over other stateof-the-art algorithms and the visualization of clustering type embeddings validates the effectiveness of automatically representing entity types with relation-aware projection.
2 Related Works

Knowledge Graph Inference
To address the inherent incompleteness of KGs, multiple KG inference methods are investigated and have made great progress. Traditional researches devote to generate logic rules based on inductive logic programming such as HAIL (Ray, 2009) to predict the missing entities in KGs. However, employing logic rules in KG inference limits the generalization performance. Path ranking algorithm (PRA) (Lao et al., 2011) extracts the relational path features based on random-walk to infer the relationships between entity pairs. DeepPath (Lin et al., 2018) is a foundational approach that formulates the multi-hop reasoning as a Markov decision process and leverages reinforcement learning (RL) to find paths in KGs. However, the RLbased multi-hop KG reasoning approaches consume much time in searching paths.

KG Embedding Models
Various KG embedding models have been extensively developed for KG inference in recent years . KGE models are capable of capturing latent representations of entities and relations in KGs independently from hand-crafted rules, and they have shown a strong capacity of efficient computation in many knowledge-aware applications (Ji et al., 2020). TransE (Bordes et al., 2013) is the foundational translation-based method, which regards a relation as translation operation from head entity to tail entity. Along with TransE, multiple variants are proposed to improve the embedding performance of KGs (Niu et al., 2020;Yuan et al., 2019;Xiao et al., 2016). ConvE (Dettmers et al., 2018) is a typical method representing entities and relations based on convolutional neural networks (CNN). Another category of KG embedding contains many tensor decomposition models, including DisMult (Yang et al., 2015). Particularly, ComplEx (Trouillon et al., 2016) extends DisMult to learn the KG embeddings in the complex space. RotatE ) defines a relation as rotation from source to target entities in a complex space but cannot infer the complex relations 1-N, N-1 and N-N. What's more, all the approaches above purely depend on the triples directly observed in KGs.

Models Incorporating Entity Types
To further improve the performance of KG embedding, various auxiliary information is introduced, such as paths (Lin et al., 2015a;Niu et al., 2020), graph structure (Michael et al., 2018) and entity types (Xie et al., 2016;Krompa et al., 2015;Ma et al., 2017). Among such information, entity types contain less noise and are appropriate for providing more general semantics for each individual entity. TKRL (Xie et al., 2016) projects each entity with the type-specific projection matrices. TransT (Ma et al., 2017) measures the semantic similarity of entities and relations utilizing types. However, all the above type-based KG embedding models require the supervision of explicit types and cannot work on KGs without explicit types. JOIE (Hao et al., 2019) links entities to their concepts in the ontology for jointly embed the instance-view graph and the ontology-view graph, but the concepts in ontologies provide too broad or even noisy information to represent the specific and precise types of each entity. (Jain et al., 2018) introduces the compatibility between the embeddings of an entity type and a relation for link prediction, but all the existing typeenhanced models neglect that the diverse types of an entity should be focused on when this entity is associated with various relations. Meanwhile, the association property implied in the embeddings of the type-specific triples has not been well modeled.

AutoETER: KGE with Automated Entity Type Representation
To cope with the above limitations, we describe the proposed model AutoETER, which aims to automatically learn a variant of type representations semantically compatible to various relations and infer all the relation patterns as well as complex relations. As figure 2 shows, we first embed the entities and relations into complex space via the entityspecific triple encoder with a hyper-plane projection strategy (3.1). Additionally, the type-specific triple encoder is developed to learn type embeddings incorporated with a relation-aware projection mechanism (3.2). Meanwhile, the type embeddings are constrained by their similarity derived from the associated relations (3.3). Afterward, we propose the overall optimization objective with both entityspecific triple and type-specific triple representations as well as the similarity constraint of the type embeddings (3.4).

Entity-specific Triple Encoder
We embed the entities and relations into the complex space, and regard a relation as the rotation operation from the head entity to the tail entity as in RotatE . To further model and infer the complex relations such as 1-N, N-1 and N-N, we project entities into their associated relation hyper-planes to ensure each entity has various representations with respect to the specific relations.
In terms of an entity-specific triple (h, r, t), the energy function E 1 (h, r, t) is defined as where h ∈ C k , t ∈ C k , r ∈ C k are the embeddings of head entity h, tail entity t and relation r in the complex space with dimension k. w r ∈ R k denotes the normal vector of the hyper-plane involved in the relation r. e h,r ∈ C k and e t,r ∈ C k represent the entity embeddings of h and t projected in the hyper-plane w r . • is the Hadamard product. On account of the embeddings of entity-specific triples, our model is able to infer all the relation patterns via the rotation operation from head to tail entities as in RotatE. Particularly, r is constrained  Figure 2: The architecture of AutoETER. Given a triple fact (h, r, t), e h,r and e t,r are the projected entity embeddings in the hyper-plane of relation r, y h,r and y t,r are type embeddings focusing on relation r. Furthermore, we expect the embeddings of entity-specific triple satisfies rotation operation and type-specific triple satisfies translation operation from head to tail entities. Type embeddings associated with the same relation r are constrained to be closer, where γ is the margin enforced between two clusters of type embeddings related to different relations.
to be |r i | = 1, i = 1, 2, ..., k for inferring the symmetric relation pattern and at least one element of r is -1 to ensure the diverse representations of head and tail entities. Moreover, the projection operation shown in Eq. 1 enables our model to infer the complex relations via various representations of entities in regard to different relations.

Type-specific Triple Encoder
Given an entity e and its associated relation r in a triple, we aim to learn the type and relation embeddings with a relation-aware projection mechanism to output the most important information of the type representations: where y e ∈ R d denotes the type embedding of entity e in the real space with dimension d. M r ∈ R d×d is defined as the projection weight matrix associated with the relation r, which could automatically select the latent information of each type embedding most relevant to the relation r.
With the relation-aware projection defined in Eq. 3, the energy function involved in type-specific triples is defined as where y h,r ∈ R d , y t,r ∈ R d are the type embeddings of entities h and t both focusing on the relation r and y r ∈ R d denotes the embedding of the relation r in the type-specific triple. In terms of the energy function in Eq. 4, we expect that y h,r + y r = y t,r Furthermore, with the type and relation embeddings learned in the real spaces, our model cost fewer parameters and could model and infer all the relation patterns including symmetry (Lemma 1), inversion (Lemma 2) and composition (Lemma 3) as well as the complex properties of relations: Lemma 1. Our model could infer relation pattern of symmetry by type-specific triple embeddings.
Proof. If a relation r is symmetric, two triples (h, r, t) and (t, r, h) will hold. From Eq. 5, the correlations among the embeddings of types and relations can be obtained as: y h,r + y r = y t,r , y t,r + y r = y h,r From Eq. 6, we can further derive that y h,r = y t,r , y r = 0 We prove that the embedding of a symmetric relation should be zero vector, and the type embeddings of head and tail entities should be equal. The above results are reasonable owing to the focused types of two entities linked by the symmetric relation are supposed to be same.
Lemma 2. Our model is able to infer relation pattern of inversion by type-specific triple embeddings.
Proof. With the inverse relations r 1 and r 2 , two triples (h, r 1 , t) and (t, r 2 , h) hold. From Eqs. 3, 4 and 5, it can be retrieved that We can define a transform matrix P ∈ R d×d that satisfies Substituting Eq. 10 into Eq. 9, the latter can be modified as Then, substituting Eq. 11 into Eq. 8, it yields that We can model and infer the inverse relations with the relation embeddings satisfying the relationship as in Eq. 12.
Lemma 3. Our model is capable of inferring the relations of composition by type-specific triple embeddings.
Proof. On account of the relations of composition pattern r 3 (a, c) ⇐ r 1 (a, b) ∧ r 2 (b, c), the corresponding triples (a, r 1 , b), (b, r 2 , c) and (a, r 3 , c) hold. Meanwhile, considering Eqs. 3, 4 and 5, it can be obtained that We can define two transform matrices P ∈ R d×d and Q ∈ R d×d to satisfy Substituting Eq. 16 into Eq. 13 and Eq. 17 into Eq. 14, respectively, we can derive that Substituting Eq. 18 into Eq. 19, it can be retrieved that M r 3 y a + Py r 1 + Qy r 2 = M r 3 y c Combining Eqs. 15 and 20, we can model the correlation among the relation embeddings of composition pattern as y r 3 = Py r 1 + Qy r 2 We prove that we can model and infer the relations of composition pattern for type-specific triples with the relation embeddings as shown in Eq. 21.
Specific to the inference on type-specific triples with the relations of complex properties 1-N, N-1 and N-N, we could exploit the various representations of an entity type associated with different relations via the relation-aware projection mechanism defined in Eq. 3 to infer on these relations.

Type Embeddings Similarity Constraint
In addition to learning type embeddings by the type-specific triple encoder (3.2), the type representations should be constrained by the similarity between the entity types. The type embeddings of head entities involved in the triples with the same relation are closer to each other (the same as type embeddings of tail entities). Thus, as for two triples with the same relation, we expect that y h 1 ,r = y h 2 ,r , y t 1 ,r = y t 2 ,r where y h 1 ,r and y h 2 ,r are type embeddings of head entities while y t 1 ,r and y t 2 ,r are type embeddings of tail entities. Particularly, they all focus on the same relation r by the relation-aware projection mechanism of Eq. 3. Now, considering any two triples (h 1 , r 1 , t 1 ) and (h 2 , r 2 , t 2 ), we design the energy function for evaluating the dissimilarity of the type embeddings as E 3 ((h 1 , r 1 , t 1 ), (h 2 , r 2 , t 2 )) = 1 2 y h 1 ,r 1 − y h 2 ,r 2 where y h 1 ,r 1 and y h 2 ,r 2 are two head entity type embeddings, y t 1 ,r 1 and y t 2 ,r 2 are two tail entity type embeddings, and they are all associated with the relation r 1 or r 2 . Therefore, we expect the value derived from Eq. 23 tends to be smaller if r 1 and r 2 are the same relation.

Optimization Objective
The designed entity-specific triples encoder, typespecific triples encoder and type representations similarity constraint could be trained as a unified model end to end. We optimize our model according to a three-component objective function: in which the overall training objective consists of three components: L 1 and L 2 are two pairwise loss functions which correspond to the entityspecific triple encoder and the type-specific triple encoder, respectively, and L 3 is a triple loss function for constraining the type embeddings. α 1 and α 2 denote the weights of L 2 and L 3 for the tradeoff between the entity-specific triple, the type-specific triple and the type similarity constraint. S contains all the triples in the train set and S is the negative sample set generated by replacing the entities in S. Specifically, L 1 , L 2 and L 3 are defined as (hp, r, tp)) + γ 3 − E 3 ((h, r, t), (hn, r , tn)) where γ 1 , γ 2 and γ 3 denote the fixed margins in L 1 , L 2 and L 3 , respectively. In specific, L 3 can be viewed as the regularization in optimization for restraining the entity type embeddings. σ denotes the sigmoid function. max [0,x] is the function to select the larger value between 0 and x. Particularly, in Eq. 27, the triple (h, r, t) is regarded as the anchor instance and (h p , r, t p ) is a positive instance in the set Y containing other triples correlated to the same relation r, while (h n , r , t n ) is any negative instance in the set Y containing the other triples without the relation r. Besides, we employ self-adversarial sampling as in .

Experiment Results
In this section, we evaluate our model AutoETER for KG completion on four real-world benchmark datasets. Additionally, we visualize the clustering results of type embeddings for demonstrating the effectiveness of representing types automatically.

Evaluation Protocol
Link prediction task aims to predict the entity when the head or tail entity of a triple in the test set is missing. For link prediction, all the entities in the KG are respectively replaced with the missing entity to generate the candidate triples. Then, on account of each candidate triple (h, r, t), we combine the two perspectives of the entity-specific triple jointly with the type-specific triple to evaluate the plausibility of this candidate triple, and the energy function for evaluation is designed as follows: The above energy function E pred (h, r, t) is composed of the energy functions E 1 (h, r, t) (with regard to the entity-specific triple) and E 2 (h, r, t) (with respect to the type-specific triple) defined in Eqs. 2 and 4, respectively. α 1 is the weight which is the same as in Eq. 24 for tradeoff. Then, ranking all the candidate triples according to the scores calculated by Eq. 28. Subsequently, these scores are sorted in ascending order and further the rank of the correct triple can be obtained. Three standard metrics are employed to evaluate the performance of link prediction: 1) Mean Rank   (MR) of the correct triples; 2) Mean Reciprocal Rank (MRR) of the correct triples; 3) Hits@n measures the proportion of the correct triples in top-n candidate triples. We also follow the filtered setting as the previous study (Dettmers et al., 2018) that evaluates the performance by filtering out the corrupt triples already exist in the KG.

Baselines and Hyper-parameters
We compare the developed model AutoETER with two categories of the state-of-the-art baselines: (1) Models only considering entity-specific triples including TransE, DisMult, HolE, ComplEx, ConvE, RotatE and QuatE; (2) Models introducing additional information such as TKRL with explicit types and the type-sensitive model TypeComplex, R-GCN with graph structure and PTransE with paths. All the baselines are selected because they achieve good performance and provide source codes for ensuring the reliability and reproducibility of the results. The results of R-GCN are from (Zhang et al., 2019). The results of TKRL are from (Xie et al., 2016).  .

Evaluation Results and Analyses
We tune our model utilizing a grid search to select the optimal hyper-parameters. The optimal configurations are provided as: the batch size is set as 1024, the learning rate is lr = 0.0001, and the weights in optimization are α 1 = 0.1, α 2 = 0.5. The dimension of the entity and relation embeddings in entity-specific triples is k = 1000, the dimension of the type and relation embeddings in type-specific triples is d = 200. For datasets FB15K and YAGO3-10, the three fixed margins are set as γ 1 = 22, γ 2 = 8, γ 3 = 6. For datasets WN18 and FB15K-237, γ 1 = 10, γ 2 = 6, γ 3 = 3. Table 2 and Table 3 report the evaluation results of link prediction on the four datasets. We can observe that AutoETER achieves the similar performance with the state-of-the-art model QuatE and outperforms the other baselines. This demonstrates the superiority of modeling and inferring all the relation patterns as well as the complex relations by our model. Specifically, AutoETER performs better than the type-embodied models TKRL and TypeComplex, which emphasizes the type representations learned automatically with relation-aware "johnny & june" "movie actress" "the two towers" "cannes best actor award" "academy award for best story" "golden bear award for best short film" "american english"   projection by AutoETER are more effective for inference than totally leveraging the explicit types or ignoring the diversity of type embeddings focusing on various relations. Furthermore, AutoETER outperforms RotatE because AutoETER could infer the complex relations and takes advantage of type representations. Additionally, our model obtains better performance of Hits@1 than QuatE on all the three datasets, which verifies the type representations learned from KGs are available to predict entities more accurately by restricting the candidate entities with type embeddings.
In view of more diverse relations existed in FB15K compared with the other three datasets, we select FB15K to evaluate the performance of link prediction by mapping 1-1, 1-N, N-1 and N-N relations. The results are shown in Table 4, our model achieves better performance on both head entity prediction and tail entity prediction compared to other baselines particularly RotatE, which illustrates the superiority of capturing various representations of entities specific to different relations with the relation-aware projection mechanism to represent entity types.  Table 5: Ablation study on FB15K-237. "H@" is the abbreviation of "Hits@".

Ablation Study
We conduct the ablation study of our model on dataset FB15K-237 when we omit the type representation (-TR) and only omit the type similarity constraint (-TSC) from our model. Table 5 demonstrates that our model performs better than the two ablated models. It illustrates the type representation and the type similarity constraint both have significant impacts on the performance of link prediction and suggests that our automatically learned type representations play a pivotal role in our approach.

Visualization of Clustering Entity Type Representations
We utilize Kmeans to cluster the type embeddings and further employ t-SNE to implement dimension-ality reduction for 2d visualization. As Figure 3(a) shows, some type embeddings are clustered into independent categories while some clusters stay close with each other because these entities share many public types. For instance, johnny&june and the two towers are clustered into the same category which actually represents the type movie as we know. Figure 3(b) shows the clustering of the entity embeddings. It can be clearly observed that entity type clustering has better compactness than entity clustering, which demonstrates the entity type embeddings could reflect the characteristic of types. The type embeddings focusing on relation /award/award category/nominated f or are visualized in Figure 3(c). It is evident that some type embeddings representing the type award such as academy award f or best story and cannes best actor award are clustered into the same category while others stay far away.These visualization results explain the effectiveness of our type embeddings learned automatically with relation-aware projection from the KG.

Conclusion and Future Work
In this paper, we propose an AutoETER framework to automatically learn type representations for enriching KG embedding. We introduce two classes of encoders to learn the entity-specific triple and type-specific triple embeddings, which could model and infer all the relation patterns of symmetry, inversion and composition as well as the complex 1-N, N-1 and N-N relations. We also constrain the type embeddings by the type similarity. Our experiments on four real-world datasets for link prediction illustrate the superiority of our model and the visualization of the type embeddings clustering verifies the availability of representing types automatically. In future work, we intend to extend our approach to obtain the better type representations incorporating the supervision of ontologies.