Knowledge Association with Hyperbolic Knowledge Graph Embeddings

Capturing associations for knowledge graphs (KGs) through entity alignment, entity type inference and other related tasks benefits NLP applications with comprehensive knowledge representations. Recent related methods built on Euclidean embeddings are challenged by the hierarchical structures and different scales of KGs. They also depend on high embedding dimensions to realize enough expressiveness. Differently, we explore with low-dimensional hyperbolic embeddings for knowledge association. We propose a hyperbolic relational graph neural network for KG embedding and capture knowledge associations with a hyperbolic transformation. Extensive experiments on entity alignment and type inference demonstrate the effectiveness and efficiency of our method.


Introduction
Knowledge graphs (KGs) have emerged as the driving force of many NLP applications, e.g., KBQA (Hixon et al., 2015), dialogue generation (Moon et al., 2019) and narrative prediction . Different KGs are usually extracted from separate data sources or contributed by people with different expertise. Therefore, it is natural for these KGs to constitute complementary knowledge of the world that can be expressed in different languages, structures and levels of specificity (Lehmann et al., 2015;Speer et al., 2017). Associating multiple KGs via entity alignment (Chen et al., 2017) or type inference (Hao et al., 2019) particularly provides downstream applications with more comprehensive knowledge representations.
Entity alignment and type inference seek to find two kinds of knowledge associations, i.e., sameAs and instanceOf, respectively. An example showing such associations is given in Figure 1. Specifically, entity alignment is to find equivalent entities from different entity-level KGs, such as United States in DBpedia and United States of America in Wikidata. Type inference, on the other hand, associates a specific entity with a concept describing its type information, such as United States and Country. The main difference lies in whether such knowledge associations express the same level of specificity or not. Challenged by the diverse schemata, relational structures and granularities of knowledge representations in different KGs (Nikolov et al., 2009), traditional symbolic methods usually fall short of supporting heterogeneous knowledge association (Suchanek et al., 2011;Lacoste-Julien et al., 2013;Paulheim and Bizer, 2013). Recently, increasing efforts have been put into exploring embeddingbased methods (Chen et al., 2017;Trivedi et al., 2018;Jin et al., 2019). Such methods capture the associations of entities or concepts in a vector space, which can help overcome the symbolic and schematic heterogeneity . Embedding-based knowledge association methods still face challenges in the following aspects. (i) Hierarchical structures. A KG usually consists of many local hierarchical structures (Hu et al., 2015). Besides, a KG also usually comes with an ontology to manage the relations (e.g., subClassOf ) of concepts (Hao et al., 2019), which typically forms hierarchical structures as illustrated in Figure 1. It is particularly difficult to preserve such hierarchical structures in a linear embedding space (Nickel et al., 2014). (ii) High parameter complexity. To enhance the expressiveness of KG embeddings, many methods require high embedding dimensions, which inevitably cause excessive memory consumption and intractable parameter complexity. For example, for the entity alignment method GCN-Align (Wang et al., 2018), the embedding dimension is selected to be as large as 1, 000. Reducing the dimensions can effectively decrease memory cost and training time. (iii) Different scales. The KGs that we manipulate may differ in scales. For example, while the English DBpedia contains 4, 233, 000 entities, its ontology only contains less than a thousand concepts. Capturing the associations between entities and concepts has to deal with drastically different scales of structures and search spaces, while most existing methods do not consider such difference.
To tackle these challenges, we propose a novel hyperbolic knowledge association method, namely HyperKA, inspired by the recent success of hyperbolic representation learning (Nickel and Kiela, 2017;Dhingra et al., 2018;Tifrea et al., 2019). Unlike the Euclidean circle circumference that grows linearly w.r.t. the radius, the hyperbolic space grows exponentially with the radius. This property makes the hyperbolic geometry particularly suitable for embedding the hierarchical structures that drastically span their sizes along with their levels. It is also capable of achieving superior expressiveness at a low dimension. To leverage such merit, HyperKA employs a hyperbolic relational graph neural network (GNN) for KG embedding and captures multi-granular knowledge associations with a hyperbolic transformation between embedding spaces. For each KG, HyperKA first incorporates hyperbolic translational embeddings at the input layer of the GNN. Then, several hyperbolic graph convolution layers are stacked over the inputs to aggregate neighborhood information and obtain the final embeddings of entities or concepts. On top of the KG embeddings, a hyperbolic transformation is jointly trained to capture the associations. We conduct extensive experiments on entity alignment and type inference. HyperKA outperforms SOTA methods on both tasks at a moderate dimension (e.g., 50 or 75). Even with a small dimension (e.g., 10), our method still shows competitive performance.

Knowledge Association
Knowledge association aims at capturing the correspondence between structured knowledge that is described under the same or different specificity. In this paper, we consider two knowledge association tasks, i.e., entity alignment between two entity-level KGs and type inference from an entitylevel KG to an ontological one. We define a KG as a 3-tuple K = {E, R, T }, where E denotes the set of objects such as entities or concepts. R denotes the set of relations and T ⊆ E × R × E denotes the set of triples. Each triple τ = (h, r, t) records a relation r between the head and tail objects h and t. On top of this, the associations between two entity-level KGs (or between one entity-level and one ontological KGs) where → denotes a kind of associations, such as the sameAs relationship for entity alignment or the instanceOf relationship in the case of type inference. A small subset of associations A + ⊂ A are usually given as training data and we aim at finding the remaining.

Related Work
Knowledge association tasks and methods. Entity alignment or type inference between KGs can be viewed as a knowledge association task. A typical method of entity alignment is MTransE (Chen et al., 2017). It jointly conducts translational embedding learning (Bordes et al., 2013) and alignment learning to capture the matches of entities based on embedding distances or transformations. As for type inference, JOIE (Hao et al., 2019) deploys a similar framework to learn associations between entities and concepts. Later studies explore with three lines of techniques for improvement. (i) KG embedding. Besides translational embeddings, some studies employ other relational learning techniques such as circular correlations (Hao et al., 2019;Shi and Xiao, 2019), recurrent skipping networks , and adversarial learning (Pei et al., 2019a,b;. Others employ various GNNs to seize the relatedness of entities based on neighborhood information, including GCN (Wang et al., 2018;, GAT (Zhu et al., 2019;Mao et al., 2020) and relational GCNs (Wu et al., 2019a,b;Sun et al., 2020a). These techniques seek to better induce embeddings with more comprehensive relational modeling. Other studies for ontology embeddings  consider relative positions between spheres as the hierarchical relationships of corresponding concepts. However, they are still limited to linear embeddings, hence may easily fall short of preserving the deep hierarchical structures of KGs. (ii) Auxiliary information. Besides relational structures, some studies characterize entities based on auxiliary information, including numerical attributes Trisedya et al., 2019), literals (Gesese et al., 2019 and descriptions Chen et al., 2018;Jin et al., 2019). They capture associations based on alternative resources, but are also challenged by the less availability of auxiliary information in many KGs (Speer et al., 2017;Mitchell et al., 2018). (iii) Semi-supervised learning. Another group of studies seek to infer associations with limited supervision, including selflearning (Sun et al., 2018Zhu et al., 2019) and co-training (Chen et al., 2018). These methods are competent in inferring one-to-one entity alignment, without consideration of associations between entities and concepts. A recent survey by Sun et al. (2020b) has systematically summarized all three lines of studies. Hyperbolic representation learning. Different from Euclidean embeddings, some studies explore to characterize structures in hyperbolic embedding spaces, and use the non-linear hyperbolic distance to capture the relations between objects (Nickel and Kiela, 2017;Sala et al., 2018). This technique has shown promising performance in embedding hierarchical data, e.g., co-purchase records (Vinh et al., 2018), taxonomies (Le et al., 2019;Aly et al., 2019) and organizational charts (Chen and Quirk, 2019). Further work extends hyperbolic embeddings to capture relational hierarchies of sentences (Dhingra et al., 2018), neighborhood aggregation (Chami et al., 2019; and missing triples of a KG (Kolyvakis et al., 2020;Balazevic et al., 2019). These studies mainly focus on the scenario of a single independent structure. Learning associations across multiple KG structures with hyperbolic embeddings is still an unsolved issue, which is exactly the focus of this paper.

Hyperbolic Geometry
The hyperbolic space is one of the three kinds of isotropic spaces. Table 1 lists some key properties of the Euclidean (flat), spherical (positively curved) and hyperbolic (negatively curved) spaces. Compared with the Euclidean and spherical spaces, the amount of space covered by a hyperbolic geometry increases exponentially rather than poly-Property Geometry Euclidean Spherical Hyperbolic
nomially w.r.t. the radius. This property allows us to capture KG structures at a very low dimension, and particularly suits those forming hierarchies. For the hyperbolic geometry, there are several important models including the hyperboloid model (Reynolds, 1993), Klein disk model (Nielsen and Nock, 2014) and Poincaré ball model (Cannon et al., 1997). In this paper, we choose the Poincaré ball model due to its feasibility for gradient optimization (Balazevic et al., 2019). Specifically, For simplicity, we follow (Ganea et al., 2018) and let c = 1. We hereby introduce some basic operations of hyperbolic geometry, which we use extensively. Hyperbolic distance. The distance between vectors u and v in the Poincaré ball is given by: ).
When points move from the origin towards the ball boundary, their distance would increase exponentially, offering a much larger volume of space for embedding learning. Vector translation. The vector translation in the Poincaré ball is defined by the Möbius addition: Transformation. A transformation is the backbone of both GNNs (Chami et al., 2019; and transformation-based associations (Chen et al., 2017;Hao et al., 2019). The work in (Ganea et al., 2018) defines the matrix-vector multiplication between Poincaré balls using the exponential and logarithmic maps. The hyperbolic vectors are first projected into the tangent space at 0 using the logarithmic map (log 0 : D n,1 → T 0,n D n,1 ), then multiplied the transformation matrix like in the Euclidean space, and finally projected back on the manifold with the exponential map (exp 0 : T 0,n D n,1 → D n,1 ). Specifically, the two projections on vector u ∈ D n,1 are defined as follows: Through such inverse projections, theoretically, we can apply any Euclidean counterpart operations on hyperbolic vectors. The transformation that maps a vector u ∈ D n,c into D m,c can be done using the Möbius version of matrix-vector multiplication:

Hyperbolic Knowledge Association
In this section, we introduce the technical details of HyperKA -the hyperbolic GNN-based representation learning method for knowledge association. Different from existing relational GNNs like R- GCN (Schlichtkrull et al., 2018), AVR-GCN (Ye et al., 2019) and CompGCN (Vashishth et al., 2020) that perform a relation-specific transformation on relational neighbors before aggregation, our method models relations as translations between entity vectors at the input layer, and performs neighborhood aggregation on top of them to derive the final entity embeddings. This allows our method to benefit from both relation translation and neighborhood aggregation without increasing computation complexity.

Hyperbolic Relation Translation
Given a triple from the KG, the translational technique (Bordes et al., 2013) models a relation as a translation vector between its head and tail entities. This technique has shown promising performance on many downstream tasks such as relation prediction, triple classification and entity alignment (Bordes et al., 2013;Chen et al., 2017;). An apparent issue of such translations to embed hierarchies in the Euclidean space is that it would require a large space to preserve the successive relation translations in a hierarchical structure. The data in a hierarchy grows exponentially w.r.t. its levels, while the amount of space grows linearly in a Euclidean space. As a result, the Euclidean embeddings usually come with a high dimension so as to achieve enough expressiveness for the aforementioned hierarchical structures. However, such modeling can be easily done in the hyperbolic space with a low dimension, where the distance between two points increases exponentially as they move towards to the boundary of the hyper-sphere. In our method, we seek to migrate the original translation operation to the hyperbolic space in a compact way. Accordingly, the following energy function is defined for a triple τ = (h, r, t): where u ∈ D n,c denote the embeddings for h, r and t at the input layer, respectively. Our method is different from some existing methods (Balazevic et al., 2019;Kolyvakis et al., 2020) that use hyperbolic relation-specific transformations on entity representations and may easily cause high complexity overhead. The parameter complexity of our translation operation remains the same as TransE. We prefer low energy for positive triples while high energy for negatives. Hence, we minimize the following contrastive learning loss: (5) where T − denotes the set of negative triples generated by corrupting positive triples (Sun et al., 2018). λ 1 is the margin where we expect f (τ ) > λ 1 .

Hyperbolic Neighborhood Aggregation
GNNs (Kipf and Welling, 2017) have recently become the paradigm for graph representation learning. Particularly, for the entity alignment task, the main merit of GNN-based methods lies in capturing the high-order proximity of entities based on their neighborhood information (Wang et al., 2018). Inspired by the recent proposal of hyperbolic GNNs Chami et al., 2019), we seek to use the hyperbolic graph convolution to learn embeddings for knowledge association. The typical message passing process of GNNs consists of two phases, i.e., aggregating neighborhood features and combining node and neighborhood information where u Different aggregation and combination functions lead to different variants of GNNs. We choose the message passing technique that highlights the representations of central objects, to benefit from the translational embeddings at the input layer. Specifically, the message passing process of our hyperbolic GNN from the (l −1)-th layer to the l-th layer is defined as follows: where W (l) is the transformation matrix and b (l) is the bias vector at the l-th layer. σ is an activation function. We adopt mean-pooling to compute u

Hyperbolic Knowledge Projection
Once each KG is embedded in a hyperbolic space, the next step is to capture the associations between different KGs. Many previous studies jointly embed different KGs into a unified space Wang et al., 2018;, and infer the associations based on similarity of entity embeddings. However, pursing similar embeddings in a shared space is ill-posed for KGs with inconsistent structures, especially under the cases with different scales of knowledge representations. We hereby tackle the challenge with a knowledge projection technique in the hyperbolic space. Given a pair of seed knowledge association (i, j) ∈ A + , we use the Möbius multiplication to project u i to find the target u j in the other space. The transformation error is defined as the hyperbolic distance between projected embeddings: where M ∈ R n×m serves as the linear transformation from the hyperbolic space D n,c of K 1 to D m,c of K 2 . The two hyperbolic spaces are not necessarily of the same dimension, i.e., we usually have n = m. The projection loss is defined as follows: A − thereof is the set of negative samples of knowledge associations, and λ 2 > 0 is a margin.

Training
The overall loss of the proposed method is the combination of relation translation learning and knowledge projection learning, which is given by: The embedding vectors are initialized using the Xavier normal initializer. Then, we can use the exponential map to project vectors to the Poincaré ball. We adopt the Riemannian SGD algorithm (Bonnabel, 2013) to optimize the loss function. Let θ be the trainable parameters. The Riemannian gradient ∇ H at θ t is computed as follows: where ∇ E denotes the Euclidean gradient. We use Adam (Kingma and Ba, 2015) as the optimizer.

Experiments
We evaluate the proposed method HyperKA on two tasks of knowledge association, i.e. entity alignment (Section 5.1) and entity type inference (Section 5.2). The source code is publicly available 1 .

Entity Alignment
Entity alignment aims at matching the counterpart entities that describe the same real-world identity across two entity-level KGs. The inference of entity alignment is based on the embedding distances.

Experimental Setup
Datasets. We use the widely-adopted entity alignment dataset DBP15K  for evaluation. It is extracted from DBpedia (Lehmann et al., 2015) and consists of three settings, namely ZH-EN (Chinese-English), JA-EN (Japanese-English) and FR-EN (French-English). Each setting contains 15 thousand pairs of entity alignment. The dataset splits are consistent with those in previous studies (Sun et al., , 2018, which result in 30% of entity alignment being used in training. The statistics of DBP15K are reported in Appendix A. Baselines. We compare HyperKA with nine recent structure-based entity alignment methods, including five relation-based methods, i.e., MTransE   (Sun et al., 2018), NAEA (Zhu et al., 2019) and TransEdge  as they achieve high performance by bootstrapping from unlabeled entity pairs. We describe the implementation of the semi-supervised HyperKA variant and experimental results shortly in Section 5.1.5.
Model configuration. In the main experiment, we use two GNN layers, and set the dimension of all layers in HyperKA to 75. The dimensions for the two KGs are the same, i.e., n = m = 75. This is the smallest dimension adopted by any baseline methods. Note that, we also evaluate our method with a range of dimensions from 10 to 150, to assess its robustness. We report in Appendix B the implementation details of HyperKA and the selected values for hyper-parameters, including the learning rate, the batch size, margin values λ 1 and λ 2 , etc. Following convention, we report three metrics on entity alignment, i.e., H@1 (precision), H@10 (the proportion of correct alignment ranked within the top 10) and MRR (mean reciprocal rank). Higher scores of those metrics indicate better performance.

Main Results
We report the entity alignment results on DBP15K in Table 2. Note that the embedding dimension for HyperKA is set to 75 (the smallest setting among baseline methods). We can observe that HyperKA consistently outperforms all baseline methods on all three datasets, especially GNN-based methods. For example, on DBP15K FR-EN, the H@1 score of HyperKA reaches 0.597, surpassing MuGNN by 0.102 and AliNet by 0.045, even though HyperKA uses a smaller dimension than these methods. Compared against the baselines with dimension of 75, HyperKA also achieves much better performance. For instance, on the ZH-EN dataset, it surpasses AlignE by 0.1 in H@1. Overall, HyperKA significantly outperforms the SOTA Euclidean methods, while using the same or much smaller dimension settings. This shows that the hyperbolic embeddings have superior expressiveness than the linear embeddings. As for the comparison between two variants of HyperKA, we can see that the one with relation embedding performs notably better. This demonstrates the effectiveness of incorporating relation translation into GNNs.

Analysis on Dimensions
We further analyze the effect of different dimensions on performance and training efficiency. We report the H@1 results of different dimensions in Table 3. We observe that the H@1 scores of Hy-perKA drop along with the decrease of embedding dimensions. This observation is generally in line with our expectations because a small dimension limits the expressiveness of KG embeddings. However, HyperKA still exhibits satisfying performance at very small dimensions in comparison to other methods, such as under the dimensions of 10 and 25. Specifically, HyperKA with 25 dimension even outperforms a number of methods in Table 2 with much higher dimensions, e.g., AlignE, GCN-Align and KECG. Note that, HyperKA with 35 dimension achieves very similar results to AliNet with layer dimensions of (500, 400, 300) and also outperforms other baseline methods. HyperKA with dimension of 150 establishes a new SOTA performance for structure-based entity alignment. Overall, the lowdimension hyperbolic representations of HyperKA demonstrate more precise and robust inference of counterpart entities across KGs. We report in Figure (Sun et al., 2018) 0  sion leads to more GPU memory costs and training time, although it also leads to better performance as shown in Table 3. HyperKA can achieve satisfying performance with limited GPU memory costs.

Analysis on Expressiveness
To further understand the expressiveness of our hyperbolic KG embeddings, we compare a small dimension along with their GPU memory costs of HyperKA and its Euclidean counterpart HyperKA (Euc.) with AliNet, when those three achieve similar performance. HyperKA (Euc.) is implemented by replacing hyperbolic operations with their corresponding Euclidean operations. For example, the Möbius addition ⊕ is replaced with vector addition +. We select the dimension of HyperKA (Euc.) in {75, 100, 150, 200, 300, 500} and its bestperforming model under the dimension of 200 can achieve similar performance to AliNet. By contrast, HyperKA only needs a dimension of 35 as shown in Table 3. Their GPU memory costs on ZH-EN are shown in Figure 3. We observe similar results on JA-EN and FR-EN. Specifically, HyperKA only costs about 45.09% memory of HyperKA (Euc.) and 29.97% of AliNet to achieve similar performance. This shows that hyperbolic embeddings can achieve satisfying expressiveness with a small dimension and efficient memory costs. We report in Table 4 the entity alignment results of HyperKA (Euc.) on DBP15K. We can find that HyperKA (Euc.) with a high dimension (e.g., 300) can also achieve similar performance with HyperKA at a low dimension of 75. This is because the Euclidean embeddings also have enough expressiveness to represent hierarchical structures  if given a large dimension. However, hyperbolic embeddings only need a small dimension, bringing along the substantial advantage in saving memory.

Semi-supervised Entity Alignment
Semi-supervised entity alignment methods use selftraining or co-training techniques to augment training data by iteratively finding new alignment labels (Sun et al., 2018;Zhu et al., 2019;. Following BootEA (Sun et al., 2018), we use the self-training strategy to iteratively propose more aligned entity pairs to augment training data, where is a distance threshold. As these pairs inevitably contains errors (Sun et al., 2018), we apply a small weight µ when using such proposed data for training, resulting in the following loss: Accordingly, the semi-supervised HyperKA variant minimizes the joint loss L+L semi . The selected settings are = 0.25, µ = 0.05, and the training takes 800 epochs. Table 5 lists the H@1 and MRR results, where HyperKA shows drastic improvement over BootEA and NAEA. It also achieves noticeably better H@1 than the latest semi-supervised method TransEdge, especially on the FR-EN setting. The good performance of TransEdge comes with prohibitive memory overhead. Its parameter complexity is O(2N e n + N r n) , where N e and N r denote the numbers of entities and relations in KGs, respectively. n is the dimension. By contrast, the complexity of our method is O(N e n + N r n + Ln 2 ) and we have N e Ln in practice, where L is the number of GNN layers. In this case, HyperKA outclasses TransEdge in both effectiveness and efficiency. Compared with our results in Table 2, we find that the self-training, being an optional and compatible technique, brings an improvement of more than 0.14 on H@1. Figure 4: Visualization of the embeddings generated by HyperKA for two related concepts "Film" and "Album" along with their entities in DB111K-174. The black up triangle denotes "Film" and the surrounding red ones are its entities. The black down triangle denotes "Album" and the blue ones are its entities.

Type Inference
The main difference between type inference and entity alignment lies in that the knowledge to associate in the former scenario differs much in scales and specificity. This causes many related methods based on shared embedding spaces to fall short.

Experimental Setup
Datasets. The experiments for this task are conducted on datasets YAGO26K-906 and DB111K-174 (Hao et al., 2019), which are extracted from YAGO and DBpedia, respectively. Each dataset has an entity-level KG and an ontological KG for concepts (types). Their statistics are reported in Appendix A. To compare with the previous work (Hao et al., 2019), we use the original data splits, and report H@1, H@3 and MRR results. The hyperparameter settings are listed in Appendix B. Baselines. So far, only a few methods have been applied to the type inference task in KGs. We compare with the SOTA method JOIE (Hao et al., 2019), and four other baseline methods TransE (Bordes et al., 2013), DistMult (Yang et al., 2015), HolE (Nickel et al., 2016) and MTransE (Chen et al., 2017) that are reported in the same paper. For JOIE, we choose its best-performing variant based on the translational encoder with cross-view transformation. A related method (Jin et al., 2019) is not taken into comparison as it requires entity attributes that are unavailable in our problem setting.

Main Results
In this task, the embedding dimensions for entities and concepts are different, i.e., n > m, as an entitylevel KG usually contains much more entities than concepts in a related ontological (or concept-level) KG. For HyperKA, we evaluate two dimension settings: n = 75, m = 15 and n = 150, m = 30. Both are much smaller than the dimensions of baseline methods. The results are reported in Table 6. We can observe that HyperKA (75, 15) outperforms JOIE in terms of H@1 on both datasets, especially on DBP111K-174, although HyperKA uses a much smaller dimension. For example, the H@1 score of HyperKA (75, 15) on DB111K-174 reaches 0.778, with a gain of 0.022 over JOIE in its best setting. HyperKA (150, 30) achieves the best performance over H@1 and MRR. We also try the dimension setting of (300, 50), but no longer observe further improvement. We believe this is because the dimension setting (150, 30) is enough for type inference as the concept-level KG is small. Meanwhile, once we apply the same small-dimension setting (75, 15) as HyperKA to baseline methods, the performance of those methods become much worse. For example, MTransE achieves no more than 0.357 in H@1 using this small dimension.

Case Study
For case study, we visualize the embeddings of two related concepts "Film" and "Album" in DBP111K-174 along with their associated entities in the PCAprojected space in Figure 4. Despite these two groups of entities are closely relevant, the embeddings learned by HyperKA are able to clearly distinguish between these two. We can see that the entities of the same type are embedded closely after transformation, while the two clusters are generally well differentiated by a clear margin (with only a few exceptions). This displays how the hyperbolic transformation is able to capture the multi-granular associations, while preserves the gap between the entities associated with different concepts.

Conclusion and Future Work
We propose a method to capture knowledge associations with a new hyperbolic GNN-based repre-sentation learning model. The proposed HyperKA method extends translational and GNN-based techniques to hyperbolic spaces, and captures associations by a hyperbolic transformation. Our method outperforms SOTA baselines using lower embedding dimensions on both entity alignment and type inference. For future work, we plan to incorporate hyperbolic RNNs (Ganea et al., 2018) to encode auxiliary information for zero-shot entity and concept representations. Another meaningful direction is to use HyperKA to infer the associations between snapshots in temporally dynamic KGs (Xu et al., 2020). We also seek to investigate the use of HyperKA for cross-domain representations of biological and medical knowledge (Hao et al., 2020). Table 7 lists the statistics of the entity alignment dataset DBP15K 3 , as well as two type inference datasets YAGO26K-906 and DB111K-174 4 (Hao et al., 2019). For a fair comparison, we reuse the original splits of associations in these datasets for training and evaluation, i.e., 30% alignment in DBP15K as well as around 60% associations in YAGO26K-906 and DB111K-174 as training data. We can see that the two KGs of type inference datasets differs much more in terms of the scales of objects and triples than those in entity alignment datasets, which also bring along more challenges to knowledge association.

B Hyper-parameter Settings
In this section, we report the implementation details and hyper-parameter settings of HyperKA on the two knowledge association tasks. We select each hyper-parameter setting within a wide range of values as follows: • and cross-domain similarity local scaling for the entity alignment task. The training takes 800 epochs on DBP15K, 60 epochs on YAGO26K-906 and 100 epochs on DB111K-174. The activation function used in our method is tanh.