Knowledge Graph Alignment with Entity-Pair Embedding

Knowledge Graph (KG) alignment is to match entities in different KGs, which is important to knowledge fusion and integration. Recently, a number of embedding-based approaches for KG alignment have been proposed and achieved promising results. These approaches ﬁrst embed entities in low-dimensional vector spaces, and then obtain entity alignments by computations on their vector representations. Although continuous improvements have been achieved by recent work, the performances of existing approaches are still not satisfactory. In this work, we present a new approach that directly learns embeddings of entity-pairs for KG alignment. Our approach ﬁrst generates a pair-wise connectivity graph (PCG) of two KGs, whose nodes are entity-pairs and edges correspond to relation-pairs; it then learns node (entity-pair) embeddings of the PCG, which are used to predict equivalent relations of entities. To get desirable embed-dings, a convolutional neural network is used to generate similarity features of entity-pairs from their attributes; and a graph neural network is employed to propagate the similarity features and get the ﬁnal embeddings of entity-pairs. Experiments on ﬁve real-world datasets show that our approach can achieve the state-of-the-art KG alignment results.


Introduction
Knowledge graphs (KGs) have been built and applied in several domains, including question answering , recommendation (Sun et al., 2018b), and information extraction (Yang and Mitchell, 2017). Most existing KGs are built separately by different organizations, using different data sources and languages. Therefore, KGs are heterogeneous that the same entity may exist in different KGs in different surface forms. On the other hand, KGs can be complementary to each other; knowledge about the same entity may distribute in several KGs. To handle the heterogeneity problem and integrate knowledge in different KGs, it is essential to perform KG alignment, i.e. matching entities in separate KGs.
Recently, KG embedding models have been explored in solving the problem of KG alignment. A number of embedding-based approaches have been proposed, including MTransE (Chen et al., 2017), JAPE , IPTransE (Zhu et al., 2017), GCN-Align (Wang et al., 2018), RDGCN , and MultiKE (Zhang et al., 2019), etc. These approaches first embed entities in low-dimensional vector spaces, and then obtain the entity alignments by computations on their vector representations. Comparing with traditional similarity-based approaches, embedding-based ones can effectively model different kinds of information in KGs, which align entities without manually designed similarity features. Most recently, continuous improvements have been achieved by combining multiple kinds of information in KGs or using more sophisticated embedding models. However, the performances of most approaches are still not satisfactory. According to the results in a recent work (Zhang et al., 2019), a traditional unsupervised alignment approach, Logmap (Jiménez-Ruiz and Cuenca Grau, 2011), outperforms most existing embedding-based approaches. To get more accurate alignment results, we propose an entity-pair embedding approach for KG alignment (EPEA). Instead of learning embeddings of single entities, our approach directly learns representations of entity-pairs. Similarity features of entities' attribute information are automatically extracted, which are then propagated using structure information of entities. Equivalent relations of entities can be accurately predicted based on the learned embeddings of entity-pairs. Specifically, our work has the following contri-butions: • We introduce the definition of pairwise connectivity graph (PCG) of KGs, whose nodes are entity-pairs and edges correspond to relation-pairs. We solve the KG alignment problem via node embedding of the PCG.
• We propose a similarity feature extraction method based on convolutional neural network (CNN), which automatically generates feature vectors of entity-pairs encoding their attribute similarities.
• We propose a graph neural network (GNN) with edge-aware attentions to propagate similarity features in the PCG. Similarity features are propagated among the neighbors of entitypairs, which incorporate structure similarity into the embeddings of entity-pairs.
• In the experiments on aligning real-world KGs, our approach outperforms the compared approaches, and achieves the state-of-the-art results.
The rest of this paper is organized as follows: Section 2 formalizes the entity alignment problem, Section 3 describes our proposed approach, Section 4 presents the evaluation results, Section 5 discusses some related work, and Section 6 is the conclusion.

KG and KG Alignment
KGs represent structural information about entities in real-world as triples having the form of s, p, o . In this work, our KG alignment model considers both relational and attributional triples in KGs. The relational triples describe relations between entities, and the attributional triples describe attributes of entities. We formally represent a KG as G = (E, R, A, L, T ), where E, R, A and L are sets of entities, relations, attributes, and literals; is the sets of triples. Given two KGs G = (E, R, A, L, T ) and G = (E , R , A , L , T ), the task of KG alignment is to find, for each entity in E, the equivalent entity in E .

Pair-wise Connectivity Graph
Pair-wise connectivity graph (PCG) can capture interactions of node-pairs of two directed PCG of G and G' r 2 t 1 r 1 t 2 r 2 t 1 r 1 t 2 r 2 t 2 r 1 t 1 Figure 1: Pair-wise connectivity graph.
graphs (Wang et al., 2012;Melnik et al., 2002). In this work, we define the PCG of KGs. For two KGs, each node in their PCG corresponds to an entity-pair from two KGs, and each edge connecting two nodes reflects the correlation between two entity-pairs. By generating the PCG of two KGs, the problem of KG alignment is then transformed to node embedding and classification (i.e. equivalent or nonequivalent) in the PCG. For two KGs G = (E, R, A, L, T ) and G = (E , R , A , L , T ), the PCG of them is G(G, G ) = (E, R, T ), where E, R and T are sets of nodes, edge types and edges. Each element in E corresponds to an entity-pair between G and G , and each element in R corresponds to an relation-pair. T is a set of typed edges between nodes, each edge is established as follows: (1) Figure 1 shows an example of PCG of two KGs. There are two KGs, each of them has three entities. The PCG of them contains nine nodes representing all the possible entity-pairs of two KGs; and there are four typed edges in the PCG. PCG can represent the connections of entity-pairs between two KGs, we use PCG to capture the interaction of possible entity alignments between two KGs. In our approach, the problem of KG alignment will be solved via node embedding of the PCG. Equivalent relations of entities are predicted based on the learned embeddings. Figure 2 shows the framework of our approach. Given two KGs, our approach first generates the PCG of them. Then, a CNN-based feature extraction method is used to generate node representations from the attribute information of entities. At last, an attention-based feature propagation is performed over the PCG to incorporate structure information into the node representations. Entity alignments are predicted based on the learned embeddings of entity-pairs. In the following, we present our approach in detail.

Generating the PCG
To generate the PCG of two KGs, we can first pair all the entities from two KGs as nodes, and then use Equation 1 to generate edges between nodes. However, KGs usually contain large number of entities, the PCG of two large-scale KGs will contain huge number of nodes. To avoid pairing all the entities from two KGs and control the size of the PCG, our approach selects entity-pairs having high equivalent possibilities as nodes in the PCG. Specifically, Locality-Sensitive Hashing (LSH) is employed in our approach to efficiently find similar entities between two KGs. LSH hashes similar items more likely into the same bucket than dissimilar items. Before using LSH, our approach first uses one of the following methods to generate setrepresentations of entities, which are used in the hashing process.
• N-grams of Names. If entity names are available and in the same language, this method generates a set of character-level n-grams of entities' names as the set-representations of entities.
• N-grams of Attributes. This method treats attribute values of an entity as text strings, and generates character-level n-grams of all the attribute values for each entity. All the n-grams are then merged into a set as the representation of the entity.
• Seeding alignments. If seeding alignments between two KGs are available, a set of aligned entities in an entity's neighborhood will be taken as the set-representation.
After being represented as sets of elements (ngrams or neighboring entities) by one of the above methods, all the entities in two KGs are hashed using LSH. To select entity-pairs as nodes in the PCG of G and G , our approach efficiently finds, for each entity e ∈ G, a set of entities C e = {e |e ∈ G , J(e, e ) > δ} as its alignment candidates, where J(e, e ) is the Jaccard similarity of two entities, δ is a predefined threshold. Entity e is then paired with all the entities in C e to form the nodes in the PCG.

Attribute Feature Generation
Entities having the same or similar attribute values tend to be equivalent. Therefore, comparing attribute values of two entities are important for discovering entity alignments. In traditional approaches, attributes have to be first matched manually, then the values of corresponding attributes can be compared to get similarities between entities. In some of the embedding-based approaches, attribute types or values are utilized to generate attribute embeddings, which are integrated with structure embeddings of entities to get more accurate entity alignments. In this work, we extract similarity features from entities' attributes in an automatic way.

CNN-based Feature Extraction
We propose an attribute feature extraction method based on Convolutional Neural Network (CNN). Our method can automatically obtain useful similarity features of entity-pairs without any human effort. It generates a vector representation of each entity-pair in the PCG, which captures attribute similarities of two entities.
Given an entity-pair (e, e ), where e ∈ G and e ∈ G . Let A = {A 1 , ..., A n } and A = {A 1 , ..., A m } be two sets of all the attributes in G and G , respectively. Let A i (e) denotes the value of the i-th attribute of e, A j (e ) denotes the value of the j-th attribute of e . To capture various similarities between two entities e and e , a similarity matrix M m×n is computed by comparing values of every attribute pair of two entities. Each element m ij in M is the similarity of A i (e) and A j (e ). Attribute values in KGs may have various types, for example data, time, float, integer and string. To keep simplicity and effectiveness, our approach treats all the attribute values as strings. Similarities of attribute values are computed as N-gram-based Jaccard similarities of strings: where N G(s) and N G(t) are n-grams of strings s

GNN Layers
Alignment prediction ⨁ Name similarities Attribute similarities  and t.
Usually, one entity is only described by a small number of attributes in a KG. Therefore, for an entity, the values of many attributes are empty. The similarity matrix of two entities is usually a sparse one, with a large proportion of 0s in it. Meanwhile, similarities between some attributes may be useless for detecting alignments. To automatically find useful similarity patterns of attribute values, we use a CNN model to encode the sparse similarity matrix into a short and dense vector.
The input of the CNN is the similarity matrix M of two entities, two convolution layers are used to generate a dense similarity vector from M. For the l-th convolution layer, its output is computed as follows: where X (l−1) is the input of l-th layer; for the first layer, X (0) = M; we use multiple filters to extract useful similarity features from the input, W k is the bias of the kth filter in l-th layer; ⊗ is the convolution operator. There is a max pooling layer after each convolution layer. The output features of last max pooling layer is the similarity vector of the entity-pair.

Name Similarity Features
In this work, name or label of an entity is considered as a special attribute, which is an important clue for determining whether two entities are equivalent. If entities' names are available in KGs, our approach computes a name similarity vector for each entity-pair, which will be concatenated with the similarity vector generated by the CNN model. To capture similarity features of entities' names from different aspects, we use multiple string-based similarity metrics, which are widely used in traditional similarity-based alignment approaches. If entities' names are in different languages in two KGs, machine translation tool will be used to translate names in one language to the other language. Let s and t be names of two entities, the following similarity measures are used in our approach.
• String equality. It measures whether two strings are the same: • Edit Distance. It evaluates the minimal cost of operations which have to applied to one of the strings to obtain the other string: z 2 (s, t) = 1 − |{ops}| max(len(s), len(t)) , where {ops} denotes the set of operations, len(·) is the string length.
• Jaccard Similarity. It computes the Jaccard Similarity of the character-level n-grams of two strings, as defined in Equation 2, we denote this similarity as z 4 (s, t).
• Substring Similarity. It is computed by finding the longest common substring of two strings.
where LCS(s, t) is the longest common substring of s and t.
Let z = [z 1 , z 2 , z 3 , z 4 ] denote name similarities of an entity-pair, it will be concatenated with the similarity vector x generated by CNN to form the initial feature vector of the entity-pair. The feature vectors of all the entity-pairs will be passed to an attention-based propagation process, to generate the final embeddings.

Attention-based Feature Propagation
Equivalent entities in two KGs are usually neighbored by some other equivalent entities. Therefore, structure information in KGs are very important for discovering entity alignments. In our work, edges between nodes in the PCG reflect the neighboring information of entity-pairs. To obtain feature representations of entity-pairs containing their neighbors' information, our approach propagates attribute features of entity-pairs following these edges. Specifically, our approach uses a Graph Neural Network (GNN) to propagate the attribute features of entity-pairs over the PCG. GNNs learn node representations in a graph by recursively aggregating the feature vectors of its neighbors, which are able to combine the node features and structure information in the graph. Several approaches have exploited GNNs for embedding-based KG alignment, which achieved promising results. In the previous approaches, GNNs are used for learning representations of entities. While in this work, we design a new GNN model for learning vector representations of entity-pairs.
Our model is a residual GNN with edge-aware attentions, which is built by modifying the attention mechanism of the GAT model (Velickovic et al., 2017). Our GNN model has two layers, each layer takes a set of node features H = {h 1 , h 2 , ..., h N } as inputs, where h i ∈ R F and N is the number of nodes in the PCG, F is the dimension of the input features. Each layer generates a new set of node representations H = {h 1 , h 2 , ..., h N }, h i ∈ R F and it is computed as: where N i is the set of neighboring nodes of the i-th node (ignoring the edge directions in the PCG), W ∈ R F ×F is a shared matrix, α ij is a learnable attention indicating the importance of the j-th node to the i-th node.

Edge-aware Attention Mechanism
In the GAT model, the attention α ij is computed based on the features of node i and j. In the task of KG alignment, we consider that the type of edge between two nodes is important and should not be ignored. Therefore, we use an edge-aware attention mechanism to compute the attention α ij . A shared attentional mechanism R F × R F × R F → R is used to computes attention coefficients: e ij = LeakyReLU a Wh i Wh j t (i→j) (8) where (i → j) denotes the index of edge-type linking the i-th node to the j-th node, t (i→j) ∈ R F is the vector representation of the edge-type; a ∈ R 3F is a weight vector of a single-layer feedforward neural network for computing the attention coefficients; represents concatenation of vectors.
Here the vector of an edge-type is computed based on the nodes' vectors connected by it. For an edgetype t k , let S k and T k be the sets of nodes' indices having outgoing edges and incoming edges of the type in the PCG respectively, the vector representation of t k is computed as: (9) which is the element-wise absolute difference between the mean vectors of source and target nodes connected by t k .
When the attention coefficients are obtained following Equation 8, normalized attentions are then computed using a softmax function over all the coefficients of its neighboring nodes: where N i is the set of neighboring nodes of the i-th node.

Residual Connections in GNN
To let the entity-pair embeddings memorize the original attribute features, we add residual connections from the input features to the output layer of the GNN model. We let F = F , i.e. the sizes of input and output node vectors of each GNN layer are the same. A shortcut connection between the input and output layers is added, and the final representation of a node is computed by element-wise addition of h 0 i and h L i , where h 0 i = [x i ||z i ] and h L i are the input and output features of the i-th node.

Model Training
There are two neural network models in our approach, i.e. the CNN model for attribute feature extraction and the GNN model for feature propagation. These two separate models are trained sequentially, using the same training data. For two KGs G and G , let be a set of known entity alignments, they will be used as training data for both models.
For the CNN model, let x i be the attribute feature vector of entity-pair (e i , v i ) generated by the model. We use one fully-connected layer to generate a score for each entity-pair, taking x i as the input: where c ∈ R d and α ∈ R are parameters, σ is the Sigmoid function. For the GNN model, let h i be the feature vector of entity-pair (e i , v i ) after the feature propagation with the model. A similar score function is also defined as: where g ∈ R d and β ∈ R are parameters, σ is also the Sigmoid function. For both models, we want the aligned entitypairs having higher scores than the non-aligned entity-pairs. Therefore, two models are both trained by minimizing the following margin-based ranking loss function: (e,v) [γ−S(e, v)+S((e , v ))] + (13) where [x] + = max{0, x}, γ > 0 is a margin hyperparameter, A (e,v) denotes the set of non-aligned entity-pairs in the PCG containing entity e or v. The score S is either S CN N or S GN N , depending on which model is trained.

Datasets
Five datasets are used to evaluate our approach, each dataset contains two knowledge graphs to be aligned. Table 1 outlines the detail information of these datasets. DBP15K ZH−EN , DBP15K JA−EN and DBP15K FR−EN were built by . They are generated from DBpedia and each dataset contains 15 thousand aligned entity pairs in two language versions of DBpedia. DBP-WD and DBP-YG were first used in (Sun et al., 2018a), which are generated from DBpedia, Wikidata and YAGO3. Each dataset contains 100 thousand aligned entity pairs. For all the datasets, we use the same training/testing split of aligned entity pairs with previous work (Sun et al., , 2018a, 30% for training and 70% for testing.

Experiment Settings
We implement our approach by using TensorFlow 1 , and run experiments on a workstation with Intel Xeon 2.1GHz CPU, an NVIDIA Tesla P100 GPU and 64 GB memory. We use Hits@k and MRR(Mean reciprocal ranking) as the evaluation metrics, which are popular and widely used in other KG alignment work. Hits@k measures the percentage of correctly alignments ranked in the top k candidates. MRR is the average of the reciprocal ranks of the results. The higher Hits@k and MRR, the better is the performance. The dimensions of similarity features and final embeddings of entitypairs are set to the same value, which is among {30, 60, 100, 120}, we consider the learning rate in two models among {0.1, 0.01, 0.002, 0.001}, the margin γ in loss functions among {1, 2, 4, 10}.

Results
Overall Comparisons. Impact of Seed Alignments. To investigate how the size of seed alignments (pre-aligned entity pairs for training) affects the results of our approach, we run our approach with different number of seed alignments. The proportions of seed alignments ranges from 5% to 30% with step of 5%. Figure 3 shows the Hits@1 and Hits@10 of EPEA on two datasets, DBP-YG and DBP F R−EN . It shows that EPEA gets nearly optimal Hits on DBP-YG using 10% seed alignments, both Hits@1 and Hits@10 are 100% when more than 15% seed alignments are used. This is because DBP-YG contains rich attribute information of entities including entities' names, our approach can fully utilize attribute and structure information to accurately predict entity alignments even with small number of seed alignments. On the DBP F R−EN dataset, our approach gets >70% Hits@1 and >95% Hits@10 when only 10% seed alignments are used; it outperforms most of the compared approaches in Table 2 which use 30% seed alignments. As the number of seed alignments increases, our approach steadily improves the alignment results.

Related Work
A number of embedding-based entity alignment approaches have been proposed recently. Some approaches mainly rely on the structure information in KGs to find alignments, including MTransE (Chen et al., 2017), IPTransE (Zhu et al., 2017), BootEA (Sun et al., 2018a), MuGNN (Cao et al., 2019), NAEA (Zhu et al., 2019), RDGCN  and AliNet (Zequn Sun, 2020). Entity embeddings are learned by using information of entity and their relations. MTransE encodes structure information of KGs in separate spaces, and then performs transitions from one space to the other. TPTransE and BootEA both are iterative alignment approaches, which use new discovered alignments to expand the seeding alignments. MuGNN employs a multi-channel GNN to learn alignment-oriented KG embeddings. NAEA enhances the TransE model by learning embeddings by a neighborhood-aware attentional representation method. RDGCN uses a relation-aware dual-graph convolutional network to incorporate relation information via attentive interactions between KG and its dual relation counterpart. AliNet is a GNN-based model which aggregates both direct and distant neighborhood information.
To get improved results, some approaches utilize entity attributes or names in KGs. JAPE  performs attribute embedding by Skip-Gram model which captures the correlations of attributes in KGs. GCN-Align (Wang et al., 2018) encodes attribute information of entities into their embeddings by using GCNs. MultiKE (Zhang et al., 2019) uses a framework unifying the views of entity names, relations and attributes to learn embeddings for aligning entities. CEA (Zeng et al., 2020) combines structural, semantic and string features of entities, which are integrated with dynamically assigned weights.
Compared with the previous approaches, ours directly learns embeddings of entity-pairs, instead of entities. Attribute and structure information are encoded in the embeddings sequentially, and experiments validate the effectiveness of our approach.

Conclusion
This paper presents a new entity-pair embedding approach for KG alignment. Our approach first extracts useful attribute features of entity-pairs by using a convolutional neural network, and then propagates the features among the neighbors of entitypairs, by using a graph neural network with edgeaware attentions. The embeddings are learned with the object of separating equivalent and nonequivalent entity-pairs. Experiments on five real-world datasets show that our approach achieves the stateof-the-art results.