Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks

Multilingual knowledge graphs (KGs) such as DBpedia and YAGO contain structured knowledge of entities in several distinct languages, and they are useful resources for cross-lingual AI and NLP applications. Cross-lingual KG alignment is the task of matching entities with their counterparts in different languages, which is an important way to enrich the cross-lingual links in multilingual KGs. In this paper, we propose a novel approach for cross-lingual KG alignment via graph convolutional networks (GCNs). Given a set of pre-aligned entities, our approach trains GCNs to embed entities of each language into a unified vector space. Entity alignments are discovered based on the distances between entities in the embedding space. Embeddings can be learned from both the structural and attribute information of entities, and the results of structure embedding and attribute embedding are combined to get accurate alignments. In the experiments on aligning real multilingual KGs, our approach gets the best performance compared with other embedding-based KG alignment approaches.


Introduction
Knowledge graphs (KGs) represent human knowledge in the machine-readable format, are becoming the important basis of many applications in the areas of artificial intelligence and natural language processing. Multilingual KGs such as DBpedia (Bizer et al., 2009), YAGO (Suchanek et al., 2008;Rebele et al., 2016), and BabelNet (Navigli and Ponzetto, 2012) are especially valuable if cross-lingual applications are to be built. Besides the knowledge encoded in each distinct language, multilingual KGs also contain rich cross-lingual links that match the equivalent entities in different languages. The cross-lingual links play an important role to bridge the language gap in a multilingual KG; however, not all the equivalent entities are connected by cross-lingual links in most multilingual KGs. Therefore, increasingly more research work studies the problem of cross-lingual KG alignment, aiming to match entities in different languages in a multilingual KG automatically.
Traditional cross-lingual KG alignment approaches either rely on machine translation technique or defining various language-independent features to discover cross-lingual links. Most recently, several embedding-based approaches have been proposed for cross-lingual KG alignment, including MTransE (Chen et al., 2017) and JAPE . Given two KGs and a set of pre-aligned entities of them, embedding-based approaches project entities into low-dimensional vector spaces; entities are matched based on the computations on their vector representations. Following very similar ideas as above, JE (Hao et al., 2016) and ITransE (Zhu et al., 2017) are embedding-based approaches for matching entities between heterogeneous KGs, and they can also work for the problem of cross-lingual KG alignment. The above embedding-based approaches can achieve promising performance without machine translation or feature engineering.
However, we find that the above approaches all try to jointly model the cross-lingual knowledge and the monolingual knowledge in one unified optimization problem. The loss of two kinds of knowledge has to be carefully balanced during the optimization. For example, JE, MTransE, and ITransE all use hyper-parameters to weight the loss of entity alignments in the loss functions of their models; JAPE uses the pre-aligned entities to combine two KGs as one, and adds weight to the scores of negative samples in its loss function. In the above approaches, entities' embeddings have to encode both the structural information in KGs and the equivalent relations of entities. Further-more, the attributes of entities (e.g., the age of a people, the population of a country) have not been fully utilized in the existing models. MTransE and ITransE cannot use attributional information in KGs; although JAPE includes the attribute types in the model, the attribute values of entities are ignored. We believe that considering the attribute values can further improve the results of KG alignment.
Having the above observations, we propose a new embedding-based KG alignment approach which directly models the equivalent relations between entities by using graph convolutional networks (GCNs). GCN is a kind of convolutional network which directly operates on graphstructured data; it generates node-level embeddings by encoding information about the nodes' neighborhoods. The adjacencies of two equivalent entities in KGs usually contain other equivalent entities, so we choose GCNs to generate neighborhood-aware embeddings of entities, which are used to discover entity alignments. Our approach can also provide a simple and effective way to include entities' attribute values in the alignment model. More specifically, our approach has the following advantages: • Our approach uses the entity relations in each KG to build the network structure of GCNs, and it only considers the equivalent relations between entities in model training. Our approach has small model complexity and can achieve encouraging alignment results.
• Our approach only needs pre-aligned entities as training data, and it does not require any pre-aligned relations or attributes between KGs.
• Entity relations and entity attributes are effectively combined in our approach to improve the alignment results.
In the experiments on aligning real multilingual KGs, our approach gets the best performance compared with the baseline methods. The rest of this paper is organized as follows, Section 2 reviews some related work, Section 3 introduces some background knowledge, Section 4 describes our proposed approach, Section 5 presents the evaluation results, Section 6 is the conclusion and future work.

KG Embedding
In the past few years, much work has been done on the problem of KG embedding. KG embedding models embed entities and relations in a KG into a low-dimensional vector space while preserving the original knowledge. The embeddings are usually learned by minimizing a global loss function of all the entities and relations in a KG, which can be further used for relation prediction, information extraction, and some other tasks. TransE is a representative KG embedding approach (Bordes et al., 2013), which projects both entities and relations into the same vector space; if a triple (h, r, t) holds, TransE wants that h + r ≈ t. The embeddings are learned by minimizing a margin-based ranking criterion over the training set. TransE model is simple but powerful, and it gets promising results on link prediction and triple classification problems. To further improve TransE, several enhanced models based on it have been proposed, including TransR (Lin et al., 2015), TransH (Wang et al., 2014) and TransD (Ji et al., 2015) etc. By introducing new representations of relational translation, later approaches achieve better performance at the cost of increasing model complexity. There are many other KG embedding approaches, recent surveys (Wang et al., 2017;Nickel et al., 2016) give detailed introduction and comparison.

Embedding-based KG Alignment
Here we introduce the KG Alignment approaches most related to ours, and discuss the main differences between our approach and them. JE (Hao et al., 2016) jointly learns the embeddings of multiple KGs in a uniform vector space to align entities in KGs. JE uses a set of seed entity alignments to connect two KGs, and then learns the embeddings by using a modified TransE model, which adds a loss of entity alignments in its global loss function.
MTransE (Chen et al., 2017) encodes entities and relations of each KG in a separated embedding space by using TransE; it also provides transitions for each embedding vector to its crosslingual counterparts in other spaces. The loss function of MTransE is the weighted sum of two component models' loss (i.e., knowledge model and alignment model). To train the alignment model, MTransE needs a set of aligned triples of two KGs. JAPE  combines structure embedding and attribute embedding to match entities in different KGs. Structure embedding follows the TransE model, which learns vector representations of entities in the overlay graph of two KGs. Attribute embedding follows the Skip-gram model, which aims to capture the correlations of attributes. To get desirable results, JAPE needs the relations and attributes of two KGs to be aligned in advance.
ITransE (Zhu et al., 2017) is a joint knowledge embedding approach for multiple KGs, which is also suitable for the cross-lingual KG alignment problem. ITransE first learns both entity and relation embeddings following TransE; then it learns to map knowledge embeddings of different KGs into a joint space according to a set of seed entity alignments. ITransE performs iterative entity alignment by using the newly discovered entity alignments to update joint embeddings of entities. ITransE requires all relations being shared among KGs.
The above approaches follow the similar framework to match entities in different KGs. They all rely on TransE model to learn entity embeddings, and then define some kinds of transformation between embeddings of aligned entities. Compared with these approaches, our approach uses an entirely different framework; it uses GCNs to embed entities in a unified vector space, where aligned entities are expected to be as close as possible. Our approach only focuses on matching entities in two KGs, and it does not learn embeddings of relations. MTransE, JAPE, and ITransE all require relations being aligned or shared in KGs; our approach does not need this kind of prior knowledge.

Problem Formulation
KGs represent knowledge about real-world entities as triples. Here we consider two kinds of triples in KGs: relational triples, and attributional triples. Relational triples represents relations between entities, and it has the form entity 1 , relation, entity 2 . Attributional triples describe attributes of entities, and it has the form entity, attribute, value . For example in the data of YAGO, graduatedFrom is a relation, and (Albert Einstein, graduatedFrom, ETH Zurich) is a relational triple; diedOnDate is an attribute, and (Albert Einstein, diedOnDate, 1955) is an attributional triple. Both relational and attributional triples describe important information about entities, we will take both of them into account in the task of cross-lingual KG alignment.
Formally, we represent a KG as G = (E, R, A, T R , T A ), where E, R, A are sets of entities, relations and attributes, respectively; be a set of pre-aligned entity pairs between G 1 and G 2 . We define the task of cross-lingual KG alignment as finding new entity alignments based on the existing ones. In multilingual KGs such as DBpedia and YAGO, the cross-lingual links in them can be used to build the sets of pre-aligned entity pairs. The already known entity alignments are used as seeds or training data in the process of KG alignment.

The Proposed Approach
The framework of our proposed approach is shown in Figure 1. Given two KGs G 1 and G 2 in different languages, and a set of known aligned entity pairs S = {(e i 1 , e i 2 )} m i=1 between them, our approach automatically find new entity alignments based on GCN-based entity embeddings. The basic idea of our approach is to use GCNs to embed entities from different languages into a unified vector space, where equivalent entities are expected to be as close as possible. Entity alignments are predicted by applying a pre-defined distance function to entities' GCN-representations.

GCN-based Entity Embedding
GCNs (Bruna et al., 2014;Henaff et al., 2015;Defferrard et al., 2016; are a type of neural network that directly operates on graph data. GCNs allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The inputs of a GCN are feature vectors of nodes and the structure of the graph; the goal of a GCN is to learn a function of features on the input graph and produces a nodelevel output. GCNs can encode information about the neighborhood of a node as a real-valued vector, which was usually used for classification or regression. When solving the problem of KG alignment, we assume that (1) equivalent entities tend Knowledge graphs (KGs) represent human to have similar attributes, and (2) equivalent entities are usually neighbored by some other equivalent entities. GCNs can combine the attribute information and the structure information together, therefore our approach uses GCNs to project entities into low-dimensional vector space, where equivalent entities are close to each other.
A GCN-model consists of multiple stacked GCN layers. The input to the l-th layer of the GCN model is a vertex feature matrix, H (l) ∈ R n×d (l) , where n is the number of vertices and d (l) is the number of features in the l-th layer. The output of the l-th layer is a new feature matrix H (l+1) by the following convolutional computation: where σ is an activation function; A is a n×n connectivity matrix that represents the structure information of the graph;Â = A + I, and I is the identity matrix;D is the diagonal node degree matrix ofÂ; W (l) ∈ R d (l) ×d (l+1) is the weight matrix of the l-th layer in the GCN, d (l+1) is the dimensionality of new vertex features.
Structure and Attribute Embedding. In our approach, GCNs are used to embed entities of two KGs in a unified vector space. To utilize both structure and attribute information of entities, our approach assigns two feature vectors to each entity in GCN layers, structure feature vector h s and attribute feature vector h a . In the input layer, h s is randomly initialized and updated during the training process; h (0) a is the attribute vectors of entities and it is fixed during the model training. Let H s and H a be the structure and attribute feature matrices of all the entities, we redefine the convolu-tional computation as: where W Model Configuration. More specifically, our approach uses two 2-layer GCNs, and each GCN processes one KG to generate embeddings of its entities. As defined in Section 3, we denote two KGs as G 1 = (E 1 , R 1 , A 1 , T R 1 , T A 1 ) and G 2 = (E 2 , R 2 , A 2 , T R 2 , T A 2 ); and let their corresponding GCN models be denoted as GCN 1 and GCN 2 . As for the structure feature vectors of entities, we set the dimensionality of feature vectors to d s in all the layers of GCN 1 and GCN 2 ; and two GCN models share the weight matrices W (1) s and W (2) s for the structure features in two layers. As for the attribute vectors of entities, we set the dimensionality of output feature vectors to d a . Because two KGs may have different number of attributes (i.e. |A 1 | = |A 2 |), the dimensionalities of the input attribute feature vectors in two GCN models are different. The first layer of each GCN model transforms the input attribute feature vectors into vectors of size d a ; and two GCN-models generate attribute embeddings of the same dimensionality. Table 1 outlines the parameters of two GCNs in our approach. The final outputs of two GCNs are (d s + d a )-dimensional embeddings of entities, which are further used to discover entity alignments.  where #T riples of r is the number of triples of relation r; #Head Entities of r and #T ail Entities of r are the numbers of head entities and tail entities of r, respectively. To measure the influence of the i-th entity over the j-the entity, we set a ij ∈ A as: if un(r)+ e j ,r,e i ∈G f un(r) (5)

Alignment prediction
Entity alignments are predicted based on the distances between entities from two KGs in the GCNrepresentation space. For entities e i in G 1 and v j in G 2 , we compute the following distance measure between them: where f (x, y) = x − y 1 , h s (·) and h a (·) denote the structure embedding and attribute embedding of an entity, respectively; d s and d a are dimensionalities of structure embeddings and attribute embeddings; β is a hyper-parameter that balances the importance of two kinds of embeddings.
The distance is expected to be small for equivalent entities and large for non-equivalent ones. For a specific entity e i in G 1 , our approach computes the distances between e i and all the entities in G 2 , and returns a list of ranked entities as candidate alignments. The alignment can be also performed from G 2 to G 1 . In the experiments, we report the results of both directions of KG alignment.

Model Training
To enable GCNs to embed equivalent entities as close as possible in the vector space, we use a set of known entity alignments S as training data to train GCN models. The model training is performed by minimizing the following margin-based ranking loss functions: where [x] + = max{0, x}, S (e,v) denotes the set of negative entity alignments constructed by corrupting (e, v), i.e. replacing e or v with a randomly chosen entity in G 1 or G 2 ; γ s , γ a > 0 are margin hyper-parameters separating positive and negative entity alignments. L s and L a are loss functions for structure embedding and attribute embedding, respectively; they are independent of each other and hence are optimized separately. We adopt stochastic gradient descent (SGD) to minimize the above loss functions.

Datasets
We use the DBP15K datasets in the experiments, which were built by . The datasets were generated from DBpedia, a large-scale multilingual KG containing rich inter-language links between different language versions. Subsets of Chinese, English, Japanese and French versions of DBpedia are selected following certain rules. Table 2 outlines the detail information of the datasets. Each dataset contains data two KGs in different languages and 15 thousand interlanguage links connecting equivalent entities in two KGs. In the experiments, the known equivalent entity pairs are used for model training and testing.

Experiment Settings
In the experiments, we compared our approach with JE, MTransE and JAPE. We also build JAPE , a variant of JAPE which does not use pre-aligned relations and attributes. Because the approach ITransE performs iterative alignment and it requires two KGs sharing the same relations, we do not include it in the comparison. The interlanguage links in each dataset are used as the gold standards of entity alignments. For all the compared approaches, we use 30% of inter-language links for training and 70% of them for testing; the split of training and testing are the same for all approaches. We use Hits@k as the evaluation measure to assess the performance of all the approaches. Hits@k measures the proportion of correctly aligned entities ranked in the top k candidates. For the parameters of our approach, we set d s = 1, 000, d a = 100; the margin γ s = γ a = 3 in the loss function, and β in the distance measure is emperically set to 0.9. Table 3 shows the results of all the compared approaches on DBP15K datasets. We report Hits@1, Hits@10 and Hits@50 of approaches on each dataset. Because we use the same datasets as in , the results of JE, MTransE, and JAPE are obtained from . For JAPE and JAPE , each of them has three variants: Structure Embedding without negative triples (SE w/o neg.), Structure Embedding (SE), Structure and attribute joint embedding (SE+AE). We use GCN(SE) and GCN(SE+AE) to denote two variants of our approach: one only uses relational triples to perform structure embedding, and the other uses both relational and attributional triples to perform structure and attribute embedding.

GCN(SE) vs. GCN(SE+AE)
We first compare the results of GCN(SE) and GCN(SE+AE) to see whether the attributional information is helpful in the KG alignment task. According to the results, adding attributes in our approach do lead to slightly better results. The improvements range from 1% to 10%, which are very similar to the improvements of JAPE(SE) over JAPE (SE+AE). It shows that the KG alignment mainly relays on the structural information in KGs, but the attributional information is still useful. Our approach uses the same framework for embedding structure and attribute information, the combination of two kinds of embeddings works effectively.

GCN(SE+AE) vs. Baselines
On the dataset of DBP15K ZH−EN , JAPE(SE+AE) performs best and gets five best Hits@k values; our approach GCN(SE+AE) gets the best Hits@1 in the alignment direction of ZH→EN. The results of GCN(SE+AE) and JAPE gets very close results regarding Hits@1 and Hits@10 in the direction of ZH→EN. In the alignment direction of EN→ZH, JAPE(SE+AE) outperforms GCN(SE+AE) by about 2-3%. But it should be noticed that JAPE uses additional aligned relations and attributes as its inputs,    Comparing with all the baselines, both GCN(SE) and GCN(SE+AE) outperform JE and MTransE significantly. Among all the baselines, JAPE is the strongest one; it might due to its ability of using both relational and attributional triples, and the extra alignments of relations and attributes that it consumes. Our approach achieves better results than JAPE on two datasets; Although JAPE performs better than our approach, the differences between their results are small. If there are no existing relation and attribute alignments between two KGs, our approach will have distinct advantage over JAPE.

GCN vs. JAPE using different sizes of training data
To investigate how the size of training set affects the results of our approach, we further compare our approach with JAPE by using different number of pre-aligned entities as training data. For JAPE, the pre-aligned entities are used as seeds to make their vectors overlapped. In our approach, all the pre-aligned entities are used to train GCN models. Intuitively, the more pre-aligned entities used, the better results should be obtained by both GCN and JAPE.
Here we use different proportions of pre-aligned entities as training data, which ranges 10% to 50% with step 10%; all the rest of pre-aligned entities are used for testing. Figure 2 shows the Hits@1 of two approaches in three datasets. It shows that both approaches perform better as the size of training data increases. And our approach always outperforms JAPE except using 40% pre-aligned entities as training data in Figure 2(a). Especially in the tasks of aligning Japanese to English and French to English, our approach has a distinct advantage over JAPE.

Conclusion and Future Work
This paper presents a new embedding-based KG alignment approach which discovers entity alignments based on the entity embeddings learned by GCNs. Our approach can make use of both the relational and the attributional triples in KGs to discover the entity alignments. We evaluate our method on the data of real multilingual KGs, and the results show the advantages of our approach over the compared baselines.
In the future work, we will explore more advanced GCN models for KG alignment task, such as Relational GCNs (Schlichtkrull et al., 2017) and Graph Attention Networks (GATs) (Velickovic et al., 2017). Furthermore, how to iteratively discover new entity alignments in the framework of our approach is another interesting direction that we will study in the future.