Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Incompleteness is a common problem for existing knowledge graphs (KGs), and the completion of KG which aims to predict links between entities is challenging. Most existing KG completion methods only consider the direct relation between nodes and ignore the relation paths which contain useful information for link prediction. Recently, a few methods take relation paths into consideration but pay less attention to the order of relations in paths which is important for reasoning. In addition, these path-based models always ignore nonlinear contributions of path features for link prediction. To solve these problems, we propose a novel KG completion method named OPTransE. Instead of embedding both entities of a relation into the same latent space as in previous methods, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of relations in the path. Meanwhile, we adopt a pooling strategy to extract nonlinear and complex features of different paths to further improve the performance of link prediction. Experimental results on two benchmark datasets show that the proposed model OPTransE performs better than state-of-the-art methods.


Introduction
Knowledge graphs (KGs) are built to store structured facts which are encoded as triples, e.g., (Beijing, CapitalOf, China) (Lehmann et al., 2015). Each triple (h, r, t) consists of two entities h, t and a relation r, indicating there is a relation r between h and t. Large-scale KGs such as YAGO (Suchanek et al., 2007), Freebase (Bollacker et al., 2008) and WordNet (Miller, 1995) contain billions of triples and have been widely applied in various fields (Riedel et al., 2013;Dong et al., 2015). However, a common problem with these KGs is * Corresponding author that they are far from complete, which has limited the development of KG's applications. Thus, KG completion with the goal of filling in missing parts of the KG has become an urgent issue. Specifically, KG completion aims to predict whether a relationship between two entities is likely to be true, which is defined as the link prediction in KGs.
Most existing KG completion methods are based on representation learning, which embed both entities and relations into continuous lowdimension spaces. TransE (Bordes et al., 2013) is one of the most classical KG completion models, which embeds entities and relations into the same latent space. To better deal with complex relations like 1-to-N, N-to-1 and N-to-N, TransH (Wang et al., 2014) and TransR (Lin et al., 2015b) employ relation-specific hyperplanes and relation-specific spaces respectively to separate triples according their corresponding relation. Unfortunately, these models ignore the relation paths between entities which are helpful for reasoning. For example, if we know A is B's brother, and B is C's parent, then we can infer that A is C's uncle.
Recently, a few researchers take relation paths in KGs as additional information for representation learning and attempt to project paths into latent spaces, which get better performance compared with conventional methods. PTransE-ADD (Lin et al., 2015a) considers relation paths as translations between entities and represents each path as the vector sum of all the relations in the path. Moreover, RPE (Lin et al., 2018) extends the TransR model by incorporating the path-specific projection. However, these methods pay less attention to the order of relations in paths which is important for link prediction. Figure 1 shows an example of the meaning change when the order of relations is altered. In addition, these path-based models assume information from different paths between an entity pair only contributes to the re- lation inference linearly and ignore other complex interactions between them.
To address these issues, we propose a novel KG completion model named OPTransE. In the model, we project the head entity and the tail entity of each relation into different spaces and introduce sequence matrices to keep the order of relations in the path. Moreover, a pooling strategy is adopted to extract nonlinear features of different paths for relation inferences. Experimental results on two benchmark datasets WN18 and FB15K show that OPTransE significantly outperforms state-of-theart methods.
The remainder of this paper is organized as follows. Section 2 discusses related work. Section 3 presents the proposed model and algorithm in detail. Empirical evaluation of the proposed algorithm and comparison with other state-of-theart algorithms are presented in Section 4. Finally, Section 5 summarises the whole paper and points out some future work.

Translation-based Models
In recent years, there has been a great deal of work on representation learning for KG completion, and most studies concentrate on translationbased models. This kind of models propose to embed both entities and relations into a continuous low-dimensional vector space according to some distance-based scoring functions.
TransE (Bordes et al., 2013) is one of the most fundamental and representative translation-based models. For the entities and relations in KGs, TransE encodes them as vectors in the same space. For each fact (h, r, t), TransE believes that h + r ≈ t when (h, r, t) holds. Thus, the scoring function is defined as where h, r and t represent the vectors of head entity h, relation r and tail entity t, respectively. If the fact (h, r, t) is true, its score f r (h, t) tends to be close to zero. TransE is a simple and efficient method for KG completion. However, its simple structure has flaws in dealing with complicated relations like 1to-N, N-to-1 and N-to-N. In order to address this problem, TransH (Wang et al., 2014) introduces relation-specific hyperplanes and projects entities as vectors onto the given hyperplanes. Similar to TransH, TransR (Lin et al., 2015b) also aims to cope with complicated relations. Instead of employing the hyperplane like TransH, TransR proposes a matrix W r ∈R m×n to project entity vectors into a relation-specific space. Moreover, STransE (Nguyen et al., 2016) extends TransR by introducing two projection matrices for the head entity and the tail entity, respectively. Therefore, the head and tail entities in a triple will be projected differently into the corresponding relation space.

Incorporating Relation Paths
The models introduced so far only exploit facts observed in KGs to conduct representation learning. In fact, there is a large amount of useful information in relation paths that can be incorporated into translation-based models to improve the performance of link prediction. Lin et al. (2015a) proposes a path-based translation model named PTransE for KG completion. It regards relation paths as translations between entities for representation learning and utilizes a pathconstraint resource allocation algorithm to evaluate the reliability of relation paths. RTransE (García-Durán et al., 2015) and TransE-COMP (Guu et al., 2015) take the sum of the vectors of all relations in a path as the representation for a relation path. For the Bilinear-COMP model (Guu et al., 2015), and the PRUNED-PATHS model (Toutanova et al., 2016), they represent each relation as a diagonal matrix, and evaluate the relation path by matrix multiplication. Most recently, PaSKoGE model (Jia et al., 2018) is proposed for KG embedding by minimizing a pathspecific margin-based loss function. Moreover, RPE (Lin et al., 2018), inspired by PTransE, ex- tends the TransR model by incorporating the pathspecific projection for paths between entity pairs.
These methods try to incorporate information of relation paths to get better performance. However, they pay less attention to the order of relations in a path when learning representations of the path. In fact, changes in the relation order of paths will alter the meanings of paths to a great extent (as shown in Figure 1). Moreover, the methods stated above assume information from different paths between an entity pair only contributes to the relation inference linearly. Unfortunately, they ignore the complex nonlinear features of different paths. In order to solve these problems, we propose OPTransE, a novel KG completion model, which learns representations of ordered relation paths and designs a pooling method to better extract nonlinear features from various relation paths.

Our Model
To infer the missing parts of KGs, we propose a KG completion model called OPTransE, whose architecture is shown in Figure 2. We first embed the entities and relations of KG into latent spaces with the consideration of the order of relations in paths. Then, we try to infer the missing relations using these latent representations. Different from previous methods which embed the head and tail of a relation into the same latent space, we project them into different spaces. Therefore, we can distinguish the order of relations in the path. To extract the complex and nonlinear path information for relation reasoning, we design a two layer pooling strategy to fuse the information from different paths.
In this section, we will first introduce the em-bedding representations of ordered relation paths. After that, we utilize a two layer pooling strategy to construct the total energy function of triples and then the objective function is presented. Finally, we will describe the detail of model implementation and analyze the complexity of the model.

Ordered Relation Paths Representation
For each triple (h, r, t) in KG, we employ vectors to represent the entity pair and the relation. Specifically, h∈R d denotes the head entity h, t∈R d denotes the tail entity t and r∈R d indicates the relation r.
We assume the paths connecting two entities contain indicative information for the direct relation between these two entities. To measure these kinds of indicative effects while guarantee the order of relations in a path, we define an energy function in Equation (2). Let p s=n denote one of the n-step path from h to t, i.e., h If the relation path is reasonable from h to t, it will obtain a lower energy value. where h p and t p denote the representations of the head entity h and the tail entity t in the ordered relation path p, respectively. S i p ∈R d×d denotes the sequence matrix with respect to the i-th relation in the given path p. Note that a triple (h, r, t) in the KG can be seen as a one-step path between h and t. Thus, the value of E(h, r, t) is able to be obtained by substituting direct relation r as p s=1 into Equation (2).
From Equation (2) we could observe that the sequence matrix S i p before each relation r i is different. If the order of several relations in a path is altered, the value of energy function will also change at the same time. Therefore, paths with the same relation set but different relation order will infer out distinct direct relations in our model. The specific representation of the ordered relation path will be demonstrated in the following contents.
To keep the order information of relations in paths, we project the head and tail entities of a relation into different spaces by introducing two matrices for each relation. Let W r,1 ∈R d×d and W r,2 ∈R d×d denote the projection matrices of the head entity and the tail entity for relation r, respectively. With these two matrices, we will project the head and tail entities into distinct spaces with respect to the same relation. Suppose there is a path r 1 , r 2 , . . . , r n from h to t, ideally, we define the following equations . . .
where t (i) indicates the i-th passing node on the path.
For the entity pair with a relation path, we get their representations after eliminating the passing nodes from Equation (5). Thus, the concrete forms of the variables in Equation (2) are shown as follows, where W p s=n ∈R d×d indicates the projection matrix for path p s=n , which aims to project the tail entity in a path to the space of p s=n . Moreover, I in Equation (9) denotes the identity matrix and M (r k , r k−1 ) ∈R d×d means the space transition matrix from the head entity space of r k to the tail entity space of r k−1 , i.e., M (r k , r k−1 ) W r k ,1 = W r k−1 ,2 . Figure 3 illustrates the representation of the relation path in our model. Suppose there is a 2-step path from h to t passing t , i.e., h It is obvious that t acts as the tail entity of relation r 1 and as the head entity of relation r 2 at the same time, which is shown on the top part of Figure 3. To connect relations in different spaces, we try to unify the passing node in the path into the same space. As defined in Equation (9), T 2 is utilized to transfer the passing node t from the head entity space of r 2 to the tail entity space of r 1 . Moreover, T 2 is also assigned to the relation r 2 and the tail entity t. Note that the tail entity t will be projected into the space of path p which is defined in Equation (6). Finally, the path from h p to t p will pass through r 1 and T 2 r 2 as shown on the bottom part of Figure 3.

Pooling Strategy
We design a two layer pooling strategy to fuse the information from different paths. First, we utilize a minimum pooling method to extract feature information from paths with i steps and define an energy function as follows, where P s=i r indicates the set of all i-step paths which are relevant to the relation r from the head entity h to the tail entity t. To obtain P s=i r , we introduce a conditional probability Pr(r|p s=i ) to represent the reliability of a path p s=i associated with the given relation r, where Pr(r, p s=i ) denotes the joint probability of r and p s=i , Pr(p s=i ) denotes the marginal probability of p s=i . In addition, N (r, p s=i ) denotes the number of cases where r and p s=i link the same entity pair in the KG, N (p s=i ) denotes the number of the path p s=i in the KG and N (p) denotes the total number of paths in the KG. Since N (p) can be removed from both the numerator and denominator of the fractional expression, we finally convert the probability into frequency for computation. We filter the paths by choosing all p s=i from h to t whose Pr r p s=i > 0. Thus, P s=i r is the set of all filtered p s=i . Sometimes we could infer the fact not from the direct relation r but from the path, which means the value of E h, P s=i r , t could possibly be less than that of E (h, r, t).
Furthermore, we utilize a minimum pooling method to fuse information from paths with dif-ferent lengths and define an energy function as follows, where E (h, r, t) indicates the energy value of direct relation r and it is calculated by substituting r as p s=1 into Equation (2). E h, P s=i r , t is initialized as infinite, thus it will not influence the outcome of final energy function if there is no istep path between h and t.
In summary, we adopt the min-pooling strategy twice in our model. For E h, P s=i r , t , minpooling aims to choose the most matched path with r among all i-step paths. And for the final energy function, min-pooling tries to extract nonlinear features from paths of various lengths. In addition, the min-pooling method addresses the problem that there may be no relation paths between h and t.

Objective Function
The objective function for the proposed model OPTransE is formalized as where L (h, r, t) indicates the loss function for the triple (h, r, t), and L h, p s=i , t represents the loss value with respect to the relation path p s=i . The probability Pr p s=i h, t indicates the reliability of the relation path p s=i given the entity pair (h, t), and Pr(r|p s=i ) denotes the reliability of a path p s=i associated with the given relation r. The details of Pr p s=i h, t are shown in (Lin et al., 2015a), which is computed by a path-constraint resource allocation algorithm. Z i = p s=i ∈P s=i r Pr p s=i h, t Pr r p s=i is a normalization factor, and λ is utilized to balance the triple loss and paths losses.
We adopt the margin-based loss in our model, i.e., where p is the simple form of p s=i . [x] + = max(x, 0) returns the higher one between x and 0. γ i is the margin to separating positive and negative samples. It is noteworthy that we employ different margin γ i for paths with different step number because the noise of energy function will be magnified as the number of steps increases. The corrupted triple set S for (h, r, t) is denoted as follows: We replace the head entity or the tail entity in the triple randomly and guarantee that the new triple is not an existing valid triple. Our goal is to minimize the total loss. Valid relation paths will obtain lower energy value after the optimization, so that paths can sometimes replace directed relations when performing the prediction.

Parameter Learning
We utilize stochastic gradient descent (SGD) to optimize the objective function in Equation (13) and learn parameters of the model. To ensure the convergence of the model, we impose limitations to the norm of vectors, i.e., ||h|| 2 ≤ 1, ||r|| 2 ≤ 1, ||t|| 2 ≤ 1, ||W r,1 h|| 2 ≤ 1, ||W r,2 t|| 2 ≤ 1. Moreover, we note that the objective function defined in Equation (13) has two parts. The first part is for the basic triple and the second part is for the relation paths. To focus on the representation of ordered relation paths in the second part, we only update the parameters of relation vectors in the path when conducting the optimization of the model.
In addition, we follow PTransE (Lin et al., 2015a) to generate reverse relation r -1 to enlarge the training set, and the inference in KGs can be through the reverse paths. For instance, for the fact (Honolulu, CapitalOf, Hawaii), we will also add a fact with the reverse relation to the KG, i.e., (Hawaii, CapitalOf −1 , Honolulu).

Complexity Analysis
Let d denote the dimension of entities and relations, N e and N r denote the number of entities and relations, respectively. The number of model parameters for OPTransE is (N e d + N r d + 2N r d 2 ), which is the same as that of STransE.
Moreover, let N p denote the expected number of relation paths between the entity pair, N t denote the number of triples for training, k denote the maximum length of relation paths. According to the objective function shown in Equation (13) and details of parameter learning stated in Section 3.4, the time complexity of OPTransE for optimization is O(k 2 d 3 N p N t ), which is on the same magnitude as that of RPE(MCOM) (Lin et al., 2018).

Datasets
To evaluate the proposed model OPTransE, we use two benchmark datasets: WN18 and FB15K as experimental data. They are subsets of the knowledge graph WordNet (Miller, 1995) and Freebase (Bollacker et al., 2008), respectively (Bordes et al., 2013). These two datasets have been widely employed by researchers for KG completion (Jia et al., 2018;Lin et al., 2018). The statistic details of the two datasets are shown in Table 1. In our experiments, as we add triples of reverse relations to the datasets, the number of relations and training triples are doubled.

Experimental Settings
We adopt the idea from TransR (Lin et al., 2015b) and initialize the vectors and matrices of OP-TransE by an existing method STransE (Nguyen et al., 2016). Following TransH (Wang et al., 2014), Bernoulli method is applied for generating head or tail entities when sampling corrupted triples.
As the length of paths increases, the reliability of the path will decline accordingly. To better determine the maximum length of paths for experiment, before the test on FB15K, we had evaluated OPTransE with 3-step paths on WN18. However, OPTransE (3-step) performs comparably as OPTransE (2-step) with a higher computational cost. This indicates that longer paths hardly contain more useful information and it is unnecessary to enumerate longer paths. Therefore, considering the computational efficiency, we limit the maximum length of relation paths as 2 steps.

Evaluation Metrics and Baselines
The same as in previous work (Bordes et al., 2013;Nguyen et al., 2016), we evaluate the proposed model OPTransE on the link prediction task. This task aims to predict the missing entity in a triple (h, r, t), i.e., predicting h when r and t are given, or predicting t given h and r. When testing a fact (h, r, t), we replace either head or tail entity with all entities in the dataset and calculate scores of generated triples according to Equation (12). And then we sort the entities with their scores in ascending order to locate the rank of the target entity.
For specific evaluation metrics, we employ the widely used mean rank (MR) and Hits@10 in the experiments. Mean rank indicates the average rank of correct entities and Hits@10 means the proportion of correct entities ranked in top 10. Higher Hits@10 or lower value of mean rank implies the better performance of the model on the link prediction task. Moreover, it is noted that the generated triple for test may exist in the dataset as a fact, thus such triples will affect the final rank of the target entity to some extent. Hence, we could filter out these generated triples which are facts in the dataset before ranking. If we have performed filtering, the result will be denoted as "Filtered", otherwise it will be denoted as "Raw".

Results
From Table 2 we could observe that: (1) PTransE performs better than its basic model TransE, and RPE outperforms its original method TransR. This indicates that additional informa-tion from relation paths between entity pairs is helpful for link prediction. Note that OPTransE outperforms baselines which do not take relation paths into consideration in most cases. These results demonstrate the effectiveness of OPTransE to take advantage of the path features in the KG.
(2) OPTransE performs better than previous pathbased models like RTransE, PTransE, PaSKoGE and RPE on all metrics. This implies that the order of relations in paths is of great importance for reasoning, and learning representations of ordered relation paths can significantly improve the accuracy of link prediction. Moreover, the proposed pooling strategy which aims to extract nonlinear features from different relation paths also contributes to the improvements of performance.
Specific evaluation results on FB15K by mapping properties of relations (1-to-1, 1-to-N, Nto-1, and N-to-N) are shown in Table 3. Several methods which have reported these results are listed as baselines. OPTransE achieves the highest scores in all sub-tasks. We note that it is more difficult to predict head entities of N-to-1 relations and tail entities of 1-to-N relations, since the prediction accuracy on these two sub-tasks is generally lower than those of other sub-tasks. Surprisingly, OPTransE has achieved significant improvements on these two sub-tasks. Especially when predicting tail entities of 1-to-N relations, OPTransE promotes Hits@10 to 87.4% which is 8.3% higher than the best performance among baselines. Meanwhile, since the average prediction accuracy for N-to-N relations of OPTransE on the two datasets has reached 91.1%, we can also infer that our model has strong ability to deal with N-to-N relations. OPTransE projects the head and tail entities of a triple into different relationspecific spaces, thus, it is able to better discriminate the relevant entities. Furthermore, these results also confirm that ordered relation paths between entity pairs which are exploited by OP-TransE contain useful information and can help to perform more accurate inference when facing complex relations.

Conclusion and Future Work
In this paper, we propose a novel KG completion model named OPTransE, which aims to address the issue of relation orders in paths. In our model, we project the head entity and the tail entity of each relation into different spaces to guarantee the order of the path. In addition, a pooling method is applied to extract complex and nonlinear features from numerous relation paths. Finally, we evaluate our proposed model on two benchmark datasets and experimental results demonstrate the effectiveness of OPTransE.
In the future, we will explore the following research directions: (1) we will study the applications of the proposed models in various domains, like personalized recommendation (Liu et al., 2018); (2) we will explore other techniques to fuse the ordered relation information from different paths (Liu et al., 2019).