A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations

This paper proposes a novel translation-based knowledge graph embedding that preserves the logical properties of relations such as transitivity and symmetricity. The embedding space generated by existing translation-based embeddings do not represent transitive and symmetric relations precisely, because they ignore the role of entities in triples. Thus, we introduce a role-speciﬁc projection which maps an entity to distinct vectors according to its role in a triple. That is, a head entity is projected onto an embedding space by a head projection operator, and a tail entity is projected by a tail projection operator. This idea is applied to TransE, TransR, and TransD to produce lppTransE, lppTransR, and lppTransD, respectively. According to the experimental results on link prediction and triple classiﬁcation, the proposed logical property preserving embeddings show the state-of-the-art performance at both tasks. These results prove that it is critical to preserve logical properties of relations while embedding knowledge graphs, and the proposed method does it effectively.


Introduction
Representing knowledge as a graph is one of the most effective ways to utilize human knowledge with a machine, and various large-scale knowledge graphs such as Freebase (Bollacker et al., 2008) and Yago (Suchanek et al., 2007) are available these days. However, the sparsity of the graphs makes it difficult to utilize them in real world applications. In spite of their huge volume, the relations among entities in the graphs are insufficient, which results in very limited inference of the knowledge of the graphs. Therefore, it is of importance to resolve such sparsity of knowledge graphs.
One of the most promising methods to complete knowledge graphs is to embed the graphs in a lowdimensional continuous vector space. This method learns a vector representation of a knowledge graph, and the plausibility of a certain knowledge within the graph is measured with algebraic operations in the vector space. Thus, new knowledge can be harvested from the space by finding knowledge instances with high plausibility.
The translation-based model among various knowledge-embedding models shows the state-ofthe-art performance of knowledge graph completion (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015;Ji et al., 2015). TransE (Bordes et al., 2013) is one of the well-known translation-based approaches to this problem. When a set of knowledge triples (h, r, t) composed of a relation (r) and two entities (h and t) is given, it finds vector representations of h, t, and r by compelling the vector of t to be same with the sum of the vectors of h and r. While TransE embeds all relations in a single vector space, TransH (Wang et al., 2014) and TransR (Lin et al., 2015) assume that each relation has its own embedding space. On the other hand, Ji et al. (2015) found out that even a single relation or a single entity usually has multiple types. Thus, they have proposed TransD which allows multiple mapping matrices of entities and relations.
Even though these translation-based models achieve high performance in knowledge graph completion, they all ignore logical properties of rela-tions. That is, transitive relations and symmetric relations lose their transitivity or symmetricity in the vector space generated by the translation-based models. As a result, the models can not complete only new knowledge with such relations, but also new knowledge with a relation affected by the relations. In most knowledge graphs, transitive or symmetric relations are common. For instance, FB15K, one of the benchmark datasets for knowledge graph completion, has a number of transitive and symmetric relations. About 20% of triples in FB15K have a transitive or a symmetric relation. Therefore, the ignorance of logical properties of relations becomes a serious problem in knowledge graph completion.
The main reason why existing translation-based embeddings can not reflect logical properties of relations is that they do not consider the role of entities. An entity should be a different vector in the embedding space according to its role. Therefore, the solution to preserve the logical properties of relations in the embedding space is to distinguish the role of entities while embedding entities and relations.
In this paper, we propose a role-specific projection to preserve logical properties of relations in an embedding space. This can be implemented by projecting a head entity onto an embedding space by a head projection operator and a tail entity by a tail projection operator. As a result, an identical entity is represented as two distinct vectors. This idea can be applied to various translation-based models including TransE, TransR, and TransD. Therefore, we also propose how to modify existing translationbased models to preserve logical properties. The effectiveness of the proposed idea is verified with two tasks of link prediction and triple classification using standard benchmark datasets of WordNet and Freebase. According to the experimental results, the logical property preserving embeddings achieve the state-of-the-art performance in both tasks.

Related Work
The sparsity of knowledge graphs is one of the most critical issues in utilizing them in real-world applications. Thus, there have been a number of studies on completing knowledge graphs as a solution to overcome the sparsity. Link prediction is one of the promising ways for knowledge graph comple-tion. This task predicts new relations between entities on a knowledge graph by investigating existing relations of the graph (Nickel et al., 2015;Neelakantan and Chang, 2015). The methods used for link prediction can be categorized into three groups. One group consists of the methods based on graph features. The observable features used in these methods are the paths between entity pairs (Lao and Cohen, 2010;Lao et al., 2011) andsubgraphs (Gardner andMitchell, 2015). The other group is composed of the methods based on Markov random fields. The studies belonging to this group inference new relations by probabilistic soft logic (Pujara et al., 2013) and first-order logic (Jiang et al., 2012).
Knowledge graph embedding is another prominent method for link prediction (Bordes et al., 2011;Nickel et al., 2011;Guo et al., 2015;Neelakantan et al., 2015). It embeds entities of a knowledge graph into a continuous low dimensional space as vectors, and embeds relations as vectors or matrices. These vectors are optimized by a score function of each knowledge graph embedding model. The Semantic Matching Energy (SME) model proposed by Bordes et al. (2014) finds vector representations of entities and relations using a neural network. When a triple (h, r, t) is given, SME makes two relationdependent embeddings of (h, r) and (r, t). Its score function for a triple is the similarity between the embeddings. Since relations are expressed as vectors instead of matrices, the complexity of SME is relatively low compared to other embedding methods. Jenatton et al. (2012) suggested Latent Factor Model (LFM). In order to capture both the first-order and the second-order interactions between two entities, LFM adopts a bilinear function as its score function. In addition, it represents a relation as a weighted sum of sparse latent factors to work with a large number of relations. Socher et al. (2013) proposed the Neural Tensor Network (NTN) model. NTN is a highly expressive embedding model which has a bilinear tensor layer instead of a standard linear neural network layer. As a result, it can process various interactions between entity vectors. However, it is difficult to process large-scale knowledge graphs with NTN due to its high complexity.
The current main stream of knowledge graph embedding is a translation-based embedding approach. The basic idea of this embedding is that entities and relations are represented as vectors, and relations are treated as operators to translate entities into other positions on an embedding space. Thus, they try to find vector representations of h, t, and r so that the vector of t becomes the sum of the vectors of h and r. TransE (Bordes et al., 2013) is the simplest translation-based graph embedding. It assumes that all vectors of entities and relations lie on a single vector space. As a result, it fails in dealing with the reflexivity and the multiplicities of relations except 1-to-1. The solution to this problem is to allow the entities to play different roles according to relations, but TransE is unable to do it.
An entity plays multiple roles in TransH (Wang et al., 2014), since TransH allows entities to have multiple vector representations. In order to obtain multiple representations of an entity, TransH projects an entity vector into relation-specific hyperplanes. TransR (Lin et al., 2015) also solves the problems of TransE by introducing relation spaces. It allows an entity to have various vector representations by mapping an entity vector into relation-specific spaces rather than relation-specific hyperplanes. Although both TransH and TransR overcome the limitations of TransE, they are still not able to handle multiple types of relations which is determined by head and tail entities of each relation. For example, let us consider two triples of (California, part of, USA) and (arm, part of, body). Both triples have a relation part of in common, but the relation should be interpreted differently in each triple. Ji et al. (2015) have proposed TransD in which a relation can have multiple relation spaces according to its entities. TransD constructs relation mapping matrices dynamically by considering entities and a relation simultaneously. For this, it introduces projection vectors for entities and relations, and then constructs the mapping matrices by multiplying these entity and relation projection vectors. As a result, every relation in TransD has multiple entity-specific spaces.

Loss of Logical Properties in Translation-Based Embeddings
Translation-based embeddings aim to find vector representations of knowledge graph entities in an embedding space by regarding relations as translation of entities in the space. Since they map the entities onto a vector space regardless of the role of the entities, they do not express logical properties of relations such as transitivity and symmetricity. That is, the vectors of transitive or symmetric relations do not deliver transitivity or symmetricity in the embedding spaces from translation-based embeddings. For instance, let us consider a transitive relation. Assume that we have three triples (e 1 , r 1 , e 2 ), (e 2 , r 1 , e 3 ), and (e 1 , r 1 , e 3 ) and r 1 is a transitive relation. When the vector of r 1 is not a zero vector, there could be three types of entity vectors as shown in Figure 1. In Figure 1-(a), e 1 , e 2 , and e 3 are placed linearly. In this case, (e 1 , r 1 , e 3 ) can not be expressed in this figure. When e 1 and e 2 are placed at the same point like Figure 1-(b), (e 1 , r 1 , e 2 ) can not be expressed. In Figure 1-(c), (e 2 , r 1 , e 3 ) can not be expressed when e 2 and e 3 are same. In a similar way, translation-based embeddings can not express symmetric relations perfectly.
The problems caused by wrong expression of transitive and symmetric relations are two-folds.  One fold is that the relations with logical properties are common in knowledge bases. Two benchmark datasets of FB15K and WN18 in Table 2 prove it. There are 483,142 triples in FB15K, and 84,172 (= 47,841 + 36,331) triples among them have a transitive or symmetric relation. That is, the translationbased embeddings do not express triples precisely for about 17% of triples in FB15K. 22.4% of triples in another dataset WN18 also have a transitive or symmetric relation. The other is that transitive or symmetric relations do not affect the entities that are directly connected by the relations, but affect also other entities shared by non-transitive and nonsymmetric relations through the entities. Therefore, it is of importance in translation-based embeddings to represent transitive and symmetric relations precisely.

Role-Specific Projection of Entity Vectors
The main reason why transitive or symmetric relations are not represented precisely by existing translation-based embeddings is that they ignore the role of entities in embedding them onto a vector space. That is, when a triple (h, r, t) is given, h and t plays different roles. However, the existing embeddings treat them equally and embed them into a space in the same way. Therefore, in order to express entities and relations more precisely, entities should be represented differently according to their role in a triple. Figure 2 shows how entities can be represented according to their role. In this figure, solid lines represent entity mappings as head roles and dotted lines mean that entities are mapped as tails. Assume that three triples of (e 1 , r 1 , e 2 ), (e 2 , r 1 , e 3 ), and (e 1 , r 1 , e 3 ) are given with a transitive relation r 1 . e 1 plays only a head role and e 3 plays only a tail role, while e 2 plays both roles. Then, the entity vectors in the entity space are mapped into the space of r 1 using two mapping matrices M r 1 h and M r 1 t . That is, head entities are mapped by M r 1 h , while tail entities are projected by M r 1 t . Let e h 1⊥ and e h 2⊥ be the projected vectors of e 1 and e 2 respectively by M r 1 h , and let e t 2⊥ and e t 3⊥ be the projected vectors of e 2 and e 3 respectively by M r 1 t . e h 1⊥ and e h 2⊥ are placed at the same point in the space of r 1 from (e 2 , r 1 , e 3 ) and (e 1 , r 1 , e 3 ). Similarly, e t 2⊥ and e t 3⊥ are same from (e 1 , r 1 , e 2 ) and (e 1 , r 1 , e 3 ). Since e 2 is used as both a head and a tail, it is mapped differently as e h 2⊥ and e t 2⊥ , respectively. Note that all three triples are well expressed in this space. Symmetric relations also can be expressed precisely by logical property preserving knowledge graph embedding. Assume that two triples of (e 4 , r 2 , e 5 ) and (e 5 , r 2 , e 4 ) are given with a symmetric relation r 2 . Figure 3 shows how the triples are well represented. The solid lines imply that entities are mapped by M r 2 h while the dotted lines mean that entities are mapped by M r 2 t . By placing e h 4⊥ and e h 5⊥ at the same point and imposing e t 4⊥ and e t 5⊥ at the same point, r 2 is precisely expressed as a symmetric relation in the embedding space.

Realization of Logical Property Preserving Embedding
Due to the simplicity of role-specific projection of entity vectors, it can be applied to various translation-based embeddings. In this paper, we apply it to TransE, TransR, and TransD.

TransE
The score function of TransE is where h, t ∈ R n , and r ∈ R n are the vectors of a head entity, a tail entity, and a relation on a single embedding space. In the logical property preserving TransE (lppTransE), h and t should be mapped differently. For this purpose, we adopt a head and a tail space mapping matrices of M h ∈ R n×n and M t ∈ R n×n . As a result, the score function of lpp-TransE becomes This is similar to the score function of TransR. The difference between lppTransE and TransR is that both h and t are mapped by a single mapping matrix for r in TransR, while h is mapped by M h and t is by M t in lppTransE.

TransR
The entities in TransR are mapped into vectors in different relation space according to a relation. Thus, its score function is defined as where M r ∈ R m×n is a mapping matrix for a relation r which is represented as r ∈ R m . Thus, in the logical property preserving TransR (lppTransR), the mapping matrix of each relation is split into a head mapping matrix M r h ∈ R m×n and a tail mapping matrix M rt ∈ R m×n . Then, the score function of lppTransR is With these two distinct mapping matrices, entities can have two different vector representations in the same relation space.

TransD
TransD maps entity vectors into different vectors in relation spaces according to entity and relation types. That is, the entity vectors are mapped by entity-relation specific mapping matrices. Thus, its score function is defined as where M rh ∈ R m×n and M rt ∈ R m×n are entityrelation specific mapping matrices. These mapping matrices are computed by multiplying projection vectors of an entity and a relation as follows.
where h p , t p and r p , are the projection vectors for a head, a tail and a relation. The logical property preserving TransD (lpp-TransD) divides r p into two projection vectors r p h and r pt to reflect the role of entities. The mapping matrices then becomes Then, its score function is

Training Logical Property Preserving Embeddings
Since all logical property preserving embeddings are based on the score function used by previous translation-based knowledge graph embeddings, they can be trained using a margin-based ranking loss defined as where f * r is the score function of a corresponding logical property preserving embedding. Here, P and N are sets of correct and incorrect triples, and γ is a margin. N is constructed by replacing a head or a tail entity in an existing triple because a knowledge graph has only correct triples. All logical property preserving embeddings are optimized by stochastic gradient descent.
The complexities of the logical property preserving embeddings are shown in Table 1. N e and N r in this table are the number of entities and relations, and n and m are the dimensions of entity and relation embedding spaces. Their complexity mainly depends on the number of relations. Thus, the increased complexities of the logical property preserving embeddings is not significant, when compared with TransE, TransR, and TransD.

Experiments
The superiority of the proposed logical property preserving embeddings is shown through two kinds of tasks. The first task is link prediction (Bordes et al., 2013). This task predicts the missing entity when there is a missing entity in a given triple. The other is triple classification (Socher et al., 2013). This task aims to decide whether a given triple is correct or not.

Data Sets
Two popular knowledge graphs of WordNet (Miller, 1995) and Freebase (Bollacker et al., 2008) are used for evaluating embeddings. WordNet provides the semantic relations among words, and there exist its two widely-used subsets which are WN11 (Socher et al., 2013) and WN18 (Bordes et al., 2014). WN18 is used for link prediction, and WN11 is adopted for triple classification. Freebase represents general facts about the world. It has two subsets of FB13 (Socher et al., 2013) and FB15K (Bordes et al., 2014). FB15K is used for both triple classification and link prediction, while FB13 is employed only for triple classification. Table 2 summarizes a simple statistics of each dataset. #Triples is the number of training triples in each benchmark dataset. #Transitive and #Symmetric are the number of triples which have transitive and symmetric relations, respectively. Ratio denotes the proportion of triples of which relation is transitive or symmetric. As shown in this table, the triples with a transitive or symmetric relation take a large proportion in WN18 and FB15K.

Link Prediction
For the evaluation of link prediction, we followed the evaluation protocols and metrics used in previous studies (Socher et al., 2013;Bordes et al., 2013;Lin et al., 2015;Ji et al., 2015). To compare the  methods for this task, two metrics of mean rank and Hits@10 are used. The mean rank measures the average rank of all correct entities, and Hits@10 is the proportion of correct triples ranked in top 10. Since there are two evaluation settings of "raw" and "filter" in this task (Bordes et al., 2013), we report both results. In addition, we report the results for two sampling methods of "bern" and "unif" (Wang et al., 2014) as the previous studies did.
There are five parameters in the proposed property preserving embeddings. They are a learning rate α, the number of training triples in each mini-batch B 1 , a margin γ, the embedding dimension for entities and relations (n and m), and a dissimilarity measure in embedding score functions (D.S). The parameter values used in our experiments are given at Table 3. The iteration number of stochastic gradient descent is 1,000. Table 4 shows the results on link prediction. The results of previous studies are referred from their report, since the same datasets are used. The val-1 α and B are related with stochastic gradient descent. ues between parentheses are the improvement over their base models. The logical property preserving embeddings outperform all other methods for both "bern" and "unif" on WN18 except lppTransR with "bern" in the raw setting. In the raw setting, lppTransE, lppTransR, and lppTransD achieve 79.5%, 79.6%, and 80.5% of Hits@10 respectively in "bern", which are 4.1%, -0.2%, and 0.9% higher than those of TransE, TransR, and TransD. The logical property preserving embeddings show even higher performance in the filter setting. Hits@10 of lppTransD in "bern" is 94.3%, while that of TransD in "unif" is 92.5% and that in "bern" is 92.2%. Thus, lppTransD improves 2.1% over TransD in "bern". Especially, this performance of lppTransD is 1.8% higher than that of TransD in "unif", the previous state-of-the-art performance.
The logical property preserving embeddings outperform their base models also on FB15K. TransE is improved most significantly with this dataset. The improvements by lppTransE in the raw setting are 12.5% in "unif" and 14.0% in "bern", while those in the filter setting are 25.8% in "unif" and 26.0% in "bern". In addition, its Hits@10 exceeds those of TransH and TransR. That is, even if TransH and TransR were proposed to tackle the problem of TransE, the proposed lppTransE solves the problem better than TransH and TransR. lppTransD achieves just a little bit lower Hits@10 than TransD in "bern", but the improvements in the filter setting are notice- able. Since Hits@10 of TransD in "bern" is the best performance ever reported, that of lppTransD in "bern" becomes a new state-of-the-art performance. Table 5 exhibits Hits@10s according to mapping property of the relations of FB15K. The notable trend of this table is that the logical property preserving embeddings show much higher Hits@10 than their base models in N-to-1 and N-to-N, while their Hits@10s are similar to those of their base models in 1-to-1 and 1-to-N. This is notable with TransD and lppTransD. lppTransD improves TransD, the previous state-of-the-art method by 7.3% (N-to-1) and 3.7% (N-to-N) in predicting head, and by 0.9% (Nto-1) and 0.3% (N-to-N). Note that it is important to verify if logical property preserving embeddings achieve good performances in N-to-N, since all transitive relations and some symmetric relations are, in general, N-to-N. According to this table, Hits@10s of most logical property preserving embeddings are improved significantly in N-to-N, which proves that the proposed method solves transitivity and symmetricity problem of previous embeddings.

Triple Classification
Three datasets of WN11, FB13, and FB15K are used in this task. WN11 and FB13 have negative triples, but FB15K has only positives. Thus, we generated negative triples for FB15K by following the strategy of (Socher et al., 2013). As a result, the classification accuracies on FB15K can not be compared directly with previous studies, and the accuracies on FB15K in this table are those obtained with our dataset. The parameter values for training TransE, TransH, TransR, and TransD are borrowed from their reports, and those for logical property preserving embeddings are shown in Table 6. Table 7 shows the accuracies of triple classifi- cation on the three datasets. The logical property preserving embeddings in general outperform their base models. lppTransE always shows higher accuracy than TransE, lppTransR than TransR except for WN11, and lppTransD than TransD in FB15K.
One thing to note is that the improvements by the logical property preserving embeddings are always observed in FB15K, while those in WN11 and FB13 are small or slightly negative. This can be explained with the number of triples with a transitive or symmetric relation in the datasets. As shown in Table  2, the triples with such a relation take just a small portion of WN11 and FB13. The ratio of those triples is less than 2.3% in these datasets. However, as noted before, more than 17% triples are such ones in FB15K. Thus, the improvement in FB15K is remarkable. The other thing to note is that lpp-TransD shows the best accuracy in FB15K. These results imply that the proposed logical property preserving embeddings solve the problems of existing translation-based embeddings effectively.

Conclusion
This paper has proposed a new translation-based knowledge graph embedding that preserves logical properties of relations. Transitivity and symmetricity are very important characteristics of relations for representing and inferring knowledge, and the triples with such a relation take a large proportion of real-world knowledge graphs. In order to preserve the logical properties in an embedding space, an entity is forced to have multiple vector represen-  -to-N N-to-1 N-to-N 1-to-1 1-to-N N-to-1 N- tations according to its role in a triple. This idea has been applied to TransE, TransR, and TransD, and they are called as lppTransE, lppTransR, and lpp-TransD. Their superiority was shown through two tasks of link prediction and triple classification. The logical property preserving embeddings showed the improved performance over their base models 2 . Especially, lppTransD showed the state-of-the-art performance in both tasks. These results imply that the proposed role-specific projection is plausible to preserve logical properties of relations.