STransE: a novel embedding model of entities and relationships in knowledge bases

Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task.


Introduction
Knowledge bases (KBs), such as WordNet (Fellbaum, 1998), YAGO (Suchanek et al., 2007), Freebase (Bollacker et al., 2008) and DBpedia (Lehmann et al., 2015), represent relationships between entities as triples (head entity, relation, tail entity).Even very large knowledge bases are still far from complete (Socher et al., 2013;West et al., 2014).Link prediction or knowledge base completion systems (Nickel et al., 2015) predict which triples not in a knowledge base are likely to be true (Taskar et al., 2004;Bordes et al., 2011).A variety of different kinds of information is potentially useful here, including information extracted from external corpora (Riedel et al., 2013;Wang et al., 2014a) and the other relationships that hold between the entities (Angeli and Manning, 2013;Zhao et al., 2015).For example, Toutanova et al. (2015) used information from the external ClueWeb-12 corpus to significantly enhance performance.
While integrating a wide variety of information sources can produce excellent results, there are several reasons for studying simpler models that directly optimize a score function for the triples in a knowledge base, such as the one presented here.First, additional information sources might not be available, e.g., for knowledge bases for specialized domains.Second, models that don't exploit external resources are simpler and thus typically much faster to train than the more complex models using additional information.Third, the more complex models that exploit external information are typically extensions of these simpler models, and are often initialized with parameters estimated by such simpler models, so improvements to the simpler models should yield corresponding improvements to the more complex models as well.
Let (h, r, t) represent a triple.In all of the models arXiv:1606.08140v1[cs.CL] 27 Jun 2016

Score function fr(h, t)
Opt.  (Bordes et al., 2013) is inspired by models such as Word2Vec (Mikolov et al., 2013) where relationships between words often correspond to translations in latent feature space.The TransE model represents each relation r by a translation vector r ∈ R k , which is chosen so that h + r ≈ t.

SE
The primary contribution of this paper is that two very simple relation-prediction models, SE and TransE, can be combined into a single model, which we call STransE.Specifically, we use relationspecific matrices W r,1 and W r,2 as in the SE model to identify the relation-dependent aspects of both h and t, and use a vector r as in the TransE model to describe the relationship between h and t in this subspace.Specifically, our new KB completion model STransE chooses W r,1 , W r,2 and r so that W r,1 h + r ≈ W r,2 t.That is, a TransE-style relationship holds in some relation-dependent subspace, and crucially, this subspace may involve very different projections of the head h and tail t.So W r,1 and W r,2 can highlight, suppress, or even change the sign of, relation-specific attributes of h and t.For example, for the "purchases" relationship, certain attributes of individuals h (e.g., age, gender, marital status) are presumably strongly correlated with very different attributes of objects t (e.g., sports car, washing machine and the like).
As we show below, STransE performs better than the SE and TransE models and other state-of-the-art link prediction models on two standard link prediction datasets WN18 and FB15k, so it can serve as a new baseline for KB completion.We expect that the STransE will also be able to serve as the basis for extended models that exploit a wider variety of information sources, just as TransE does.

Our approach
Let E denote the set of entities and R the set of relation types.For each triple (h, r, t), where h, t ∈ E and r ∈ R, the STransE model defines a score function f r (h, t) of its implausibility.Our goal is to choose f such that the score f r (h, t) of a plausible triple (h, r, t) is smaller than the score f r (h , t ) of an implausible triple (h , r , t ).We define the STransE score function f as follows: using either the 1 or the 2 -norm (the choice is made using validation data; in our experiments we found that the 1 norm gave slightly better results).To learn the vectors and matrices we minimize the following margin-based objective function: where [x] + = max(0, x), γ is the margin hyperparameter, G is the training set consisting of correct triples, and ∈ G} is the set of incorrect triples generated by corrupting a correct triple (h, r, t) ∈ G.
We use Stochastic Gradient Descent (SGD) to minimize L, and impose the following constraints during training:

Related work
Table 1 summarizes related embedding models for link prediction and KB completion.The models differ in the score functions f r (h, t) and the algorithms used to optimize the margin-based objective function, e.g., SGD, AdaGrad (Duchi et al., 2011), AdaDelta (Zeiler, 2012) and L-BFGS (Liu and Nocedal, 1989).
DISTMULT (Yang et al., 2015) is based on a Bilinear model (Nickel et al., 2011;Bordes et al., 2012;Jenatton et al., 2012) where each relation is represented by a diagonal rather than a full matrix.The neural tensor network (NTN) model (Socher et al., 2013) uses a bilinear tensor operator to represent each relation.Similar quadratic forms are used to model entities and relations in KG2E (He et al., 2015) and TATEC (Garcia-Duran et al., 2015b).
The TransH model (Wang et al., 2014b) associates each relation with a relation-specific hyperplane and uses a projection vector to project entity vectors onto that hyperplane.TransD (Ji et al., 2015) and TransR/CTransR (Lin et al., 2015b) extend the TransH model using two projection vectors and a matrix to project entity vectors into a relation-specific space, respectively.TransD learns a relation-role specific mapping just as STransE, but represents this mapping by projection vectors rather than full matrices, as in STransE.Thus STransE can be viewed as an extension of the TransR model, where head and tail entities are associated with their own project matrices, rather than using the same matrix for both, as in TransR and CTransR.Guu et al. (2015) showed that relation paths between entities in KBs provide richer information and improve the relationship prediction.Nickel et al. (2015) reviews other approaches for learning from KBs and multi-relational data.

Experiments
For link prediction evaluation, we conduct experiments and compare the performance of our STransE model with published results on the benchmark WN18 and FB15k datasets (Bordes et al., 2013).Information about these datasets is given in Table 2.

Task and evaluation protocol
The link prediction task (Bordes et al., 2011;Bordes et al., 2012;Bordes et al., 2013) predicts the head or tail entity given the relation type and the other entity, i.e. predicting h given (?, r, t) or predicting t given (h, r, ?)where ?denotes the missing element.The results are evaluated using the ranking induced by the score function f r (h, t) on test triples.
For each test triple (h, r, t), we corrupted it by replacing either h or t by each of the possible entities in turn, and then rank these candidates in ascending order of their implausibility value computed by the score function.Following the protocol described in Bordes et al. (2013), we remove any corrupted triples that appear in the knowledge base, to avoid cases where a correct corrupted triple might be ranked higher than the test triple.We report the mean rank and the Hits@10 (i.e., the proportion of test triples in which the target entity was ranked in the top 10 predictions) for each model.Lower mean rank or higher Hits@10 indicates better link prediction performance.

Main results
Table 3 compares the link prediction results of our STransE model with results reported in prior work, using the same experimental setup.The first twelve rows report the performance of models that do not exploit information about alternative paths between head and tail entities.The next two rows report results of the RTransE and PTransE models, which are extensions of the TransE model that exploit information about relation paths.The last row presents results for the log-linear model Node+LinkFeat (Toutanova and Chen, 2015) which makes use of textual mentions derived from the large external ClueWeb-12 corpus.
It is clear that Node+LinkFeat with the additional external corpus information obtained best results.In future work we plan to extend the STransE model to incorporate such additional information.Table 3 also shows that models RTransE and PTransE employing path information achieve better results than models that do not use such information.In terms of models not exploiting path information or external information, the STransE model scores better than Method WN18 FB15k MR H10 MR H10 SE (Bordes et al., 2011) 985 80.5 162 39.8 Unstructured (Bordes et al., 2012) 304 38.2 979 6.3 TransE (Bordes et al., 2013) 251 89.2 125 47.1 TransH (Wang et al., 2014b) 303 86.7 87 64.4 TransR (Lin et al., 2015b) 225 92.0 77 68.7 CTransR (Lin et al., 2015b) 218 92.3 75 70.2KG2E (He et al., 2015) 348 93.2 59 74.0 TransD (Ji et al., 2015) 212  2015) since NTN was originally evaluated on different datasets.The results marked with + are obtained using the optimal hyper-parameters chosen to optimize Hits@10 on the validation set; trained in this manner, STransE obtains a mean rank of 244 and Hits@10 of 94.7% on WN18, while producing the same results on FB15k.
the other models on WN18 and produces the highest Hits@10 score on FB15k.Compared to the closely related models SE, TransE, TransR, CTransR and TransD, STransE does better than these models on both WN18 and FB15k.Following Bordes et al. (2013), Table 4 analyzes Hits@10 results on FB15k with respect to the relation categories defined as follows: for each relation type r, we computed the averaged number a h of heads h for a pair (r, t) and the averaged number a t of tails t for a pair (h, r).If a h < 1.5 and a t < 1.5, then r is labeled 1-1.If a h ≥ 1.5 and a t < 1.5, then r is labeled M-1.If a h < 1.5 and a t ≥ 1.5, then r is labeled as 1-M.If a h ≥ 1.5 and a t ≥ 1.5, then r is labeled as M-M.1.4%, 8.9%, 14.6% and 75.1% of the test triples belong to a relation type classified as 1-1, 1-M, M-1 and M-M, respectively.
Table 4 shows that in comparison to prior models not using path information, STransE obtains highest Hits@10 result for M-M relation category at (80.1% + 83.1%)/2 = 81.6%.In addition, STransE also performs better than TransD for 1-M and M-1 relation categories.We believe the improved performance of the STransE model is due to its use of full matrices, rather than just projection vectors as in TransD.This permits STransE to model diverse and complex relation categories (such as 1-M, M-1 and especially M-M) better than TransD and other similiar models.However, STransE is not as good TransD for the relations.Perhaps the extra parameters in STransE hurt performance in this case (note that 1-1 relations are relatively rare, so STransE does better overall).

Conclusion and future work
This paper presented a new embedding model for link prediction and KB completion.Our STransE combines insights from several simpler embedding models, specifically the Structured Embedding model (Bordes et al., 2011) and the TransE model (Bordes et al., 2013), by using a low-dimensional vector and two projection matrices to represent each relation.STransE, while being conceptually simple, produces highly competitive results on standard link prediction evaluations, and scores better than the embedding-based models it builds on.Thus it is a suitable candidate for serving as future baseline for more complex models in the link prediction task.
In future work we plan to extend STransE to exploit relation path information in knowledge bases, in a manner similar to Lin et al. (2015a), Garcia-Duran et al. (2015a) or Guu et al. (2015).

Table 1 :
The score functions f r (h, t) and the optimization methods (Opt.) of several prominent embedding models for KB completion.In all of these the entities h and t are represented by vectors h and t ∈ R k respectively.

Table 2 :
Statistics of the experimental datasets used in this study (and previous works).#E is the number of entities, #R is the number of relation types, and #Train, #Valid and #Test are the numbers of triples in the training, validation and test sets, respectively.