Knowledge Graph Embedding with Numeric Attributes of Entities

Knowledge Graph (KG) embedding projects entities and relations into low dimensional vector space, which has been successfully applied in KG completion task. The previous embedding approaches only model entities and their relations, ignoring a large number of entities’ numeric attributes in KGs. In this paper, we propose a new KG embedding model which jointly model entity relations and numeric attributes. Our approach combines an attribute embedding model with a translation-based structure embedding model, which learns the embeddings of entities, relations, and attributes simultaneously. Experiments of link prediction on YAGO and Freebase show that the performance is effectively improved by adding entities’ numeric attributes in the embedding model.

KGs encode structured information of entities in the form of triplets (e.g. M icrosof t, isLocatedIn, U nitedStates ), and have been successfully applied in many realworld applications. Although KGs contain a huge amount of triplets, most of them are incomplete. In order to further expand KGs, much work on KG completion has been done, which aims to predict new triplets based on the existing ones in KGs. A promising group of research for KG completion is known as KG embedding. KG embedding * Corresponding Author approaches project entities and relations into a continuous vector space while preserving the original knowledge in the KG. KG embedding models achieve good performance in KG completion in terms of efficiency and scalability. TransE is a representative KG embedding approach (Bordes et al., 2013), which projects both entities and relations into the same vector space: if a triplet head entity, relation, tail entity (denoted as h, r, t ) holds, TransE wants that h + r ≈ t. The embeddings are learned by minimizing a margin-based ranking criterion over the training set. TransE model is simple but powerful, and it gets promising results on link prediction and triple classification problems. There are several enhanced model of TransE, including TransR (Lin et al., 2015), TransH (Wang et al., 2014) and TransD (Ji et al., 2015) etc. By introducing new representations of relational translation, later approaches achieve better performance at the cost of increasing model complexity. Recent surveys (Wang et al., 2017;Nickel et al., 2016) give detailed introduction and comparison of various KG embedding approaches.
However, most of the existing KG embedding approaches only model relational triplets (i.e. triplets of entity relations), while ignoring a large number of attributive triplets (i.e. triplets of entity attributes, e.g. M icrosof t, wasF oundedOnDate, 1975 ) in KGs. attributive triplets describe various attributes of entities, such as ages of people or areas of a city. There are a huge number of attributive triplets in real KGs, and we believe that information encoded in these triplets is also useful for predicting entity relations. Having the above motivation, we propose a new KG embedding approach that jointly model entity relations and entities' numeric attributes. Our approach consists of two component models, structure embedding model and attribute embedding model. The structure embedding model is a translational distance model that preserves the knowledge of entity relations; the attribute embedding model is a regression-based model that preserves the knowledge of entity attributes. Two component models are jointly optimized to get the embeddings of entities, relations, and attributes. Experiments of link prediction on YAGO and Freebase show that the performance is effectively improved by adding entities' numeric attributes in the embedding model.

Our Approach
To effectively utilize numeric attributes of entities in KG embedding, we propose TransEA, which combine a new attribute embedding model with the structure embedding model of TransE. Two component models in TransEA share the embeddings of entities, and they are jointly optimized in the training process.

Structure Embedding
The structure embedding directly adopts the translation-based method in TransE to model the relational triplets in KGs. Both Entities and relations in a KG are represented in the same vector space R d . In a triplet h, r, t , the relation is considered as a translation vector r, which connects the vector of entities h and t with low error, i.e. h + r ≈ t. The score function of a given triplet h, r, t is defined as (1) ||x|| 1/2 denotes either the L1 or L2 norm. For all the relational triplets in the KG, the loss function of the structure embedding is defined as: (2) where [x] + = max{0, x}, S denotes the set of negative triplets constructed by corrupting h, r, t , i.e. replacing h or t with a randomly chosen entity in KG; γ > 0 is a margin hyperparameter separating positive and negative triplets.

Attribute Embedding
Attribute embedding model takes all the attributive triplets in a KG as input, and learns embeddings of entities and attributes. Both entities and attributes are represented as vectors in space R d . In an attributive triplet e, a, v , e is an entity, a is an attribute, and v is the value of the entity's attribute. In our approach, we only consider attributive triplets containing numeric values or values can be easily converted into numeric ones. For a triplet e, a, v , we define a score function as where a and e are vectors of attribute a and entity e, b a is a bias for attribute a. The idea of this score function is to predict the attribute value by a linear regression model of attribute a; the vector a and bias b a are the parameters of the regression model. For all the attributive triplets in the KG, the loss function of the attribute embedding is defined as: where T is the set of all attributive triplets with numeric values in the KG.

Joint Model
To combine the above two component models, TransEA minimizes the following loss function: where α is a hyper-parameter that balances the importance of structure and attribute embedding. In the joint model, we let the embeddings of entities shared by two component models. Entities, relations, and attributes are all represented by vectors in R d . We implement our approach by using Ten-sorFlow 1 , and the loss function is minimized by performing stochastic gradient descent.

Datasets
The following two datasets are used in the experiments, Table 1 shows their detail information.
YG58K. YG58K is a subset of YAGO3 (Mahdisoltani et al., 2015) which contains about 58K entities. YG58K is built by removing entities from YAGO3 that appear less than 25 times or have no attributive triplets. All the remaining triplets are then randomly split into training/validation/test sets.
FB15K. FB15K is a subset of triplets extracted from Freebase 2 . This subset of Freebase was originally used in (Bordes et al., 2013), and then widely used for evaluating KB completion approaches. Since our approach consumes attributive triplets, we extract all the attributive triplets of entities in FB15K from Freebase to build the evaluation dataset.

Experimental setup
In the experiments, Mean Rank (the mean rank of the original correct entity), Hits@k (the proportion of the original correct entity to the top k entities), and MRR (the mean reciprocal rank) are used as evaluation metrics. Given a testing triplet h, r, t , we replace the head h by every entity in the KGs and calculate dissimilarity measures according to the score function f r . Ranking the scores in ascending order, then we get the rank of the original correct triplet to compute the evaluation metrics. And we repeat the procedure when removing the tail t instead of the head h. We name the evaluation setting as "Raw". While corrupted triplets that appear in the train/valid/test sets (except the original correct one) may underestimate the metrics, we also filter out those corrupted triplets before getting the rank of each testing triplet and we call this process "Filter". Because our approach is built based on TransE, we compare our approach with TransE to see whether adding attribute embedding in the model improves the performance of link prediction. For TransE and TransEA, we consider the learning rate λ among {0.1, 0.01, 0.001}, the margin γ among {1, 2, 4, 10}, the dimensions of embedding d among {20, 50, 100, 150}, the types of norm in two score functions among {L1, L2}, and α among {0.2, 0.3, 0.4, 0.5, 0.6}. Based on the mean rank in validation set, we select the best configurations for two approaches. On the YG58K dataset, the best parameter configuration for TransE is (λ = 0.1, γ = 4, d = 50, f r = L1, f a = L1), and for TransEA is (λ = 0.001, γ = 4, d = 50, f r = L1, f a = L1, α = 0.6). On the FB15K dataset, the best parameter configuration for TransE is(λ = 0.01, γ = 1, d = 50, f r = L1, f a = L1), and for TransEA is (λ = 0.001, γ = 2, d = 100, f r = L1, f a = L1, α = 0.3). Table 2 shows the results of link prediction on YG58K and FB15K datasets. The results of predicting head and tail entities are outlined separately, and we also report the overall results by considering prediction of both head and tail entity. According to the overall results, TransEA outperforms TransE on both two datasets in terms of all the three metrics. TransEA gets lower Mean Ranks by about 10 on YG58K dataset; the MRR and Hits@k of two approaches are very close, TransEA gets slightly better results, the improvements of MRR and Hits@k are 0.1-0.2% and 0-0.3%. On FB15K dataset, TransEA gets lower Mean Ranks by 13, and it also gets better results than TransE according to MRR, Hits@10 and Hits@3. Table 3 shows the results of different relational categories. In general, TransEA has superiority on two datasets, except one-to-many relation for replacing head entity on YG58K. And the improvements on FB15K are larger than YG58K.

Results
In order to figure out which relations are predicted more accurately by TransEA, Table  4 lists the top 5 improved relations in terms of Hits@10 on YG58K. It shows the best improvement of Hits@10 is 25% for the relation isInterestedIn. The second one is 12.5% for hasAcademicAdvisor, and the third is 6.3% for worteMusicFor.
Entities of these three relations have plenty of numeric attributes (wasBornOnDate, diedOnDate ) describing people, we believe they are helpful to improving the embeddings of entity relations. Entities in relational triplets about livesIn, (e.g. HankAzaria, livesIn, N ewY ork ), also have some numeric attributes (hasLatitude, hasLongtude, hasNumberOfPeople, etc), therefore TransEA gets a 5% improvement of Hits@10.
On FB15K dataset, five relations have 100%     improvements of Hits@10, because TransE does not correctly predict any correct triplets in the top 10 ranked ones. We find that these relations only have one single sample in the test sets, so Table 5 lists the Mean Rank of them. Obviously, TransEA improves their Mean Rank a lot. Entities in triplets of the five relations have only a few attributes. For example, the relation business/brand/company only has one numeric attributive triplet about organization/dateFounded. And the relation music/artists supported has two triplets with numeric attributes person/dateOfBirth and one triplet with person/heightMeters. Therefore, the quality of predicted links can be improved as well even with only a small number of entities numeric attributes.

Conclusion
In this paper, we propose TransEA, an embedding approach which jointly models relational and attributive triplets in KGs. TransEA combines an attribute embedding model with the translation-based embedding model in TransE. Experiments on YAGO and Freebase show that TransEA achieves better performance than TransE in link prediction task. In the future, we will study how to predict missing attribute values in KGs based on KG embedding.