Encoding Temporal Information for Time-Aware Link Prediction

Most existing knowledge base (KB) embedding methods solely learn from time-unknown fact triples but neglect the temporal information in the knowledge base. In this paper, we propose a novel time-aware KB embedding approach taking advantage of the happening time of facts. Speciﬁcally, we use temporal order constraints to model transformation between time-sensitive relations and enforce the embeddings to be temporally consistent and more accurate. We empirically evaluate our approach in two tasks of link prediction and triple classiﬁcation. Experimental re-sults show that our method outperforms other baselines on the two tasks consistently.


Introduction
Knowledge bases (KBs) such as Freebase (Bollacker et al., 2008) and YAGO (Fabian et al., 2007) play a pivotal role in many NLP related applications. KBs consist of facts in the form of triplets (e i , r, e j ), indicating that head entity e i and tail entity e j is linked by relation r. Although KBs are large, they are far from complete. Link prediction is to predict relations between entities based on existing triplets, which can alleviate the incompleteness of current KBs. Recently a promising approach for this task called knowledge base embedding (Nickel et al., 2011;Bordes et al., 2011;Socher et al., 2013) aims to embed entities and relations into a continuous vector space while preserving certain information of the KB graph. TransE (Bordes et al., 2013) is a typical model considering relation vector as trans-lating operations between head and tail vector, i.e., e i + r ≈ e j when (e i , r, e j ) holds.
Most existing KB embedding methods solely learn from time-unknown facts but ignore the useful temporal information in the KB. In fact, there are many temporal facts (or events) in the KB, e.g., (Obama, wasBornIn, Hawaii) happened at August 4, 1961. (Obama, presidentOf, USA) is true since 2009. Current KBs such as YAGO and Freebase store such temporal information either directly or indirectly. The happening time of time-sensitive facts may indicate special temporal order of facts and time-sensitive relations. For example, (Einstein, wasBornIn, Ulm) happened in 1879, (Einstein, wonPrize, Nobel Prize) happened in 1922, (Einstein, diedIn, Princeton) occurred in 1955. We can infer the temporal order of time-sensitive relations from all such kinds of facts: wasBornIn → wonPrize → diedIn. Traditional KB embedding models such as TransE often confuse relations such as wasBornIn and diedIn when predicting (person,?,location) because TransE learns only from time-unknown facts and cannot distinguish relations with similar semantic meaning. To make more accurate predictions, it is non-trivial for existing KB embedding methods to incorporate temporal order information.
This paper mainly focuses on incorporating the temporal order information and proposes a timeaware link prediction model. A new temporal dimension is added to fact triples, denoted as a quadruple: (e i , r, e j , t r ), indicating the fact happened at time t r 1 . To make the embedding space compati-ble with the observed triple in the fact dimension, relation vectors behave as translations between entity vectors similarly as TransE models. To incorporate temporal order information between pair-wise temporal facts, we assume that prior time-sensitive relation vector can evolve into a subsequent timesensitive relation vector through a temporal transition. For example, we have two temporal facts sharing the same head entity: (e i , r i , e j , t 1 ) and (e i , r j , e k , t 2 ) and the temporal order constraint t 1 < t 2 , i.e., r i happens before r j , then we propose the assumption that prior relation r i after temporal transition should lie close to subsequent relation r j , i.e., r i M ≈ r j , here matrix M captures the temporal order information between relations. In this way, both semantic and temporal information are embedded into a continuous vector space during learning.
To the best of our knowledge, we are the first to consider such temporal information for KB embedding. We evaluate our approach on public available datasets and our method outperforms state-of-the-art methods in the time-aware link prediction and triple classification tasks.

Time-Aware KB Embedding
Traditional KB embedding methods encode only observed fact triples but neglect temporal constraints between time-sensitive entities and facts. To deal with this limitation, we introduce Time-Aware KB Embedding which constrains the task by incorporating temporal constraints.
To consider the happening time of facts, we formulate a temporal order constraint as an optimization problem based on a manifold regularization term. Specially, temporal order of relations in timesensitive facts should affect KB representation. If r i and r j share the same head entity e i , and r i occurs before r j , then prior relation's vector r i could evolve into subsequent relation's vector r j in the temporal dimension.
To encode the transition between time-sensitive relations, we define a transition matrix M ∈ R n×n between pair-wise temporal ordering relation pair (r i , r j ). Our optimization requires that positive temporal ordering relation pairs should have lower scores (energies) than negative pairs, so we define a temporal order score function as which is expected to be a low score when the relation pair is in chronological order, and high otherwise.
To make the embedding space compatible with the observed triples, we make use of the triple set ∆ and follow the same strategy adopted in previous methods such as TransE.
f (e i , r, e j ) = e i + r − e j 1 . (2) For each candidate triple, it requires positive triples to have lower scores than negative triples. The optimization is to minimize the joint score function, where r i and r j share the same head entity e i , and y − = (r j , r i ) ∈ Ω e i are the corresponding negative relation order pairs by inverse. In experiments, we enforce constrains as e i 2 ≤ 1, r i 2 ≤ 1, r j ≤ 1 and r i M 2 ≤ 1.
The first term in Equation (3) enforces the resultant embedding space compatible with all the observed triples, and the second term further requires the space to be temporally consistent and more accurate. Hyperparameter λ makes a trade-off between the two cases. Stochastic gradient descent (in minibatch mode) is adopted to solve the minimization problem.

Experiments
We adopt the same evaluation metrics for timeaware KB embedding in two tasks: link prediction (Bordes et al., 2013) and triple classification (Socher et al., 2013).   Table 1. This dataset is denoted YG15k. Second, to make a mixed dataset, we created YG36k where 50% facts have time annotations and 50% do not. We will release the data upon request.

Link Prediction
Link prediction is to complete the triple (h, r, t) when h, r or t is missing.

Entity Prediction
Evaluation protocol. For each test triple with missing head or tail entity, various methods are used to compute the scores for all candidate entities and rank them in descending order. We use two metrics for our evaluation as in (Bordes et al., 2013): the mean of correct entity ranks (Mean Rank) and the proportion of valid entities ranked in top-10 (Hit-s@10). As mentioned in (Bordes et al., 2013), the metrics are desirable but flawed when a corrupted triple exists in the KB. As a countermeasure, we filter out all these valid triples in the KB before ranking. We name the first evaluation set as Raw and the second as Filter. Baseline methods. For comparison, we select translating methods such as TransE (Bordes et al., 2013), TransH (Wang et al., 2014b) and TransR (Lin et al., 2015b) as our baselines. We then use time-aware embedding based on these methods to obtain the corresponding time-aware embedding models. A model with time-aware embedding is denoted as "tTransE" for example. Implementation details. For all methods, we create 100 mini-batches on each data set. The di-  mension of the embedding n is set in the range of {20,50,100}, the margin γ 1 and γ 2 are set in the range {1,2,4,10}. The learning rate is set in the range {0.1, 0.01, 0.001}. The regularization hyperparameter λ is tuned in {10 −1 ,10 −2 ,10 −3 ,10 −4 }. The best configuration is determined according to the mean rank in validation set. The optimal configurations are n=100,γ 1 =γ 2 =4,λ=10 −2 , learning rate is 0.001 and taking 1 −norm.
Results. Table 2 reports the results on the test set.
From the results, we can see that time-aware embedding methods outperform all the baselines on all the data sets and with all the metrics. The improvements are usually quite significant. The Mean Rank drops by about 75%, and Hits@10 rises about 19% to 30%. This demonstrates the superiority and generality of our method. When dealing with sparse data YG15k, all the temporal information is utilized to model temporal associations and make the embeddings more accurate, so it obtains better improvement than mixing the time-unknown triples in YG36k.

Relation Prediction
Relation prediction aims to predict relations given two entities. Evaluation results are shown in Table 3 on only YG15K due to limited space, where we report Hits@1 instead of Hit-s@10. Example prediction results for TransE and tTransE are compared in Table 4. For example, when testing (Billy Hughes,?,London,1862), it's easy for TransE to mix relations wasBornIn and diedIn because they act similarly for a person and a place. But known that (Billy Hughes, isAffiliatedTo, National Labor Party) happened in 1916, and tTransE have learnt temporal order that wasBornIn→isAffiliatedTo→diedIn, so the regularization term |r born T − r af f iliated | is smaller than |r died T − r af f iliated |, so correct answer wasBornIn ranks higher than diedIn.   predictions tTransE predictions (Billy Hughes,?,London,1862) diedIn,wasBornIn wasBornIn,diedIn (John Schoenherr,?,Caldecott Medal,1988) owns,hasWonPrize hasWonPrize,owns (John G. Thompson,?,University of Cambridge,1961) graduatedFrom,worksAt worksAt,graduatedFrom (Tommy Douglas,?,New Democratic Party,1961 isMarriedTo,isAffiliatedTo isAffiliatedTo,worksAt

Triple Classification
Triple classification aims to judge whether an unseen triple is correct or not. Evaluation protocol. We follow the same evaluation protocol used in Socher et al. (2013). To create labeled data for classification, for each triple in the test and validation sets, we construct a corresponding negative triple by randomly corrupting the entities. To corrupt a position (head or tail), only entities that have appeared in that position are allowed. During triple classification, a triple is predicted as positive if the score is below a relation-specific threshold δ r ; otherwise as negative. We report averaged accuracy on the test sets. Implementation details. We use the same hyperparameter settings as in the link prediction task. The relation-specific threshold δ r is determined by maximizing averaged accuracy on the validation sets.
Results. Table 5 reports the results on the test sets. The results indicate that time-aware embedding outperforms all the baselines consistently. Temporal order information may help to distinguish positive and negative triples as different head entities may have different temporally associated relations. If the temporal order is the same with most facts, the regularization term helps it get lower energies and vice versa.

Related Work
Many models have been proposed for KB embedding (Nickel et al., 2011;Bordes et al., 2013;Socher et al., 2013). External information is employed to improve KB embedding such as text (Riedel et   2013; Wang et al., 2014a;Zhao et al., 2015), entity type and relationship domain (Guo et al., 2015;Chang et al., 2014), and relation path (Lin et al., 2015a;Gu et al., 2015). However, these methods solely rely on triple facts but neglect temporal order constraints between facts. Temporal information such as relation ordering in text has been explored (Talukdar et al., 2012;Bethard, 2013;Chambers et al., 2007;Chambers and Jurafsky, 2008). This paper proposes a time-aware embedding approach that employs temporal order constraints to improve KB embedding.

Conclusion and Future Work
In this paper, we propose a general time-aware KB embedding, which incorporates creation time of entities and imposes temporal order constraints on the geometric structure of the embedding space and enforce it to be temporally consistent and accurate. As future work: (1) We will incorporate the valid time of facts.
(2) Some time-sensitive facts lack temporal information in YAGO2, we will mine such temporal information from texts.