Learning to Update Knowledge Graphs by Reading News

News streams contain rich up-to-date information which can be used to update knowledge graphs (KGs). Most current text-based KG updating methods rely on elaborately designed information extraction (IE) systems and carefully crafted rules, which are often domain-specific and hard to maintain. Besides, such methods often hardly pay enough attention to the implicit information that lies underneath texts. In this paper, we propose a novel neural network method, GUpdater, to tackle these problems. GUpdater is built upon graph neural networks (GNNs) with a text-based attention mechanism to guide the updating message passing through the KG structures. Experiments on a real-world KG updating dataset show that our model can effectively broadcast the news information to the KG structures and perform necessary link-adding or link-deleting operations to ensure the KG up-to-date according to news snippets.


Introduction
Knowledge graphs have been widely used in different areas, where keeping the knowledge triples up-to-date is a crucial step to guarantee the KG quality.
Existing works attempt to synchronize KGs with encyclopedia sources (Morsey et al., 2012;Liang et al., 2017), which basically leverage structured data, while directly updating KGs using plain texts, such as news snippets, is not an easy task and still remains untouched. For example, given a KG recording the rosters of all NBA teams, we need to update this KG according to a news article which describes a trade between different teams. Given the following news snippet that reports a trade between Minnesota Timberwolves and Philadelphia 76ers, one has to add and delete many links in the KG, as illustrated in Figure 1.
The Minnesota Timberwolves has acquired forward Robert Covington, forward DarioŠarić, guard Jerryd Bayless and a 2022 second-round draft pick from the Philadelphia 76ers in exchange for forward Jimmy Butler and center Justin Patton.
Current approaches either rely on manual updating, or often solve it in a two-step manner: first, using an off-the-shelf information extraction (IE) tool to extract triples from the text, and then modifying the KG according to certain rules predefined by domain experts. However, we should notice that besides what we can explicitly extract from the text, the implicit information, i.e., information that is not mentioned in the text but can be inferred from the text, should also be taken into consideration. This requires far more complicated updating rules. In the above example, if Robert Covington (RC) is traded from Philadelphia 76ers to Minnesota Timberwolves, his teammates should all be changed accordingly, which is not mentioned in the news at all. This is what we referred as implicit information in this scenario.
We should point out that both explicit and implicit information are just different aspects of the same event, which is different from another stream of research that focuses on the evolution of graphs (Pareja et al., 2019;Jin et al., 2019). For example, changing the head coach may trigger a possible trade for his/her favorite players in the future, but these are actually two events, thus beyond the scope of our work.
Intuitively, the implicit information behind the news is related to the KG structures. Many recent works focus on embedding a KG into a continuous vector space, which can be used to fill in miss- Figure 1: An example illustrating a trade in the KG of NBA teams. Note that all changes described by the news happen in the text-subgraph, the implicit changes, e.g., the teammate changes of player Robert Covington (RC), locate outside the text-subgraph but inside the 1hop-subgraph. The outside of the 1hop-subgraph, e.g, the Atlanta Hawks part, will not be affected by this trade.
ing links in KGs (Bordes et al., 2013;Wang et al., 2014;Lin et al., 2015;Yang et al., 2014;Trouillon et al., 2016;Schlichtkrull et al., 2018). However, most of them learn from static KGs, thus unable to help with our task, since they can not dynamically add new links or delete obsolete links according to extra text information.
In this paper, we propose a novel neural model, GUpdater, to tackle this problem, which features a graph based encoder to learn latent KG representations with the guidance from the news text, and a decoder to score candidate triples with reconstructing the KG as the objective. Our encoder is built upon the combination of recently proposed R-GCN (Schlichtkrull et al., 2018) and GAT (Velickovic et al., 2017) with two new key factors designed specifically for our task. The main idea is to control the updating message from the text passing through the KG. First, we use the given text to generate all the graph attention weights in order to selectively control message passing. Second, we link all entities mentioned in the news text together as shortcuts in order to let the message pass to each other even if they are topologically far from each other in the KG. For the decoder, we simply use DistMult (Yang et al., 2014) to score related triples to be updated. To evaluate our method, we construct a new real-world dataset, NBAtransactions, for this task. Experimental results show that our model can effectively use the news text to dynamically add new links and remove obsolete links accordingly.
Our contributions are in two-fold: 1. we propose a new text-based KG updating task and release a new real-world dataset to the community for further research.
2. we design a novel neural methodl, GUpdater, to read text snippets, which features an attention mechanism to selectively control the message passing over KG structures. This novel architecture enables us to perform both link-adding and link-deleting to ensure the KG up-to-date.

Task Formulation
We first formally define the task. Given a knowledge graph G = (E, R, T), where E, R and T are the entity set, relation set and KG triple set, respectively, and a news text snippet S = {w 1 , w 2 , · · · , w |S| }, for which entity linking has been performed to build the mentioned entity set L ⊂ E, the text-based knowledge graph updating task is to read the news snippet S and update T accordingly to get the final triple set T and the updated graph G = (E, R, T ). In this paper, we focus on the scenarios where the entity set remains unchanged.
As a single event can only affect the KG in a limited range, for each event that is described by a news snippet, we consider two kinds of subgraphs that are defined by their entity sets, as illustrated in Figure 1: Text-Subgraph Subgraphs with entity set E text = L, i.e., all the entities are mentioned in news snippet S. Explicit information in the texts can lead to changes in these graphs only.
1Hop-Subgraph If we expand text-subgraphs by one hop, we get 1hop-subgraphs. For each entity in 1hop-subgraphs, either itself or one of its neighbors is mentioned in the news. The one-hop entity set is formally defined as E 1hop = {e|e ∈ L ∨ (((e, r, t) ∈ T ∨ (t, r, e) ∈ T) ∧ t ∈ L)}.
We argue that updating in 1hop-subgraphs only is a good balance of effectiveness and efficiency. On the one hand, implicit changes always occur around explicit changes usually within a short distance. In fact, most of the implicit changes can be captured in the one-hop range. Here we give an extreme example: given the news snippet "Barack Obama was elected president.", the explicit change is to add (Barack Obama, Position, President) to the KG, while one of the implicit changes is to add (Michelle Obama, Position, First Lady) to the KG. Note that in this case, both "Michelle Obama" and "First Lady" are not mentioned in the news, but, because they are neighbors of "Barack Obama" and "President", respectively, this implicit change can be found in the 1hop-subgraph. On the other hand, updating in small subgraphs can avoid a huge amount of meaningless computation, and meanwhile can reduce mis-prediction. For these two reasons, we se-lect the one-hop range, i.e., the 1hop-subgraph as the main testbed for this task. Also note that this one-hop setting can be easily extended to two-hop or larger scopes if necessary.

Our Method
The overview of our model is illustrated in Figure 2. As mentioned before, our GUpdater follows an encoder-decoder framework with the objective to reconstruct the modified KGs.

Relational Graph Attention Layer
In order to better capture the KG structures, we propose a new graph encoding layer, the relational graph attention layer (R-GAT), as the basic building block of GUpdater's encoder. R-GAT can be regarded as a combination of R-GCN (Schlichtkrull et al., 2018) and GAT (Velickovic et al., 2017), which benefits from both the ability of modeling relational data and the flexibility of attention mechanism. Recall that the layer-wise propagation rule of R-GCN can be written as follows: whereÂ l r is the normalized adjacent matrix for relation r, W l r is a layer-specific trainable weight matrix for relation r, and σ(·) denotes an activa-tion function, here we use ReLU. H l is the latent entity representations in the l th layer.
Upon R-GCN, we can easily introduce attention mechanisms by computingÂ r using an attention function: ij is the i th row and j th column element of A l r , and N r i denotes the set of neighbor indices of node i under relation r. att lr (·, ·) is the attention function. h l i and h l j are the i th and the j th entity representations of layer l, respectively.
A known issue for R-GCN (Eq. (1)) is that the number of model parameters grows rapidly with the increasing number of relation types. We thus use basis-decomposition (Schlichtkrull et al., 2018) for regularization. In a standard R-GCN layer, if there are R relations involved, there will be R weight matrices for each layer. Basisdecomposition regularizes these matrices by defining each weight matrix as a linear combination of B(B < R) basis matrices, which significantly decreases the parameter number.

Text-based Attention
The core idea of GNNs is to gather neighbors' information. In our case, we propose a text-based attention mechanism to utilize the news snippet to guide the message passing along the KG structure within the R-GAT layer.
We first use bi-GRU (Cho et al., 2014) to encode the given news text S into a sequence of representations {u 1 , u 2 , · · · , u |S| }, then we leverage the sequence attention mechanism (Luong et al., 2015) to compute the context vector: where b lr t is the text attention weight, and is computed as follow: where g lr text is a trainable guidance vector to guide the extraction of relation-dependent context.
Recall that in GAT, the way to compute the attention weights is similar to the formula below: where || denotes concatenation operation, and g lr graph is a relation-specific trainable vector that serves as a graph guidance vector to decide which edge to pay attention to.
Here, we generate the final guidance vector g lr f in that combines textual information to the graph guidance vector by simply using linear interpolation, and we replace g lr graph in Eq. (5) by g lr f in to get our final attention function: where U lr is a trainable transformation matrix, and α lr ∈ [0, 1] can either be trainable or fixed. If we set α lr = 1, then Eq.(7) degenerates to Eq.(5) and our encoder degenerates to R-GAT. It makes it easy to pre-train the model to get good embeddings for all entities and relations when news snippets are not provided.

Shortcuts
Shortcuts In practice, entities in L can be far from each other in the KG, even unreachable sometimes, while a single-or two-layer R-GAT can only gather information from a near neighborhood. In order to encourage more direct interactions through the graph structure, before running the model, we simply link all entities in L to each other and assign these links with a new relation label, SHORTCUT.
Shortcuts with labels Actually, relations between entities in L can be different. For example, if a player is traded from one team to another, the relations between this player and that two teams must be opposite. Thus a unified shortcut label may make it difficult to correctly pass opposite messages, even with the help of the attention mechanism. So we further extend the shortcuts' label to 3 types: ADD, DEL and OTHER. ADD (DEL) denotes that these triples are explicitly mentioned in the text to be added to (deleted from) the KG. The rest are labeled with OTHER.
Off-the-shelf IE tools can be used to generate these labeled shortcuts. Here, we easily build a simple extraction model using our GUpdater encoder. We get the entity representations from GUpdater's encoder (with unified shortcuts), then for each entity pair (e i , e j ), where e i , e j ∈ L, i = j, we use an MLP classifier to get the probability of each label: where shortcut label z ∈ {ADD, DEL, OTHER}, h i and h j are entity representations for entity e i , e j , respectively.
In our experiments, we train the shortcutlabeling module separately, as we find that the results are good enough while saving training time. One can surely perform the extraction step in a joint training fashion.

Decoder
We use DistMult (Yang et al., 2014), which is known to have good performance on standard KG completion tasks, followed by sigmoid function, as the decoder, i.e., for each possible triple (e i , r k , e j ) where e i , e j ∈ E 1hop , r k ∈ R, the probability of this triple to appear in the final KG is computed as follow: where • denotes element-wise multiplication.
Since we formulate KG updating as a binary classification task, we use the cross-entropy loss to train the model.

Dataset
We construct a new dataset, NBAtransactions, to evaluate our method. NBAtransactions contains 4,100 transaction-news pairs in NBA from 2010 to 2019, 3,261 for training, 417 for validation and 422 for testing. We consider 8 different kinds of transactions, which can be divided into 3 categories according to the updating patterns: 1) Adding edges only: free agency and draft. 2) Deleting edges only: released, overseas, retirement and d-league. 3) Both adding and deleting edges: trade and head coach.
For each transaction, a news snippet S with mentioned entity set L and two undirected KG fragments, G 1hop and G 1hop , representing the corresponding 1hop-subgraphes before and after the transaction, respectively, are given. Averagely, one subgraph contains 27.86 entities, 4 types of relations, and 239.28 triples, and one news snippet contains 29.10 words and 2.66 KG entities. Each transaction averagely causes adding of 17.07 edges and deleting of 18.37 edges; among them, only 0.95 (for adding) and 1.03 edges (for deleting) are in the text-subgraph, i.e., only 5.6% of the edge changes are explicitly mentioned in the news texts. Detailed statistics are shown in Table 1.
Our dataset is collected from several NBA related webistes. The KGs we used in the dataset are constructed using the roster of each NBA team in each season collected from Basketball-Reference 2 . For each NBA season, we build a large KG that records all teams, players, head coaches, general managers and their relations at the beginning of that season. So there are 9 large KGs in total, one KG for one NBA season. Wikipedia 3 records all NBA transactions from 1946 to 2019 in a structured form and provides URLs of news sources for most of the transactions after 2010. We crawled all the available news and corresponding transactions. For each transaction, G 1hop can be easily extracted from the corresponding large KG. However, we cannot get G 1hop directly, as our KGs only record the rosters at the beginning of each season. To generate the G 1hop , we manually create a large set of complicated conversion rules for each transaction type to modify G 1hop and obtain the 1hop-subgraph after the transaction. We use string matching algorithms to perform entity linking, and meanwhile generate the mentioned entity set L.

Setup
The dimensions of all embeddings (words, entities, and relations) are all set to 128, and the hidden dimension is 256. We use a single layer encoder, as we find that more layers do not bring any benefit. The basis number of basis-decomposition is 2. We replace the word embeddings of entity mentions in the text by the entity embeddings for better alignment of word embedding space and entity embedding space. The entity embeddings and relation embeddings are pre-trained using R-GAT, and the word embeddings are randomly initialized. We set dropout (Srivastava et al., 2014) rate to 0.5. The batch size in our experiments is 1. We use Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.001 for training.
We consider updating in two different scopes: 1) Text-Subgraphs: changes in these subgraphs correspond to the explicit information mentioned in the texts. 2) 1Hop-Subgraphs: all explicit and implicit changes happen in these subgraphs, thus this setting is treated as the overall/real-world updating evaluation.

Metrics
As our model aims to reconstruct the KG using binary classification, i.e., deciding whether a possible triple should be in the modified KG, we use accuracy, precision, recall and F1 score as the evaluation metrics. Further, to evaluate the ability of link-adding, link-deleting and link-preserving, respectively, we collect all added edges, deleted edges, and unchanged edges and compute the prediction accuracies separately, which we denote as Added Acc, Deleted Acc and Unchanged Acc, respectively.

Baseline Models
Because most current text-based KG updating methods are in two steps: first extract information from texts, then add and remove links in KGs, we select non-rule-based models that perform well on these two steps and also their combination as our baseline models.
PCNN (Zeng et al., 2015) is a strong baseline for relation extraction, which divides the sentence into three pieces and applies max-pooling in a piecewise manner after the convolution layer.
IE-gold is a simulation of an ideal IE model that can perfectly extract explicit information from given texts. This is an upper bound for the information extraction step.
DistMult (Yang et al., 2014) is a widely-used multiplication-based triple scoring function for KG completion, which computes the three-way inner-product of the triples. Its symmetric nature is suitable for NBAtransactions as the KGs are undirected.
R-GCN (Schlichtkrull et al., 2018) is one of the strongest baselines for KG completion, which first encodes entities by gathering neighbors' information, then decodes them by DistMult. Note that compared to traditional KG completion tasks, here the target KGs are the modified KGs and usually different from the input KGs. Because of the lack of network structure, single DistMult is unable to train like this.
IE-gold + R-GCN is the combination of IEgold and R-GCN. We first use IE-gold to generate a perfectly updated text-subgraph, then use R-GCN to predict other changes outside the textsubgraph.

Main Results
We summarize the main results in Table 2. As we can see, PCNN performs pretty well in extracting explicit information from the news (over 0.96 in Added Acc, Deleted Acc and Unchanged Acc in Text-Subgraphs). However, the explicit changes only take up 5.8% in the testing set, while the majority of changes are implicit. Therefore, even a perfect IE model, IE-gold, which never makes any wrong predictions, can only correctly predict about 6% of the changes in the 1hop-subgraphs.
The implicit changes are highly related to KG structures. Compared to DistMult, which just preserves the original structures and scores 0.0656 in Added Acc, 0.1220 in Deleted Acc in the 1Hop-Subgraphs, R-GCN can use the modified KGs for training and learns to predict changes without reading the news. R-GCN beats DistMult by  0.37 in Added Acc and 0.08 in Deleted Acc in the 1Hop-Subgraphs. Such a huge improvement suggests that the blind prediction is not completely arbitrary, and R-GCN indeed learns certain updating patterns through the KG structures. Besides, R-GCN scores 0.9393 in F1 in the 1hop-subgraphs, outperforms IE-gold by 1%, once again underlines the importance of implicit changes. IE-gold + R-GCN combines the advantages of IE-gold and R-GCN, and performs best among the baselines. However, in the 1hop-subgraphs, the Added Acc and Deleted Acc are only 0.4681 and 0.2448, which are still quite low. Although IEgold + R-GCN can perfectly extract explicit information from the news, there is no mechanism for the IE model to pass the extracted information to the R-GCN module, thus this two-step method still performs poorly in finding out the implicit changes.
For overall performance, GUpdater significantly outperforms all baseline models, which coincides to our intuition. Particularly, in the 1hopsubgraphs, GUpdater beats IE-gold + R-GCN by 0.42 in Added Acc, 0.65 in Deleted Acc, indicating that GUpdater can well capture both explicit and implicit information in the news snippets.

Ablation Analysis
We also perform ablation test to figure out the effect of each component of GUpdater. As shown in Table 3, compared to R-GAT, both GUpdater-shortcut (GUpdater without shortcut) and GUpdater-text (GUpdater without text-based attention) improve the overall performance with different levels, showing that both the text-based attention and the shortcuts are helpful for this task.
Generally, we can see that adding shortcuts is key to this task, as it brings a giant leap to nearly all indicators. Actually, updating KGs using news can be regarded as information injection from one semantic space to another, the text mentioned entities can be seen as junctions of these two spaces and serve as entrances for information injection. So, in the KG, the text mentioned entities are information sources, and send messages to the explicitly and implicitly related target entities. However, the targets may be very far and hard to reach, it is the shortcuts that make the successful information delivery possible.
However, if there are too many shortcuts, the messages are easy to get to wrong targets. The text-based attention mechanism can be regarded as gates that selectively open several shortcuts that are easy for message passing, and this helps the model get 2%-3% steady improvements in both Added Acc and Deleted Acc.
When adding explicit labels to shortcuts, we get much better performance than the basic GUpdater (7% improvement in Added Acc, 8% improvement in Deleted Acc in 1hop-subgraphs), which indicates that splitting the shortcuts into different channels for different kinds of messages is crucial for better information delivery, and that makes it easier to dig out implicit information. The result also indicates that our model is well extensible and can easily incorporate off-the-shelf IE tools.

Visualization of Attention on Shortcuts
In order to explore how the information actually passes along shortcuts, we select 5 different types of trades, as shown in Table 4, and we get the corresponding attention weight on each shortcut in  GUpdater. We find that most shortcuts are closed by the attention mechanism, i.e., their corresponding attention weights are very close to 0, and only a small portion of shortcuts are open for message passing.
For each trade type, the remaining shortcuts are organized in a similar pattern: there is a central team selected by the model that sends messages to all other entities and receives the message from another team. For symmetric trades, i.e., each team plays the same role in the trade (T2, T3 in Table 4), the selection of the central team seems to be arbitrary, while for asymmetric trades (T4, T5), the central teams are the teams that involved in more transactions.
It seems that GUpdater learns a 2-team trade rule from that pattern: if two teams send messages mutually through shortcuts, they will exchange all their players in text-subgraphs. T1 and T2 are most simple 2-team trades, thus our model updates the graphs perfectly with this message passing pattern. T4 and T5 look difficult and complicated, but they are actually direct combinations of two simple 2-team trades, e.g., T4 can be decomposed into two trades: the Grizzlies trades JS to the Celtics for FM, and the Thunder trades RG to the Celtics. So for these trades, GUpdater also performs quite well, as only two not-that-important triples are missing in T5. However, in T3, three teams exchange players in a rotation, and it cannot be decomposed into several 2-team trades, thus leads to a severe mis-prediction: (TS, player, Jazz), as placing a player in a wrong team may cause more teammate-errors in the 1hop-subgraph. The inability of perfectly performing rotational threeteam trades may due to the lack of training instance, as in the past 10 years, such kind of trade happened nearly once a year.

Related Work
KG Representation Learning aims to embed a KG into a continuous vector space, which preserves the KG structures. There are mainly two streams of researches. The first is additionbased models (also called translation-based models), which based on the principle that for every valid triple (h, r, t), their embeddings holds: h + r ≈ t. TransE (Bordes et al., 2013) is the first such model. TransH (Wang et al., 2014) projects entity embeddings into relation-specific hyperplanes. TransR (Lin et al., 2015) generalize TransH by extending projection to linear transformation. TransD (Ji et al., 2015) simplifies TransR by decomposing the transformation matrix into the product of two vectors. Another is multiplicationbased models, which comes from the idea of tensor decomposition. RESCAL (Nickel et al., 2011) is one of the earlist studies, which using a bilinear scoring function. DistMult (Yang et al., 2014) simplifies RESCAL by using a diagonal matrix. ComplEx (Trouillon et al., 2016) extend DistMult into the complex space to handle asymmetric relations. SimplE (Kazemi and Poole, 2018) learns two embeddings for each entity dependently. A direct application is KG completion.
Graph Neural Networks allow passing information horizontally among nodes (Gilmer et al., 2017). Recently, many tecniques has been incorporated to GNN model, and lots of attempt has bean made to promote GNNs to more application scenarios. GCN(Kipf and Welling, 2016) leverages graph spectrums for semi-supervised node classification. GAT (Velickovic et al., 2017) extends GCN by bringing attention mechanism  Table 4: Examples of visualization for the attention on shortcuts. Each row shows an example of one type of trade. G text and G text represent the actual text-subgraphs before and after the trade, respectively. Selected Shortcuts represents the shortcuts selected by the attention mechanism, and the arrows indicate the directions of message passing. Results in Text-Subgraph lists the prediction errors of GUpdater. into GNN. To better capture KG information, R-GCN (Schlichtkrull et al., 2018) was proposed to incorporates relation embeddings into GNN.
Relation Extraction task aims to extract relations from texts. Current relation extraction models are mainly under distant supervision (Mintz et al., 2009), and are trained in bag level. PCNN (Zeng et al., 2015) divides the sentence into three pieces and applies max-pooling in a piecewise manner after the convolution layer. APCNN (Ji et al., 2017) uses sentence-level attention to select multiple valid sentences with different weights in a bag. (Luo et al., 2017) uses a transition matrix to model the noise and use curriculum for training.

Conclusion
In this paper, we propose a new text-based KG updating task and construct a dataset, NBAtransactions, for evaluation. We design a novel GNNbased model, GUpdater, which uses text information to guide the message passing through the KG structure. Experiments show that our model can effectively handle both explicit and implicit information, and perform necessary link-adding and link-deleting operations accordingly. In the future, we will try to investigate how to update KGs when entities are involved in several successive events.