Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning

Walk-based models have shown their unique advantages in knowledge graph (KG) reasoning by achieving state-of-the-art performance while allowing for explicit visualization of the decision sequence. However, the sparse reward signals offered by the KG during a traversal are often insufficient to guide a sophisticated reinforcement learning (RL) model. An alternate approach to KG reasoning is using traditional symbolic methods (e.g., rule induction), which achieve high precision without learning but are hard to generalize due to the limitation of symbolic representation. In this paper, we propose to fuse these two paradigms to get the best of both worlds. Our method leverages high-quality rules generated by symbolic-based methods to provide reward supervision for walk-based agents. Due to the structure of symbolic rules with their entity variables, we can separate our walk-based agent into two sub-agents thus allowing for additional efficiency. Experiments on public datasets demonstrate that walk-based models can benefit from rule guidance significantly.


Introduction
While knowledge graphs (KGs) are widely adopted in natural language processing applications, a major bottleneck hindering its usage is the sparsity of manually curated facts (Min et al., 2013), leading to extensive studies on KG completion (or reasoning). (Trouillon et al., 2016;Dettmers et al., 2018;Das et al., 2017;Xiong et al., 2017;Lin et al., 2018;Meilicke et al., 2019) Traditional approaches on the KG reasoning task are mainly based on logic rules (Landwehr et al., 2007(Landwehr et al., , 2010; Galárraga et al., 2013Galárraga et al., , 2015. They represent relations as predicates and ontological constraints as first-order logic rules. These * Equal contributions. 1 Code and data will be released.
methods are referred to as symbolic-based methods. Despite their good performance in recent work (Meilicke et al., 2019(Meilicke et al., , 2020, the symbolicbased methods are inherently limited by the symbolic representation and rely on whether the associated relations of the given rules can be generalized well, shown in Section 3.2. To resolve the limitation of symbolic representations, embedding-based methods were proposed. They learn low-dimensional distributed representations for entities and relations in order to capture semantic meanings. These methods (Bordes et al., 2013;Socher et al., 2013;Wang et al., 2014;Yang et al., 2014;Trouillon et al., 2016;Dettmers et al., 2018Dettmers et al., , 2017Sun et al., 2019;Zhang et al., 2019) have shown superior performance on various benchmark datasets. However, embedding-based methods apply "one-hop" reasoning directly, thus fail to provide humanfriendly interpretations for both the learned embeddings and the reasoning modules.
To make the decision process more interpretable, many recent efforts formulate KG reasoning as a multi-hop reasoning process (Xiong et al., 2017;Das et al., 2017;Shen et al., 2018;Lin et al., 2018). A major issue of walk-based methods is that, during the training phase, the agent only receives sparse "hit or not" reward signals after a long sequence of decisions. Lin et al. (2018) tries to alleviate this issue by shaping the reward with an embeddingbased distance measurement. However, the path with the highest probability does not always have the highest shaped reward. For this reason, walkbased methods can still be improved. Fortunately, we observe that walk-based and symbolic-based methods are complementary to each other: On one hand, symbolic rules can be fetched even without learning. However, due to the limitation of symbols, they are not easy to be generalized. On the other hand, walk-based methods benefit from the embeddings which encapsulate rich semantic information, thus have better generalizability but are hard to train only with a sparse signal, i.e. the "hit or not" reward.
In this work, we aim to tackle the lack of reward in walk-based methods via symbolic-based methods. Given a KG, a symbolic-based model is first applied to fetch a set of symbolic rules. A set of high-confidence rules are then leveraged to guide the training process of a walk-based model by providing additional rewards when its agent uncovers a path that is within the rule set. We note that the agent can have efficiency issues with a huge action space, as the graph can be extremely dense with various relation-entity combinations. Fortunately, symbolic rules represented by relations and variables instead of concrete entities allow us to separate the agent into a relation agent that focuses on selecting a path of relations (to form a symbolic rule), and an entity agent that selects the concrete entity given the current relation, so to scale to large and dense graphs. The process of selecting a relation-entity combination can be separated into two steps: we first select a relation by a relation agent and then select an entity by an entity agent. Besides, they can interpret the reward from a rule miner in different fashions.

Related Work
Symbolic-based methods attempt to reveal symbolic patterns of relation paths in the form of first-order logic rules.
Early works include FOIL and its follow-up works (Quinlan, 1990;Landwehr et al., 2007Landwehr et al., , 2010. Later, AMIE and AMIE+ (Galárraga et al., 2013(Galárraga et al., , 2015 is developed for efficient mining without counterexamples. Recently, AnyBURL (Meilicke et al., 2019(Meilicke et al., , 2020 have shown comparable performance to state-ofthe-art neural models with significant advantages in efficiency. Embedding-based methods learn a matching score for a target triple with distributed representation, and thus can capture rich semantic information of entities and relations (Bordes et al., 2013;Socher et al., 2013;Yang et al., 2014;Trouillon et al., 2016;Dettmers et al., 2017;Zhang et al., 2018;Sun et al., 2019). They have achieved superior performance in various datasets. However, they are one-hop prediction models that ignore complex patterns for complete reasoning paths.
They are also short of interpretability for their embeddings and matching functions.
Reasoning-based methods capture advantages of both symbolic and embedding representations to generate reasoning paths (Xiong et al., 2017;Das et al., 2017;Shen et al., 2018;. Recently, Multi-Hop model (Lin et al., 2018) is proposed to leverage pretrained embedding models to compensate for false-negative rewards given by the incomplete graph.

Problem and Preliminaries
In this section, we review the KG reasoning task. We also describe the current symbolic-based and walk-based methods, which are leveraged in the proposed method.

Problem Formulation
A KG consisting of fact triples is represented as G = {(e i , r, e j )} ⊆ E × R × E, where E and R are the set of entities and relations, respectively. Given a query (e s , r q , ?) where e s is a subject entity and r q is a query relation, the task of KG reasoning is to find a set of object entities E o such that (e s , r q , e o ) , where e o ∈ E o , is a fact triple missing in G. We denote the queries (e s , r q , ?) as tail queries. We note that we can also perform the reversed version (?, r q , e o ), i.e. head queries. However, in this paper, we only evaluate our method on tail queries to keep consistent with the majority of existing works.

Symbolic-based Methods
Some previous work used symbolic Horn rules to perform KG reasoning. They mine rules directly from the KG and predict missing facts by grounding these rules. In this task, a Horn rule can be represented by binary predicates, i.e. relations, and variables which can be grounded by constants, i.e. entities.
Horn rules can be categorized into C rules and AC rules (Meilicke et al., 2019), which are generalized from cyclic and acyclic paths in G, respectively.
In these two logical formula, lower-case arguments (i.e. c 0 and c n ) represent constants, and upper-case ones (i.e. X, Y and A n ) stand for variables. We use r(· · · ) to denote a rule head and the conjunction of atoms b 1 (· · · ), . . . , b n (· · · ) to denote a rule body. We note that r(c i , c j ) is equivalent to the fact triple (c i , r, c j ).
For query r(X, Y ), if there are other paths starting from entity X and ending at entity Y without using relation r, these cyclic paths can be treated as reasoning paths to find target entity Y and forms C rules. For AC rules, the paths' ending entities do not need to be Y. If there is a strong correlation between entity c 0 and c n , and an acyclic path's ending entity is c n , then we can predict the ending entity Y as c 0 based on this AC rule. AC rules are less generalizable compared to C rules especially on large KGs with huge entity space as they require certain entity constants. Thus, in this work, we only use C rules as symbolic guidance.
One recent symbolic-based method, AnyBURL (Meilicke et al., 2019), proves to achieve comparable performance with state-of-the-art neural models. It first mines symbolic rules by sampling paths from the G, and then predict object entities by matching subject entities and query relations in the given queries with the rules.
However, such methods have limitations. The Upper part of figure 1 shows a counterexample. A rule which has only one instance on the KG cannot be generalized well. In addition, as a symbolic reasoner commonly relies on high quality rules, it can behave inconsistently on different datasets. The lower part of figure 1 shows the average quality of top 15 rules mined from two datasets. The y-axis indicates the average percentage of rules that can successfully hit the target entities in the KG. For WN18RR, the average percentage of the top 1 rule is much larger than those of other rules. Therefore, a symbolic method can easily differentiate these top rules to reason.

Neural Reasoning Methods
To capture semantic meanings, given a query (e s , r q , ?), embedding-based approaches use onehop reasoning to learn confidence scores of all triples via a matching function f (e x , r q , ·) and return e o such that f (e s , r q , e o ) has the highest score. To make the neural approach more interpretable, given e s and r q , walk-based methods learn an agent to reach the target e o by finding a path from e s to e o that implies the query relation r q . At step t, the current state is represented by a tuple s t = (e t , (e s , r q )), where e t is the entity being visited at step t. The agent then samples the next relation-entity pair to visit from possible actions A t = {(r , e )|(e t , r , e ) ∈ G}. A pseudo self-loop relation r s that forms action (e t , r s , e t ) is also included and indicates termination. The number of hops is fixed for each query's reasoning process.

Proposed Method
In this study, we try to leverage both symbolic and neural reasoning approaches. For the symbolic method, C rules' confidence scores only depend on associated relations. However, the entities inside the reasoning paths can also contribute to the confidence of these paths. We first tried to rerank the pre-mined rules based on the probabilities of generating them in neural reasoning approaches. In this way, we have to mask all reasoning paths that are not in the ruleset. However, due to the large searching space, the signal is too sparse to train such models. Then, instead of strictly mask reasoning paths, we give an extra reward to the reasoning agent if the selected path is a rule. We show this result in the 5.3 ablation study. As C rules are represented by relation constants and entity variables, we can significantly prune the action space by separate the agent into a relation and entity agent. They work alternatively and communicate to perform KG reasoning. Extensive experiments show the effectiveness of this joint framework. Specifically, for the joint framework, at each reasoning step t, from entity e t , the relation agent first chooses a relation r t and Figure 2: The architecture of the proposed dual agent. The relation and entity agent interact with each other to generate a path. At each step, the entity agent will first generate an entity distribution for selectable entities. After relation agent sample a relation, the entity space will be pruned based on selected relation. The entity agent then sampling an entity based on pruned entity space. At the final step, they receive a hit reward based on last selected entity and a rule guidance reward from the pre-mined rule set based on selected the relation path. then the entity agent selects an entity e t+1 such that (e t , r t , e t+1 ) ∈ G. In this way, the overall action space is greatly reduced. Our framework is depicted in Figure 2.

Relation agent
At each step t, the relation agent selects a single relation r t which is incident to the current entity e t . Given a query and a set of pre-mined rules, the agent first filter out rules whose heads are not same as the query relation, and then it selects r t from the t th atoms of the remaining rule bodies, i.e. b t (· · · ) in the C rule patterns. However, if no rule can significantly outperform others, it will be difficult to select a good r t .
We alleviate this issue by using reinforcement learning techniques to train the relation agent. During the pretraining phase which will be described in later parts, it learns the confidence score distribution of all rules. During the training phase, it then applies the pretrained symbolic strategy and keeps tuning the relation distribution by utilizing embeddings semantic information to increase performance. In another word, the relation agent leverages both confidence scores of pre-mined rules as well as embedding shaped hit rewards.

Entity agent
At each step t, the entity agent first calculates probabilities of all candidate entities in the current step entity space based on e s , r q , and the entity history h E t . After the relation agent selects the current step relation r t , the entity space will be pruned and only contains entity incident on that relation. The entity agent then selects an entity based on pruned entity space. In this way, the entity and relation agent can reason independently.
In experiments, we have also tried to let the entity agent generate distribution based on relation agent pruned entity space. In this way, the entity agent takes in the selected relation and can leverage the information from the relation agent. However, the entity space may be extremely small and thus the unlikely candidates may become relatively confident. In inference time, the entities in a small pruned space may be more likely to be chosen. This relation pruning bias is caused by the information loss of the pruned entity space. It makes the entity agent less effective, especially on large and dense KG.

Policy Network
Relation agent's search policy is parameterized by the embedding of query relation r q and relation history h R t . The relation history is encoded using an LSTM (Hochreiter and Schmidhuber, 1997): where r s is a special start relation embedding to form a initial relation-entity pair with source entity embedding e s . Relation space embeddings R t ∈ R |Rt|×d consist embeddings of all the relations in relation space R t at step t. Finally, relation agent outputs a probability distribution d R t and samples a relation from it.
where σ is the softmax operator, W 1 and W 2 is trainable parameters. We design relation agent's history-dependent policy as . Entity agent can acquire its embedding of last step e t−1 , entity space embeddings E t , its history h E t = LSTM(h E t−1 , e t−1 ), and the probability distribution of entities d E t as follows.
where W 3 and W 4 is trainable parameters. Note that entity agent uses a different LSTM to encode the entity history.

Reward
Intuitively, the relation agent prefers paths which mostly direct the way to the correct entity for a query relation. Thus, given a relation path, we give reward according to its confidence level, which we call the rule guidance reward R r . These confidence scores are retrieved from the rule miner. In our experiments, we only use C rules generated by AnyBURL (Meilicke et al., 2019) to compute the reward. A confidence score is defined as the number of grounded rule paths divided by the number of grounded body relations. We also add a Laplace smoothing p c = 5 to the confidence score for the final rule guidance reward. In addition to the rule guidance reward, the agent will also receive a hit reward R h , which is 1 if the predicted triple = (e s , r q , e T ) ∈ G. Otherwise, we use the embedding of to measure reward as in Lin et al. (2018).
where I(·) is an indicator function, f (e s , r q , e T ) is a composition function for reward shaping using embeddings.

Training
We train the framework in four stages. Firstly, we use embedding approaches to train relation and entity embeddings as one-hop reasoning. Secondly, we use symbolic approach to retrieve C rules and their associated confidence scores. Thirdly, we  pre-train the relation agent by freezing the entity agent and asking the relation agent to sample a path. We only use the rule miner to evaluate the path and compute R r based on the pre-mined confidence score. The model will learn as many highquality rules as possible and behave like a symbolic model with similar confidence score distribution. Finally, we jointly train the relation and entity agent to leverage the semantic information of the embeddings for hit reward shaping. The final reward R is the R r and R h with a constant factor λ: R = λR r + (1 − λ)R h .

Optimization
We train our policy network of two agents using REINFORCE (Williams, 1992) algorithm to maximize the expected reward: R(e T , r 1 , · · · , r T |e s , r q )]]

Experiments
In this section, we test our model on three datasets and compare its performance with symbolic, embedding, and walk-based approaches. We describe the experiment setting, results, and analysis in detail.

Experimental Setup
Datasets We evaluate the effectiveness of the proposed method on four benchmark datasets.
(1) FB15k-237 (Toutanova et al., 2015), (2) WN18RR (Dettmers et al., 2018), and (3) NELL-995 (Xiong et al., 2017).    Table 2 shows the evaluation results of our proposed approach compared to symbolic-based, embedding-based, and walk-based baselines. Our model achieves state-of-the-art results on WN18RR and NELL-995. We also have a competitive performance on FB15k-237. This observation is expected given AnyBURL's performance on these datasets. In FB15k-237, the relation space is much larger, which means it is harder for the relation agent to select a valid rule that is traceable. We find symbolic-based models perform strongly on WN18RR and we set the λ to 0.65 to encourage the agent to generate more rule paths. We also observe that embedding-based methods perform better than walk-based methods despite their simplicity. One possible reason is that embedding-based methods implicitly encode the connectivity of the whole graph into the embed-  ding space (Lin et al., 2018). By leveraging rules, we also incorporate some global information as guidance to make up for the potential searching space loss during the discrete inference process. Table 3 shows the percentage of rules used on the development set using ComplEx embedding in the pre-training and training phase. It shows that our model abandons a few rules to further improve hit performance during the training phase.

Performance Analysis
Ablation Study We performed an ablation study where we removed the pre-training step, freeze the relation agent after pre-training, or use a single agent on WN18RR. They all use ComplEx embedding for consistency. We observe a hit@1 and MRR decrease on the development set, as shown in Table 4. The performance decrease for freezing pre-trained agent shows that using rule reward is not enough. The model still needs the hit performance information to further improve. The performance decrease if pre-training is removed shows that learning to reason as a symbolic approach first then improve it as neural reasoning can have better performance. Single agent performance drop  shows the effectiveness of pruning action space.
Confidence Score Threshold In addition, to analyze the utility of the pre-mined rule set, we set up a confidence score threshold to see if the model will be affected by it. The maximum threshold we use is confidence score ≥ 0.15, since the most confident rules are mostly distributed within [0.15, 0.20]. As results shown in table 5, there are no observable patterns. One potential reason is that walk-based reasoning paths and less confident rules may have similar performance on certain queries. We use the threshold as a hyperparameter and use the one with the best performance on the development set.

Conclusions
We introduced high precision symbolic-based rule guidance rewards to alleviate the sparse signal of hit reward for the walk-based model in the KG reasoning task. We proposed a collaborative framework that has an entity and relation agent to effectively utilize the structure of symbolic rules that contains entity variables and significantly reduce the action space. Experimentally, our approach improves over state-of-the-art walk-based models on several benchmark KGs.