Faithfully Explainable Recommendation via Neural Logic Reasoning

Knowledge graphs (KG) have become increasingly important to endow modern recommender systems with the ability to generate traceable reasoning paths to explain the recommendation process. However, prior research rarely considers the faithfulness of the derived explanations to justify the decision making process. To the best of our knowledge, this is the first work that models and evaluates faithfully explainable recommendation under the framework of KG reasoning. Specifically, we propose neural logic reasoning for explainable recommendation (LOGER) by drawing on interpretable logical rules to guide the path reasoning process for explanation generation. We experiment on three large-scale datasets in the e-commerce domain, demonstrating the effectiveness of our method in delivering high-quality recommendations as well as ascertaining the faithfulness of the derived explanation.


Introduction
Compared with traditional recommender systems (RS), explainable recommendation is not only capable of providing high-quality recommendation results but also offers personalized and intuitive explanations . Incorporating a knowledge graph (KG) into recommender systems has become increasingly popular, since KG reasoning is able to generate explainable paths connecting users to relevant target item entities. At the same time, there is increasing demand for systems to ascertain the faithfulness of the generated explanation, i.e., assess whether it faithfully reflects the reasoning process of the model and is consistent with the historic user behavior.
However, previous work has largely neglected faithfulness in KG-enhanced explainable recommendation . A number of studies (Lakkaraju et al., 2019;ter Hoeve et al., 2018;Wu and Mooney, 2018) argue that * Equal contribution faithful explanations should also be personalized and gain the capability to reflect the personalized user historic behavior. However, to the best of our knowledge, none of the existing explainable recommendation models based on KGs have considered faithfulness in the explainable reasoning process and its evaluation on the generated explainable paths. For instance, PGPR (Xian et al., 2019;Zhao et al., 2020) infers explainable paths over the KG without considering personalized user behavior, and its prediction on next potential entities is merely based on the overall knowledge-driven rewards. CAFE  builds user module profiles to guide the path inference procedure. However, as illustrated in Subramanian et al. (2020), such neural module networks only implicitly abstract the reasoning process and lack of considering the faithfulness of explanations.
In this paper, we propose a new KG-enhanced recommendation model called LOGER to produce faithfully explainable recommendation via neural logic reasoning. To fully account for heterogeneous information and rules about users and items from the KG, we leverage an interpretable neural logic model for logical reasoning, enhanced by a general graph encoder that learns KG representations to capture semantic aspects of entities and relations. These two components are iteratively trained via the EM algorithm by marrying the merits of interpretability of logical rules and the expressiveness of KG embeddings. Subsequently, the learned rule weights are leveraged to guide the path reasoning to generate faithful explanations. The derived logical rules are expected to be consistent with historic user behavior and the resulting paths genuinely reflect the decision making process in KG reasoning. We experiment on three large-scale datasets for e-commerce recommendation that cover rich user behavior patterns. The results demonstrate the superior recommendation performance achieved by our model compared to the state-of-the-art base-lines, with the guarantee of the faithfulness on the generated path-based explanations. The contributions of this paper are threefold. • We highlight the significance of considering faithfulness in explainable recommendation. • We propose a novel approach that incorporates interpretable logical rules into KG path reasoning for recommendation and explanation generation. • We experiment on three large-scale datasets showing promising recommendation performance as well as faithful path-based explanation.

Problem Formulation
A knowledge graph (KG) for recommendation is defined as G = {(e h , r, e t ) | e h , e t ∈ E, r ∈ R}, where E denotes the entity set consisting of sets of users U, items I, and other entities, while R denotes the relation set. Each triplet (e h , r, e t ) represents a fact indicating head entity e h interacts with tail entity e t via relation r. In recommendation tasks, we are particularly interested in user-item interactions {(u, r ui , v) | u ∈ U, r ui ∈ R, v ∈ I} with the special relation r ui meaning purchase in e-commerce or like in movie recommendation. The problem of KG reasoning for explainable recommendation is formulated as follows. Given an incomplete KG G with missing user-item interactions, for every user u ∈ U, the goal is to select a set of items as recommendations {v|(u, r ui , v) ∈ G, v ∈ I} along with a set of paths as explanations connecting each pair of the user and a predicted item. The key challenge is to not only guarantee the recommendation quality with the rich information in KG, but also generate faithful explanations that reflect the actual decision-making process of the recommendation model and are consistent with historic user behavior.

Proposed Method
We introduce the novel neural LOGic Explainable Recommender (LOGER) for producing faithfully explainable recommendations with a KG. As illustrated in Fig. 1, it consists of three components: (i) a KG encoder for learning embeddings of KG entities and relations to capture their semantics, (ii) a neural logic model for conducting interpretable logical reasoning to make recommendations, and (iii) a rule-guided path reasoner for generating faithfully explainable paths. Both KG encoder and neural logic model are trained iteratively via the EM algorithm (Neal and Hinton, 1998) so that they mutually benefit to make recommendations via logical reasoning. Additionally, personalized rule importance scores are derived for every user and leveraged to guide the path reasoning for faithful explanation generation.

KG Encoder
Let X hrt be a binary random variable indicating whether a triplet (e h , r, e t ) is true or not, X G = {X hrt | (e h , r, e t ) ∈ G} be a random variable regarding all observed triplets in the KG G, and X H = {X hrt | (e h , r, e t ) ∈ H} be a random variable of hidden user-item interactions in The KG encoder is generally defined as a triplet-wise function f θ : E × R × E → [0, 1] parametrized by θ that maps each triplet to a real-valued score. For any triplet (e h , r, e t ) ∈ G ∪ H, we can interpret its truth probabilistically via the KG encoder f θ as q(X hrt |θ) = Bernoulli(X hrt |f θ (e h , r, e t )). The KG encoder f θ can be instantiated with any existing KG embedding (Ji et al., 2020) or graph neural network (Wu et al., 2020) model.

Neural Logic Model
We focus on composition rules for user-item interactions, i.e., r ui is a composition of relations r 1 , . . . , r j if (u, r 1 , e 1 ) ∧ · · · ∧ (e j−1 , r j , v) ⇒ (u, r ui , v), ∀u ∈ U, v ∈ V, e 1 , . . . , e j−1 ∈ E. Given a set of logical rules L mined from the KG, the goal of this component is, for every user u ∈ U, to emit a set of personalized rule importance scores y u = {y u,l } l∈L to capture the historic user behavior. To achieve this, we build upon Markov Logic Networks (Qu and Tang, 2019), an interpretable probabilistic logic reasoning method that models the joint distribution of all triplets via a set of logical rules L, i.e., p(X G , X H |w) = 1 Z exp l∈L w l n l , where w = {w l } l∈L with w l being the global weight of rule l ∈ L, and n l denotes the number of true groundings of rule l over observed and hidden triplets. Accordingly, we define the personalized rule importance score to be y u,l = w l n l (u) l ∈L n l (u) , where n l (u) is the number of groundings of rule l over the observed triplets in {(u, r ui , v) ∈ G}. However, it is intractable to directly maximize the log likelihood of observed triplets to learn the global weights w, i.e., max w log p(X G |w). Instead, we employ the EM algorithm to iteratively optimize the objective to acquire optimal global weights.

E-Step
We introduce a mean-field variational distribution q(X H |θ) ≈ (e h ,r,et)∈H q(X hrt |θ) over hidden user-item interactions in H. The goal of the E-step is to estimate q(X H |θ) by minimizing the KL divergence between q(X H |θ) and the posterior distribution p(X H |X G , w) with fixed w. For each triplet (e h , r, e t ) ∈ H, we denote by L hrt the set of rules associated with the triplet and by G hrt the corresponding groundings of all logical rules in L hrt . Following Qu and Tang (2019), the optimal q(X H |θ) can be achieved under the fixed-point condition, i.e., q(X hrt |θ) ≈ p(X hrt |X G hrt , w), for all (e h , r, e t ) ∈ H. Here, q(X hrt |θ) is approximated by the KG encoder f θ , and p(X hrt |X G hrt , w) can be estimated with the global weights w of the rules in L hrt from the last iteration: where σ(·) is the sigmoid function. In other words, if a hidden triplet (e h , r, e t ) is asserted to be true by the rules (e.g., p(X hrt = 1 | X G hrt , w) > 0.5), the probability q(X hrt = 1 | θ) given by the KG encoder is also expected to be high. Therefore, to learn the parameter θ, we aim to maximize the log-likelihood function over all observed triplets in G and the plausibly true hidden triplets in H + = {(e h , r, e t ) | p(X hrt = 1|X G hrt , w) ≥ τ }, which leads to the objective where τ is a hyperparameter.

M-Step
The goal of the M-step is to learn the global rule weights w by maximizing the log- given a fixed θ from the E-step. Since the log-likelihood term models the joint distribution over all triplets, which is hard to compute for a large KG, we approximate it with the pseudolikelihood (Besag, 1975): Then, we can invoke gradient ascent to acquire the optimal w, with the gradient defined as: where p hrt = p(X hrt = 1|X G hrt , w). Once the optimal global weights are acquired, we can make a recommendation by calculating the ranking score of a user u ∈ U and an item v ∈ I as q(X urv |θ) + α p(X urv = 1|X Gurv , w), where r = r ui and α ∈ R is a hyperparameter.

Rule-Guided Path Reasoner
We draw on the KG encoder f θ and the personalized rule importance scores y u from the last two steps to generate explainable paths for every user u. Specifically, we train an LSTM-based path reasoning network φ that takes the start user embedding as input and predicts a sequence of entities and relations to form a path. For every user u, we restrict the reasoner to generate the paths that follow the rules with the largest scores in y u . The details of φ and path reasoning are described in the Appendix.

Experiment
Dataset We experiment on three domain-specific e-commerce datasets from Amazon, namely Cellphones, Grocery, and Automotive. There are two requirements that lead to the selection of these categories in our experiments. First, the constructed KG should contain rich user behavior patterns, e.g., user mentioned features or preferred styles, etc. This is the major difference from most of the existing work (Zhao et al., 2019), which only extends knowledge on the item side. Second, the KGs are assumed to be large-scale. We select several large subsets from ) explicitly models higher-order KG connectivity and learns node representations by propagating the embedding of neighbors with corresponding importance discriminated by an attention mechanism. We adopted the same metrics as Ai et al. (2018) to evaluate the recommendation performance of all models: Precision, Recall, Normalized Discounted Cumulative Gain (NDCG), and Hit Rate (HR).

Recommendation Results
We first evaluate the recommendation quality of our model. The results of all methods across all three datasets are reported in Table 1. In general, our method significantly outperforms all state-ofthe-art baselines on all metrics. Taking Cellphones as an example, our method achieves an improvement of 6.01% in NDCG against the best baseline (underlined), and an improvement of 5.82% in Hits@10. Similar trends can be observed on other benchmarks as well. Note that both our model and HeteroEmbed adopt TransE for KG representation learning, yet our model achieves better performance, mainly attributed to the iterative learning of graph encoder and neural logic model.

Faithfulness of Explanation
We aim to measure whether the generated explainable paths are consistent with the historic user behavior via a faithfulness metric and a user study.
Measuring Faithfulness Inspired by previous work (Maaten and Hinton, 2008;Serrano and Smith, 2019;Subramanian et al., 2020), we define the faithfulness to be the Jensen-Shannon (JS) divergence of rule-related distributions from training and test sets. Specifically, we randomly sample 50 users from the training set. For each user u, we further sample around 1,000 paths between the user and the connected item nodes, and calculate the rule distribution over these paths, denoted by F (u). We compare the proposed LOGER with two baselines, PGPR, and KGAT, each of which is used to generate 20 explainable paths for every selected user in the test phase. Similarly, we can calculate the rule distribution over these 20 paths, denoted by Q f (u). The JS scores are defined as follows.
Here, Q w (u) is the rule weight distribution derived from the personalized rule importance scores of our method or the path weights of baselines. Smaller values of two JS scores correspond to better faithfulness of the explainable paths. This faithfulness evaluation is motivated in terms of the consistency of the explainable paths with respect to the user historic behavior.
User Study Additionally, we conduct a user study to evaluate the faithfulness of the explainable paths. We display 50 sampled KG paths starting from one user towards purchased items in the training set to represent examples of user historical behaviors. For comparison, we also present 10 explainable paths generated by three methods for the same user in the test dataset. We ask 20 human subjects to rank these methods based on whether   the generated paths are consistent with those from the training set. Then, we calculate the average ranking scores (Avg. Rank) by averaging the rank given by each human tester on each method.

Results
The results on the Cellphones and Grocery datasets are reported in Table 2. We observe that our method LOGER achieves the lowest JS scores and average ranking score, which reveal the effectiveness of our model in producing more faithful explanations in both quantitative measurements and in the user study.

Ablation Study
We further study how hidden triplets used in training KG encoder (Eq. 2) influence the recommendation performance. We experiment on the Cellphones data under different sizes of hidden triplet sets H + . We choose the sizes of {10, 20, 30, 40, 50} and keep all other settings unchanged. The results are plotted in Fig. 2, including our model (red circles) and the best baseline Het-eroEmbed (blue crosses). We find that our model consistently outperforms the baseline in all the metrics under different numbers of hidden triplets. Better recommendation performance can be achieved with more hidden triplets included in training the KG encoder, because more candidate items will enhance the capability of our model to discern the logical rules of good quality and hence benefit the recommendation prediction.

Conclusion
In this paper, we propose LOGER for faithfully explainable recommendation, which generates explainable paths based on personalized rule importance scores via neural logic reasoning that adequately captures historic user behavior. We experiment on three large-scale datasets for e-commerce recommendation showing superior recommendation quality of LOGER as well as the faithfulness of the generated explanations both quantitatively and qualitatively. We hope to encourage future work that values explainability and in particular the faithfulness of explanations. Our code is available at https://github.com/orcax/LOGER.

A Detail of Rule-Guided Path Reasoning
Our LSTM-based path reasoner φ is based on the graph walker in Moon et al. (2019). It takes as input the embedding of the current entity e t−1 and outputs the embeddings of the next relation r t and the next entity e t , i.e., r t , e t = φ(e t−1 ). In particular, the next relation embedding r t is defined as: where W α , b α are parameters and α t are the attention weights over all relations in the KG. The next entity embedding e t is defined as: Here, [; ] denotes concatenation, is elementwise multiplication, i t , o t are vectors passing through corresponding gates, and z t is the context vector. During training, for every user and its observed user-item triplets, we sample a set of training paths following the rules, with numbers proportional to the rule weights. The goal is to make the path reasoner φ generate paths that are close to the training samples, which can be optimized by the hinge loss.
The inference pipeline using the trained pathreasoning network is described in Alg. 1. Starting with a user u encoded as e 0 = u, the estimated entity embedding e t and relation embedding r t at the t-th hop is obtained by the model φ. At each hop, for all potential neighbors, we calculate a ranking score based on the dot-product of the neighbor and estimated (e t , r t ). After ranking these neighbors based on such scores, we can filter a set of candidate neighbors and invoke a Beam Search to identify a set of paths as well as corresponding items for u.

B Implementation Details
In order to guarantee path connectivity, we add reverse relations into the knowledge graph, i.e., if (e h , r, e t ) ∈ G, then (e t , r −1 , e h ) ∈ G. We restrict the length of candidate rules to be 3. We adopt TransE (Bordes et al., 2013) as the KG encoder Algorithm 1 Rule-guided path reasoning 1: Input: KG G, user u, item v, rule set L 2: Output: a set of paths P 3: procedure MAIN() 4: P ← {{u}}.
f θ , with the dimensionality of entity and relation embeddings set as 100.
To learn the global rule weights, we first generate the hidden triplet set according to the result of the KG encoder. For each user, the top 50 estimated items with the highest scores predicted by KG encoder are taken as the hidden triplet set H + . The threshold τ is set to 0.5 and the weighting factor α is set to 0.3 by default. In the path reasoning algorithm, we set the neighboring size β to 10. Other training details can be found in Table 3.

C Dataset Statistics
The statistics of our datasets are shown in Table 4