Towards Knowledge-Based Recommender Dialog System

In this paper, we propose a novel end-to-end framework called KBRD, which stands for Knowledge-Based Recommender Dialog System. It integrates the recommender system and the dialog generation system. The dialog generation system can enhance the performance of the recommendation system by introducing information about users’ preferences, and the recommender system can improve that of the dialog generation system by providing recommendation-aware vocabulary bias. Experimental results demonstrate that our proposed model has significant advantages over the baselines in both the evaluation of dialog generation and recommendation. A series of analyses show that the two systems can bring mutual benefits to each other, and the introduced knowledge contributes to both their performances.


Introduction
Dialog in e-commerce has great commercial potential. In conventional recommender systems, personalized recommendation is highly based on the previous actions of users, including searching, clicking and purchasing. These actions can be regarded as users' feedbacks that reflect users' interest. However, due to its implicitness, such feedback can only reflect a part of users' interest, causing inaccuracy in recommendation. Another information source about user preferences is the dialog between users and services. In such dialog, users often provide more information about their preferences. They often ask for tips or recommendation in the dialog. In this process, services can guide them to speak out their interests in order to solve users' problems and meet their requirements. Compared with the implicit feedback, the USER: Hello! RECOMMENDER: What kind of movies do you like? USER: I am looking for a movie recommendation. When I was younger I really enjoyed the A Nightmare on Elm Street (1984).

BASELINE:
Have you seen It (2017)  feedback from the dialog is more explicit and more related to users' preferences. Therefore, a recommender dialog system possesses high commercial potential. A recommender dialog system can be regarded as a combination of a recommender system and a dialog system. A dialog system should respond to users' utterances with informative natural language expressions, and a recommender system should provide high-quality recommendation based on the content of users' utterances. We demonstrate an example in Table 1. In brief, a recommender dialog system should perform well in both tasks.
An ideal recommender dialog system is an endto-end framework that can effectively integrate the two systems so that they can bring mutual benefits to one another. In this setting, information from the recommender system can provide vital information to maintain multi-turn dialog, while information from the dialog system that contains implication of users' preferences can enhance the qual-ity of recommendation. Besides, the incorporation of external knowledge can strengthen the connections between systems and enhance their performances. Therefore, driven by the motivations, we propose a novel end-to-end framework that integrates the two systems. We name it KBRD, standing for Knowledge-Based Recommender Dialog System.
Specifically, the dialog generation system provides contextual information about items to the recommender system. For instance, for a movie recommendation system, contextual information can be director, actor/actress and genre. Thus, even with no item mentioned in the dialog, the recommender system can still perform high-quality recommendation based on the contextual information. In return, the recommender system provides recommendation information to promote the dialog, such as recommendation-aware vocabulary bias. Furthermore, we incorporate external knowledge into our framework. The knowledge graph helps bridge the gap between systems and enhances both their performances.
We conduct a series of experiments that demonstrate the effects of our framework in both the evaluation of recommendation and dialog generation. Moreover, the analyses show that dialog information effectively tackles the cold-start problem in recommendation, and the recommendation-aware vocabulary bias from the recommender system improves the quality of generated dialogs. Also, the biased words can be parts of reasons to explain the system's decisions for recommendation.

Preliminary
Before we introduce our proposed framework, we provide an illustration of the basic framework of the recommender dialog system to show how the recommendation system and the dialog system are organized for end-to-end training.

Recommender System
Provided with a user's information, a recommender system is aimed at retrieving a subset of items that meet the user's interest from all the items. In a cold-start setting, the recommender system initially has no knowledge about the user. With the progress of the dialog, the recommender system accumulates user's information and builds a user profile. Thus, it can provide reasonable recommendation based on the user preferences re-flected in the conversation.
To implement an effective recommender system in this task, it is available to build a recommender system based on conventional collaborative filtering algorithms (Sarwar et al., 2001) or based on neural networks (He et al., 2017). For example, Li et al. (2018) applies a user-based autoencoder (Sedhain et al., 2015) to recommend new items based on previously mentioned items in the dialog.

Dialog System
The dialog system in the basic framework is in charge of generating multi-turn dialog with a natural language generation model. The pioneering work (Li et al., 2018) on conversational recommendation task adopted Hierarchical Recurrent Encoder Decoder (HRED) (Sordoni et al., 2015b,a;Serban et al., 2016) for this part. The HRED is an encoder-decoder framework for sequence-to-sequence learning (Sutskever et al., 2014). In the framework, an encoder receives the dialog history as input and encodes it to highlevel representation, while a decoder generates responses based on the encoded representation. By recursively encoding and decoding the previous information, the system makes utterances in the multi-turn dialog.

End-to-End System
In order to perform end-to-end training, we demonstrate the combination of the recommender system and conversation system. Specifically, the input of the recommender system is constructed based on the dialog history, which is a representation of mentioned items in the dialog. The output of the recommender system P rec , which is a probability distribution over the item set, can be combined with the output of the dialog system P dialog ∈ R |V | , where V refers to the vocabulary. A switching mechanism (Gulcehre et al., 2016) controls the decoder to decide whether it should generate a word from the vocabulary or an item from the recommender output at a certain timestep.
where w represents either a word from the vocabulary or an item from the item set, o is the hidden representation in the final layer of the dialog Figure 1: Comparative illustration on modules of the existing baseline framework and our proposed KBRD framework. (a) The connection between the recommender system and the dialog system in the baseline framework is weak. The dialog system takes the plain text of the dialog history as input and the recommender only considers mentioned items in the dialog. (b) Our framework enables interaction between the two systems. First, informative entities are linked to an external knowledge graph and sent to the recommender besides items. They are propagated on the KG via a relational graph convolutional network, enriching the representation of user interest. Second, the knowledge-enhanced user representation is sent back to the dialog system in the form of vocabulary bias, enabling it to generate responses that are consistent with the user's interest.
system. w s ∈ R d and b s ∈ R are the switcher's parameters and σ refers to the sigmoid function. Therefore, the whole system can be trained in an end-to-end fashion.

Proposed Model
In this section, we introduce our proposed framework KBRD that integrates the recommender system and the dialog system effectively via knowledge propagation. We show how knowledge connects the two systems and how they bring mutual benefits to each other.

Dialog-Aware Recommendation with Knowledge
Recommendation of the basic framework is solely based on the mentioned items in the dialog history. Such recommendation ignores contextual information in dialog that often indicates users' preferences.
Here we propose to make use of the dialog contents, including the non-item information, in the process of recommendation. Furthermore, to effectively recommend items from the non-item information, we introduce an external knowledge graph from DBpedia (Lehmann et al., 2015) to our system. The knowledge can build a connection between dialog contents and items.
Incorporating Dialog Contents Specifically, we have a knowledge graph G consisting of triples (h, r, t) where h, t ∈ E and r ∈ R. E and R denote the sets of entities and relations in the knowledge graph. We first match each item in the item set to entities in E by name. 2 We then perform entity linking (Daiber et al., 2013) on dialog contents and thus informative non-item entities appearing in dialog contents are matched to E. 3 Therefore, we can represent a user as T u = e 1 , e 2 , · · · , e |Tu| , where e i ∈ E. To be more specific, it is a set of mentioned items plus nonitem entities extracted from the dialog contents, linked to the knowledge graph. Schlichtkrull et al. (2018), we apply Relational Graph Convolutional Networks (R-GCNs) to encode structural and relational information in the knowledge graph to entity hidden representations. An intuition behind this is that neighboring nodes in knowledge graph may share similar features that are useful for recommendation. For exam-ple, when a user speaks of his/her preference on an actor/actress, the recommender should provide movies that have a close connection to that person. In addition, by taking different relations into consideration, the system models different types of neighbors more accurately.

Relational Graph Propagation Inspired by
Formally, at layer 0, we have a trainable embedding matrix H (0) ∈ R |E|×d (0) for nodes (i.e., entities) on the knowledge graph. Then, for each node v in E at layer l, we compute: denotes the hidden representation of node v at the l-th layer of the graph neural network, and d (l) denotes the dimensionality of the representation at the layer. N r v denotes the set of neighbor indices of node v under relation r ∈ R. W l r is a learnable relation-specific transformation matrix for vectors from neighboring nodes with relation r. W l 0 is a learnable matrix for transforming the nodes' representation at the current layer. c v,r is a normalization constant that can either be learned or chosen in advance (e.g., c v,r = |N r v |). For each node on the graph, it receives and aggregates the messages from its neighboring nodes after relation-specific transformation. Then it combines the information with its hidden representation to form its updated representation at the next layer.
Finally, at the last layer L, structural and relational information is encoded into the entity representation h (L) v for each v ∈ E. We denote the resulting knowledge-enhanced hidden representation matrix for entities in E as H (L) ∈ R |E|×d (L) . We omit the (L) in the following paragraphs for simplicity.
Entity Attention The next step is to recommend items to users based on knowledge-enhanced entity representations. While an item corresponds to an entity on the knowledge graph, a user may have interacted with multiple entities. Given T u , we first look up the knowledge-enhanced representation of entities in T u from H, and we have: where h i ∈ R d is the hidden vector of entity e i . Here our objective is to encode this vector set of variable size to a vector of fixed size so that we can compute the similarity between user and item. Instead of simply averaging these vectors, we choose a linear combination of the |T u | vectors. Specifically, we apply self-attention mechanism (Lin et al., 2017) that takes H u as input and outputs a distribution α u over |T u | vectors: where W a1 ∈ R da×d is a weight matrix and w a2 is a vector of parameters with size d a . The final representation of user u is computed as follows: This enables the recommender system to consider the importance of different items and nonitem entities in the dialog. Finally, the output of our recommender is computed as follows: where mask is an operation that sets the score of non-item entities to −∞. The masking operation ensures that the recommendations are all items.

Recommendation-Aware Dialog
Instead of applying HRED, we introduce the Transformer framework to the dialog system in this task. Transformer (Vaswani et al., 2017) can reach significantly better performances in many tasks, such as machine translation (Vaswani et al., 2017;Ott et al., 2018), question answering (Rajpurkar et al., 2016;Yang et al., 2018;Ding et al., 2019) and natural language generation (Liu et al., 2018;. In our preliminary experiments, we have found that Transformer can also achieve better performance than HRED in this task, and thus we apply this framework to the dialog system. The Transformer is also an encoder-decoder framework for sequence-to-sequence learning. The Transformer encoder consists of an embedding layer and multiple encoder layers. Each encoder layer has a self-attention module and a Point-Wise Feed-Forward Network (FFN). The encoder encodes the dialog history x = (x 1 , x 2 , . . . , x n ) to high-level representations s = (s 1 , s 2 , . . . , s n ). Similarly, the Transformer decoder contains an embedding layer and multiple decoder layers with self-attention and FFN. Moreover, each of them contains a multi-head context attention to extract information from the sourceside context. The decoder generates a representation o at each decoding time step.
In order to predict a word at each decoding time step, the top layer of the decoder, namely the output layer, generates a probability distribution over the vocabulary: where W ∈ R |V |×d and b ∈ R |V | are weight and bias parameters, and V refers to the vocabulary. However, so far the dialog system is completely conditioned on the plain text of the dialog contents. By further introducing the recommender system's knowledge of the items that have appeared in dialog, we guide the dialog system to generate responses that are more consistent with the user's interests. Specifically, we add a vocabulary bias b u to the top layer of the decoder inspired by Michel and Neubig (2018). Different from their work, b u is computed based on the recommender system's hidden representation of user u: where F : R d → R |V | represents a feed-forward neural network and t u is the user representation in the recommendation context introduced in Equation 5. Therefore, the computation of the top layer of the decoder becomes: So far, we have built an end-to-end framework that bridges the recommender system and the dialog system, which enables mutual benefits between the systems.

Experiments
In this section, we provide an introduction to the details of our experiments, including dataset, setting, evaluation as well as further analyses.

Dataset
REcommendations through DIALog (REDIAL) is a dataset for conversational recommendation. Li et al. (2018) collected the dialog data and built the dataset through Amazon Mechanical Turk (AMT). With enough instructions, the workers on the platform generated dialogs for recommendation on movies. Furthermore, in order to achieve and dialog-aware recommendation, besides movies, we introduce the relevant entities, such as director and style, from DBpedia. The number of conversations is 10006 and the number of utterances is 182150. The total number of users and movies are 956 and 51699 respectively.

Setting
We implement the models in PyTorch and train on an NVIDIA 2080Ti. For the recommender, both the entity embedding size d (0) and the hidden representation size d (l) are set to 128. We choose the number of R-GCN layers L = 1 and the normalization constant c v,r to 1. For Transformer, all input embedding dimensions and hidden sizes are set to 300. During training, the batch size is set to 64. We use Adam optimizer (Kingma and Ba, 2015) with the setting β 1 = 0.9, β 2 = 0.999 and = 1 × 10 −8 . The learning rate is 0.003 for the recommender and 0.001 for the Transformer. Gradient clipping restricts the norm of the gradients within [0, 0.1].

Evaluation Metrics
The evaluation of dialog consists of automatic evaluation and human evaluation. The metrics for automatic evaluation are perplexity and distinct ngram. Perplexity is a measurement for the fluency of natural language. Lower perplexity refers to higher fluency. Distinct n-gram is a measurement for the diversity of natural language. Specifically, we use distinct 3-gram and 4-gram at the sentence level to evaluate the diversity. As to human evaluation, we collect ten annotators with knowledge in linguistics and require them to score the candidates on the consistency with the dialog history. We sample 100 multi-turn dialogs from the test set together with the models' corresponding responses, and require them to score the consistency of the responses. 4 The range of score is 1 to 3.
The evaluation for recommendation is Re-call@K. We evaluate that whether the top-k items selected by the recommender system contain the ground truth recommendation provided by human recommenders. Specifically, we use Recall@1, Recall@10, and Recall@50 for the evaluation.  Table 2: Evaluation of the recommender system. We report the results of Recall@1, Recall@10 and Re-call@50 of the models (p 0.01). KBRD (D) stands for only incorporating the dialog contents. KBRD (K) stands for only incorporating knowledge. The results demonstrate that both the interaction with the dialog system and the external knowledge are helpful for the improvement of model performance, and our proposed model reaches the best performance on the three metrics.

Baselines
The baseline models for the experiments are illustrated in the following: • REDIAL This is a basic model for conversational recommendation. It basically consists of a dialog generation system based on HRED (Sordoni et al., 2015a;Serban et al., 2016), a recommendation system based on autoencoder and a sentiment analysis module.
• Transformer We name our implemented baseline model Transformer. It is similar to REDIAL, but its dialog generation system is based on the model Transformer (Vaswani et al., 2017). Except for that, the others remain the same.

Results
In the following, we present the results of our experiments, including the model performances in recommendation and dialog generation.
Recommendation To evaluate the effects of our recommendation system, we conduct an evaluation of Recall@K. We present the results in Table 2. From the results, it can be found that our proposed model reaches the best performances in the evaluation of Recall@1, Recall@10 and Recall@50. Furthermore, we also demonstrate an ablation study to observe the contribution of the dialog system and the introduced knowledge. It can be found that either dialog or knowledge can bring improvement to the performance of the  respectively. This shows that the information from both sources is contributive. The dialog contains users' preferred items as well as attributes, such as movie director and movie style, so that the system can find recommendation based on these inputs. The knowledge contains important features of the movie items so that the system can find items with similar features. Further, the combination brings an advantage even greater than sum of the two parts, which proves the effectiveness of our model.
Dialog Table 3 shows the results of the evaluation of the baseline models and our proposed method in dialog generation. In the evaluation of perplexity, Transformer has much lower perplexity (18.0) compared to REDIAL (28.1), and KBRD can reach the best performance in perplexity. This demonstrates the power of Transformer in modeling natural language. In the evaluation of diversity, we find that the models based on Transformer significantly outperform REDIAL from the results of distinct 3-gram and 4-gram. Besides, it can be found that KBRD has a clear advantage in diversity over the baseline Transformer. This shows that our model can generate more diverse contents without decreasing fluency.
As to the human evaluation, we ask human annotators to score the utterances' consistency with their dialog history. Compared with REDIAL KBRD reaches better performance by +0.22 consistency score, which is an advantage of 15%. Moreover, considering the range is between 1 and 3, this is a large gap between the performances  Figure 2: Performance of the recommender system with different numbers of mentioned items. The xaxis refers to the number of mentioned items in the dialog, the y-axis for the line chart (on the left) refers to the model performance on the Recall@50 evaluation, and the y-axis for the histogram (on the right) refers to proportion in the test set. This shows recommendation is much more difficult with few items mentioned (i.e., at the first few rounds in dialog). Leveraging dialog contents makes a great difference in this situation.
of the two models in this evaluation. To make a consistent response in a dialog, the model should understand the dialog history and better learn the user's preference. The baseline REDIAL does not have a strong connection between the dialog system and user representation. Instead, in our framework, the recommender system provides the recommendation-aware vocabulary bias b u , which is based on the user representation t u , to the dialog system. Thus the dialog system gains knowledge about the user's preference and generates a consistent response.

Discussion
In this section, we conduct a series of analyses to observe the effects of our proposed model. We discuss how dialog can improve the recommendation performance and how recommendation can enhance the dialog quality.

Does dialog help recommendation?
We first evaluate whether the dialog contents can benefit the recommendation system. The results of the evaluation are demonstrated in Figure 2. From the histogram in the figure, we observe that most of the dialogs contain only a few mentioned movies. The dialogs with only 0-2 mentioned movies take up a proportion of 62.8% of the whole testing dataset. Therefore, it is important for the system to perform high-quality recommendation with only a small number of mentioned movies.
This also corresponds to the classical problem "cold start" (Schein et al., 2002) in the recommender system. In real applications, we also expect that the system can perform high-quality recommendation with fewer rounds. This represents the efficiency of the recommender system, which can save users' time and efforts. Specifically, we demonstrate the performances of four systems in Figure 2. They are the basic framework, the one only with the interaction with the dialog system, the one only with the external knowledge and KBRD with both dialog and knowledge incorporation. From the figure, it can be found that while there is no mentioned item in the dialog, the baseline and the one only with knowledge perform the worst. In contrast, the two models with dialog incorporation perform significantly better. This shows that the context in the dialog contains much useful non-item information about users' preferences, such as director, actor/actress in movie recommendation. Therefore, while there is no mentioned item, the recommender system can still perform high-quality recommendation based on the contextual information. With the increase of mentioned items, the contribution of knowledge becomes more significant than the dialog. On average, the system with both information sources performs the best. Dialog introduces contextual information and knowledge introduces movie features and structural connection with other movies.

Does recommendation help dialog?
In Section 4.5, we present the performances of the baselines and our model KBRD in dialog generation. It can be found that the interaction with the recommendation system can enhance the performance of the dialog system in both automatic evaluation and human evaluation. Also, an example of the responses of different models is shown in Table 1. With the dialog history, the baseline REDIAL simply uses a generic response with a recommended movie. Instead, KBRD has more concern about the mentioned items apart from the plain text of dialog history. The user representation from our recommender system contains such information, which is sent to the dialog system to form a vocabulary bias. With such information, KBRD has a better understanding of both the dialog history as well as the user's preference, and thus generates a consistent response.  To further study the effects of the recommender system on dialog generation, we display the top biased words from the vocabulary bias. Note that in KBRD a connection between the recommender system and dialog system is the recommendationaware vocabulary bias b u . To be specific, we compute the recommendation-aware bias b u in dialog and select the components with the top-8 largest values. 5 Then we record the corresponding words and observe whether these words are related to the mentioned movies. We present several examples in Table 4. From the table, we observe that the words are highly related to the mentioned movies. For example, when "The Shining" is mentioned, some of the top biased words are "creepy", "gory" and "scary", which are consistent with the style of the horror movie, and "stephen", who is the original creator of the movie. Therefore, it can be suggested that the recommendation system conveys important information to the dialog system in the form of a vocabulary bias. Furthermore, these biased words can also serve as explicit explanation to recommendation results. From this perspective, this shows the interpretability of our model.

Related Work
Recommender systems aim to find a small set of items that meet users' interest based on users' historical interactions. Traditional recommender systems rely on collaborative filtering (Resnick et al., 1994;Sarwar et al., 2001), and recent advances in this field rely much on neural networks (Wang et al., 2015;He et al., 2017;Ying et al., 2018). To deal with the cold-start problem and the sparsity of user-item interactions which these methods usually suffer, researchers have proposed methods to incorporate external information, such as heterogeneous information networks (Yu et al., 2014), knowledge bases Wang et al., 2018a) and social networks (Jamali and 5 After stop words filtering. Ester, 2010). Besides accuracy, explainability is also an important aspect when evaluating recommender systems (Zhang et al., 2014;Zhang and Chen, 2018;Wang et al., 2018b).
End-to-end dialog systems based on neural networks have shown promising performance in open-ended settings (Vinyals and Le, 2015;Sordoni et al., 2015b;Dodge et al., 2016;Wen et al., 2015) and goal-oriented applications (Bordes et al., 2017). Recent literature also explores the intersection of end-to-end dialog systems with other intelligence systems and creates new tasks such as visual dialog (Das et al., 2017;De Vries et al., 2017), conversational recommendation (Li et al., 2018). In particular, Li et al. (2018) collects a dataset of conversations focused on providing movie recommendations and proposes a baseline model for end-to-end training of recommender and dialog systems. Earlier studies in this field focus on different tasks such as minimizing the number of user queries (Christakopoulou et al., 2016), training the dialog agent to ask for facet values for recommendation (Sun and Zhang, 2018). Related literature can also be found in Thompson et al. (2004), Mahmood and Ricci (2009), Chen and Pu (2012), Widyantoro and Baizal (2014) and Liao et al. (2019).

Conclusion
In this paper, we propose a novel end-to-end framework, KBRD, which bridges the gap between the recommender system and the dialog system via knowledge propagation. Through a series of experiments, we show that KBRD can reach better performances in both recommendation and dialog generation in comparison with the baselines. We also discuss how the two systems benefit each other. Dialog information is effective for the recommender system especially in the setting of cold start, and the introduction of knowledge can strengthen the recommendation performance significantly. Information from the recommender system that contains the user preference and the relevant knowledge can enhance the consistency and diversity of the generated dialogs.