User Memory Reasoning for Conversational Recommendation

We study an end-to-end approach for conversational recommendation that dynamically manages and reasons over users’ past (offline) preferences and current (online) requests through a structured and cumulative user memory knowledge graph. This formulation extends existing state tracking beyond the boundary of a single dialog to user state tracking (UST). For this study, we create a new Memory Graph (MG) <-> Conversational Recommendation parallel corpus called MGConvRex with 7K+ human-to-human role-playing dialogs, grounded on a large-scale user memory bootstrapped from real-world user scenarios. MGConvRex captures human-level reasoning over user memory and has disjoint training/testing sets of users for zero-shot (cold-start) reasoning for recommendation. We propose a simple yet expandable formulation for constructing and updating the MG, and an end-to-end graph-based reasoning model that updates MG from unstructured utterances and predicts optimal dialog policies (eg recommendation) based on updated MG. The prediction of our proposed model inherits the graph structure, providing a natural way to explain policies. Experiments are conducted for both offline metrics and online simulation, showing competitive results.


Introduction
Conversational recommendation system has recently gained traction in the dialog community, in which the model aims to learn up-to-date (online) user preferences, instead of using static (offline) preferences as in the traditional recommender systems (e.g.collaborative filtering (CF)).Most existing works focus on combining a static recommender system with a dialog system by updating user preferences via asking relevant questions (often referred as "System Ask User Respond (SAUR)" (2) Conversational recommendation allows users to express preferences and requirements through dialogs.(3) Our MGConvRex corpus is grounded on user memory, which represents user's past history as well as newly added preferences.(Zhang et al., 2018).However, this "short-term" update in the model unnaturally isolates users' history and their preference in the current dialog (that are possibly forgotten after the dialog is finished).An intelligent system should be able to dynamically maintain and reason over users' knowledge for current (and possibly future) recommendations.
To this end, we introduce a novel concept called user memory graph to holistically represent the knowledge about users and associated items.This user memory graph may contain any static knowledge obtained offline (e.g.items, attributes, the history of users and past dialogs) and users' knowledge online (e.g. from state tracking of the current dialog), as illustrated in Figure 1.User memory graph naturally has the following benefits.(1) Holistic reasoning considers available knowledge about users and items all together to generate dialog policy.We believe this is the core problem in conversational recommendation because asking a good question or finding a good candidate item needs to explore the "soft match" of the knowledge between users and items2 (Zhang et al., 2018).( 2) Zero-shot (cold-start) reasoning for users/items unseen during training.User memory graph naturally separates user/item knowledge from the reasoning process of policy.As a result, one can train a user/item agnostic model that can be later applied to the user memory graph for a new user (obtained after the model is deployed).In contrast, most CFbased system "overfits" to existing users / items (in their embeddings).(3) Open space policy is a key challenge in conversational recommendation because of the innumerable items involved in dialog policy.This requires a flexible space of policy to cover all items (and possibly all valid values and slots3 for acquiring preference) instead of a pre-defined fixed space.User memory graph can be a basis for policy because it contains all these valid entities for the current dialog.In summary, this paper aims to address the following problem: User Memory Reasoning for Recommendation: Assuming an agent involved in a conversational recommendation with a user e u .The agent (1) constructs4 a user memory graph G 0 = {(e, r, e )|e, e ∈ E, r ∈ R} based on history knowledge H of e u , candidate items C, and their associated slots and values, and then, (2) without loss of generality, updates G x−1 with new knowledge from the x-th turn d x ∈ D, in the form of tuples G x ← G x−1 ∪{(e, r, e ), . . .} ; (3) performs reasoning over G x to yield a dialog policy π x that either (i) performs more rounds of interaction to collect users' knowledge (e.g. via question answering), or (ii) recommends items T ⊂ C to the user.
To this end, we first collect a dataset for this problem as existing public datasets may hardly meet the needs of this paper for the following reasons.(1) Lacking users' history and thus dialogs referring to the history (e.g. the 2nd and 4th turn in Figure 1).One reason is that most datasets aim for task-oriented systems, where users' history and reasoning are not core issues to solve.(2) Lacking fine-grained annotation (for updating the user memory graph).Most public datasets for conversational recommendation are combinations of the datasets for recommender systems and dialogs transcribed separately (Li et al., 2018a;Zhang et al., 2018).The process is not designed for knowledgegrounded dialogs and leads to the hardness of annotating entity-level knowledge.(3) Lacking humanlevel reasoning.The goal of transcribing for existing datasets is not to reason over existing knowledge from both users and items.Some actions are taken at the transcribers' will (Li et al., 2018a).The collected dataset is called Memory Graph ↔ Conversational Recommendation (MGConvRex), containing 7.6K+ dialogs with 73K turns based on real-world users' behavior.It is annotated with dialog acts, items, slots, values, and sentiment polarities that captures human-level reasoning of dialog policy (see Section 3 and Appendix for more details of data collection).
To construct the user memory graph, we define a simple yet flexible ontology, as detailed in Section 4. One challenge in conversational recommendation is to deal with the open space policy.This needs a flexible formation of policy space that differs dialog-by-dialog.We propose a baseline called user memory graph reasoner (UMGR), which preserves the structure of the user memory graph during reasoning and generates policy based on the graph.This also potentially allows for the interpretability of dialog policy.
In summary, the contribution of this paper is as following: (1) We propose a novel task of user memory reasoning for conversational recommendation; (2) We collect a dataset and propose an ontology to construct user memory graph; (3) We propose a baseline for reasoning dialog policy over the user memory graph.Experimental results show that such a reasoning model is promising.

Related Work
Conversational Recommendation is one important type of information seeking dialog system (Zhang et al., 2018).Existing studies focus on combining a recommender system with a dialog state tracking system, through the "System Ask User Respond (SAUR)" paradigm.Once enough user preference is collected, such systems often make personalized recommendations to the user.
For instance, (Li et al., 2018a) proposes to mitigate cold-start users by learning new users' preferences during conversations and linking the learned preferences to existing similar users in a traditional recommender system.(Sun and Zhang, 2018) propose to updates a recommender system in the latent space with the latent space of dialog state tracking and tune the dialog policy via reinforcement learning.The updates are short-term and very close to a task-oriented dialog system.(Kang et al., 2019) propose a self-play reinforcement learning (RL) setting to boost the performance of a text-to-text dialog model.(Zhang et al., 2018) leverages reviews to mimic online conversations to update an existing user's preference and re-rank items.In (Misu et al., 2010), the user memory/knowledge is represented as a probabilistic state with a fixed hierarchical structure of Markov probabilistic model to predict dialog actions.However, it lacks the flexibility for encoding richer and fine-grained knowledge and accumulating new knowledge about users for long-term use.(Zhou et al., 2020) demonstrate the usage of user profile and users' interests from ongoing dialog in a social chatbot.To the best of our knowledge, none of the existing systems (or datasets) aims to build an explicit user memory for reasoning and long-term use.
Task-oriented Dialog Systems are widely studied with multiple popular benchmark datasets (Henderson et al., 2014;Wen et al., 2016;Budzianowski et al., 2018;Eric et al., 2019;Rastogi et al., 2019).Most of the state-of-the-art approaches (Wu et al., 2019;Gao et al., 2019;Chao and Lane, 2019) focus on improving dialog state tracking with spanbased pointer networks for unseen values, which predicts information that is essential for completing a specified task (e.g.hotel/air ticket booking, etc.).Datasets for task-oriented systems typically lack users' history, probably because users' history is not very important to correctly locate a record for the current dialog.Although certain types of dialog act, slots, and values are shareable for both taskoriented system and conversational recommendation, the core problem of conversational recommendation is to reason and to rank items or questions to ask.
Graph Reasoning is essential for generating dialog policy from the proposed user memory graph, where the graph can be viewed as a structured form of state representation.There are many studies on leveraging knowledge graphs for recommender systems.For example, (Xian et al., 2019) introduced a graph-based recommender (not dialog) system that is trained via reinforcement learning.Graph neural networks are popular in recent years, which aim to learn hidden representations over discrete graph structures (Scarselli et al., 2008;Duvenaud et al., 2015;Defferrard et al., 2016;Kipf and Welling, 2016).It is leveraged in this paper to learn structure-preserving (and thus explainable) reasoning.A number of extensions to the original graph neural network have been proposed (Li et al., 2015;Pham et al., 2017), most notably R-GCNs (Schlichtkrull et al., 2018), which can be applied to large-scale and multi-relational graphs (relations are associated with typed embeddings).
A few works have recently been proposed to allow knowledge graph reasoning in dialog systems.(Moon et al., 2019a,b) propose a new corpus to learn knowledge graph paths that connect dialog turns.(Tuan et al., 2019) introduces a knowledgegrounded dialog generation task given a knowledge graph that is dynamically updated.However, these works often focus on response generation and do not address the reasoning of user knowledge in conversational recommendations.

MGConvRex Dataset
This section describes the construction of the MG-ConvRex dataset.MGConvRex aims to contain dialogs that draw relevance of the user's history and fine-grained user preferences to update the user memory graph.As such, we propose to leverage existing data from recommender systems5 that carry users' past behavior to harvest large-scale dialog scenarios.Then we define fine-grained dialog acts, slots, values and sentiment polarities to turn unstructured utterances into structured knowledge for memory graph updates.
This section is organized as follows.
(1) We detail the curation of dialog scenarios in Section 3.1.(2) We then define structured knowledge such as dialog acts, slots, values, and sentiment polarities for MGConvRex, as detailed in Section 3.2.
(3) Next, we describe the process for transcribing human-to-human simulated dialogs in a Wizard-of-Oz environment (Henderson et al., 2014;Wen et al., 2016;Budzianowski et al., 2018;Eric et al., 2019) (Section 3.3).( 4) Lastly, we define the ontology for annotating the structured knowledge in utterances, and provide the statistics of the dataset in Section 3.4.As a result, MGConvRex can be used for a broader scope of research in conversational recommendation, includes but not limited to policy reasoning, natural language understanding (e.g.intent detection, slot filling, sentiment analysis), natural language generation, etc.

Dialog Scenarios
We use scenario to refer to a pre-defined user-agent setting to collect a dialog between two crowd workers, where one plays the user and the other plays the agent.Scenarios in conversational recommendation can be generated from user behaviors in the datasets of recommender system.This mitigates the needs of curating synthetic dialog scenarios as in datasets for task-oriented dialog system (Li et al., 2016(Li et al., , 2018b)).
We assume each item is associated with values and each value is associated with at least one slot.Let B = {0, 1} be a binary number.We define a scenario consisting of the following parts: (e u , C, H, V, P, T ), where e u is a user, C ∈ B |C|×|V| is about the candidate items C and their associated values V, H ∈ B |H|×|V| is about users past history (e u visited items H6 and their values) that is known to the agent, V ∈ B |V|×|S| indicates values with their associated slots, P ∈ B |S|×|V| is the user preference (which value the user prefer for a slot) and T ⊂ C is the ground-truth items.
We create dialog scenarios as the following way: To ensure difficulty of human reasoning, we choose |C| ∈ [10, 20] candidate items and enforce certain similarity over candidate items (such as all locations are from the same state) as the groundtruth items.For the same user, we also create a duplicated scenario except that |H| = 0, where the agent player can only use knowledge from the current dialog for recommendation.

Dialog Acts, Slots, Values and Sentiment Polarities
We further define the following knowledge for curating structured information for graph updates.Dialog Acts (A): Table 1 demonstrates the dialog acts for both the user and the agent.Note that besides the System Ask User Respond (SAUR) paradigm (Sun and Zhang, 2018;Li et al., 2018a;Zhang et al., 2018), we also propose a User Ask -System Respond (UASR) paradigm that allows users to actively participate in a recommendation.Acts such as Open question, Yes/no question and Inform are designed for this purpose.

Slots and Values(S, V):
We select |S| = 10 popular slots with a total of 470+ values for the restaurant domain.To help transcribers use some values naturally in utterances, we change some values (such as price ranges $) into English words ("cheap" etc.).Sentiment Polarity: We define a user's preference expressed in a conversation as pairs of opinion targets (an item or a value) and their associated sentiment polarities (Hu and Liu, 2004).We adopt 3 types of polarities pos on, neg on and neu on to represent positive, negative and neutral polarity, respectively8 .

Wizard-of-Oz Collection and Annotation
We build a wizard-of-oz system to randomly pair two crowd workers to engage in a chat session, where each scenario is split into two parts: (P, T ) for the user and (e u , C, H, V ) for the agent.The goal of a conversation is like a game between the user and the agent, where the agent needs to reason the user's current preference and find the groundtruth item and the user can tell information from preference P or confirm a recommended item e i ∈ T but cannot tell the ground-truth directly.The guidelines, screenshots of the Wizard-of-Oz UI can be found in the Appendix.

Summary of MGConvRex
We annotate dialog acts, items, slots, values, and users' utterance-level and entity-level sentiment.
The dialogs are split into training, development, and testing sets with non-overlapping users for zero-shot reasoning on unseen users.The statistics of MGConvRex are in Table 2.For scenarios with users' history, we notice that the average number of turns are slightly shorter than those without users' history.We further plot agent's dialog acts to study the behavior of the agent players, as in Figure 1, where agent players seem to use more yes/no questions to confirm users' preference exhibit in history.We discuss more details in Appendix.

User Memory Graph
In this section, we describe the formulation of a user memory graph based on a scenario and annotated user preference.There are many design choices for constructing a user memory graph.Our goal is to model user knowledge and scenarios with extensibility and maintenance.

Construction
As a reminder, a user memory graph is denoted as G = {(e, r, e )|e, e ∈ E, r ∈ R}, which is essentially a heterogeneous graph with typed entities and relations.We first define the ontology (or meta entities and relations) in Table 3.The user memory contains available items I for a dialog scenario.An item i can be associated with multiple values vs with r has aspect relation.Each value is associated with their slot s via r is a relation.In this way, values / slots entities are rather expandable and new values or slots (or even slots of slots) can be easily added in.Further, each user has their own entity e u and several associated memory entities ms.We define memory entity to model an event or experience of the user, such as visiting a restaurant (via entity m history ), or having a conversation as in current dialog (via m cur dialog ).The advantage of allowing    4.
multiple memory entities is that a user may have different opinions for the same target (items or values) from their very different experiences (e.g.like Thai food for lunch but not dinner).To express a user's history on visited items, we use a r visited relation to connect a memory entity with a visited item.
As an example, we demonstrate the construction of a user u Bob in the first graph in Figure 3.We will keep use this example to demonstrate the updates of user memory graph for the dialog in Table 4.

Update
The updates of user memory graph is assumed 9 to leverage the outputs of natural language understanding (NLU) or state tracking.For simplicity, we use 3 sentiment relations r pos on , r neg on and r neu on to update a user memory graph, which associate values/items (opinion target) with the memory entity of the current dialog m cur dialog .We believe humans have a more complex memory system in their brains.We expect more complex (such as error correction) memory update systems in future work.
From the first turn of the user in Table 4, we know that u Bob likes Thai food and the user memory graph is updated with a new triple (m cur dialog , r pos on , v Thai ).Following the second turn of the user, we know that u Bob is still interested in v affordable restaurants, indicated by a new triple (m cur dialog , r pos on , v affordable ).Then the agent can infer a recommendation i Basil , which can be explained by paths: (1) u Bob → r has memory → m cur dialog → r pos on → v Thai → r has aspect → i Basil , (2) u Bob → r has memory → m cur dialog → r pos on → v affordable → r has aspect → i Basil , and (3) u Bob → r has memory → m history → r visited → 9 We leave language understanding parts to future work and the baselines of this paper use ground-truths from annotations.

Role Utterance
Agent what kinds of food do you like ?User I like Thai food.Agent are you still interested in affordable restaurant ?User yes.Agent how about Basil, which is affordable and serves Thai food.Table 4: An example dialog corresponds to the graph updates in Figure 3.
v Seas → r has aspect → v affordable → r has aspect → i Basil , where the last path draws the relevance from a visited item to the current recommendation.As we can see, sentiment relations serve as the bridge to connect a user to items and enables potential reasoning for recommendation.

User Memory Graph Reasoner
In this section, we propose a model called User Memory Graph Reasoner (UMGR), which uses user memory graph to reason dialog policy (Figure 4).As discussed in the introduction, we aim to resolve the issue of open space policy in conversational recommendation.We define the inputs/outputs as following, which maps certain entities from user memory graph to policy space.Input: (1) past dialog acts up to the current turn from the user a; (2) updated user memory graph G x .Output: dialog policy π = (ŷ A , ŷC , ŷS , ŷV ) for the current turn, where A, C, S, V indicate the space of dialog acts, candidate items, slots and values, respectively.
Note that ŷC , ŷS and ŷV can be interpreted as the arguments of dialog acts and are essentially rankings over their corresponding entity sets.For example, when ŷA = Recommendation, the top-1 entity arg max e i ∈C (ŷ C ) will be provided to the user.Similarly, ŷA = Open Question is related to the top-1 slot arg max es∈S (ŷ S ) and ŷA = Yes/no Question is related to the top-1 value arg max ev∈V (ŷ V ).As such, the policy space of UMGR can be determined by the user memory graph where only valid entities can be generated.A structure-preserving model is preferred for reasoning where all entities in policy are generated as a holistic reasoning process.
We let UMGR first encodes past dialog acts a and entities e j ∈ E into hidden dimensions.
where W A and W E are embedding layers and the past dialog acts are further encoded by an LSTM encoder.Then we incorporate a Relational Graph Convolutional Networks (R-GCN) (Schlichtkrull et al., 2018) into UMGR for reasoning.R-GCN is a GCN (Kipf and Welling, 2016) with typed relations, where each relation is associated with their own weights to enable reasoning over a heterogeneous graph.Each entity is encoded by multiple layers of R-GCN as following: where h (l) j is the hidden state of entity e j in the l-th layer of R-GCN, N r j is entity e j 's neighbors in relation type r and W (l) r is the weight associated with r in the l-th layer to transform one neighbor h (l) k .The R-GCN layer updates the hidden states of each entity with the incoming messages in the form of their neighbors' hidden states type-by-type.Then R-GCN sums over all types before passing through the GELU activation (Hendrycks and Gimpel, 2016).The hidden state of entity e j in the (l + 1)-th layer is computed via a residual connection (He et al., 2016) (to keep the original entity information instead of just neighbors' information) and layer normalization.
The hidden states from the last layer of R-GCN is passed into an aggregation layer.
where W ag and b ag are weight for aggregation layer.The purpose of having an aggregation layer is to leverage the information in the user memory graph for predicting the dialog acts.The loss for dialog acts is defined as where ⊕ is the concatenation operation, W A merges the hidden states of dialog acts and graph, MLP A (•) is a multi-layer perception for dialog acts and y A is the label of dialog act.Further, all item, slot and value entities are trained by log loss for ranking.For example, the loss for candidate items C is computed as where MLP I (•) is the multi-layer perceptron for item.Similarly, we obtain loses L S , L V for slot entities S and value entities V, respectively.The total loss is the sum over all losses for dialog acts, items, slots and values: where α, β, γ and δ are hyper-parameters to balance losses of different scales.Note that during training and prediction, all invalid entities (e.g.not appear in a user memory graph) are masked out.
As we can see, unlike traditional recommender systems, UMGR has no assumption on users/items in training set and provides the capability of zero-shot reasoning.The policy space is open-ended because entities in policy is determined by the rankings of entities in user memory graph instead of a predefined set for the model.
This section conducts experiments on baselines for reasoning dialog policy.

Evaluation Metrics
We propose the following metrics to evaluate UMGR both offline (against the collected testing dialogs) and online (against a user simulator running on testing scenarios in MGConvRex.

Offline metrics
We propose the following offline metrics to evaluate UMGR.Note that all offline metrics assume UMGR uses annotations (ground-truth) of past turns (e.g. on constructing a user memory graph).
Act Accuracy & F1 are reported for all predicted dialog acts against annotated turn acts in testing.
Entity Matching Rate (EMR, k@1, 3, 5) measures turn-level top-k entities against the testing set.These metrics evaluate only on correctly predicted dialog acts since the types of predicted entities (items, slots, or values) depend on the predicted dialog acts ŷA .Item Matching Rate (IMR) measures dialoglevel predicted items against the ground-truth items.

Online metrics
In addition to offline evaluation, we use a user simulator (see Appendix) to dynamically evaluate the performance of recommendation.This mitigates the assumption in offline metrics that all past turns are correct, which limits the interactive evaluation of conversations.Success Rate tracks whether the interaction with user simulators yields the ground-truth item e t .We use the scenarios for testing sets used for the offline evaluation.The maximum number of turns is simulated as 11.We ran simulations 3 times and average the results.

Compared Methods
RandomAgent: we implement a baseline agent that randomly picks a dialog act and randomly pick a candidate item/slot/value as the dialog policy.
RecAgent: this agent always chooses Recommendation as the optimal dialog act to enact and select a random item that has not been tried in candidate items (memorize all trials).This is a strong (yet annoying) rule-based baseline and does not collect or use any user preference.
Memory Network (Sukhbaatar et al., 2015;Bordes et al., 2016): we adapt memory network and encodes the user memory graph as triples.The memory can be updated as new triples added.Note that memory networks cannot deal with open space policy because of attention-based aggregation of triple memories.As such, we enumerate all possible combinations of dialog acts and entities in user memory as the space of policy.Specifically, all items in a scenario are indexed as i 1 , i 2 , . . . to differentiate candidate items for policy generation.
The inputs of the memory network are the encoded dialog acts (the same as UMGR).We adopt 5 hops for memory networks.
Pretrained Embeddings: we pre-train the graph embeddings and utilize these as graph encoder for predicting dialog policy (without R-GCN layers in UMGR).The graph embeddings are trained from all scenarios in the training set using the TransEbased graph prediction approaches (Nickel et al., 2016).While this approach is widely used in the related literature and carries cross-scenario knowledge, we show that using pre-trained graph embedding alone is sub-optimal for a particular user's scenario and the dialog policy needs to perform dynamic reasoning over the user memory graph.
UMGR (Proposed): this is the proposed model in Section 5. To enable zero-shot reasoning during inference, all items share the same embeddings and UMGR purely learns leverage the graph structure for reasoning policy.We adopt 5 layers of R-GCN and all sizes of hidden states are 384.The maximum number of past acts is set as 10.Factors of losses α, β, γ and δ are set as 1, 10, 10, 100 based on the scales of losses.We choose the batch size to be 160.We further investigate the following ablation studies on UMGR: -Prev.User Act Only: this ablation study only uses the most recent dialog act from the user.We use this to show how many past dialog acts are needed for policy generation.
-No Dialog Acts: this study removes the dialog acts encoder, investigating the importance of the dialog acts for recommendation.
-Static G: this study uses the initial user memory graph without any updates during the conversation.
We use this study to demonstrate that dynamic updates of the user memory graph are crucial for reasoning better dialog policy.Table 5: Results of both offline and online evaluation: EMR stands for entity matching rate, which compares all types of predicted entities against annotated ones when the dialog act is predicted correctly; IMR stands for item matching rate, which evaluates predicted items against the ground-truth item across all turns in a dialog.
Figure 5: Visualization of item-level conversational reasoning, given an example dialog.Darker color indicates more salient items for recommendation at each given turn (row), predicted by our UMGR model.

Results and Discussion
The results are summarized in Table 5.Overall, it can be seen that the proposed UMGR architecture outperforms other baselines in both offline and online evaluation.Ablations: Specifically, we notice that dynamically updating the user memory graph with users' new preference is crucial for a recommendation, as indicated by UMGR -static G that forbids updating user memory graph.It can also be seen that removing the previous dialog context does degrade the performance as expected (UMGR -Prev.User Act Only), although the UMGR architecture still maintains a competitive performance.Similarly, while UMGR -No Dialog Acts does not take past dialog acts as input, its results on non-act prediction metrics are relatively competitive.Both of these ablation studies indicate the user memory graph contains enough information for the model to perform dialog reasoning.UMGR vs. Memory Network.We notice that memory networks may not be suitable for complex reasoning over a user memory graph.This may be caused by the following reasons: (1) triples in memory are disconnected, which limits the possibility of joint reasoning of multiple triples; (2) memory network is not structure-preserving, which leads to hardness of aligning entities in triples with the output policy, such as ranking items; (3) existing research using memory network (Bordes et al., 2016;Eric and Manning, 2017;Madotto et al., 2018) assumed a static memory, which carries a great amount of knowledge from training to testing.Memory network may not be very suitable for our zero-shot reasoning where no user or item knowledge can be carried to testing directly.UMGR vs. Rule-based Agent.We notice that RecAgent is a good rule-based baseline regarding the performance of recommendation.One advantage of RecAgent is that it can easily remember the recommended items tried in previous turns.However, frequent acts of recommendation can be annoying to the user.UMGR vs. Pre-trained Graph Embeddings.
We confirm that static pre-trained graph embeddings provide general representations of memory graphs but have a limited capability of reasoning for a particular user's scenario.This study indicates UMGR has the capability for a personalized recommendation.
Discussion We first examine the generated dialog acts.UMGR typically asks a few questions and then makes a few recommendations.We observe that UMGR may make more recommendations than expected from agent workers in MGConvRex.This may be caused by the frequent patterns of dialog acts in conversational recommendation: different types of non-recommendation acts are frequently followed by a recommendation act.As a result, a neural network prefers frequent patterns to diverse details of reasoning.We believe more diverse and detailed reasoning is an important direction to improve in the future.Meanwhile, we argue that human performance on reasoning is very limited given the vast amount of candidate items in the real-world recommendation.Learning the behavior from humans is just a beginning.We expect research on automatic reasoning over large-scale user knowledge in future work.
Visualization of Item-level Reasoning.Figure 5 shows an example dialog in which the prominence scores of candidate items for recommendations at each turn, predicted by our model (darker color indicates more salient items for recommendation).
At the beginning of the dialog, the prominence scores (and thus the ranking among the candidate items) are soft-initialized to reflect the user's offline preferences, as indicated in the user memory graph.We can see that UMGR can almost predict the ground-truth item.As the dialog progresses and the system collects (or confirms) new user knowledge or a request (e.g.updated slots, opinions on recommended items "Toronto,ON", etc.. ), the proposed UMGR model dynamically updates the ranking of the relevant items, reflecting the online preferences.Overall, UMGR effectively incorporates both online and offline preferences through a structured user memory graph, allowing for natural interactions and accurate recommendations.

Conclusion
This paper proposes a novel problem of user memory graph reasoning for conversational recommendation.We expect to release a conversational recommendation dataset with a grounded user memory graph from the behaviors of real-world users.The proposed user memory graph has the benefits of accumulating knowledge for a user to reason dialog policy.We propose a baseline model called UMGR that performs reasoning over such a user memory graph in open space policy.UMGR is structurepreserving for policy generation and provides zero-shot reasoning capability for user memory graphs that have never been seen before.Experimental results demonstrate the effectiveness of UMGR over a wide spectrum of metrics.

A Appendix
This appendix contains two guidelines for building MGConvRex dataset: transcription guideline and annotation guideline, followed by the statistics of the dataset and a sample implementation of user simulator.
A.1 Transcription Guideline

A.1.1 Motivation
Getting irrelevant restaurant recommendations is a frustrating experience.The ideal recommendation system should be able to provide better recommendations by understanding your current needs, your restaurant preferences, and your restaurant history.

A.1.2 Overview
In this project, you will generate a dialog between an imaginary person (user) and an imaginary recommendation system (assistant 10 ).You will play one of the two roles, that will randomly be assigned to you.You will automatically get paired with someone else who will play the other role.
User: A user is expected to interact with an assistant to get a restaurant recommendation.The user will already know his/her general restaurant preferences and also the exact name of the restaurant he/she wants to go to.Further, information about restaurants that the user has visited in the past will be available and shown to the user.
Assistant: An assistant is expected to interact with the user and work towards recommending a restaurant the user wants to go to in the future.The assistant will have access to information about restaurants that the user has previously visited and a list of candidate restaurants.

A.1.3 Task
You will be randomly assigned a single role: either user or assistant.You will see your assignment in the top left corner of the screen, "You are: the user" or "You are: the assistant".
User: You will interact with the assistant, to get the correct restaurant recommendation from the assistant.You will be provided with the following information: • Restaurant preference over 10 characteristics (or slots).
• The restaurant you will go to: "Ground-Truth restaurant". 10We term agent as "assistant" in guidelines.• You will optionally have information about restaurants that you have visited in the past.
As a user player, you are expected to: • Answer the questions the assistant asks about your preference.
• Ask questions about the recommended restaurant to justify why you accept or reject the recommendation.
• If needed, use the information in your visited restaurant to help inform the assistant about your preference.
• The frequency of characteristics (or slots) shared by multiple restaurants are indicated in (...), e.g."(3) parking lot" means this user has been to 3 restaurants with parking lots.
• When you use information from your visited restaurants in one of your responses, make sure to click the "Use Fact" button.
Assistant: You will interact with the User, to give the desired recommendation (ground-truth restaurant) to the user.You will be provided with the following information: • Name of the user.
• A list of candidate restaurants, and their characteristics (slots).One of the restaurants in this list is the desired or ground-truth restaurant the user is looking for.• Optionally, the characteristics (slots) and values of the restaurants the user has visited (visited restaurants).(See the definitions of slots below).The frequency of slots shared by multiple restaurants are indicated in (...), e.g."(3) parking lot" means this user has been to 3 restaurants with parking lots.The visited restaurants' section may or may not be given to you.If it is given, your goal is to utilize (by clicking "Use Fact") the information from visited restaurants as much as possible to provide the desired recommendation to the user.
To make an efficient recommendation, you are expected to: • Ask the user questions about their restaurant preference.
• If the visited restaurants are available, investigate their slots and values to reduce the number of questions you may need.
• Recommend restaurants to the user based on your knowledge of their preference, their visited restaurants, the information of the candidate restaurants, and from the answers the user gives to your questions.
• Intelligently apply the information the user gives to you to guide your conversation.
• Recommend the desired restaurant.

A.1.4 Instructions
This section describes the details of transcription.
In general, transcribers are required to follow predefined dialog acts, slots and values, but free to make up utterances based on these pre-defined metadata.
Dialog Acts are the intents of one utterance from a player.Note that the user and assistant have their own set of dialog acts, as shown in You will need to write the values exactly as you see them in the UI, including the underscores " " and commas "," and excluding "|".For example, type "Bonfyre American Grille" but not "Bonfyre American Grille".The full lists of values and their slots are at the ends of guidelines.Items and their Names: Each item (restaurant) has an item name and has multiple values and their associated slots.An item is typically associated with a recommendation act from the assistant side.When recommending a restaurant (item), you are expected to mention the restaurant name (item name), which follows the same rule as writing a value in an utterance.

A.1.5 Important Notes
During transcribing, it is important to keep these things in mind: • A dialog can end with either a user or an assistant response.
• The person who plays the user, however, will be the one to terminate the session by pressing the button "Dialog is done!" • The user should NEVER give all of their preference to the assistant in a single utterance.
• The user should NEVER give the groundtruth restaurant to the assistant.
• When you use content from the visited restaurants in your response, make sure to click the corresponding "Use Fact" buttons before sending your response.The click will be recorded.
• If the user player has sent more than 10 responses (20 including the responses from the assistant), it is up to the user player to decide whether to stop the current dialog or to continue.
The following actions should be avoided.
• Do not engage in the transcribed dialog with the other person about the transcription task itself and do not go off-topic.
• Do not share any of your personal information.Always be "in your character", i.e., speak as the user or the assistant.
• NO INDECENCY / DISRESPECT / HA-RASSMENT.Keep your messages decent and respectful towards the other person.Any violations will result in a ban on further tasks.
• Do not directly copy any of the utterances from this guideline or UI.
• Do not repeat /template your answer, that is to say, do not create one set of responses ahead and then make small changes to them over and over.Please always generate unique and new responses.

A.1.6 Feedback
After the transcription of one dialog is over, both sides need to give feedback about the transcribed dialog, including: 1. Rate the dialog (1-5) based on the smoothness and coherence of the whole dialog and whether it closely follows this guideline.
2. Rate the other side (1-5): whether the other side closely follows this guideline.

A.2 Annotation Guideline
In this task, you will get a transcribed dialog between a user and an assistant, in which the assistant helps the user find the desired restaurant to go to.You will annotate the utterances with dialog acts, slots, values, item names and sentiment on values or item names.For your reference, the transcription guideline is detailed in Section A. This annotation task will be further supported by a QA process before and during the annotation to resolve hard cases.From Figure 9, we can see that an agent worker use both Open question and Yes/no questions to collect preference.Yes/no questions are more frequent in scenarios with history to confirm users' preferences.Figure 10 shows the distribution of slots for Open and Yes/no questions asked by the agent player.category, location, and price are their mainly used slots for collecting user preference and distinguish different candidate items in C. We further demonstrate one example dialog is shown in Table 6.

A.4 User Simulator
Our online evaluation is conducted against user simulators under a simulation environment in our developed framework.Here we brief one simulator as in Algorithm 1.Note that although our predefined dialog acts for user can be either passive or active (as in Table 1), we mostly focus on a pas-

Figure 1 :
Figure 1: A conceptual illustration of Memorygrounded conversational recommendation.(1) Past (offline) user preferences are captured as an initial Memory Graph (MG).(2) Conversational recommendation allows users to express preferences and requirements through dialogs.(3) Our MGConvRex corpus is grounded on user memory, which represents user's past history as well as newly added preferences.
(1) for each user, we draw |H| ∈ [5, 20] visited items and |T | = 1 7 items as the ground-truth items T .Use the values and its associated slots of the ground-truth items as user preference P .(2) negatively sample |C| − |T | items and combine them with the ground-truth items T as candidate items C.

Figure 2 :
Figure 2: Distribution of dialog acts from agent side: w/ his indicates scenarios have users' history.

Figure 3 :
Figure 3: User memory graph construction and updates based on the dialog in Table4.

Figure 6 :
Figure 6: Screenshot of transcription UI for User.

Figure 7 :
Figure 7: Screenshot of transcription UI for Assistant.
! Are you still living in the Phoenix,AZ location area.?YNQ User Yes.[pos on] ANS Assistant Ok great!Do you want a full bar alcohol with your meal?YNQ User No just beer and wine alcohol [pos on] are fine.ANS Assistant My system shows The Nash item restaurant.They also offer free wifi wifi.REC User Is the food cheap price [pos on] over there?I'm tight on budget.YNQ, inform Assistant It is on the cheap price side of the restaurants.ANS User Great.I'll try them out.Thanks.Reply & Thanks Assistant Thank you and enjoy your meal ThanksTable 6: An example dialog in MGConvRex with slots and sentiment polarities annotated.

Figure 8 :
Figure 8: Distribution of user dialog acts

Figure 9 :
Figure 9: Distribution of agent dialog acts

Figure 10 :
Figure 10: Distribution of slots asked in Open and Yes/no questions from the agent

Table 1 :
Dialog acts for agent and user A: the spans of items/slot values are italized.

Table 2 :
History # of Dial.# of Turns Avg.# of Turns # of Dial.Avg.# of Turns # of Dial.Avg.# of Turns Statistics of the Dataset: Dialogs w/ or w/o History indicates whether scenarios include visited items H.

Table 3 :
Ontology of user memory graph: bolded relations are used for graph updates or accumulation.

Table 1 .
Youcan only use these pre-defined dialog acts in your utterance.Slots refer to 10 pre-defined characteristics of restaurants.Values: one slot is further associated with multiple values, such as a slot Parking can take value street.Note that a slot can take multiple values at the same time.In the UI, these values are separated by "|".For example, Parking = street | garage means that a restaurant has both street and garage parking.DO NOT include "|" in your responses, instead, use one or multiple values naturally in the utterance.e.g."I prefer street or garage parking."You do not have to write out all the values of a slot in one utterance.For example, the category slot usually has many values and you do not need to list them all in your utterance.