On a Chatbot Conducting Dialogue-in-Dialogue

We demo a chatbot that delivers content in the form of virtual dialogues automatically produced from plain texts extracted and selected from documents. This virtual dialogue content is provided in the form of answers derived from the found and selected documents split into fragments, and questions are automatically generated for these answers.


Introduction
Presentation of knowledge in dialogue format is a popular way to communicate information effectively. It has been demonstrated in games, news, commercials, and educational entertainment. Usability studies have shown that for information acquirers dialogues often communicate information more effectively and persuade stronger than a monologue most of times (Cox et al., 1999, Craig et al., 2000.
We demo a chatbot that delivers content in the form of virtual dialogues automatically produced from plain texts extracted and selected from documents. Given an initial query, this chatbot finds documents, extracts topics from them, organizes these topics in clusters according to conflicting viewpoints, receives users clarification on which cluster is most relevant to them, and provides the content for this cluster. This content is presented in the form of a virtual dialogue where the answers are derived from the found and selected documents split into fragments, and questions are automatically generated for these answers.
A virtual dialogue is defined as a multi-turn dialogue between imaginary agents obtained as a result of content transformation. It is designed with the goal of effective information representation and is intended to look as close as possible to a genuine dialogue. Virtual dialogues as search results turn out to be more effective means of information access in comparison with original documents provided by a conventional chatbot or a search engine.

Dialogue Construction from Plain Text
To form a dialogue from text sharing information or explaining how to do things, we need to split it into parts which will serve as answers.
Then for each answer a question needs to be formed. The cohesiveness of the resultant dialogue should be assured by the integrity of the original text; the questions are designed to "interrupt" the speaker similar to how journalists do interviews. We employ a general mechanism of conversion of conversion a text paragraph of various styles and genres into a dialogue form. The paragraph is split into text fragments serving as a set of answers, and questions are automatically formed for some of these text fragments. The problem of building dialogue from text T is formulated as splitting it into a sequence of answers A = [A 1 …A n ] to form a dialogue [A 1 , <Q 1 , A 2 >, …, <Q n-1 , A n >], where A i answers Q i-1 and possibly previous question, and A i = T. Q i-1 needs to be derived from the whole or a part of A i by linguistic means and generalization; also some inventiveness may be required to make these questions sound natural. To achieve it, we try to find a semantically similar phrase on the web and merge it with the candidate question.
The main foundation of our dialogue construction algorithm is Rhetorical Structure Theory (RST, Mann and Thompson, 1988 Rhetorical relations between the EDUs are usually binary and anti-symmetric, which defines the main unites (nucleus) and the subordinate ones (satellite). Thus, once we split a text into EDUs, we know which text fragments will serve as answers to questions: satellites of all relations. Elaboration rhetorical relation is default and What-question to a verb phrase is formed. Background relation yields another What-question for the satellite '…as <predicate>-<subject>'. Finally, Attribution relation is a basis of "What/who is source" question.

On a Chatbot Conducting Dialogue-in-Dialogue
Boris Galitsky 1 , Dmitry Ilvovsky 2 , and Elizaveta Goncharova 2 1 Oracle Inc. Redwood Shores CA 2 National Research University Higher School of Economics boris.galitsky@oracle.com; dilvovsky@hse.ru; egoncharova@hse.ru A trivial approach to question generation is simple conversion of a satellite EDU into a question. But it would make it too specific and unnatural, such as 'the linchpin of its strategy handled just a small fraction of the tests then sold to whom?'. Instead, a natural dialogue should be formed with more general questions like 'What does its strategy handle?'.
An example of converting a text into a virtual dialogue is shown in Figure 1. First, the text is split into EDUs. They act as answers in the virtual dialogue. The questions generated on their basis are shown in angle brackets and bolded. Each leave of the discourse tree determining an EDU starts with 'TEXT'. Rhetorical relations (in italics) are followed by the tags 'LeftToRight' or 'RightToLeft' specifying dependency direction between the units, or which of the following unit is a nucleus and a satellite.
attribution (RightToLeft) <who provided the evidence of responsibility?> TEXT: Dutch accident investigators say TEXT: that evidence points to pro-Russian rebels as being responsible for shooting down plane .
contrast (RightToLeft) attribution (RightToLeft) TEXT: The report indicates joint TEXT: where the missile was fired from elaboration (LeftToRight) <what else does report indicate?> TEXT: and identifies TEXT: who was in control and pins the downing of the plane on the pro-Russian rebels . elaboration (LeftToRight) attribution (RightToLeft) TEXT: However , the Investigative Committee of the Russian Federation believes elaboration (LeftToRight) TEXT: that the plane was hit by a missile from the air <where was it produced?> TEXT: which was not produced in Russia . attribution (RightToLeft) TEXT: At the same time , rebels deny <who denied about who controlled the territory?> TEXT: that they controlled the territory from which the missile was supposedly fired The scheme of building a dialogue from text process is shown in Figure 2. Each paragraph of a document is converted into a dialogue via building a communicative discourse tree for it and then generating questions from its Satellite Elementary Discourse Units. Current chatbot is development of the previously built tool that conducted task-oriented conventional dialogues (Galitsky et al., 2017).

Evaluation of Effectiveness
Evaluating the effectiveness of information delivery via virtual dialogues, we compare the conventional chatbot sessions where users were given plain-text answers, and the ones where users were given a content via virtual dialogues. The results on comparative usability of conventional dialogue and virtual dialogue are given in Table 1. We assess dialogues with respect to following usability properties averaged over the number of experiments: The speed of arriving to the sought piece of information (first column). It is measured as a number of iteration (a number of user utterances) preceding the final reply of the chatbot provided an answer wanted by the user. We measure the number of steps only if the user confirms that she accepts the answer.
The speed of arriving to decision to commit a transaction, such as purchase or reservation, or product selection (second column). A user is expected to accumulate sufficient information, and this information, such as reviews, should be convincing enough for making such decision. The less these values are the more relevant information was delivered via the dialogue.
We also measure how many entities (in linguistic sense) were explored during a session with the chatbot (third column). We are interested in how thorough and comprehensive the chatbot session is, how much a user actually learns from it. This assessment is sometimes opposite to the above two measures but is nevertheless important for understanding the overall usability of various conversational modes.
We do not compare precision and recall of search sessions with either dialogue mode since the same information is delivered, but in distinct modes.
In the first and second rows, we assess the standalone systems. One can observe that virtual dialogues take less iteration on average for information access (4.1 compared to 4.6) and a little less number of iterations for decisions than conventional dialogues do (6.0 and 6.3 respectively).
In the bottom two rows, we observe the usability of the hybrid system. Notice that the bottom row corresponds to the inverse architecture, where virtual dialog is followed by the conventional one. This scenario proceed from right to left, so, the first step's results are shown in three last columns of the table, then the values of the first three columns are calculated. When a conventional dialogue is followed by a virtual one, a lower portion of users is satisfied by the first step in comparison to the inverse architecture. Thus, the latter accounts for much less iteration required by user to be satisfied with the answer and make a final decision.

Sample ChatBot session
We present an exploratory session that combines information delivery in both the traditional textual answers (conventional dialogue) and a virtual dialogue form. The chatbot session is shown in Figure 3.
The dialogue starts from the user question, 'advantages and new features of 5G'. The chatbot consults the sources (e.g. public URLs) and extracts the content from each page (or documents) expected to be relevant for the query. In this example seven URLs were processed, from domain-specific to general knowledge portals like Quora.com. Then the chatbot forms the list of topics extracted from these search results so that the user might select one of his interest.
Once the chatbot forms the topics for clarification of the user search intent, it shows them as a list. In Fig. 3 the list of topics proposed by the chatbot is underlined, the topics are numbered from 1to 5. The user selects his topic of interest and requests a specific answer via the topic number or the topic expression ('next stage in technology' or '[5]'). Once the answer is read, there are multiple options (yes/more/ … / virtual dialogue): • navigate to the next answer from the chatbot list; • navigate to a specific answer from the chatbot list; • reject this answer and attempt to reformulate the query; • reduce search to a specified web domain (such as quota.com, for example); • proceed in the same direction to more search results in the form of a virtual dialogue; • accept the answer and conclude the session.  In the example the user selects the last option and the chatbot builds a virtual dialogue. It is a conversation among imaginary people whereas the topic stays the same, matching the original query. The virtual dialog is shown in the bottom frame (Fig. 3). As long as an imaginary chatbot responds to the same person, the dialog is intended to stay cohesive; coreferences in the follow-up questions are maintained. The main dialogue can be viewed as a one in the meta-level, and the object-level dialogue is naturally embedded into the meta-level one. Now the user can either browse the built virtual dialogue or search it to find a fragment of conversation which is relevant to the user current exploration intent. If the user types the query 'Are the features right for me?', he is directed to the virtual dialogue fragment where some other users are discussing if the technology is 'right for them'. The search matches the query either against the fragments of an original text, generated questions, or both. (Piwek et al 2007) were pioneers of automated construction of dialogues, proposing Text2Dialogue system. The authors provided a theoretical foundation of the mapping that the system performs from RST structures to Dialogue representation structures. The authors introduced a number of requirements for a dialogue generation system (robustness, extensibility, and variation and control) and reported on the evaluation of the mapping rules.

Related Work and Conclusions
An important body of work concerns tutorial dialogue systems. Some of the work in that area focuses on authoring tools for generating questions, hints, and prompts. Typically, these are, however, single utterances by a single interlocutor, rather than an entire conversation between two agents. Some researchers have concentrated on generating questions together with possible answers such as multiple choice test items, but this work is restricted to a very specific type of Q/A pairs (Mitkov et al 2006).
Dialogue acts are an important source which differentiates between a plain text and a dialogue. Proposed algorithm of virtual dialogues can assist with building domain-specific chatbot training datasets. Recently released dataset, DailyDialog (Li et al., 2017), is the only dataset that has utterances annotated with dialogue acts and is large enough for learning conversation models.
We proposed a novel mode of chatbot interaction via virtual dialogue. It addresses sparseness of dialogue data on the one hand and convincingness, perceived authenticity of information presented via dialogues on the other hand. We quantitatively evaluated improvement of user satisfaction with virtual dialogue in comparison to regular chatbot replies and confirmed the strong points of the former. We conclude that virtual dialogue is an important feature related to social search to be leveraged by a chatbot.
Chatbot demo videos (please, check 10 min video) and instructions on how to use it are available at https://github.com/bgalitsky/relevance-based-onparse-trees in the "What is new?" section.