Paving the way towards counterfactual generation in argumentative conversational agents

Counterfactual explanations present an effective way to interpret predictions of black-box machine learning algorithms. Whereas there is a signiﬁcant body of research on counter-factual reasoning in philosophy and theoretical computer science, little attention has been paid to counterfactuals in regard to their explanatory capacity. In this paper, we review methods of argumentation theory and natural language generation that counterfactual explanation generation could beneﬁt from most and discuss prospective directions for further research on counterfactual generation in explain-able Artiﬁcial Intelligence.


Introduction
Automatic decision-making systems using blackbox machine learning (ML) algorithms are now widely used in various complex domains from legislation (Greenleaf et al., 2018) to health care (Gargeya and Leng, 2017). However, such systems cannot be trusted blindly as their output often comes unexplained to end users (Rudin, 2018). As a result, there exists a lack of confidence in such automatic decisions caused by a low degree of their interpretability (Ribeiro et al., 2016).
The need for intelligent systems to explain their decisions has driven a decent amount of research in the past decades (Biran and Cotton, 2017). However, advances in social sciences impose novel challenges on explainable agents. For example, recent findings from cognitive science testify that the key feature of explanations is their contrastiveness (Miller, 2019), that is the ability to reflect on alternative scenarios of actually happened events. Whereas little research has been performed on generation of such counterfactual explanations, we believe that enabling virtual assistants and recommendation systems with the ability to generate them should increase greatly their acceptance among end users.
In this paper, we briefly review prospective methods for addressing the problem of counterfactual explanation generation. Subsequently, we aim to further shape the line of research devoted to counterfactual analysis for explainable Artificial Intelligence (AI) by pointing to the existing fieldspecific theoretical foundations and potential directions of its algorithmic design. As a result, this work supports a discussion on prospective methods for argumentative conversational agent development.
The rest of the manuscript is organised as follows. Section 2 inspects definitions of a counterfactual explanation and reviews existing generation approaches to counterfactual explanations. Section 3 describes the most prominent formal argumentation frameworks as a theoretical basis for counterfactual analysis. Section 4 discusses the classification of and recent advances in developing argumentative conversational agents in the context of counterfactual generator implementation. Finally, we conclude with outlining open challenges relevant for counterfactual explanation generation in section 5.

Counterfactual explanations
Explanations are argued to be contrastive (Miller, 2019). According to Miller, people are not satisfied with mere direct explanations in form of causal relations between the antecedent and consequent but also require to know why an alternative (or opposing) event could not have happened. Furthermore, Pearl and Mackenzie (2018) argue that it is the ability to produce such contrastive statements, referred to as counterfactuals, that lies on top of human reasoning.
In ML, a counterfactual explanation describes an alternative (hypothesised) situation which is as similar as possible to the original event in terms of its feature values while having a different outcome prediction ("the closest possible world") (Molnar, 2019). When searching for a suitable counterfactual explanation, the distance between a given piece of factual information and its counterpart is to be minimised while the outcome is different so that the counterfactual presumes only the most relevant alterations to the original fact. In addition, counterfactuals capture contextual information as they describe "a dependency on the external facts that led to a decision" (Wachter et al., 2018). As a result, explanations supported by counterfactuals are likely to gain acceptability by end users.
While the general understanding of the concept of counterfactuals is shared among researchers, there exist several interpretations of this phenomenon. As counterfactuals are generally assumed to have a clear connection with causation (Pearl and Mackenzie, 2018), they are often viewed as non-observable potential outcomes that would have happened in the absence of the cause (Shadish et al., 2002). In terms of causality, they are informally defined as conditional statements in the form: "If event X had not occurred, event Y would not have occurred" (Lewis, 1973). However, Wachter et al. (2018) propose a causationfree definition of an unconditional counterfactual statement based on the idea of subject's disbelief in a given hypothetical situation. On the other hand, counterfactuals are also sometimes referred to as "conditional connectives" in conditional logic (Besnard et al., 2013).
In recent years, there have been several attempts to approach the problem of counterfactual explanation generation. Wachter et al. (2018) suggested an approach for calculating counterfactuals based on the use of the Manhattan distance. Sokol and Flach (2018) adopted this approach to implement a counterfactual explanation generator for a decision tree-based AI system. In addition, Hendricks et al. (2018) proposed a model where candidate counterfactual pieces of evidence are selected from a set of all the noun-phrases of the corresponding textual descriptions of input images. Such evidence is then verified to be absent in the original image so that it can be used in the output counterfactual explanation. A rule-based system is then used to generate fluent negated explanations. Later, Birch et al. (2019) introduced an arbitrated dispute tree model arguing that the explanations generated by their model are indeed contrastive in accordance with the principles proposed by Miller (2019) as opposite outcomes are presented for all cases. Furthermore, the corresponding features and stages are explicitly found for cases opposing to the focus case.
As has been shown above, the problem of counterfactual explanation generation is concerned with several topics from philosophy, (computational) linguistics, and AI. While this leaves room for developing novel synergistic methods and algorithms that would combine insights from all the relevant fields, potential challenges when developing such tools are multiplied. For example, the fact that certain types of counterfactual explanations are preferred over their counterparts (Byrne, 2019) places further restrictions on newly developed frameworks as in designing heuristics for reducing the search space of the most relevant counterfactual explanations in accordance with such additional restrictive criteria.
In conclusion, counterfactual explanations are likely to enrich conversational interfaces of any system to be considered explainable. However, counterfactuals produced directly from ML algorithm predictions show a lack of coherence and appear unreliable from the ethical point of view (Kusner et al., 2017). Moreover, they usually do not involve a user in an extensive dialogic interaction, which makes them self-explanatory only in a limited number of cases. Therefore, we hypothesise that going deeper with their formalization is likely to overcome these weaknesses.

Formal argumentation
Formal argumentation (Baroni et al., 2018) provides practitioners with a natural form of counterfactual explanation formalization. Indeed, argumentation is claimed to mimic human reasoning (Cerutti et al., 2014). As such, it offers a set of tools that have become widely applicable to interpreting the output of ML algorithms. Formal argumentation embraces a wide range of theoretical frameworks from argumentation schemes (Walton et al., 2008) to dialogue games (Carlson, 1985), among others. In this paper, we focus on abstract argumentation (AA) frameworks as a prospective theoretical basis for counterfactual explanation generation.
While disregarding the internal structure of ar-guments, AA frameworks primarily deal with relations between arguments. The AA framework introduced in Dung (1995) is a pioneering theoretical framework, which has become well known. This AA framework is a directed graph (also referred to as "argument graph") formally defined as a pair AA = (A, R) where A is a set of arguments, R ⊆ A × A being a set of binary attack relations between pairs of arguments (a, b) ∈ R. In these settings, argument a is assumed to attack argument b. The acceptability of arguments is defined through numerous semantics in form of extensions over a conflict-free set of arguments, which is defined as a subset of all arguments that do not attack each other.
Due to its seeming simplicity, Dung's framework only presents the very basic argumentative constructs. Indeed, a number of extensions address this handicap. For example, some models attempt to extend the original Dung's argumentation framework by refining the concept of attacks between arguments allowing attack-to-attack relations (Modgil, 2007;Baroni et al., 2011). In contrast, a significant body of research aims to complement the nature of relations between arguments by incorporating supportive relations (Verheij, 2002;Amgoud et al., 2008).
It is worth noting that variants of AA have already been employed to address the problem of explanation generation. For example, Amgoud and Serrurier (2008) use the AA framework to resolve a binary classification task and motivate the outcome with arguments constructed, subsequently compared against each other, and ranked according to their strength.Šešelja and Straßer (2013) augment AA with explanatory features for scientific debate modelling. However, none of these works embodies counterfactual explanations.
Dung et al. (2009) proposed a conceptually novel instance of the AA framework which is known as the assumption-based argumentation (ABA) framework. Thus, ABA operates on a set of assumptions deducted via inference rules and reconsiders attack relations defined now as contraries to assumptions supporting the original argument. Following this approach, Zhong et al. (2019) implements an ABA multi-attribute explainable decision model that generates textual explanations on the basis of dispute trees. Notice that this model is claimed to be an argumentationbased framework to generate textual explanations for decision-making models. Nevertheless, while justifying why a particular decision is preferred over its counterpart, the model does not offer counterfactual explanations for rejected decisions.
Despite a rising interest towards counterfactual explanation generation in recent years, little work has been done in the direction of applying formal methods (including argumentation) to generation of counterfactual explanations. While most existing counterfactual frameworks make use of elements of causal inference, we find counterfactual statements naturally integrated into conditional logic-based (Besnard et al., 2013) as well as abstract argumentation (Sakama, 2014) frameworks. However, none of these frameworks governs any existing counterfactual explanation generation system so far.

Argumentative conversational agents
Argumentative frameworks can be embedded directly into chatbots or conversational agents to interact with end users. In terms of practical implementation, conversational agents are broadly divided into two main groups: retrieval-based and generative agents (Chen et al., 2017). On the one hand, a retrieval-based agent aims to select the most suitable response from the set of predefined responses that it contains given user's inquiry (Rakshit et al., 2017;Bartl and Spanakis, 2017). This kind of agents is based on the use of templates and produces grammatical utterances in all cases. However, such template-based text generators are expensive to develop and maintain due to immense expert labour resources required. On the other hand, generative models can form previously unseen utterances as they are trained from scratch without any templates in store (Li et al., 2016;Shao et al., 2017). Nevertheless, their generic responses limit their applicability to explainable AI problems.
The need for explanability of complex MLbased systems imposes additional requirements on conversational agents. Thus, automatically generated explanations are expected to be convincing enough in order to increase user's confidence in system's predictions with respect to the given task. This is hypothesised to lead to an indispensable shift of attention towards development of argumentative conversational agents (or argumentative dialogue systems) operating on a set of arguments as responses to user's inquiries. Further-more, such argumentation-based agents are considered to push the boundaries of the present-day conversational agents towards more human-like interaction (Dignum and Bex, 2018). In combination with recent advances in deep learning and reinforcement learning, the use of argumentation as a theoretical basis for conversational agents opens prospects for a new era of generative conversational agents (Rosenfeld and Kraus, 2016;Rach et al., 2019).
Finally, the issue of evaluation of argumentation-based conversational agents merges with those coming directly from the field of natural language generation (NLG) and explainable AI. At present, there is no unifying agreement on a set of evaluation metrics to be used neither within the NLG community (Gatt and Krahmer, 2018) nor within the explainable AI community (Adadi and Berrada, 2018). While common objective (automatic) and subjective (human-oriented surveys) metrics used for NLG evaluation are found in the literature on conversational agents and dialogue systems, novel metrics are regularly introduced for instances of argumentative chatbots (e.g., distinctiveness, as in (Le et al., 2018)) and counterfactual generators (e.g., accuracy with counterfactual text and phrase-error, as in (Hendricks et al., 2018)). Thus, a direct comparison between analogous agents becomes a particularly challenging task. As a possible solution, a combination of subjective and objective metrics is believed to be a reasonable starting point for a discussion on the choice of evaluation techniques. At the same time, automatically generated explanations are expected to be accurate, consistent, and comprehensible. As the perception of these properties is highly subjective, they cannot be measured (and therefore evaluated) directly and require further investigation.

Concluding remarks
Our literature review has revised the foundations of current approaches to counterfactual explanation generation. The limitations found call for some potential areas for improvement on the development of explainable AI systems.
First, there is no single definition of a counterfactual explanation. While counterfactuals have various interpretations in the literature, we find it particularly important to suggest a uniform definition that would not only capture all the properties of counterfactual explanations but also allow for designing a universal domain-independent framework for their generation.
Second, existing argumentation-based explanation generation models do not fully solve the problem of counterfactual explanation generation. While some of such models do not offer consistent explanations in textual form, others do not output contrastive explanations. Therefore, a more holistic counterfactual generation framework should be developed to close this gap.
Third, formal argumentation is rarely considered in present-day conversational agents. To the best of our knowledge, such argumentation-based agents do not consider incoming dialogic information received from the direct interaction with the user to contextualise their counterfactual explanations. However, processing such information may help to improve the quality of the offered counterfactual explanations making them more personalised. Therefore, capturing such contextual information presents another noteworthy line of research.
The aforementioned issues, along with others not discussed due to space limitations, show that the generation of counterfactual explanations is a timely but complex problem. In the future, we plan to address these issues by designing an argumentation-based dialogue protocol and developing a conversational agent ready to make use of the protocol to output accurate and consistent counterfactual explanations.