Attentive Interaction Model: Modeling Changes in View in Argumentation

We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder’s (OH’s) reasoning and a challenger’s argument, with the goal of predicting if the argument successfully changes the OH’s view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH’s reasoning that are amenable to change, and (2) interaction encoding, which identifies the relationship between the content of the OH’s reasoning and that of the challenger’s argument. Based on evaluation on discussions from the Change My View forum on Reddit, the two components work together to predict an OH’s change in view, outperforming several baselines. A posthoc analysis suggests that sentences picked out by the attention model are addressed more frequently by successful arguments than by unsuccessful ones.


Introduction
Through engagement in argumentative dialogue, interlocutors present arguments with the goals of winning the debate or contributing to the joint construction of knowledge. Especially modeling the knowledge co-construction process requires understanding of both the substance of viewpoints and how the substance of an argument connects with what it is arguing against. Prior work on argumentation in the NLP community, however, has focused mainly on the first goal and has often reduced the concept of a viewpoint as a discrete side (e.g., pro vs against, or liberal vs conservative), missing more nuanced and complex details of viewpoints. In addition, while the strength of the argument and the side it represents have been addressed relatively often, the dialogical aspects of argumentation have received less attention.
To bridge the gap, we present a model that jointly considers an Opinion Holder's (OH's) expressed viewpoint with a challenger's argument in order to predict if the argument succeeded in altering the OH's view. The first component of the architecture, vulnerable region detection, aims to identify important parts in the OH's reasoning that are key to impacting their viewpoint. The intuition behind our model is that addressing certain parts of the OH's reasoning often has little impact in changing the OH's view, even if the OH realizes the reasoning is flawed. On the other hand, some parts of the OH's reasoning are more open to debate, and thus, it is reasonable for the model to learn and attend to parts that have a better chance to change an OH's view when addressed.
The second component of the architecture, interaction encoding, aims to identify the connection between the OH's sentences and the challenger's sentences.
Meaningful interaction in argumentation may include agreement/disagreement, topic relevance, or logical implication. Our model encodes the interaction between every pair of the OH's and the challenger's sentences as interaction embeddings, which are then aggregated and used for prediction. Intuitively, the interactions with the most vulnerable regions of the OH's reasoning are most critical. Thus, in our complete model, the interaction embeddings are weighted by the vulnerability scores computed in the first component. We evaluate our model on discussions from the Change My View forum on Reddit, where users (OHs) post their views on various issues, partic-ipate in discussion with challengers who try to change the OH's view, and acknowledge when their views have been impacted. Particularly, we aim to answer the following questions: • RQ1. Does the architecture of vulnerable region detection and interaction encoding help to predict changes in view?
• RQ2. Can the model identify vulnerable sentences, which are more likely to change the OH's view when addressed? If so, what properties constitute vulnerability?
• RQ3. What kinds of interactions between arguments are captured by the model?
We use our model to predict whether a challenger's argument has impacted the OH's view and compare the result with several baseline models. We also present a posthoc analysis that illuminates the model's behavior in terms of vulnerable region detection and meaningful interaction. For the remainder of the paper, we position our work in the literature (Section 2) and examine the data (Section 3). Then we explain our model design (Section 4). Next, we describe the experiment settings (Section 5), discuss the results (Section 6), and conclude the paper (Section 7).

Background
Argumentation theories have identified important dialogical aspects of (non-)persuasive argumentation, which motivate our attempt to model the interaction of OH's and challenger's arguments. Persuasive arguments build on the hearer's accepted premises (Walton, 2008) and appeal to emotion effectively (Aristotle and Kennedy, 2007). From a challenger's perspective, effective strategies for these factors could be derived from the OH's background and reasoning. On the other hand, nonpersuasive arguments may commit fallacies, such as contradicting the OH's accepted premises, diverting the discussion from the relevant and salient points suggested by the OH, failing to address the issues in question, misrepresenting the OH's reasoning, and shifting the burden of proof to the OH by asking a question (Walton, 2008). These fallacies can be identified only when we can effectively model how the challenger argues in relation to the OH's reasoning.
While prior work in the NLP community has studied argumentation, such as predicting debate winners (Potash and Rumshisky, 2017;Zhang et al., 2016;Wang et al., 2017;Prabhakaran et al., 2013) and winning negotiation games (Keizer et al., 2017), this paper addresses a different angle: predicting whether an argument against an OH's reasoning will successfully impact the OH's view. Some prior work investigates factors that underlie viewpoint changes (Tan et al., 2016;Lukin et al., 2017;Hidey et al., 2017;Wei et al., 2016), but none target our task of identifying the specific arguments that impact an OH's view.
Changing an OH's view depends highly on argumentation quality, which has been the focus of much prior work.  reviewed theories of argumentation quality assessment and suggested a unified framework. Prior research has focused mainly on the presentation of an argument and some aspects in this framework without considering the OH's reasoning. Specific examples include politeness, sentiment (Tan et al., 2016;Wei et al., 2016), grammaticality, factuality, topic-relatedness (Habernal and Gurevych, 2016b), argument structure (Niculae et al., 2017), topics (Wang et al., 2017), and argumentative strategies (e.g., anecdote, testimony, statistics) (Al Khatib et al., 2017). Some of these aspects have been used as features to predict debate winners (Wang et al., 2017) and view changes (Tan et al., 2016). Habernal and Gurevych (2016a) used crowdsourcing to develop an ontology of reasons for strong/weak arguments.
The persuasiveness of an argument, however, is highly related to the OH's reasoning and how the argument connects with it. Nonetheless, research on this relationship is quite limited in the NLP community. Existing work uses word overlap between the OH's reasoning and an argument as a feature in predicting the OH's viewpoint (Tan et al., 2016). Some studies examined the relationship between the OH's personality traits and receptivity to arguments with different topics (Ding and Pan, 2016) or degrees of sentiment (Lukin et al., 2017).
The most relevant to our work is the related task by Tan et al. (2016). Their task used the same discussions from the Change My View forum as in our work and examined various stylistic features (sentiment, hedging, question marks, etc.) and word overlap features to identify discussions that impacted the OH's view. However, our task is different from theirs in that they made predictions on  initial comments only, while we did so for all comments replied to by the OH in each discussion. Our task is more challenging because comments that come later in a discussion have a less direct connection to the original post. Another challenge is the extreme skew in class distribution in our data, whereas Tan et al. (2016) ensured a balance between the positive and negative classes. The Change My View forum has received attention from recent studies.
For example, ad hominem (attacking an arguer) arguments have been studied, along with their types and causes (Habernal et al., 2018). Another study annotated semantic types of arguments and analyzed the relationship between semantic types and a change in view (Hidey et al., 2017). Although this work did not look at the interaction between OHs and specific challengers, it provides valuable insight into persuasive arguments. Additionally, the semantic types may potentially allow our model to better model complex interaction in argumentation.

Data
Our study is based on discussions from the Change My View (CMV) forum 2 on Reddit. In this forum, users (opinion holders, OHs) post their views on a 2 https://www.reddit.com/r/changemyview Opinion Holder (OH) CMV: DNA tests (especially for dogs) are bullshit. For my line of work (which is not the DNA testing), … I have NEVER seen a DNA test return that a dog is purebred, or even anywhere close to purebred. … these tests are consistently way off on their results. … My mother recently had a DNA test done showing she is 1/4 black. I believe this is also incorrect since she knows who her parents and grandparents are, and none of them are black. … Challenger 1 I'm not sure what exactly these particular DNA tests are looking at, but they are probably analyzing either SNPs or VNTRs. There's nothing stopping a SNP from mutating at any given generation, or a VNTR from shrinking or expanding due to errors during DNA replication. … The take-home message is that DNA testing isn't complete bullshit, but it does have limitations.

Challenger 2
Knowing your grandparents "aren't black" doesn't really rule out being 25% African American, genetically, because genes combine during fertilization almost completely randomly. … Basically, the biggest conclusion from this information is that race is only barely genetic. It's mostly a social construct. wide range of issues and invite other users (challengers) to change their expressed viewpoint. If an OH gains a new insight after reading a comment, he/she replies to that comment with a ∆ symbol and specifies the reasons behind his/her view change. DeltaBot monitors the forum and marks comments that received a ∆, which we will use as labels indicating whether the comment successfully changed the OH's view.
CMV discussions provide interesting insights into how people accept new information through argumentation, as OHs participate in the discussions with the explicit goal of exposing themselves to new perspectives. In addition, the rules and moderators of this forum assure high quality discussions by requiring that OHs provide enough reasoning in the initial post and replies.
We use the CMV dataset compiled by Tan et al. (2016) 3 . The dataset is composed of 18,363 discussions from January 1, 2013-May 7, 2015 for training data and 2,263 discussions from May 8-September 1, 2015 for test data.
Qualitative analysis We conducted qualitative analysis to better understand the data. First, to see if there are topical effects on changes in view, we examined the frequency of view changes across different topics. We ran Latent Dirichlet Allocation (Blei et al., 2003) with 20 topics, taking each discussion as one document. We assigned each discussion the topic that has the highest standardized probability. The most discussed topics are government, gender, and everyday life ( Figure 1a). As expected, the frequency of changes in view differs across topics ( Figure 1b). The most malleable topics are food, computers & games, clothing, art, education, and everyday life. But even in the food domain, OHs give out a ∆ in less than 10% of their replies in most discussions.
In order to inform the design of our model, we sampled discussions not in the test set and compared comments that did and did not receive a ∆. A common but often unsuccessful argumentation strategy is to correct detailed reasons and minor points of the OH's reasoning-addressing those points often has little effect, regardless of the validity of the points. On the contrary, successful arguments usually catch incomplete parts in the OH's reasoning and offer another way of looking at an issue without threatening the OH. For instance, in the discussion in Figure 2, the OH presents a negative view on DNA tests, along with his/her reasoning and experiences that justify the view. Challenger 1 addresses the OH's general statement and provides a new fact, which received a ∆. On the other hand, Challenger 2 addresses the OH's issue about race but failed to change the OH's view.
When a comment addresses the OH's points, its success relies on various interactions, includ-ing the newness of information, topical relatedness, and politeness. For example, Challenger 1 provides new information that is topically dissimilar to the OH's original reasoning. In contrast, Challenger 2's argument is relatively similar to the OH's reasoning, as it attempts to directly correct the OH's reasoning. These observations motivate the design of our Attentive Interaction Model, described in the next section.

Model Specification
Our Attentive Interaction Model predicts the probability of a comment changing the OH's original view, P (∆ = 1), given the OH's initial post and the comment. The architecture of the model ( Figure 3) consists of detecting vulnerable regions in the OH's post (sentences important to address to change the OH's view), embedding the interactions between every sentence in the OH's post and the comment, summarizing the interactions weighted by the vulnerability of OH sentences, and predicting P (∆ = 1).
The main idea of our model is the architecture for capturing interactions in vulnerable regions, rather than methods for measuring specific argumentation-related features (e.g., agreement/disagreement, contraction, vulnerability, etc.). To better measure these features, we need much richer information than the dataset provides (discussion text and ∆s). Therefore, our proposed architecture is not to replace prior work on argumentation features, but rather to complement it at a higher, architectural level that can potentially integrate various features. Moreover, our architecture serves as a lens for analyzing the vul-nerability of OH posts and interactions with arguments.
Formal definition of the model (Figure 3

(A) and (B)) Denote the OH's initial post by
, where x i is the ith sentence, and M O is the number of sentences. The sentences are encoded via an RNN, yielding a hidden state for the ith sentence s O i ∈ R D S , where D S is the dimensionality of the hidden states. Similarly, for a comment d C = (x C 1 , ..., x C M C ), hidden states of the sentences s C j , j = 1, · · · , M C , are computed. Vulnerable region detection (Figure 3 (A)) Given the OH's sentences, the model computes the vulnerability of the ith sentence g(s O i ) ∈ R 1 (e.g., using a feedforward neural network). From this vulnerability, the attention weight of the sentence is calculated as .
Interaction encoding (Figure 3 (C)) The model computes the interaction embedding of every pair of the OH's ith sentence and the comment's jth sentence, where D I is the dimensionality of interaction embeddings, and h is an interaction function between two sentence embeddings. h can be a simple inner product (in which case D I = 1), a feedforward neural network, or a more complex network. Ideally, each dimension of v i,j indicates a particular type of interaction between the pair of sentences.
Interaction summary (Figure 3 (D)) Next, for each of the OH's sentences, the model summarizes what types of meaningful interaction occur with the comment's sentences. That is, given all interaction embeddings for the OH's ith sentence, v i,1 , · · · , v i,M C , the model conducts max pooling for each dimension, where v i,j,k is the kth dimension of v i,j and u max i ∈ R D I . Intuitively, max pooling is to capture the existence of an interaction and its highest intensity for each of the OH's sentences-the interaction does not have to occur in all sentences of the comment. Since we have different degrees of interest in the interactions in different parts of the OH's post, we take the attention-weighted sum of u max i to obtain the final summary vector Prediction (Figure 3 (E)) The prediction component consists of at least one feedforward neural network, which takes as input the summary vector u max and optionally the hidden state of the last sentence in the comment s M C . More networks may be used to integrate other features as input, such as TFIDF-weighted n-grams of the comment. The outputs of the networks are concatenated and fed to the final prediction layer to compute P (∆ = 1). Using a single network that takes different kinds of features as input does not perform well, because the features are in different spaces, and linear operations between them are probably not meaningful.
Loss The loss function is composed of binary cross-entropy loss and margin ranking loss. Assume there are total N D initial posts written by OHs, and the lth post has N l comments. The binary cross-entropy of the lth post and its tth comment measures the similarity between the predicted P (∆ = 1) and the true ∆ as: where ∆ l,t is the true ∆ ∈ {0, 1} of the comment and P Θ is the probability predicted by our model with parameters Θ. Since our data is skewed to negatives, the model may overpredict ∆ = 0. To adjust this bias, we use margin ranking loss to drive the predicted probability of positives to be greater than the predicted probability of negatives to a certain margin. The margin ranking loss is defined on a pair of comments C 1 and C 2 with ∆ C 1 > ∆ C 2 as: where is a margin. Combining the two losses, our final loss is For the expectation in the ranking loss, we consider all pairs of comments in each minibatch and take the mean of their ranking losses.

Experiment
Our task is to predict whether a comment would receive a ∆, given the OH's initial post and the comment. We formulate this task as binary prediction of ∆ ∈ {0, 1}. Since our data is highly skewed, we use as our evaluation metric the AUC score (Area Under the Receiver Operating Characteristic Curve), which measures the probability of a positive instance receiving a higher probability of ∆ = 1 than a negative instance.

Data Preprocessing
We exclude (1) DeltaBot's comments with no content, (2) comments replaced with [deleted], (3) system messages that are included in OH posts and DeltaBot's comments, (4) OH posts that are shorter than 100 characters, and (5) discussions where the OH post is excluded. We treat the title of an OH post as its first sentence. After this, every comment to which the OH replies is paired up with the OH's initial post. A comment is labeled as ∆ = 1 if it received a ∆ and ∆ = 0 otherwise. Details are described in Appendix B.
The original dataset comes with training and test splits (Figure 1a). After tokenization and POS tagging with Stanford CoreNLP (Manning et al., 2014), our vocabulary is restricted to the most frequent 40,000 words from the training data. For a validation split, we randomly choose 10% of training discussions for each topic. We train our model on the seven topics that have the highest ∆ ratios (Figure 1b). We test on the same set of topics for in-domain evaluation and on the other 13 topics for cross-domain evaluation. The main reason for choosing the most malleable topics is that these topics provide more information about people learning new perspectives, which is the focus of our paper. Some statistics of the resulting data are in Table 1.

Inputs
We use two basic types of inputs: sentence embeddings and TFIDF vectors. These basic inputs are by no means enough for our complex task, and most prior work utilizes higher-level features (politeness, sentiment, etc.) and task-specific information. Nevertheless, our experiment is limited to the basic inputs to minimize feature engineering and increase replicability, but our model is general enough to incorporate other features as well.
Sentence embeddings Our input sentences x are sentence embeddings obtained by a pretrained sentence encoder (Conneau et al., 2017) (this is different from the sentence encoder in our model). The pretrained sentence encoder is a BiLSTM with max pooling trained on the Stanford Natural Language Inference corpus (Bowman et al., 2015) for textual entailment. Sentence embeddings from this encoder, combined with logistic regression on top, showed good performance in various transfer tasks, such as entailment and caption-image retrieval (Conneau et al., 2017).
TFIDF A whole post or comment is represented as a TFIDF-weighted bag-of-words, where IDF is based on the training data. We consider the top 40,000 n-grams (n = 1, 2, 3) by term frequency.
Word Overlap Although integration of handcrafted features is behind the scope of this paper, we test the word overlap features between a comment and the OH's post, introduced by Tan et al. (2016), as simple proxy for the interaction. For each comment, given the set of its words C and that of the OH's post O, these features are defined as |C ∩ O|, |C∩O| |C| , |C∩O| |O| , |C∩O| |C∪O| .

Model Setting
Network configurations For sentence encoding, Gated Recurrent Units (Cho et al., 2014) with hidden state sizes 128 or 192 are explored. For attention, a single-layer feedforward neural network (FF) with one output node is used. For interaction encoding, we explore two interaction functions: (1) the inner product of the sentence embeddings and (2) a two-layer FF with 60 hidden nodes and three output nodes with a concatenation of the sentence embeddings as input. For prediction, we explore (1) a single-layer FF with either one output node if the summary vector u max is the only input or 32 or 64 output nodes with ReLU activation if the hidden state of the comment's last sentence is used as input, and optionally (2) a single-layer FF with 1 or 3 output nodes with ReLU activation for the TFIDF-weighted n-grams of the com-ment. The final prediction layer is a single-layer FF with one output node with sigmoid activation that takes the outputs of the two networks above and optionally the word overlap vector. The margin for the ranking margin loss is 0.5. Optimization is performed using AdaMax with the initial learning rate 0.002, decayed by 5% every epoch.
Training stops after 10 epochs if the average validation AUC score of the last 5 epochs is lower than that of the first 5 epochs; otherwise, training runs 5 more epochs. The minibatch size is 10.

Input configurations
The prediction component of the model takes combinations of the inputs: MAX (u max ), HSENT (the last hidden state of the sentence encoder s C M C ), TFIDF (TFIDFweighted n-grams of the comment), and WDO (word overlap).

Baseline
The most similar prior work to ours (Tan et al., 2016) predicted whether an OH would ever give a ∆ in a discussion. The work used logistic regression with bag-of-words features. Hence, we also use logistic regression as our baseline to predict P (∆ = 1). Simple logistic regression using TFIDF is a relatively strong baseline, as it beat more complex features in the aforementioned task.

Input configurations
The model takes combinations of the inputs: TFIDF (TFIDF-weighted n-grams of the comment), TFIDF (+OH) (concatenation of the TFIDF-weighted n-grams of the comment and the OH's post), WDO (word overlap), and SENT (the sum of the input sentence embeddings of the comment).  information learned by our model and surfacelevel n-grams in TFIDF have strong predictive power, and attending to vulnerable regions helps. The highest score is achieved by our model (AIM) with both MAX and TFIDF as input (72.0%). The performance drops if the model does not use interaction information-(A)IM with HSENT (69.6%)-or vulnerability information-(A)IM with MAX+TFIDF (69.5%). TFIDF by itself is also a strong predictor, as logistic regression with TFIDF performs well (70.9%). There is a performance drop if TFIDF is not used in most settings. This is unsurprising because TFIDF captures some topical or stylistic information that was shown to play important roles in argumentation in prior work (Tan et al., 2016;Wei et al., 2016). Simply concatenating both comment's and OH's TFIDF features does not help (69.5%), most likely due to the fact that a simple logistic regression does not capture interactions between features.

Results
When the hand-crafted word overlap features are integrated to LR, the accuracy is increased slightly, but the difference is not statistically significant compared to LR without these features nor to the best AIM configuration. These features do not help AIM (70.9%), possibly because the information is redundant, or AIM requires a more deliberate way of integrating hand-crafted features.
For cross-domain performance, logistic regres-sion with TFIDF performs best (69.6%). Our interaction information does not transfer to unseen topics as well as TFIDF. This weakness is alleviated when our model uses TFIDF in addition to MAX, increasing the cross-domain score (from 67.5% to 69.4%). We expect that information about vulnerability would have more impact within domain than across domains because it may learn domainspecific information about which kinds of reasoning are vulnerable. The rest of the section reports our qualitative analysis based on the best model configuration.
RQ2. Can the model identify vulnerable sentences, which are more likely to change the OH's view when addressed? If so, what properties constitute vulnerability? Our rationale behind vulnerable region detection is that the model is able to learn to pay more attention to sentences that are more likely to change the OH's view when addressed. If the model successfully does this, then we expect more alignment between the attention mechanism and sentences that are actually addressed by successful comments that changed the OH's view.
To verify if our model works as designed, we randomly sampled 30 OH posts from the test set, and for each post, the first successful and unsuccessful comments. We asked a native English speaker to annotate each comment with the two most relevant sentences that it addresses in the OH post, without knowledge of how the model computes vulnerability and whether the comment is successful or not.
After this annotation, we computed the average attention weight of the two selected sentences for each comment. We ran a paired sample t-test and confirmed that the average attention weight of sentences addressed by successful comments was significantly greater than that of sentences addressed by unsuccessful comments (p < 0.05). Thus, as expected in the case where the attention works as designed, the model more often picks out the sentences that successful challengers address.
As to what the model learns as vulnerability, in most cases, the model attends to sentences that are not punctuation marks, bullet points, or irrelevant to the topic (e.g., can you cmv?). A successful example is illustrated in Figure 4. More successful and unsuccessful examples are included in Appendix C.
RQ3. What kinds of interactions between arguments are captured by the model? We first use existing argumentation theories as a lens for interpreting interaction embeddings (refer to Section 2). For this, we sampled 100 OH posts with all their comments and examined the 150 sentence pairs that have the highest value for each dimension of the interaction embedding (the dimensionality of interaction embeddings is 3 for the best performing configuration). 22% of the pairs in a dimension capture the comment asking the OH a question, which could be related to shifting the burden of proof. In addition, 23% of the top pairs in one dimension capture the comment pointing out that the OH may have missed something (e.g., you don't know the struggles ...). This might represent the challengers' attempt to provide premises that are missing in the OH's reasoning.
As providing missing information plays an important role in our data, we further examine if this attempt by challengers is captured in interaction embeddings even when it is not overtly signaled (e.g., You don't know ...). We first approximate the novelty of a challenger's information with the topic similarity between the challenger's sentence and the OH's sentence, and then see if there is a correlation between topic similarity and each dimension of interaction embeddings (details are in Appendix D). As a result, we found only a small but significant correlation (Pearson's r = −0.04) between topic similarity with one of the three dimensions.
Admittedly, it is not trival to interpret interaction embeddings and find alignment between embedding dimensions and argumentation theories. The neural network apparently learns complex interactions that are difficult to interpret in a human sense. It is also worth noting that the top pairs contain many duplicate sentences, possibly because the interaction embeddings may capture sentencespecific information, or because some types of interaction are determined mainly by one side of a pair (e.g., disagreement is manifested mostly on the challenger's side).
TFIDF We examine successful and unsuccessful styles reflected in TFIDF-weighted n-grams, based on their weights learned by logistic regression (top n-grams with the highest and lowest weights are in Appendix E). First, challengers are more likely to change the OH's view when talking about themselves than mentioning the OH in their the sat should not include trigonometry in their math section . . most colleges do not require trigonometry for admissions , and do not require students to take a trigonometry course . it seems unfair that the sat would include this in the math section . some will argue that it makes sure students are `` well rounded , '' but it 's incredibly unfair to use this to test a student 's aptitude for college . when i was in high school , i had an 89 % overall gpa . i got mid-range scores on the reading and writing sections of the sat , but did very poorly on the math section . because of this , i was denied admission to many colleges which i applied to . i understand that my scores in reading and writing were average , but it was the low math score which really hurt my chances of admission . this might seem like a personal argument , but the fact remains that i 'm sure many students would agree with me . i understand including algebra and geometry , but i do n't see why they include trigonometry . this is a person 's future which they are dealing with . edit : of the five colleges i applied to , i was rejected by two of them , but was accepted by three of them .
!=1 / P(!=1)=0.073 i get and understand that math is not your strong point , that 's great and fine , however it is mine . i got my undergrad in math and i am working on my masters in stats , but just because i do n't see myself as needing reading or writing that does not mean that others feel the same way . my personal opinion of the sat and act is less that is it to make a``well rounded '' person and more to set a bar for entrance into selective schools . to your opening point , the sat did not prevent you from going to college it just prevented you for attending a more selective college , one that desires a higher level of math knowledge than the ones that accepted you . it has little to do with you and more to do with the statistics of placing people . if someone has a better understanding of math they will be able to understand more things in general -LRB-all else being held constant -RRB-.
!=0 / P(!=1)=0.039 > i understand including algebra and geometry , but i do n't see why they include trigonometry . if you know geometry but not trigonometry , you do n't know much geometry . high school geometry classes are supposed to include trigonometry . a lot of applications of geometry in higher-level math and in subjects such as physics will require trigonometry . i do n't know how authoritative -LSB-this source -RSB--LRB-<UNK> -RRB-is , but it seems to be a pretty good list of geometry topics you should master before moving on to <UNK> .

OH's initial post
Two comments Figure 4: Example discussion with the OH's initial post (left), a successful comment (top right), and an unsuccessful comment (bottom right). The OH's post is colored based on attention weights (the higher attention the brighter). Sentences with college and SAT sections (reading, writing, math) get more attention than sentences with other subjects (algebra, geometry). The successful comment addresses parts with high attention, whereas the unsuccessful comment addresses parts with low attention.
arguments. For instance, first-person pronouns (e.g., i and me) get high weights, whereas secondperson pronouns (e.g., you are and then you) get low weights. Second, different kinds of politeness seem to play roles. For example, markers of negative politeness (can and can be, as opposed to should and no) and negative face-threatening markers (thanks), are associated with receiving a ∆. Third, asking a question to the OH (e.g., why, do you, and are you) is negatively associated with changing the OH's view.

Conclusion
We presented the Attentive Interaction Model, which predicts an opinion holder (OH)'s change in view through argumentation by detecting vulnerable regions in the OH's reasoning and modeling the interaction between the reasoning and a challenger's argument. According to the evaluation on discussions from the Change My View forum, sentences identified by our model to be vulnerable were addressed more by successful challengers than by unsuccessful ones. The model also effectively captured interaction information so that both vulnerability and interaction information increased accuracy in predicting an OH's change in view.
One key limitation of our model is that making a prediction based only on one comment is not ideal because we miss context information that connects successive comments. As a discussion between a challenger and the OH proceeds, the topic may digress from the initial post. In this case, detecting vulnerable regions and encoding interactions for the initial post may become irrelevant. We leave the question of how to transfer contextual information from the overall discussion as future work.
Our work is a step toward understanding how to model argumentative interactions that are aimed to enrich an interlocutor's perspective. Understanding the process of productive argumentation would benefit both the field of computational argumentation and social applications, including cooperative work and collaborative learning.

A.2 AIM
We implemented our model in PyTorch 0.3.0.

B Data Preprocessing
In the CMV forum, DeltaBot replies to an OH's comment with the confirmation of a ∆, along with the user name to which the OH replied. For most OH replies, the (non-)existence of a ∆ indicates whether a comment to which the OH replied changed the OH's view. However, an OH's view is continually influenced as they participate in argumentation, and thus a ∆ given to a comment may not necessarily be attributed to the comment itself. One example is when a comment does not receive a ∆ when the OH reads it for the first time, but the OH comes back and gives it a ∆ after they interact with other comments. In such cases, we may want to give a credit to the comment that actually led the OH to reconsider a previous comment and change the view.
Hence, we use the following labeling that considers the order in which OHs read comments. We treat the (non-)existence of a ∆ in an OH comment as a label for the last comment that the OH read. We reconstruct the order in which the OH reads comments as follows. We assume that when the OH writes a comment, he/she has read all prior comments in the path to that comment.
Based on this assumption, we linearize (i.e., flatten) the original tree structure of the initial post and all subsequent comments into a linear sequence S. Starting with empty S, for each of the OH's comments in chronological order, its ancestor comments that are yet to be in S and the comment itself are appended to S. And for each of the OH's comments, its preceding comment in S is labeled with ∆ = 1 if the OH's comment has a ∆ and 0 otherwise. This ensures that the label of a comment to which the OH replied is the (non-)existence of a ∆ in the OH's first reply. If an OH reply is not the first reply to a certain comment (as in the scenario mentioned above), or a comment to which the OH replied is missing, the (non-)existence of a ∆ in that reply is assigned to the comment that we assume the OH read last, which is located right before the OH's comment in the restructured sequence.
!=1 / P(!=1)=0.057 this slogan is for people who do not seem to have the iq or common sense to take basic precautions for their own safety . there are two ways to convince these prospective candidates of the darwin award -authority or emotion . appeal to emotion requires some introspection and determining your own worth to your family etc. this is intellectually more involved than common sense and thus clearly beyond the capabilities of these individuals . therefore , an appeal to authority , like law , is your only chance .
!=0 / P(!=1)=0.021 but everyone knows there a penalties and fines for breaking the law . its not an appeal to authority , its pointing out the consequences -LRB-the fines -RRB-. and appeal to authority would be closer to``buckle up , the government says you should '' .

OH's initial post Two comments`
buckle up , it 's the law '' is an appeal to authority , and therefore not a good slogan to get people to put on their seat belts . . i believe that `` buckle up , it 's the law '' is a very bad slogan , because it is an -LSB-appeal to authority -RSB--LRB-<UNK> -RRB-which can be rejected easily in people 's minds if they are n't aware of the purpose of a law . instead , an appeal to the motorist 's intelligence by pointing out the consequences of not buckling up , and thus making motorists aware of the possible consequences of not buckling up and making it obvious why it is rather sensible to wear one 's seat belt would be a lot more effective .
-LSB-this german ad posted along public roads throughout germany -RSB--LRB-<UNK> -RRB-is an excellent example of this . the text translates to `` one is distracted , four die '' . a brief but concise outline of cause and effect , enough to raise awareness .

Good example
!=1 / P(!=1)=0.277 it 's hard to say without seeing the skin first hand , but -LRB-if my assumptions were right on everything else other than hair color -RRBhypothetically ... i suggest using a <UNK> <UNK> -something very gentle on the skin . no more than once every five days . wash it at night , as your skin type -LRB-if my guesses are right -RRB-produces more oil when you sleep . also , do not wash your face in the shower , do it afterwards . your <UNK> are open in the shower -LRB-due to the heat -RRB-, and whatever you clean is going to fill up with soap residue after you washed it . that residue can clog your <UNK> and lead to a break out . pro tip : rinse your face after washing twice -first with hot water , then with cold water . this closes your <UNK> and limits <UNK> . hair ? i 'd have to see it up close , but some simple recommendations -LRB-if my assumptions about slightly oily scalp and hair are right -RRB-would be <UNK> -LRB-brand -RRB-<UNK> oil shampoo and conditioner . let your conditioner sit and soak for at least 4 minutes before rinsing it out . you do n't need to use much , just enough to cover it . if you want or need further help -feel free to pm me . without sounding all pedo -LRB-do n't look at my username -RRB-, take a few <UNK> pics of your face and hair -LRB-so i can see the skin and your hair structure -RRB-and link me to the pics in the pm . i can give you a much better breakdown of what to do when i can see what i am working with . or if you have the balls , you can post those pics here too . up to you , and yes -wash your sheets more often -chicks love a freshly washed set of sheets .
!=0 / P(!=1)=0.028 if your hair is actually dirty , you must clean it . for someone with short hair and soft water , soap will be fine . however , in hard water the polar end of the soap binds to calcium and forms a sticky scum that does not easily wash out of long hair . a detergent like shampoo does not have this problem .

OH's initial post Two comments
shampoo and special body wash products are unnecessary . . bar soap is all you need . and you dont wash your hair at all , you just rinse it . sometimes i use shampoo , maybe once in a month or two , if i did something specially dirty or got chemicals in my hair etc. but your hair is healthier without it , and if i cared enough to find an alternative i would use something natural . if you quit using shampoo , your hair might be greasy for the first couple days , but with nothing but proper rinsing your hair will be able to clean itself . face wash is unnecessary as well . bar soap is fine . special body washes are unnecessary . it is all a marketing ploy . i am a clean and beautiful boy who has no problem attracting the opposite sex , and have never been led to suspect that my habits are somehow smelly or unclean . what is the point of using these products ? please , reddit , change my view : <UNK> products are a scam . Figure 5: Successful examples of vulnerable region detection.

Good example
!=1 / P(!=1)=0.018 > i see that as a sort of amateur performance art as someone who has <UNK> , i do n't agree . a street magician , <UNK> , or someone giving a public speech are all asking for your attention . they 're doing what they 're doing for the sake of their audience . some cosplayers fit this category , but for some they just wan na dress up in a cool costume for the day and a con is the best place to do that .
!=0 / P(!=1)=0.004 would you walk up to someone on the street and take their picture without asking ?

OH's initial post Two comments
Bad example i do n't feel obligated to ask permission to take cosplayer pictures at a convention . . i 've been to a prominent anime convention -LRB-~ 8000 annual attendees -RRB-, 6 or 7 years now and have never felt the need to ask anyone 's permission before taking pictures . i 'll ask permission to take a picture if : * the cosplayer is dressed up as something i really like and no one else is taking their picture -i want them to do their pose or whatever if they do n't mind because it 's from something i like * they 're dressed in something suggestive , showing a lot of skin , or look uncomfortable being dressed that way in a public setting -i do n't usually take these people 's pictures anyways because 9 times out of 10 me feeling creepy is n't worth the value i 'd get having the picture * they might otherwise enjoy being asked to get their picture taken -little girl , something obscure , whatever i typically wo n't ask to take a picture if : * they 've already got a big crowd of people around them taking pictures * they 've got a cool costume i want to remember , but i do n't care enough to have them do their pose or whatever . * i want to capture some aspect of the convention and anime culture itself -to me a convention is like going to a fair or a festival , it 's an event i want pictures of i think the main reason people are so strongly opposed to people taking unwarranted pictures is creepy people , and that 's a valid concern . however i think with the general discretion that i follow , asking every single person for their picture is a bit unnecessary . at the same time , i know a lot of people feel very strongly about photographic consent and i may very well be overlooking something important so change my view ! edit : wording !=1 / P(!=1)=0.131 1 . -RRB-i will concede that on a biological level , squatting is the`d efault '' position so our biology and anatomy generally works better in that position . 2 . -RRB-toilet paper is a shield that , hopefully , keeps your hand and any small cuts , or splits cleaner and less prone to nasty infections . it does , as other commentators have said , keep feces out from underneath your fingernails . the associated costs of water usage , soap also affect the environment . -LRB-though it must be noted that you still should wash your hands after <UNK> it just takes less if your not scrubbing last nights dinner off your hand . -RRB-3 . -RRB-bulky , dirty , and in need of maintenance i will give you . however , if we are talking about a toilet in a home cleanliness should be part of the necessary routine that would be needed if you had say , a bucket and a floor level toilet system . the complexity in a toilet provides a way to shield sewer gasses from coming back up into the restroom . it 's not a perfect system but it 's better than up against a tree in the woods . !=0 / P(!=1)=0.105 1 -RRB-this may be true , but there is no evidence that i am aware of that supports any of your claims . also , cancer ? really ? that sounds almost like a joke :``i squat when i poop so i wo n't get cancer ! '' 2 -RRB-soap and other cleaning materials also have costs associated with them . the cleanliness bonus is marginal for people who shower daily . you 'll need to use more water too to wash up . are you sure that this is really a plus ? 3 -RRB-they are also a great way to dispose of waste : it has to go somewhere , it can be toxic to plants , and toilets take up a negligibly larger amount of space than a bucket , which then requires`maintenance ' every time it needs emptied . butts are also , with the exception of the asshole itself , -LSB-probably the cleanest part of our bodies . -RSB--LRB-<UNK> -RRB-they 're always covered and we rarely directly touch anything with them ; why would they be unclean ?

OH's initial post Two comments
Bad example european style pooping is the worst way to go to the bathroom . 1 . squatting is more comfortable , easier and healthier than sitting . it creates less stress on the the <UNK> muscle allowing for a smoother uninterrupted experience . it plays well with gravity so less pressure is needed and lowers the risk of cancer and other ailments . 2 . toilet paper is messy , expensive and damages the environment . when washed properly the use of your hand is preferable to toilet paper , it might sound disgusting but when you think about it using a thin piece of frail paper to smear around fecal matter with no water or soap is even worse . 3 . modern <UNK> toilets are large , bulky and complex . they take more space , require more maintenance and are ultimately dirtier as butts keep touching them . n-grams for ∆ = 1 n-grams for ∆ = 0 and, in, for, use, it, on, thanks, often, delta, time, depression, -RRB-, lot, -LRB-, or, i, can, &, with, more, as, band, *, #, me, -LRB--RRB-, can be, has, deltas, when ?, >, sex, why, do you, wear, relationship, child, are you, op, mother, should, wearing, teacher, then, it is, same, no, circumcision, you are, then you, baby, story Table 3: Top n-grams with the most positive/negative weights for logistic regression.

D Topic Similarity between Sentences
The topic similarity between a pair of sentences is computed as the consine similarity between the topic distributions of the sentences.
The first step is to extract topics. Using Latent-DirichletAllocation in scikit-learn v0.19.1, we ran LDA on the entire data with 100 topics, taking each post/comment as a document. We treat the top 100 words for each topic as topic words.
The second step is to compute the topic distribution of each sentence. We simply counted the frequency of occurrences of topic words for each topic, and normalized the frequencies across topics.
Lastly, we computed the cosine similarity between the topic distributions of a pair of sentences.

E Top TFIDF n-grams
The n-grams that contribute most to ∆ prediction for logistic regression are shown in table 3.