Argument Strength is in the Eye of the Beholder: Audience Effects in Persuasion

Americans spend about a third of their time online, with many participating in online conversations on social and political issues. We hypothesize that social media arguments on such issues may be more engaging and persuasive than traditional media summaries, and that particular types of people may be more or less convinced by particular styles of argument, e.g. emotional arguments may resonate with some personalities while factual arguments resonate with others. We report a set of experiments testing at large scale how audience variables interact with argument style to affect the persuasiveness of an argument, an under-researched topic within natural language processing. We show that belief change is affected by personality factors, with conscientious, open and agreeable people being more convinced by emotional arguments.


Introduction
Americans spend a third of their online time on social media, with many participating in online conversations about education, public policy, or other social and political issues. Our hypothesis is that online dialogs have important properties that may make them a useful resource for educating the public about such issues. For example, user-generated content might be more engaging and persuasive than traditional media, due to the prevalence of emotional language, social affiliation, conversational argument structure and audience involvement. Moreover, particular types of people may be more or less convinced by particular styles of argument, e.g. emotional arguments may resonate with some personalities while factual arguments resonate with others.
Factual: Death Penalty Q1: I'm sure there have been more repeat murderers than innocent people put to death. As far as the cost goes, is that really an issue? Execution Room = $10,000. Stainless Steel Table = $2,000. Leather Straps = $200. Lethal Injection Chemicals = $5,000. Knowing this person will never possibly be able to kill again = PRICELESS R1: Actually the room, straps, and table are all multiuse. And the drugs only cost Texas $86.08 per execution as of 2002.
Emotional: Death Penalty Q2: You mean, the perpetrator is convicted and the defender acquitted? Yes, that's the rule and not the exception. Notice here how no-one ended up dead, or even particularly seriously injured. Additionally the circumstances described are incredibly rare, that's why it makes the news. R2: The defender shouldn't even have been brought to trial in the first place. That doesn't make it any better. Somebody breaks into your home and threatens your family with rape and murder, they deserve serious injury at the very least. For example, contrast the two informal dialogic exchanges about the death penalty in Table 1 with the traditional media professional summary in Table 2. We might expect the argument in Table 2 to be more convincing, because it is carefully written to be balanced and exhaustive (Reed and Rowe, 2004). On the other hand, it seems possible that people find dialogic arguments such as those in Table 1 more engaging and learn more from them. And indeed, about 90% of the people in online forums are so-called lurkers (Whittaker, 1996;Nonnecke and Preece, 2000;Preece et al., 2004), and do not post, suggesting that they are in fact reading opinionated dialogs such as those in Table 1 for interest or entertainment.
Research in social psychology identifies three Curated Summary: Death Penalty PRO: Proponents of the death penalty say it is an important tool for preserving law and order, deters crime, and costs less than life imprisonment. They argue that retribution or "an eye for an eye" honors the victim, helps console grieving families, and ensures that the perpetrators of heinous crimes never have an opportunity to cause future tragedy.
CON: Opponents of capital punishment say it has no deterrent effect on crime, wrongly gives governments the power to take human life, and perpetuates social injustices by disproportionately targeting people of color (racist) and people who cannot afford good attorneys (classist). They say lifetime jail sentences are a more severe and less expensive punishment than death. factors that affect argument persuasiveness (Petty and Cacioppo, 1986;Petty and Cacioppo, 1988).
The ARGUMENT includes the content and its presentation, e.g. whether it is a monolog or a dialog, or whether it is factual or emotional as illustrated in Table 1 and Table 2. The AUDIENCE factor models people's prior beliefs and social affiliations as well as innate individual differences that affect their susceptibility to particular arguments or types of arguments (Anderson, 1971;Davies, 1998;Devine et al., 2000;Petty et al., 1981). Behavioral economics research shows that the cognitive style of the audience interacts with the argument's emotional appeal: emphasizing personal losses is more persuasive for neurotics, whereas gains are effective for extraverts (Carver et al., 2000;Mann et al., 2004). The SOURCE is the speaker, whose influence may depend on factors such as attractiveness, expertise, trustworthiness or group identification or homophily (Eagly and Chaiken, 1975;Kelman, 1961;Bender et al., 2011;Luchok and McCroskey, 1978;Ludford et al., 2004;McPherson et al., 2001).
We present experiments evaluating how properties of social media arguments interact with audience factors to affect belief change. We compare the effects of two aspects of the ARGUMENT: whether it is monologic or dialogic, and whether it is factual or emotional. We also examine how these factors interact with properties of the AUDI-ENCE. We profile audience prior beliefs to test if more neutral people are swayed by different types of arguments than people with entrenched beliefs. We also profile the audience for Big Five personality traits to see whether different personality types are more open to different types of arguments, e.g., we hypothesize that people who are highly agreeable (A) might be more affected by the combative style of emotional arguments. We provide a new corpus for the research community of audience personality profiles, arguments, and belief change measurements. 1 Audience factors have been explored in social psychological work on persuasion, but have been neglected in computational work, which has largely drawn from sentiment, rhetorical, or argument structure models (Habernal and Gurevych, 2016b;Conrad et al., 2012;Boltuzic andŠnajder, 2014;Choi and Cardie, 2008). We demonstrate that, indeed, undecided people respond differently to arguments than entrenched people, and that the responses of undecided people correlate with personality. We show that this holds across an array of different arguments. Our research questions are: • Can we mine social media to find arguments that change people's beliefs? • Do different argument types have different effects on belief change? • Do personality and prior beliefs affect belief change? • Are different personality types differently affected by factual vs. emotional arguments?
Our results show a small but highly reliable effect that short arguments derived from online dialogs do lead people to change their minds about topics such as abortion, gun control, gay marriage, evolution, the death penalty and climate change. As expected, opinion change is greater for people who are initially more neutral about a topic, than those who are entrenched. However personality variables also have a clear effect on opinion change: neutral, balanced arguments are more successful with all personality types, but conscientious people are more convinced by dialogic emotional arguments, and agreeable people are more persuaded by dialogic factual arguments. We describe how we use plan these findings to select and repurpose social media arguments to adapt them to people's individual differences and thus maximize their educational impact.

Related Work
Previous work on belief change has primarily focused on single, experimentally crafted, persuasive messages, rather than exploring whether usergenerated dialogic arguments can be repurposed to persuade. Recently however several papers have begun to investigate two challenges in argument mining: (1) understanding the structure of an argument and extracting argument components (Lippi and Torroni, 2015;Nguyen and Litman, 2015;Stab and Gurevych, 2014;Lippi and Torroni, 2015;Biran and Rambow, 2011);and (2) understanding what predicts the persuasiveness of web-sourced argumentative content (Habernal and Gurevych, 2016b;Fang et al., 2016;Wachsmuth et al., 2016;Habernal and Gurevych, 2016a;Tan et al., 2016). Tan et al. (2016) study belief change in the Reddit /r/ChangeMyView subreddit (CMV), in which an original poster (OP) challenges others to change his/her opinion. They build logistic regression models to predict argument success, identifying two conversational dynamic factors: a) early potential persuaders are more successful and b) after 4 exchanges, the chance of persuasion drops virtually to zero. Linguistic factors of persuasive posts include: a) dissimilar content words to the OP, b) similar stop words, c) being lengthy (in words, sentences, and paragraphs), d) italics and bullets. Finally, susceptibility to persuasion is correlated with singular vs. plural first person pronouns, which the authors relate to the personality trait of Openness to Experience. The CMV reddit offers a unique window into how persuasion of self-declared open-minded people occurs online. However, while Tan et al. find potential proxies for personality traits, they cannot examine traits directly because they do not have personality profiles as we do here. They also do not examine the effect of argument style as we do.
Recent work (Habernal and Gurevych, 2016b;Habernal and Gurevych, 2016a) also examines what makes an informal social media argument convincing. They have created a new dataset of pairs of arguments annotated for which argument is more convincing, along with the reasons given by annotators for its convincingness. They test several models for predicting convincingness comparing an SVM with engineered linguistic features to a BLSTM, with both models performing similarly. In contrast to our experiments, they do not explore factors of the audience or explicitly vary the style of the argument.
Previous work also tests the hypothesis that dialogic exchanges might be more engaging, in the context of expository or car sales dialog (André et al., 2000;Lee, 2010;Craig et al., 2006;Stoyanchev and Piwek, 2010). Work comparing monologic vs. dialogic modes of providing information suggest that dialogs: (1) are more memorable and engaging, (2) stimulate the audience to formulate their own questions, and (3) allow audiences to be more successful at following communication (Lee et al., 1998;Fox Tree, 1999;Suzuki and Yamada, 2004;Driscoll et al., 2003;Fox Tree and Mayer, 2008;Fox Tree, 1999;Liu and Fox Tree, 2011).
Other work (Vydiswaran et al., 2012) explores how user-interface factors (e.g., number and order of argument presentation, whether and how arguments are rated) affect how readers process arguments. Several factors increased the number of passages read, including explicitly presenting contrasting viewpoints simultaneously. This exercise caused people with strong beliefs (about the healthiness of milk) to moderate their views after 20-30 minutes of concentrated study. We do not concentrate on interface factors, instead exploring how persuasiveness relates to audience factors and argumentative style. Also our experiments are run online with hundreds of users, rather than as a controlled study in the lab.

Experimental Method
Our experimental method consists of the following steps: • Select user-generated dialogs with persuasive argument features from an online corpus of socio-political debates, exploring the role of affect (Sec. 3.1). • Profile subjects for personality traits and prior beliefs about socio-political issues (Sec. 3.2). • Expose subjects to user-generated, factual vs. emotional dialogic exchanges and compare the effects on belief change to balanced, curated arguments (Sec. 3.3).
• Conduct experiments to predict the degree of belief change as a function of prior belief, personality and type of argument.
The participants were pre-qualified using a reading comprehension task that checked their re-sponses against a gold standard to ensure that they read the arguments carefully. Because we make many comparisons, and our experiments are conducted at large scale, all of our results incorporate Bonferroni corrections.

Dialog Selection: Identifying Socio-Emotional Arguments
Our work requires a new experimental corpus that is sensitive to readers' prior beliefs and personalities. We utilize online dialogs from 4forums.com downloaded from The Internet Argument Corpus (IAC) (Walker et al., 2012c). The IAC contains quote/response pairs of targeted arguments between two people (Table 1) on topics such as: death penalty, gay marriage, climate change, abortion, evolution and gun control. Each argument is annotated to distinguish arguments making strong appeals to emotional factors versus straightforwardly factual arguments.
We selected a subset of extreme exemplars of factual (FACT) versus emotional (EMOT) arguments, defined as Q/R pairs reliably annotated to be at the extreme ends of the fact/emotion scale, i.e. responses with an average ≥ 4 annotation were considered factual, and those whose annotation averaged ≤ −4 were considered emotional on a scale of -5 to 5. Table 1 illustrates both factual (R1) and emotional (R2) arguments, with additional examples for other topics in Table 3.
In the IAC, 95% of the Q-R pairs are disagreements, the FACT and EMOT datasets were selected to contain a similar proportion. There was no correlation between agreement/disagreement and emotionality (r = 0.07, ns).

Personality
Personality is usually measured with a standardized survey that calculates a scalar value for the five OCEAN traits: opennness to experience O, conscientiousness C, extraversion E, agreeableness A, and neuroticism N (Goldberg, 1990;Norman, 1963) We first conducted an experiment to profile the Big Five personality traits of 637 Turkers using the Ten Item Personality Inventory (TIPI) (Gosling et al., 2003). The TIPI instrument defines each person on a scale from 1 to 7 with 0.5 precision. In order to guarantee reliablity of our results, we then verified that our pre-qualified Turkers are representative of the population as a whole, by comparing the means and standard deviations of our sample of 637 Turkers with the na-  tional standards given in Gosling et. al (2003). Table 4 shows that our survey means and standard deviations are very close to the national norms, suggesting our sample is representative of the public in general, and hence can be used to validate whether social media arguments could fruitfully be be used to educate the public.   (Anderson, 1971;Davies, 1998;Devine et al., 2000), so we wanted to establish the baseline beliefs of our pre-qualified Turkers before they had been exposed to any arguments about a topic. We therefore collected each Turker's initial stance on a topic, by asking them to answer a simple stance question with no context, for example: Should the death penalty be allowed?. Likert responses were recorded on a -5 to 5 slider scale with 0.01 degrees of precision, with labels on the slider of "Yes", "No", or "Neutral".
Curated Summary: Abortion PRO: Proponents, identifying themselves as pro-choice, contend that abortion is a right that should not be limited by governmental or religious authority, and which outweighs any right claimed for an embryo or fetus. They argue that pregnant women will resort to unsafe illegal abortions if there is no legal option.
CON: Opponents, identifying themselves as pro-life, assert that personhood begins at conception, and therefore abortion is the immoral killing of an innocent human being. They say abortion inflicts suffering on the unborn child, and that it is unfair to allow abortion when couples who cannot biologically conceive are waiting to adopt. Our goal is to compare the belief change that results from social-media dialogs with the belief change from professionally-curated monologs. We selected the balanced, monologic, argument summaries from the website ProCon.org (in Table 2 with an additional example in Table 5). The arguments from ProCon.org are very high quality, and produced by domain experts.
After probing initial beliefs, we presented participants with one of the three different argument types to test their affect on belief change: a Curated Monolog (MONO) ( Table 2), an emotional argument (EMOT) (R2 in Table 1), or a factual argument (FACT) (R1 in Table 1). After each person read one of these three types of arguments, we retested their reactions to the original stance question, while viewing the argument. Responses were again recorded on a -5 to 5 slider scale with 0.01 degrees of precision, with labels on the slider of "Yes", "No", or "Neutral". We computed belief change by measuring differences in stance be-

Entrenchment and Belief Change
Our first question is whether our method changed participant's beliefs. Table 6 shows belief change as a function of argument type: monologs (MONO), factual (FACT) and emotional (EMOT). Belief change occurred for all argument types: and the change was statistically significant as measured by paired t-tests (t (5184) = 38.31, p <0.0001). This confirms our hypothesis that social media can be mined for persuasive materials.
In addition, all three types of arguments independently led to significant changes in belief. 2 One of the strongest theoretical predictions is that people with entrenched beliefs about an issue are less likely to change their mind when provided new information about that issue. Table 6 shows the relationship between initial beliefs and extent of belief change. We defined people as having more entrenched initial beliefs if their response to the initial stance question was within 0.5 points of the two ends of the scale, i.e. (1.0-1.5) or (4.5-5.0), indicating an extreme initial view.
We tested whether people who were more entrenched initially showed less change than those who were initially more neutral. We conducted a 2 Initial Belief (Entrenched/Neutral) X 3 Argument Type (MONO/EMOT/FACT) ANOVA, with Belief Change as the dependent variable, and Initial Belief and Argument Type as between subjects factors. Again, as expected, initially Entrenched people showed less change (M = 0.43) than those who began with Neutral views (M = 0.51), ANOVA (F (1,5179) =5.97, p = 0.015).

Argument Type and Belief Change
We wanted to test whether the engaging, socially interesting, dialogic materials of EMOT and FACT might promote more belief change than balanced curated monologic summaries. We tested the differences between argument types, finding a main effect for argument type (F (2,5179) =31.59, p <0.0001), with Tukey post-hoc tests showing MONO led to more belief change than both EMOT and FACT (both p <0.0001), but no differences between EMOT and FACT overall across all subjects (See Table 6). Finally there was no interaction between Initial Belief and Argument Type (F (2,5179) =1.25, p >0.05): so although neutrals show more belief change overall, this susceptibility does not vary by argument type.

Predicting Belief Change
Our results so far show that our arguments changed people's beliefs as a function of their prior beliefs and argument type. However we aim to automatically predict belief change, and hypothesize that knowing a person's personality in combination with their prior beliefs will allow us to select social-media arguments that are more persuasive for a particular individual.
Thus, we vary whether providing a learner with features about a person's personality improves performance for predicting belief change, when compared with providing information about degree of entrenchment alone. We use different representations for personality and prior beliefs as features, the raw score from the Likert slider for belief change and the TIPI score, as well as normalizations of the raw scores according to the distributions per topic, and finally categorical binning of the transformed scores.

Feature Development and Selection
New features were created by computing the ztransformation score from the raw prior beliefs and personality traits scores. Applying Equation 1 to the raw data creates a normal distribution where the new mean is 0 and the standard deviation is 1. For prior beliefs, x i is an individual prior belief for a particular topic,x i is the mean, and σ i is the standard deviation for the particular topic.
Categorical bins are derived from the transformed scores to describe the direction of the belief change by comparing prior and final recorded beliefs. The belief change is positive or negative depending upon where the Turkers rate themselves on the belief scale, moving more towards one side (1) or the other (5). Next, to control for variance, we apply a z-transformation on change scores to create a normal distribution. We classify the resulting distribution into three bins: Low, Medium, and High. The interpretation of what stance the Low and High bins represent is strictly topic dependent. The Medium bin consists of ztransformation values between -1 and 1. These are the people whose belief change is less than one standard deviation from the transformed mean. The Low bin contains z-transform scores of less than -1 and translates to belief changes of a large magnitude (more than a standard deviation from the mean) in a negative direction, where again, the meaning of "negative" is dependent upon how the question was framed. The High bin contains ztransformation scores of greater than 1 and translates to belief changes of a large magnitude in a positive direction. For example, the stance question Should the death penalty be allowed?' has "no" at the -5 end and "yes" at the +5 end of the likert scale. A Low bin is indicative as moving in the direction of the "no" stance and High towards the "yes" stance.
Bins were also derived for the personality traits, e.g. for Openness, the High bin indicates someone who is very open, the Medium bin is average, and the Low bin is not open at all.
Finally, a binary feature was created to represent how entrenched an individual is in a particular topic. This feature is based on the raw prior belief score and is True if the prior belief score is within 0.5 points of either end point on the stance scale. This feature is different from the prior belief bins because this entrenchment feature groups together people who are in the extremes on both sides of the stance scale, while the prior belief bins distinguishes between the two ends.
We created a development set using data from a prior Mechanical Turk experiment which had 20 HITs, 5 questions per HIT, and 20 people who completed each HIT. In the same manner as the FACT and EMOT HITs, these Turkers (whose per-sonality was already profiled) were asked about their prior beliefs about a topic, then presented with a factual or emotional argument. But in this case they were asked to rate the strength of the argument rather than to report their belief about the topic. We then identified the combination of features that best predicted argument strength in this development data, and then used this feature set for the belief change experiments below. Turkers who participated in this initial study did not participate in the belief change study and vice versa.
Results on the development set showed that the z-transformation scores for prior belief and personality performed better than the raw scores and bins. On the other hand, the belief change feature was most effective when represented as a categorical variable via binning and directionality. We also found that it is better to have both the ztransformed prior belief feature and the entrenchment feature. Thus our experiments below use these feature representations.
We test on three different datasets: MONO, FACT and EMOT, to elicit responses from reading the monologic summaries, and the factual and emotional dialogic arguments. The FACT and EMOT datasets have specific information in terms of scalar values about their degree of factuality or emotionality, on a scale of -5,+5 and a feature with this value is created for these datasets derived from the crowdsourced Turker judgments about the degree of Fact/Emotion in a Q/R pair, as described earlier. The monologic summaries (MONO) are assumed to be neutral and are not assigned a value for degree of factuality or emotionality.

Belief Change Experimental Results
Our dataset consists of 5185 items, with 3185 responses to the balanced MONO summaries, 1020 responses to FACT, and 980 responses to EMOT. We first applied 10-fold cross-validation with Naive Bayes, Nearest Neighbor, AdaBoost, and JRIP, from the Weka toolkit (Hall et al., 2005). Overall, Naive Bayes had the most consistent scores with our feature sets, thus we only report Naive Bayes experimental results below.
Seven feature sets were created for each of the three {MONO, FACT, EMOT} datasets. None feature sets are the no-personality baseline within each dataset. The baseline features contain no information about the personality of the unseen human subjects. We use the {MONO, FACT, EMOT}+None feature sets for testing our hypothesis that personality affects belief change, and our ability to predict belief change using personality features. All feature sets have information about all of the human subjects' personality traits as 5 distinct features. The remaining five {O,C,E,A,N} feature sets examine the effect of providing information to the learner about personality using only one personality trait at a time, in order to determine if any personality trait is having a larger impact for belief change prediction. Table 7 summarizes our key results, reporting accuracy, precision, recall, and F1 for predicting belief change as a discrete bin, Low, Medium, and High. We balanced each dataset to contain the same number of instances in bins, thus the accuracy for majority classification is 33% (Row 1).
After running Naive Bayes over all feature sets in the three datasets, we compared the experimental classifier performance of All and {O,C,E,A,N} against the None baselines using a Bonferroni corrected t-test for F1 measure. Using statistical ANOVA tests that control for pre and post test sample variance, we found small but highly reliable effects. We show all of our results, but focus our discussion below on statistically significance differences in F1. We boldface personality feature sets in Table 7 that are statistically significant when comparing {MONO, FACT, EMOT}+None with the other feature sets in the group.
The effect of argument alone (without personality information) can be seen by the no-personality baseline for each argument type, where we exclude personality information ({MONO, FACT, EMOT}+None). All these feature sets perform above the baseline of 33% (Row 1). This supports the results of our prior ANOVA testing over all subjects for belief change, and shows that the argument itself partially predicts belief change.
However, more interestingly, Table 7 also shows that providing the learner with information about personality consistently improves the ability of the learner to predict belief change. For all types of arguments, ie. the neutral, monologic summaries and the factual and emotional dialogs, the feature sets without any information about the personality traits of the unseen human subjects perform significantly worse than the feature sets that contain all five personality traits. MONO+None compared to MONO+All (rows 2 and 8 respectively) show a slight but significant increase in F1 from 0.51 to  A more interesting result is that Openness to Experience (EMOT+O, row 17) was also important for Emotional arguments, increasing F1 from 0.44 to 0.51 (p = .00001). In contrast, Openness had no effect for Factual arguments (Row 10) (p > 0.05). Models for predicting belief change for Emotional arguments also benefit from information about Conscientiousness and Agreeableness. Row 17 (EMOT+O), Row 18 (EMOT+C) and Row 20 (EMOT+A) all show significant differences in F1, with EMOT+O better than EMOT+None (p = .00001), EMOT+C better than EMOT+None (p = .00001) and EMOT+A better than EMOT+None (p = .0001).
Information about Agreeableness also improves the quality of the belief change models for the factual dialogs (FACT+A, row 5) with an increase in F1 from 0.46 baseline to 0.49 (p = .004), suggesting that people who are more Agreeable are more influenced by factual arguments. This confirms one of our initial hypotheses that Agreeable people would be more sensitive to the fact/emotional dimension of arguments because of their desire to either avoid conflict (highly Agreeable people) or to seek conflict (Disagreeable people).

Conclusions
To the best of our knowledge we are the first to examine the interaction of social media argument types with audience factors. Our contributions are: • A new corpus of personality information and belief change in socio-political arguments; • A new method for identifying and deploying social media content to inform and engage the public about important social and political topics; • Results showing at scale (hundreds of users) that we can mine arguments from online discussions to change people's beliefs; • Results showing that different types of arguments have different effects: while balanced monologic summaries led to the greatest belief change, socio-emotional online exchanges also caused changes in belief.
Although our short question/response pairs did not induce as much belief change as the curated balanced monologs, we believe that these are striking results given that the materials we extracted from online discussions are not balanced or professionally produced, but instead are simple fragments extracted from online discussions.
Further, confirming prior work on persuasion (Eagly and Chaiken, 1975;Kelman, 1961;Petty et al., 1981), we found that these effects depend on audience characteristics. As expected, belief depended on the strength of prior beliefs so that initially neutral people were more likely to be persuaded than entrenched individuals, regardless of the type of argument. Again supporting our predictions, argument effectiveness depended on personality type. People who are Open to Experience were influenced by balanced and emotional materials. In contrast, Agreeable people are most affected by factual materials. Emotional arguments had very different effects from factual and balanced monologs: Openness is important but so too are Conscientiousness and Agreeableness.
How can we explain this? People who are more Open are typically receptive to new ideas. But our results for emotional arguments also show that Conscientious people change their views when presented with emotional arguments, possibly because they are careful to process the arguments however expressed. And Agreeable people may also be motivated to change belief by emotional arguments because they are less likely to be influenced by personal feelings.
Our results have numerous implications that suggest further technical experimentation. The fact that we can induce belief change by extracting simple discussion fragments suggests that belief change can be induced without the application of sophisticated text processing tools. While our results for balanced monologs suggest that summaries increase belief change, summary tools for such arguments are still under development . However, perhaps high quality summaries may not be needed if compelling argument fragments can be automatically extracted (Misra et al., 2016b;Subba and Di Eugenio, 2007;Nguyen and Litman, 2015;Swanson et al., 2015).
Our work also suggests the importance of personalization for persuasion: with different personality types being open to different styles of argument. Future work might be based on methods for profiling participant personality from simple online behaviors (Di Eugenio et al., 2013;Liu et al., 2016;Pan and Zhou, 2014;Yee et al., 2011), or from user-generated content such as first-person narratives or conversations (Mairesse and Walker, 2006a;Mairesse and Walker, 2006b;Rahimtoroghi et al., 2016;Rahimtoroghi et al., 2014). We could then select personalized arguments to meet a participant's processing style.
While here we used crowdsourced judgments to select arguments of particular types. Elsewhere, we present algorithms for automatically identifying and bootstrapping arguments with different properties. We have methods to extract arguments that represent different stances on an issue (Misra et al., 2016a;Anand et al., 2011;Sridhar et al., 2015;Walker et al., 2012a;Walker et al., 2012b), as well as argument exchanges that are agreements vs. disagreements , factual vs. emotional arguments (Oraby et al., 2015), sarcastic and not-sarcastic arguments, and nasty vs. nice arguments (Oraby et al., 2016;Lukin and Walker, 2013;Justo et al., 2014).
An open question is to whether these effects are long term. Our approach limits us to examining belief change during a single session for practical reasons; long-term cross-session comparisons lead to significant participant retention issues.
Our results also suggest new empirical and theoretical methods for studying persuasion at scale. Only recently have studies of persuasion moved beyond small scale lab studies involving simple single arguments (Habernal and Gurevych, 2016b;Habernal and Gurevych, 2016a;Tan et al., 2016). Our research also suggests new methods and tools for larger scale studies of persuasion. While care must be taken in deploying these results, studies of juries and other decision making bodies suggest that exposure to a diversity of opinions and minority views are very important to countering extremism and understanding the issues at stake (Devine et al., 2000;Ludford et al., 2004). The ability to repurpose the huge number of varied opinions available in social media sites for educational purposes could provide a novel way to expose people to a diversity of views.