Exploring the Role of Prior Beliefs for Argument Persuasion

Public debate forums provide a common platform for exchanging opinions on a topic of interest. While recent studies in natural language processing (NLP) have provided empirical evidence that the language of the debaters and their patterns of interaction play a key role in changing the mind of a reader, research in psychology has shown that prior beliefs can affect our interpretation of an argument and could therefore constitute a competing alternative explanation for resistance to changing one’s stance. To study the actual effect of language use vs. prior beliefs on persuasion, we provide a new dataset and propose a controlled setting that takes into consideration two reader-level factors: political and religious ideology. We find that prior beliefs affected by these reader-level factors play a more important role than language use effects and argue that it is important to account for them in NLP studies of persuasion.


Introduction
Public debate forums provide to participants a common platform for expressing their point of view on a topic; they also present to participants the different sides of an argument. The latter can be particularly important: awareness of divergent points of view allows one, in theory, to make a fair and informed decision about an issue; and exposure to new points of view can furthermore possibly persuade a reader to change his overall stance on a topic.
Research in natural language processing (NLP) has begun to study persuasive writing and the role of language in persuasion. Tan et al. (2016) and Zhang et al. (2016), for example, have shown that the language of opinion holders or debaters and their patterns of interaction play a key role in changing the mind of a reader. At the same time, research in psychology has shown that prior beliefs can affect our interpretation of an argument even when the argument consists of numbers and empirical studies that would seemingly belie misinterpretation (Lord et al., 1979;Vallone et al., 1985;Chambliss and Garner, 1996).
We hypothesize that studying the actual effect of language on persuasion will require a more controlled experimental setting -one that takes into account any potentially confounding userlevel (i.e., reader-level) factors 1 that could cause a person to change, or keep a person from changing, his opinion. In this paper we study one such type of factor: the prior beliefs of the reader as impacted by their political or religious ideology. We adopt this focus since it has been shown that ideologies play an important role for an individual when they form beliefs about controversial topics, and potentially affect how open the individual is to being persuaded (Stout and Buddenbaum, 1996;Goren, 2005;Croucher and Harris, 2012).
We first present a dataset of online debates that enables us to construct the setting described above in which we can study the effect of language on persuasion while taking into account selected userlevel factors. In addition to the text of the debates, the dataset contains a multitude of background information on the users of the debate platform. To the best of our knowledge, it is the first publicly available dataset of debates that simultaneously provides such comprehensive information about the debates, the debaters and those voting on the debates.
With the dataset in hand, we then propose the novel task of studying persuasion (1) at the level of individual users, and (2) in a setting that can control for selected user-level factors, in our case, the prior beliefs associated with the political or religious ideology of the debaters and voters. In particular, previous studies focus on predicting the winner of a debate based on the cumulative change in pre-debate vs. post-debate votes for the opposing sides (Zhang et al., 2016;Potash and Rumshisky, 2017). In contrast, we aim to predict which debater an individual user (i.e., reader of the debate) perceives as more successful, given their stated political and religious ideology.
Finally, we identify which features appear to be most important for persuasion, considering the selected user-level factors as well as the more traditional linguistic features associated with the language of the debate itself. We hypothesize that the effect of political and religious ideology will be stronger when the debate topic is Politics and Religion, respectively. To test this hypothesis, we experiment with debates on only Politics or only Religion vs. debates from all topics including Music, Health, Arts, etc.
Our main finding is that prior beliefs associated with the selected user-level factors play a larger role than linguistic features when predicting the successful debater in a debate. In addition, the effect of these factors varies according to the topic of the debate topic. The best performance, however, is achieved when we rely on features extracted from user-level factors in conjunction with linguistic features derived from the debate text. Finally, we find that the set of linguistic features that emerges as the most predictive changes when we control for user-level factors (political and religious ideology) vs. when we do not, showing the importance of accounting for these factors when studying the effect of language on persuasion.
In the remainder of the paper, we describe the debate dataset (Section 2) and the prediction task (Section 3) followed by the experimental results and analysis (Section 4), related work (Section 5) and conclusions (Section 6).

Dataset
For this study, we collected 67, 315 debates from debate.org 2 from 23 different topic categories including Politics, Religion, Health, Science and Music. 3 In addition to text of the debates, we collected 198, 759 votes from the readers of these debates. Votes evaluate different dimensions of the 2 www.debate.org 3 The dataset will be made publicly available at http://www.cs.cornell.edu/ esindurmus/. debate.
To study the effect of user characteristics, we collected user information for 36, 294 different users. Aspects of the dataset most relevant to our task are explained in the following section in more detail.

Debates
Debate rounds. Each debate consists of a sequence of ROUNDS in which two debaters from opposing sides (one is supportive of the claim (i.e., PRO) and the other is against the claim (i.e., CON)) provide their arguments. Each debater has a single chance in a ROUND to make his points. Figure 1 shows an example ROUND 1 for the debate claim "PRESCHOOL IS A WASTE OF TIME". The number of ROUNDS in debates ranges from 1 to 5 and the majority of debates (61, 474 out of 67, 315) contain 3 or more ROUNDS. Votes. All users in the debate.org community can vote on debates. As shown in Figure 2, voters share their stances on the debate topic before and after the debate and evaluate the debaters' conduct, their spelling and grammar, the convincingness of their arguments and the reliability of the sources they refer to. For each such dimension, voters have the option to choose one of the debaters as better or indicate a tie. This fine-grained voting system gives a glimpse into the reasoning behind the voters' decisions.

Determining the successful debater
There are two alternate criteria for determining the successful debater in a debate. Our experiments consider both.
Criterion 1: Argument quality. As shown in Figure 2, debaters get points for each dimension of the debate. The most important dimension -in that it contributes most to the point total -is making convincing arguments. debate.org uses Criterion 1 to determine the winner of a debate.
Criterion 2: Convinced voters. Since voters share their stances before and after the debate, the debater who convinces more voters to change their stance is declared as the winner.

User information
On debate.org, each user has the option to share demographic and private state information such as their age, gender, ethnicity, political ideology, religious ideology, income level, education level, the president and the political party they support. Beyond that, we have access to information about their activities on the website such as their overall success rate of winning debates, the debates they participated in as a debater or voter, and their votes. An example of a user profile is shown in Figure 3.
Opinions on the big issues. debate.org maintains a list of the most controversial debate topics as determined by the editors of the website. These are referred to as big issues. 4 Each user shares his stance on each big issue on his profile (see Figure  3): either PRO (in favor), CON (against), N/O (no opinion), N/S (not saying) or UND (undecided).  big issues to see if we can infer their opinions from these factors. Finally, using our findings from these analyses, we perform the task of predicting which debater will be perceived as more successful by an individual voter. Figure 4 shows the correlation between pairs of voting dimensions (in the first 8 rows and columns) and the correlation of each dimension with (1) getting more points (row or column 9) and (2) convincing more people as a debater (final row or column). Abbreviations stand for (on the CON side): has better conduct (CBC), makes more convincing arguments (CCA), uses more reliable sources (CRS), has better spelling and grammar (CBSG), gets more total points (CMTP) and convinces more voters (CCMV). For the PRO side we use PBC, PCA, and so on. From Figure 4, we can see that making more convincing arguments (CCA) correlates the most with total points (CMTP) and convincing more voters (CCMV). This analysis motivates us to identify the linguistic features that are indicators of more convincing arguments.

The relationship between a user's opinions on the big issues and their prior beliefs
We disentangle different aspects of a person's prior beliefs to understand how well each correlates with their opinions on the big issues. As noted earlier, we focus here only on prior beliefs in the form of self-identified political and religious ideology.
Representing the big issues. To represent the opinions of a user on a big issue, we use a fourdimensional one-hot encoding where the indices of the vector correspond to PRO, CON, N/O (no opinion), and UND (undecided), consecutively (1 if the user chooses that value for the issue, 0 otherwise). Note that we do not have a representation for N/S since we eliminate users having N/S for at least one big issue for this study. We then concatenate the vector for each big issue to get a representation for a user's stance on all the big issues as shown in Figure 5. We denote this vector by BIGISSUES.
We test the correlation between the individual's opinions on big issues and the selected userlevel factors in this study using two different approaches: clustering and classification.
Clustering the users' decisions on big issues. We apply PCA on the BIGISSUES vectors of users who identified themselves as CONSERVATIVE vs. LIBERAL (740 users). We do the same for the users who identified themselves as ATHEIST vs. CHRIS-TIAN (1501 users). In Figure 6, we see that there are distinctive clusters of CONSERVATIVE vs. LIB-ERAL users in the two-dimensional representation  Figure 6: PCA representation of decisions on big issues color-coded with political and religious ideology. We see more distinctive clusters for CONSERVATIVE vs. LIBERAL users suggesting that people's opinions are more correlated with their political ideology.

Prior belief type
Majority BIGISSUES Political ideology 57.70% 92.43% Religious Ideology 52.70% 82.81% while for ATHEIST vs. CHRISTIAN, the separation is not as distinct. This suggests that people's opinions on the big issues identified by debate.org correlate more with their political ideology than their religious ideology. Classification approach. We also treat this as a classification task 5 using the BIGISSUES vectors for each user as features and the user's religious and political ideology as the labels to be predicted. So the classification task is: Given the user's BIGISSUES vector, predict his political and religious ideology. Table 1 shows the accuracy for each case. We see that using the BIGISSUES vectors as features performs significantly better 6 than majority baseline 7 . This analysis shows that there is a clear relationship between people's opinions on the big issues and the selected user-level factors. It raises the question of whether it is even possible to persuade someone with prior beliefs relevant to a debate claim to change their stance on the issue. It may be the case that people prefer to agree with the individuals having the same (or similar) beliefs regardless of the quality of the arguments and the particular language used. Therefore, it is important to understand the relative effect of prior beliefs vs. argument strength on persuasion.

Task descriptions
Some of the previous work in NLP on persuasion focuses on predicting the winner of a debate as determined by the change in the number of people supporting each stance before and after the debate (Zhang et al., 2016;Potash and Rumshisky, 2017). However, we believe that studies of the effect of language on persuasion should take into account other, extra-linguistic, factors that can affect opinion change: in particular, we propose an experimental framework for studying the effect of language on persuasion that aims to control for the prior beliefs of the reader as denoted through their self-identified political and religious ideologies. As a result, we study a more fine-grained prediction task: for an individual voter, predict which side/debater/argument the voter will declare as the winner.
Task 1 : Controlling for religious ideology. In the first task, we control for religious ideology by selecting debates for which each of the two debaters is from a different religious ideology (e.g., debater 1 is ATHEIST, debater 2 is CHRIS-TIAN). In addition, we consider only voters that (a) self-identify with one of these religious ideologies (e.g., the voter is either ATHEIST or CHRISTIAN) and (b) changed their stance on the debate claim post-debate vs. pre-debate. For each such voter, we want to predict which of the PRO-side debater or the CON-side debater did the convincing. Thus, in this task, we use Criterion 2 to determine the winner of the debate from the point of view of the voter. Our hypothesis is that the voter will be convinced by the debater that espouses the religious ideology of the voter.
In this setting, we can study the factors that are important for a particular voter to be convinced by a debater. This setting also provides an opportunity to understand how the voters who change their minds perceive arguments from a debater who is expressing the same vs. the opposing prior belief.
To study the effect of the debate topic, we perform this study for two cases -debates belonging to the Religion category and then all the categories. The Religion category contains debates like "IS THE BIBLE AGAINST WOMEN'S RIGHTS?" and "RELIGIOUS THEORIES SHOULD NOT BE TAUGHT IN SCHOOL". We want to see how strongly a user's religious ideology affects the persuasive effect of language in such a topic as compared to the all topics. We expect to see stronger effects of prior beliefs for debates on Religion.
Task 2: Controlling for political ideology. Similar to the setting described above, Task 2 controls for political ideology. In particular, we only use debates where the two debaters are from different political ideologies (CONSERVATIVE vs. LIBERAL). In contrast to Task 1, we consider all voters that self-identify with one of the two debater ideologies (regardless of whether the voter's stance changed post-debate vs. pre-debate). This time, we predict whether the voter gives more total points to the PRO side or the CON side argument. Thus, Task 2 uses Criterion 1 to determine the winner of the debate from the point of view of the voter. Our hypothesis is that the voter will assign more points to the debater that has the same political ideology as the voter.
For this task too, we perform the study for two cases -debates from the Politics category only and debates from all categories. And we expect to see stronger effects of prior beliefs for debates on Politics.

Features
The features we use in our model are shown in Table 2. They can be divided into two groups -features that describe the prior beliefs of the users and linguistic features of the arguments themselves.

User features
We use the cosine similarities between the voter and each of the debaters' big issue vectors. These features give a good approximation of the overall similarity of two user's opinions. Second, we use indicator features to encode whether the religious and political beliefs of the voter match those of each of the debaters.

Linguistic features
We extract linguistic features separately for both the PRO and CON side of the debate (combining all the utterances of PRO across different turns and doing the same for CON). Table 2 contains a list of these features. It includes features that carry information about the style of the language (e.g., usage of modal verbs, length, punctuation), represent different semantic aspects of the argu-User-based features Description Opinion similarity.
For userA and userB, the cosine similarity of BIGISSUES userA and BIGISSUES userB .

Matching features.
For userA and userB, 1 if userA f ==userB f , 0 otherwise where f ∈ {political ideology, religious ideology}. We denote these features as matching political ideology and matching religious ideology. Linguistic features Description Length.
Number of tokens. Tf-idf.
Unigram, bigram and trigram features. Referring to the opponent.
Whether the debater refers to their opponent using words or phrases like "opponent, my opponent". Politeness cues.
Whether the text includes any signs of politeness such as "thank" and "welcome". Showing evidence.
Whether the text has any signs of citing any other sources (e.g., phrases like "according to"), or quotation.

Sentiment.
Average sentiment polarity. Subjectivity (Wilson et al., 2005). Number of words with negative strong, negative weak, positive strong, and positive weak subjectivity. Swear words.
Average # of words with positive, negative and neutral connotation. Personal pronouns.
Usage of first, second, and third person pronouns.

Links.
# of links.
# of exclamation marks.

Questions.
# of questions.  (Wilson et al., 2005), sentiment, swear word features) as well as features that convey different argumentation styles (argument lexicon features (Somasundaran and Wiebe, 2010). Argument lexicon features include the counts for the phrases that match with the regular expressions of argumentation styles such as assessment, authority, conditioning, contrasting, emphasizing, generalizing, empathy, inconsistency, necessity, possibility, priority, rhetorical questions, desire, and difficulty. We then concatenate these features to get a single feature representation for the entire debate.

Results and Analysis
For each of the tasks, prediction accuracy is evaluated using 5-fold cross validation. We pick the model parameters for each split with 3-fold cross validation on the training set. We do ablation for each of user-based and linguistic features. We report the results for the feature sets that perform better than the baseline. We perform analysis by training logistic regression models using only user-based features, only linguistic features and finally combining userbased and linguistic features for both the tasks.  Task 1 for debates in category Religion. As shown in Table 3, the majority baseline (predicting the winner side of the majority of training examples out of PRO or CON) gets 56.10% accuracy. User features alone perform significantly better than the majority baseline. The most important user-based feature is matching religious ideology. This means it is very likely that people change their views in favor of a debater with the same religious ideology. In a linguistic-only features analysis, combination of the personal pronouns and connotation features emerge as most important and also perform significantly better than the majority baseline at 65.37% accuracy. When we use both user-based and linguistic features to predict, the accuracy improves to 66.42% with connotation features. An interesting observation is that including the user-based features along with the linguistic features changes the set of important linguistic features for persuasion removing the personal pronouns from the important linguistic features set. This shows the importance of studying potentially confounding user-level factors.
Task 1 for debates in all categories. As shown in Table 4, for the experiments with user-based features only, matching religious ideology and opinion similarity features are the most important. For this task, length is the most predictive linguistic feature and can achieve significant improve-  ment over the baseline (61.01%). When we combine the language features with user-based features, we see that with exclamation mark the accuracy improves to (65.74%).
Task 2 for debates in category Politics. As shown in Table 5, using user-based features only, the matching political ideology feature performs the best (80.40%). Linguistic features (refer to Table 5 for the full list) alone, however, can still obtain significantly better accuracy than the baseline (59.60%). The most important linguistic features include approval, politeness, modal verbs, punctuation and argument lexicon features such as rhetorical questions and emphasizing. When combining this linguistic feature set with the matching political ideology feature, we see that with the accuracy improves to (81.81%). Length feature does not give any improvement when it is combined with the user features.
Task 2 for debates in all categories. As shown in Table 6, when we include all categories, we see that the best performing user-based feature is the opinion similarity feature (73.96%). When using language features only, length feature (56.88%) is the most important. For this setting, the best accuracy is achieved when we combine user features with length and Tf-idf features. We see that the set of language features that improve the performance of user-based features do not include some of that perform significantly better than the baseline when used alone (modal verbs and politeness features).

Related Work
Below we provide an overview of related work from the multiple disciplines that study persuasion. Argumentation mining. Although most recent work on argumentation has focused on identifying the structure of arguments and extracting argument components (Persing and Ng, 2015;Palau and Moens, 2009;Biran and Rambow, 2011;Mochales and Moens, 2011;Feng and Hirst, 2011;Stab and Gurevych, 2014;Lippi and Torroni, 2015;Park and Cardie, 2014;Nguyen and Litman, 2015;Peldszus and Stede, 2015;Niculae et al., 2017;Rosenthal and McKeown, 2015), more relevant is research on identifying the characteristics of persuasive text, e.g., what distinguishes persuasive from non-persuasive text (Tan et al., 2016;Zhang et al., 2016;?;Habernal and Gurevych, 2016a,b;Fang et al., 2016;Hidey et al., 2017). Similar to these, our work aims to understand the characteristics of persuasive text but also considers the effect of people's prior beliefs.
Persuasion. There has been a tremendous amount of research effort in the social sciences (including computational social science) to understand the characteristics of persuasive text (Kelman, 1961;Burgoon et al., 1975;Chaiken, 1987;Tykocinskl et al., 1994;Chambliss and Garner, 1996;Dillard and Pfau, 2002;Cialdini, 2007;Durik et al., 2008;Tan et al., 2014;Marquart and Naderer, 2016  is the research of Tan et al. (2016), Habernal and Gurevych (2016a) and Hidey et al. (2017). Tan et al. (2016) focused on the effect of user interaction dynamics and language features looking at the ChangeMyView 9 (an internet forum) community on Reddit and found that user interaction patterns as well as linguistic features are connected to the success of persuasion. In contrast, Habernal and Gurevych (2016a) created a crowd-sourced corpus consisting of argument pairs and, given a pair of arguments, asked annotators which is more convincing. This allowed them to experiment with different features and machine learning techniques for persuasion prediction. Taking motivation from Aristotle's definition for modes of persuasion, Hidey et al. (2017) annotated claims and premises extracted from the ChangeMyView community with their semantic types to study if certain semantic types or different combinations of semantic types appear in persuasive but not in non-persuasive essays. In contrast to the above, our work focuses on persuasion in debates than monologues and forum datasets and accounts for the user-based features. Persuasion in debates. Debates are another resource for studying the different aspects of persuasive arguments. Different from monologues where the audience is exposed to only one side of the opinions about an issue, debates allow the audience to see both sides of a particular issue via a controlled discussion. There has been some work on argumentation and persuasion on online debates. Sridhar et al. (2015), Somasundaran and Wiebe (2010) and Hasan and Ng (2014), for example, studied detecting and modeling stance on online debates. Zhang et al. (2016) found that the side that can adapt to their opponents' discussion points over the course of the debate is more likely to be the winner. None of these studies investigated the role of prior beliefs in stance detection or persuasion.
User effects in persuasion. Persuasion is not independent from the characteristics of the people to be persuaded. Research in psychology has shown that people have biases in the ways they interpret the arguments they are exposed to because of their prior beliefs (Lord et al., 1979;Vallone et al., 1985;Chambliss and Garner, 1996). Understanding the effect of persuasion strategies on people, the biases people have and the effect of prior beliefs of people on their opinion change has been an active area of research interest (Correll et al., 2004;Hullett, 2005;Petty et al., 1981). Eagly and Chaiken (1975), for instance, found that the attractiveness of the communicator plays an important role in persuasion. Work in this area could be relevant for the future work on modeling shared characteristics between the user and the debaters. To the best of our knowledge, Lukin et al. (2017) is the most relevant work to ours since they consider features of the audience on persuasion. In particular, they studied the effect of an individual's personality features (open, agreeable, extrovert, neurotic, etc.) on the type of argument (factual vs. emotional) they find more persuasive. Our work differs from this work since we study debates and in our setting the voters can see the debaters' profiles as well as all the interactions between the two sides of the debate rather than only being exposed to a monologue. Finally, we look at different types of user profile information such as a user's religious and ideological beliefs and their opinions on various topics.

Conclusion
In this work we provide a new dataset of debates and a more controlled setting to study the effects of prior belief on persuasion. The dataset we provide and the framework we propose open several avenues for future research. One could explore the effect different aspects of people's background (e.g., gender, education level, ethnicity) on persuasion. Furthermore, it would be interesting to study how people's prior beliefs affect their other activities on the website and the language they use while interacting with people with the same and different prior beliefs. Finally, one could also try to understand in what aspects and how the language people with different prior beliefs/backgrounds use is different. These different directions would help people better understand characteristics of persuasive arguments and the effects of prior beliefs in language.