A Study of the Impact of Persuasive Argumentation in Political Debates

Persuasive communication is the process of shaping, reinforcing and changing others’ responses. In political debates, speakers express their views towards the debated topics by choosing both the content of their discourse and the argumentation process. In this work we study the use of semantic frames for modelling argumentation in speakers’ discourse. We investigate the impact of a speaker’s argumentation style and their effect in inﬂuencing an audience in supporting their candidature. We model the inﬂuence index of each candidate based on their relative standings in the polls released prior to the debate and present a system which ranks speakers in terms of their relative inﬂuence using a combination of content and persuasive argumentation features. Our results show that although content alone is predictive of a speaker’s inﬂu-ence rank, persuasive argumentation also affects such indices.


Introduction
In recent years, researchers have studied political texts detecting ideological positions (Sim et al., 2013;Hasan and Ng, 2013), predicting voting patterns (Thomas et al., 2006;Gerrish and Blei, 2011) and characterising power based on linguistic features (Prabhakaran et al., 2013). While there is a vast amount of theoretical research on the rhetoric of politicians, only recently there has been a growing interest in understanding the argumentation processes involved in political communication by means of computational linguistics (Hasan and Ng, 2013;Boltužić andŠnajder, 2014).
During a debate, a speaker tries to convince the audience of a particular point of view. This normally involves an argumentation process, where the structuring of ideas is built upon logical connections between claims and premises, and a persuasive communication style. In this paper, we study the impact of persuasive argumentation in political debates on candidates' power/influence ranking. As opposed to previous approaches, we propose to characterise political debates based on persuasive argumentation modelled through semantic frames.
Previous work (Rosenberg and Hirschberg, 2009) has analysed political speech transcripts identifying prosodic and lexical-syntactic cues which correlate with political personalities. Prabhakaran et al. (2013) proposed interactions within political debates as predictors of a candidate's relative power or influence rank in polls. More recently they also found topic-shifting to be a good indicator of candidate's relative rankings in polls (Prabhakaran et al., 2014). Argumentation in debates has been studied from the perspective of automatic argument extraction (Cabrio and Villata, 2012) and stance classification (Hasan and Ng, 2013). However, to the best of our knowledge, argumentation has not been explored as a influence rank indicator. Moreover the study of persuasion in the NLP community has been so far limited.
The novelty of our work is the proposal of a method to automatically extract persuasive argumentation features from political debates by means of the use of semantic frames as pivoting features. We have trained a rank Support Vector Machine (SVM) model based on content and persuasive ar-gumentation features in order to rank debate speakers. Our experimental results on the 20 debates for the Republican primary election show that certain types of persuasive argumentation features such as Premise and Support Relation appear to be better predictors of a speaker's influence rank compared to basic content features such as unigrams. When combining with content-related features, most persuasive argumentation features give superior performance compared to the baselines.

Persuasive Argumentation
Argumentation has been defined as a verbal and social activity of reason which aims to increase the acceptability of a controversial standpoint by putting forward a set of connected propositions intending to justify or refute a standpoint before a rational judge (van Eemeren et al., 1996). Different argumentation theories propose various schemes for describing the underlying structure of an argument (Toulmin., 1958;Walton et al., 2008;Freemen, 2011;Peldszus and Stede, 2013). All these theories generally agree in that an argument can be structured by means of two argument components and two argumentative relations. The argument components include claims and premises. A claim is a central component of an argument and is characterised as being a controversial statement to be judged as true or false. Moreover a claim cannot be accepted by an audience without additional support. Such support is provided in the form of premises underpinning the validity of the claim. The following sentence illustrates an example 1 of an argument highlighting the claim and premises: ''People aren't investing in America because this president has made America a less attractive place for investing and hiring than other places in the world." (Former Governor Mitt Romney) While argumentation focuses on the rational support structured to justify or refute a standpoint, persuasion focuses on language cues aiming at shaping, reinforcing and changing a response. In persuasive communication such response ranges from perceptions, beliefs, attitudes and behaviours.
Persuasive language is characterised by the use of emotive lexicons (e.g., atrocious, dreadful, sensational, highly effective) where the speaker tries to engage with the audience's emotions (Macagno and Walton, 2014). Often words with emotive meanings can present values and assumptions as uncontroversial, acting therefore as potentially manipulative instruments of argumentation (Macagno, 2010). Other characteristics of persuasive language include the use of alliteration, which is a stylistic device characterised by the repetition of first consonants in series of words. This artistic constraint enables the speaker to sway the audience by feeling an urgency towards a rhetorical situation by intensifying any attitude being signified (Bitzer, 1968;Lanham, 1991). The use of a repeating sounds engages auditory senses leading to the evoking of emotions that engage the audience.
The following is an example of persuasive language 2 : "I'm convinced that part of the divide that we're experiencing in the United States, which is unprecedented, it's unnatural, and it's un-American, is because we're divided economically, too few jobs, too few opportunities" (Former Governor Huntsman).
To the best of our knowledge however, the study of the relation of persuasion and argumentation in political debates is limited. One of the main challenges is the lack of annotated corpora which include both argument annotations and persuasive messages annotations. While there has recently been released a corpus of persuasive essays (Stab and Gurevych, 2014) containing annotations for both class-level argument components and argument relations, there is yet none annotated corpora for persuasive arguments in political debates. In order to study whether persuasive cues and persuasive argumentation can be used as predictors of speakers' influence ranking on a debate, we propose to bridge between existing persuasive and political corpora through semantic frame features. The following section introduces the proposed strategy to port annotation between two corpora.

Extracting Persuasive Argumentation Features from Political Debates
In order to study whether persuasive argumentation can be used as predictors of speakers' influence ranking on a debate, we propose to use the persuasive essays corpus compiled by Stab and Gurevych (2014) to study persuasive argumentation in political debates through the use of semantic frames.

Persuasive Essays (PE) Corpus
A persuasive essay is an essay written with the aim of convincing a reader on adopting a way of thinking regarding a stance taken on a topic. Unlike speech where an audience can be persuaded by means of social features or speech style, essays only rely on the written word depending therefore solely on the writer's persuasive style. The Persuasive Essays (PE) corpus consists of 90 essays comprising 1,673 sentences. It contains annotations for both class-level argument components and argument relations. The class-level annotations include: 1) major claims; 2) claims; 3) premises and 4) the argumentative relations being either "support" or "attack". Argumentative relations are directed relations between source and target components (e.g., between premises, claims and major claims). The PE argument annotations follows the scheme described in Table 1.

Claim
Controversial Statement which is either true or false, and which should not be accepted or otherwise without additional support Premise Justifies the validity of a claim ForStance Indicates that an argument supports a claim AgainstStance Indicates that an argument refutes a claim SupportRel. Indicates which supporting premises belong to a claim AttackRel.
Indicates which refuting premises belong to a claim

Presidential Political Debates (PD) Corpus
Presidential political debates enable candidates to expose and discuss their stances on policy issues contrasting them with other candidates' stances. During a debate, speakers unveil their discourse style as well as the premises supporting their claims.
For our experiments, we collected the manual transcripts of debates for the Republican party presidential primary election from The American Presidency Project 3 . This political debates corpus (PD) consists of 20 debates which took place between May 2011 and February 2012. A total of 10 candidates participated in these debates with an average participation of 6.7 candidates per debate. This corpus comprises 30-40 hours of interaction time and an average of 20,466.6 words per debate. These debates follow a common structure in which a moderator directly addresses questions to the candidates where disruptions to answers are common due to interruptions from other candidates. In this corpus, each debate transcript lists the speakers including moderator and candidates and questions asked during the debate. Each transcript also clearly delimits turns between speakers and moderators as well as mark-up occurrences of the audience's reactions such as booing and laughter.

Semantic Frames
We propose to make use of the persuasion essays corpus annotations to understand persuasive argumentation in political debates by means of the use of semantic frames. A semantic frame is a description of context in which a word sense is used. We make use of FrameNet (Baker et al., 1998), which consist of over 1000 patterns used in English (e.g., Leadership, Causality, Awareness, and Hostile encounter). In this work we extract such patterns using SEMAFOR (Das et al., 2010).
Consider the sentence in Table 2 in which two semantic frames are detected. Each parsed semantic frame consists of {Frame, SemanticRole, label} providing a higher level characterisation of a text, highlighting the semantics of the discourse used in this text. If such semantic frames appear to be some of the most prominent features for a certain persuasive argumentation annotation scheme (e.g., "Claim"), then we can extract persuasive argumentation features from the unlabelled Political Debates corpus using semantic frames as pivoting features.
In this work we propose to port annotations between the Persuasive Essays (PE) and Political Debates (PD) corpora by means of the use of semantic frames as pivoting features.
To represent the PE corpus, let A = {a 1 , .., a n } Sentence: What we need in this country is to use this issue as a national security tool.

FRAME SEMANTIC ROLE LABEL
Political locales Target national security Point of dispute Target this issue be the set of annotation schemes described in Table 1 and let T a = {t 1 , .., t n } be the collection of sentences annotated with argument scheme a. To represent the PD corpus, let's U D = {u 1 , .., u n } be the set of speakers taking part on a debate D. Let S uD = {s 1 , .., s n } be the set of sentences generated by speaker u on debate D.
Taking the PE corpus as a reference corpus, we propose to generate a vector representation of each annotation scheme in A for each speaker in each debate of corpus PD by following the steps below: i) Based on tf-idf we extract the most representative semantic frames for each annotation scheme a in PE as the vector SF a ; ii) We compute the weighted representation of each annotation scheme a in the PD corpus as the vector f u d,a for each speaker u on each debate d as follows: a) First we compute the bag of semantic frames SF u d from speaker u in debate d based on the speaker's content on the debate; then b) For each annotation scheme a we weight vector f u d,a based on the normalised frequency of each semantic frame element in SF u d appearing in SF a .

Semantic Frames and Argument Types
The statistics of the extracted semantic frames from PE for each argument type are presented in Table 3.  Such semantic frames provide a vector representation characterising each persuasive argumentation scheme described in Table 1.  gumentation type.
Using the vector representation of each annotation scheme generated from PE, we computed the persuasive argumentation features for the PD corpus. Table 5 presents a sentence sample for each argument type identified in the PD corpus along with the semantic frames characterising the sentence.

Influence Ranking in Political Debates
We study a speaker's influence on an audience based on his/her persuasiveness language and argumentation styles during a political debate. To measure how influential a speaker is on an audience, we make use of the influence index (Prabhakaran et al., 2013), which is calculated based on a speakers relative standing on poll released prior to the debate.
Poll scores describe the influence a speaker has to favourably change the electorate position towards his/her campaign. Given a debate D and the set of speakers U D we retrieve the poll results released prior to the debate and use the percentage of electorate supporting each candidate. If for a given debate there are multiple polls then the index is computed taking the mean of poll scores. Therefore the influence index P of speaker u ∈ U D is: where p i is the poll percentage assign to speaker u in poll i in the reference polls.

Sentence Semantic Frames
Claim If we can turn Syria and Lebanon away from Iran, we finally have the capacity to get Iran to pull back.
Cause Change, Manipulation, Capability Premise Because they put that money in, the president gave the companies to the UAW, they were part of the reason the companies were in trouble.

Causation, Predicament, Leadership
ForStance And the reason is because that's how our founding fathers saw this country set up.

Reason, Kinship, Perception Experience
AgainstStance I was concerned that if we didn't do something, there were some pretty high risks that not just Wall Street banks, but all banks would collapse.
Emotion Directed, Intentionally Affect, Daring SupportRel. I went to Washington, testifying in favor of a federal amendment to define marriage as a relationship between man and a woman.
Taking Sides, Cognitive Connection, Evidence AttackRel. But you can't stand and say you give me everything I want or I'll vote no.

Features
We characterise each speaker in each debate based on the content and emotion cues he/she generated. Specifically we analyse each candidate in three dimensions: i) what they said (content features); ii) the persuasiveness of the language they used including persuasive argumentation features and emotive language; iii) and external emotions evoked during the debates. We described each set of features below.

Content Features
We use a set of features which characterise content of a candidate's participation on a debate (Prabhakaran et al., 2013). These include: 1) Unigrams (UG), which represents lexical patterns by counting frequencies of word occurrences; 2) Question Deviation (QD), difference between observed percentage of questions asked to a candidate and the fair share percentage of questions in the debate; 3) Word Deviation (WD), difference between observed percentage of words spoken by a candidate and the fair share percentage of words in the debate; 4) Mention Percentage (MP), a candidate mention counts normalised based on all candidates' mentions in a debate.

Persuasiveness Features
We represent three types of persuativeness features as follows: 1) Persuasive Argumentation Features. Following the method described in the previous section, we extract the semantic frame feature vector representing each annotation scheme (f u d,a ) for each speaker on each debate. These vectors provide information of different argumentation dimensions. We have extracted a total of 710 semantic frames in PD.
2) Alliteration. After removing stopwords, we computed alliteration as repetitions of part of a word or a full word within a sentence.
3) Emotive Language. To characterise the use of emotive language, we generated a list of emotionrelated semantic frames (e.g., emotion directed, emotions by stimulus, emotions by possibility) 4 , then for each speaker u in each debate d, we generated an emotion-frame vector weighted by tf-idf.
Once the features for each speaker have been generated, we followed a supervised learning approach for ranking speakers of a debate based on their influence Index, which can be used to denote how well a speakers participation on a debate has impacted the audience endorsement of his/her campaign.

External Emotion Cues
Previous work (Strapparava et al., 2010) has shown that an audiences' social signal reactions to an idea, such as booing or cheering, are good pre-dictors of hot-spots where persuasion attempts succeeded or at least such attempts were recognised by the audience. In this work, rather than recognising such persuasion hot-spots, we explore these audiences' reaction cues (e.g applause) as potential predictors of a candidate success on a political debate, we refer to such cues as external emotion cues. For each speaker in a debate, we computed the number of i) applauses (APL); ii) booings (BOO); iii) laughs (LAU); and iv) crosstalks (CRO) he/she received during his/her participation on a debate. These counts were normalised based on the total number of each emotion appeared on the debate.
With these features, we train a supervised learning classifier for ranking speakers of the debates based on their influence indices described in the following section.

Influence Ranking Approach
In ranking, a training set consists of an ordered data set. Let "A is preferred to B" be denoted as " A B". Let D denote a debate with a set of speakers U D = {u 1 , u 2 , ..u n } and influence indexes P (u i ) for 1 < i < n. We specify a training set for ranking as R = {(u i , γ i ), .., (u n , γ n )} where γ i is the ranking of u i based on its P (u i ) so γ i < γ j if u i u j .
We want to find a ranking function F which outputs a score for each instance from which a global ordering of data is constructed. So the target function F (u i ) outputs a score such that F (u i ) > F (u j ) for any u i u j . In this work we use the Ranking SVM (Joachims, 2006) to estimate the ranking function F .

Experimental Setup
For our experiments we used the Persuasive Essays (PE) and Political Debates (PD) corpora introduced in previous sections. While the PE was used as a reference corpus, all our experiments were performed on the PD corpus.
All features are computed for the aggregation of a candidate's content in a debate. For content and alliteration features, we first removed stopwords. In particular, for computing unigram features we also stemmed words using a Porter stemmer (Porter, 1997).
To compute persuasive argumentation features we used the collection of semantic frame features for the reference corpus PE.

Evaluation
In this work, the ranking task evaluation for each debate consists on comparing the generated ranked list of candidates, using the influence ranking approach introduced above, against a reference ranked list. Such a reference ranked list corresponds to our gold standard of ranked list of candidates generated based on the polled scores for that debate.
Following a 5-fold cross validation, we report results applying four commonly used evaluation metrics for ranking tasks, nDCG, nDCG-3, Kendall's Tau and Spearman correlations. The discounted cumulative gain metric (nDCG) penalises inversions happening at the top n elements 5 of a ranked list more than those inversions happening at the bottom. While the nDCG metric penalises certain elements in the list, Kendall's tau and Spearman's rank correlations penalises inversions equally across the ranked list.
6 Results and Discussion

Correlation Analysis
We performed a correlation analysis for the content and persuasive emotion numeric features 6 . We computed the Pearson's product correlation between each feature with the candidate's influence index P (u) derived from the polls. The computed correlations for these features are presented in Figure 1. Darker bars indicate statistical significance correlation at p < 0.001; lighter dark bars at p < 0.05; and light bars not statistically significant.
These results show that for the content features, both question deviation (QD) and word deviation (WD) correlate moderately with the influence index; while the mention percentage (MP) feature correlates highly with the influence index (p < 0.05). For the emotion cues, we obtained statistically significant (p < 0.05) moderate correlations between the applause (APL), laugh (LAU), crosstalk (CRO) and the influence index; while the correlation between the booing (BOO) cue and the influence index was not statistically significant. These results indicate that speakers with higher influence index spoke for longer periods of time, in line with existing empirical findings in sociological studies (Ng and Bradac, 1993;Reid and Ng., 2000;Prabhakaran et al., 2013), and were asked a higher number of questions. This analysis also indicates that speakers with higher influence index generated more crosstalk, in line with previous empirical sociological findings (Ng et al., 1995); received more applauses and made the audience laugh more often.

Influence Ranking Results
Following the results of the correlation analysis, we conducted experiments using those content and emotion cue features presenting statistically significant correlations with the influence index. Apart from these features, we also consider the persuasive argumentation features and a combination of features from both content and persuasion categories. Results for the prediction of influence ranking using these features are presented in Table 6. For the content features, using the simple unigrams gives the best results. The mention percentage (MP) feature also attains competitive performance. A combination of word deviation, question deviation and mention percentage (WD+RD+MP) however degrades the performance. This is in contrast to the results reported in (Prabhakaran et al., 2013) (denoted as [PR13] in Table 6), where the unigram feature gives much worse results and their best results were obtained using WD+RD+MP. One possible reason is that for the unigram feature used in our experiments, we have performed pre-processing by removing stop words and stemming.
For external emotion cues, although some emotion cues appeared to be significantly correlated with the influence index in our analysis, they did not outperform the unigram baseline. We suspect that this might be due to the fact that depending on the location of a debate, certain candidates may bring bigger crowds into the debate's venue, therefore emotion cues can be a deliberate biased way of support. Consequently emotion cues happening within the debate venue may not reflect the emotions induced to the audiences that followed the broadcast of the debate.
When analysing the persuasion features, alliteration and emotive language features give better results compared to external emotion cues. But they did not outperform the unigram baseline either.
We find that persuasive argumentation features alone provide improvement upon the unigram baseline. In particular, in terms of nDCG and nDCG-3 7 , the premise and support relation types provide the best results for this feature category. In terms of Tau and Spearman 8 correlations, the attack relation type provides the best results. When focusing only on persuasive argumentation features, these results suggest that speakers with higher influence index tend to use well supported arguments (i.e. present more premisses supporting their claims) and/or tend to attack more other candidates by presenting premises refuting a claim.
When combining features, we found that the top 100 persuasive argumentation features ranked by tfidf together with word deviations and mention percentages significantly improve upon the baselines for particular argumentation cases including Claim, Premise, ForStance, and SupportRel.
The overall best performing features for predicting influence ranking in terms of nDCG, Tau and Spearman was consistently obtained with the com-bined feature for the Premise type.
Our results improve upon those recently obtained in (Prabhakaran et al., 2014) in both nDCG and Tau where topic shift patterns have been added for influence ranking (denoted as [PR14] in Table 6).
These results suggest the relevance of "what they said", the " persuasiveness style of their arguments" and the relative importance given by others by means of mentions are good predictors of influence ranking in political debates. In particular when combining the lexical content of candidates' discourse with their persuasive argumentation style, our results indicate that candidates with higher influence ranking tend to present more premises while clearly stating their stance (i.e. supporting a claim) on a particular topic.

Conclusions and Future Work
In this paper, we have studied the impact of argumentation in speaker's discourse and their effect in influencing an audience on supporting their candidature. In particular, we have conducted the study in the domain of political debates. In order to extract persuasive argumentation features from political debates, we have proposed a novel method to port annotations from a persuasive essay corpus using semantic frames as pivot features.
Our experimental results on the 20 debates for the Republican primary election show that when combined with word deviations and mention percentages,most persuasive argumentation features give superior performance compared to the baselines. Particularly with the Premise and SupportRel types appear to be better predictors of a speaker's influence rank. In future work, we will aim to improve the accuracy of the extracted persuasive argumentation features by exploring other methods for identifying persuasive argumentations from text.