Annotating and Analyzing Semantic Role of Elementary Units and Relations in Online Persuasive Arguments

For analyzing online persuasions, one of the important goals is to semantically understand how people construct comments to persuade others. However, analyzing the semantic role of arguments for online persuasion has been less emphasized. Therefore, in this study, we propose a novel annotation scheme that captures the semantic role of arguments in a popular online persuasion forum, so-called ChangeMyView. Through this study, we have made the following contributions: (i) proposing a scheme that includes five types of elementary units (EUs) and two types of relations. (ii) annotating ChangeMyView which results in 4612 EUs and 2713 relations in 345 posts. (iii) analyzing the semantic role of persuasive arguments. Our analyses captured certain characteristic phenomena for online persuasion.


Introduction
Changing a person's opinion is a difficult process because one has to first understand his/her opinion and reasons. Recent studies in the field of argument mining and persuasion detection have investigated the feature of persuasiveness in the documents of a persuasive forum (Tan et al., 2016;Hidey et al., 2017). Many existing studies analyzing the features of persuasion have focused on lexical features (Tan et al., 2016;Habernal and Gurevych, 2016) and argumentative features such as post-to-post interaction (Ji et al., 2018), concessions (Musi et al., 2018), and semantic types of argument components. Although these analyses are important, we argue that it is also important to understand the fine-grained strategy by analyzing the semantic roles of arguments. * The work of this paper was performed when he was a student at Tokyo University of Agriculture and Technology (to contact: morio@katfuji.lab.tuat.ac.jp.)  In this study, we investigate the semantic roles of arguments in a persuasive forum by proposing an annotation scheme on a data set of Change-MyView (Tan et al., 2016). ChangeMyView is a subreddit in which users post an opinion (named a View) to change their perspective through comments of a challenger. When the View is changed, the user who posted the original post (OP) awards a Delta point (∆) to the challenger who changed the View. Figure 1 is an overview of our annotation in ChangeMyView in which the Positive post is an awarded post that won a ∆ and the Negative post is a non-awarded one.
To parse arguments from ChangeMyView, we considered five types of elementary unit (EU) (i.e., Fact, Testimony, Value, Policy, and Rhetorical Statement) and two types of relation between EUs (i.e., Support and Attack). Moreover, We demonstrated that EUs and these relations are effective for characterizing persuasive arguments.
The contributions of this study can be summarized as follows: (i) We have proposed an annotation scheme for EUs and its relations for Change-MyView; (ii) We annotated 4612 EUs and 2713 relations in 115 threads, and we computed an interannotator agreement using Krippendorff's alpha. Note that α EU = .677 and α Rel = .532 are better than those of existing studies; (iii) A significant difference in the distribution of each EU exists between OP and reply posts; however, no significant difference in the types of EU and relation is observed between persuasive and non-persuasive arguments.

Related Work
Recent studies in argument mining investigated the characteristics of an argument by considering the role of argumentative discourse units and relations (Ghosh et al., 2014;Peldszus and Stede, 2015;Stab and Gurevych, 2014). Moreover, recent studies have focused on the semantics of argument components (Park et al., 2015;Al Khatib et al., 2016;Becker et al., 2016). For example, Hollihan and Baaske (2004) proposed three types of claims, i.e., fact, value, and policy, in which fact can be verified with objective evidence, value is an interpretation or judgment, and policy is an assertion of what should be done. Park et al. (2015) extended this argument model with types of claims such as testimony and reference. Al Khatib et al. (2016) proposed the argument model for analyzing the argumentation strategy in news editorials. This model separated an editorial into argumentative discourse units of six different types, such as Common Ground, Assumption, and Testimony. Because persuasion is often based on facts and testimony, this type of semantic classification of claim is valid for our study.
Several studies have focused on the semantics for analyzing the characteristics of persuasive arguments. Wachsmuth et al. (2018) investigated the rhetorical strategy for effectively persuading to the other, and Hidey et al. (2017) focused on the semantics of premise and claim.

Data Source
In our study, a dataset of ChangeMyView (Tan et al., 2016) is introduced. ChangeMyView is a forum in which users initiate the discussion by posting an Original Post (OP) and describing their View (or we call it as Major Claim) in the title. An OP user has to describe his/her reason behind the View. Then, certain challengers post a reply to change the OP's View. If the challenger succeeds at changing the OP's View, the OP user awards a ∆ to the challenger.
In this study, we extracted 115 threads from the ChangeMyView dataset through a simple random sampling. Each thread contained a triple of OP, Positive (which won a ∆), and Negative (which is a non-awarded one). Therefore, we used 345 posts (115 × (OP, Positive, Negative)) for our annotation.

Annotation Scheme
We defined the five types of EUs and two types of relations between the EUs. This scheme enables us to capture the semantic roles of elementary units and how we build an argument based on the semantic units.

Type of Elementary Units
There are five types of EUs that are similar to the scheme of Park et al. (2015) pertaining to eRulemaking comments. The motivation for the introduction of the scheme is based on our expectation that we can feature persuasive arguments by considering personal experience, facts, and value judgments. The five types of EUs are defined as follows: Fact: This is a proposition describing objective facts as perceived without any distortion by personal feelings, prejudices, or interpretations. Unlike Testimony, this proposition can be verified with objective evidence; therefore, it captures the evidential facts for persuasion. Certain examples of Fact are as follows: "they did exactly this in the U.K. about thirty or so years ago" and "this study shows that women are 75% less likely to speak up in a space when outnumbered". Testimony: This is an objective proposition related to the author's personal state or experience. This proposition characterizes how users utilize their experience for persuasions. Certain examples of Testimony are as follows: "I do not have children" and "I've heard suggestions of an exorbitant tax on ammunition". Value: This is a proposition that refers to subjective value judgments without providing a statement on what should be done. This proposition is nearly similar to an opinion. Certain examples of Value are as follows: "this is completely unworkable" and "it is absolutely terrifying". Policy: This is a proposition that offers a specific course of action to be taken or what should be done. It typically contains modal verbs, such as should, or imperative forms. Certain examples of Policy are as follows: "everyone needs to be respectful of other patrons" and "intelligent students should be able to see that".
Finally, because ChangeMyView users usually utilize a rhetorical question (Blankenship and Craig, 2006) to increase their persuasion, this study provides a novel EU type that is useful for determining a rhetorical strategy. Rhetorical Statement: This unit implicitly states the subjective value judgment by expressing figurative phrases, emotions, or rhetorical questions. Therefore, we can regard it as a subset of Value 1 . Certain examples of Rhetorical Statement are as follows: "You can observe this phenomenon yourself!" and "if one is paying equal fees to all other students why is one not allowed equal access and how is this a good thing?".

Type of Relations
The two types of relations between EUs are defined as follows: Support: An EU X has support relation to the other EU Y if X provides positive reasoning for Y. It is typically linked by connectives such as therefore. An example of support relation is as follows: X: "Every state in the U.S. allows homeschooling" (Fact) support Y: "if you are ideologically opposed to the public school system, you are free to opt out" (Value). Attack: An EU X has attack relation to the other EU Y if X provides negative reasoning for Y. It is typically linked by connectives such as however. An example of attack relation is as follows: X: "Young men are the most likely demographic to get into an accident" (Value) attack Y: "that does not warrant discriminating against every individual in the group" (Value).

Annotation Process
The annotation task includes two subtasks: (1) segmentation and classification of EUs and (2) relation identification. We recruited 19 non-native students who are English proficient as annotators with all annotations being performed over original English texts. Each annotator was asked to read the guideline as well as the entire post before the actual annotation. Moreover, we held several meetings for each subtask to train the annotators. Furthermore, because the annotators are nonnative speakers, to ensure the understanding of the posts is consistent among the annotators, the posts are translated into their language. The translation was conducted by two annotators per document: one for the translation and the other for the validation. Note that the translated documents are only used as a reference for the annotators.
In the EU annotation, three annotators independently annotated 87 threads, whereas the remaining 28 threads were annotated by eight expert annotators who were selected from 19 annotators. From the 87 threads, using a majority vote, a gold standard is established by merging three annotation results. To extract accurate minimal EU boundary and remove irrelevant tokens, such as therefore and punctuation, we considered the token-level annotation rather than the sentencelevel. Token-level annotation enables us to distinguish an inference step that one of the propositions can be a claim and the other can be a premise.
Here is an example of inference step: <"Empire Theatres in Canada has a "Reel Babies" showing for certain movies" [Fact]> so <"parents can take their babies and not worry about disturbing others" [Value]>. Moreover, all EU boundaries, except a Rhetorical Statement, should contain a complete sentence to render EU propositions.
In the relation annotation, two annotators independently annotated 50 threads, whereas the remaining 65 threads were annotated by eight expert annotators. In the 50 threads, to establish the gold standard by merging two annotation results, expert annotators were assigned to each thread. We modeled the structure of each argument with a oneclaim approach (Stab and Gurevych, 2016) that considers an argument as the pairing of a single claim and a set of premises that justify the claim. Major Claim has to be a root node of an argument in OP posts, and each claim has a stance attribute to the OP's View.
We computed an inter-annotator agreement (IAA) using Krippendorff's α. Consequently, the IAA of EUs is α EU = 0.677 and that of relations is α Rel = 0.532. Note that the IAA values are higher than the result of Park and Cardie (2018) in the eRulemaking annotation with respect to EUs (α = 0.648) and relations (α = 0.441). 2 Furthermore, our IAA of EUs is higher than the result of Hidey et al. (2017) (α = 0.65) in the ChangeMyView annotation 3 . We consider the higher agreement is because of the token-level annotations as the sentence-level annotations cannot accurately distinguish an inference step.
Most of the disagreement in EU annotation occurred between Value and the other types. In Value vs. Fact situation, a disagreement occurred when a unit is described in a general way, such as "many people" and "generally", and incorrectly marked as a Fact, although the unit should be Value. Moreover, in Value vs. Testimony situation, a disagreement occurred when a unit is incorrectly interpreted as a Value. For example, "I am an atheist" was incorrectly marked as Value, although it should be labeled Testimony because the unit describes a personal state.
2 Note that the relation annotation of Park and Cardie (2018) is only limited to the Support relation.
3 Note that the IAA result of relations cannot be compared because the labeling of relations is not conducted in Hidey et al. (2017)

Corpus Analysis
To examine the features of persuasive arguments, we analyzed the EUs and the relation between units in each case, i.e., OP vs. Reply (Positive and Negative) and Positive vs. Negative.
We investigated how the number of EU in a post contributes to the persuasive strategy. We used the Mann-Whitney U test and identified that there exists a significant difference in Testimony and Rhetorical Statement in OP vs. Reply. Testimony is more likely to appear in OP (10.3%) than in Reply (6.6%) and Rhetorical Statement is more likely to appear in Reply (17.2%) than in OP (12.1%). Therefore, an OP author tends to describe their View based on their own experience or state and Rhetorical Statement tends to appear more in Reply as the reply post is for trying to change the OP's View. This result is consistent with intuition; however, there is no significant difference between positive and negative in any type of the EU and the p-value of Testimony, Policy, and Rhetorical Statement is p > 0.85. This indicates that the frequency of occurrence of the EU cannot be a persuasive feature. Figure 2 shows the annotation result of Relations between units in each post, in which source means the type of supporting EU and target means the type of supported unit. Most of the targets is Value type. Note that Testimony is reasoning more in OP than in reply and Rhetorical Statement is reasoning more in reply than in OP; moreover, the relation between Values is more in positive than in negative.
Next, to investigate the logical strength of an argument (Wachsmuth et al., 2017), we examine degree and depth. The degree means the number of supporting EUs to a supported unit. For example, in Figure 1, Major Claim is supported by EU1 and EU2; thus, the degree = 2 and the depth = 2. However, EU4 is only supported by EU5; thus, the degree = 1 and the depth = 1. Figures 3 and 4 show the resulting histogram of degree and depth in each post, respectively. According to the results, each post has no significant difference and it  is a power-law distribution. Most of the EUs have two or less relations, and the depths of arguments are less than three. This indicates that the logical strength of argument may not contribute to persuasiveness. Moreover, because it indicates that there are many arguments that have a stance attribute to the OP's View, how they interact with the OP may contribute to persuasion.
To clarify the role of EUs as arguments, we investigated the position of each type of EUs in an argument. Figure 5 shows a histogram of the position in the argument, where the position means normalized depth at the root node to 0.0 and at the terminal node to 1.0. For instance, normalized depth of the following argument can be described as follows: In Positive and Negative post, Fact and Testimony often appear at near the terminal node of an argument structure, which indicates that trying to persuade is based on facts and personal experiences. Moreover, Value and Policy appear at near the root node, which indicates trying to change the View by finally describing an opinion or what should be done as a conclusion. These results are consistent with intuitive results; moreover, an interesting result is that Rhetorical Statement tends to appear at near the terminal node of the argument. This indicates that people tend to use rhetorical phrases for appealing to the emotions first and then assert their opinion as their persuasive strategy.
Furthermore, the statistical tests were conducted to examine whether the difference in OP vs. Reply and Positive vs. Negative post exists. We used the Kolomogorov-Smirnov (KS) test and Levene test on each case. In OP vs. Reply, a sig-nificant difference exists in the position distribution of Fact by KS test (p < 0.05), and Policy by Levene test (p < 0.01). This indicates that people tend to make an assertion based on objective facts as a persuasion strategy.

Conclusion
In this study, we proposed an annotation scheme for capturing the semantic role of EUs and relations in online persuasions. We annotated five types of EUs and two types of relations that resulted in 4612 EU and 2713 relation annotations. The analyses revealed that the existence of Rhetorical Statement and the position of Fact in an argument structure characterizes the persuasive posts that try to change the View. In future studies, we will focus on the following: (i) the expansion of our corpus data by annotating the post-to-post interaction and (ii) the application of our data to training sets of machine learning, i.e., automatically identifying the argument structure and detecting the persuasive posts.