Argument Relation Classiﬁcation Using a Joint Inference Model

In this paper, we address the problem of ar-gument relation classiﬁcation where argu-ment units are from different texts. We de-sign a joint inference method for the task by modeling argument relation classiﬁcation and stance classiﬁcation jointly. We show that our joint model improves the re-sults over several strong baselines


Introduction
What is a good counterargument or support argument for a given argument?Despite recent advances in computational argumentation, such as argument unit (e.g., claims, premises) mining (Habernal and Gurevych, 2015), argumentative relation (e.g., support, attack) prediction between argument units from the same text (Stab and Gurevych, 2014;Nguyen and Litman, 2016), as well as assessing argument strength of essays (Persing and Ng, 2015) or predicting convincingness of Web arguments (Habernal and Gurevych, 2016), this question is still an unsolved problem.
In this work we focus on the problem of argument relation classification where argument units are from different texts, i.e., given a set of arguments related to the same topic, we aim to predict relations (e.g., agree or disagree) between any two arguments.We are aware of argumentative relations between premises and the conclusion within a structured argument.Instead, here we are interested in modeling relations among atomic argument units in dialogic argumentation.This task is important for argumentation in debates (Zhang et al., 2016), stance classification (Sridhar et al., 2015), or persuasion analysis (Tan et al., 2016), among others.
There are various different views on the meaning of "support" and "attack" in argumenta-tion theory (Cayrol andLagasquie-Schiex, 2005, 2013).In this paper, we use "agree" and "disagree" to represent relations between two arguments which bear a stance regarding the same topic.Specifically, if a 1 agrees with a 2 regarding the topic t then a 1 and a 2 are conflict-free.And if a 1 disagrees with a 2 then they are not conflictfree.
There is a close relationship between argument relation classification and stance classification.First, argument relation classification can benefit from knowing the stance information of arguments.Specifically, if two arguments hold different stances with regard to the same topic, then they likely disagree with each other.Likewise, two arguments that hold the same stance regarding the same topic tend to agree with each other.Secondly, stance classification can benefit from modeling relations between arguments.For instance, we would expect two arguments that disagree with each other to hold different stances.
There has been a large amount of work focusing on stance classification in on-line debate forums by integrating disagreement information between posts connected with reply links (Somasundaran and Wiebe, 2009;Murakami and Raymond, 2010;Sridhar et al., 2015).However, disagreement information is mainly used as an auxiliary variable and is not explicitly evaluated.Our goal in this paper is to examine argument relation classification in dialogic argumentation.Our task is more challenging because unlike most previous work on disagreement classification, which can explore meta information (e.g., reply links between posts are strong indicators of disagreement), we are only provided with text information (see examples in Table 1).
In this paper, we model argument relation classification and stance classification jointly.We evaluate our model on a dataset extracted from De-Debate Topic: Are genetically modified foods (GM foods) beneficial?
Sub Topic: Consumer safety Arg (1) Pro Foods with poisonous allergens can be modified to reduce risks.Arg (2) Pro GM crops can be fortified with vitamins and vaccines.Arg (3) Con There are many instances of GM foods proving dangerous.
Sub Topic: socio-economic impacts Arg (4) Pro GM crops are made disease-resistant, which increases yields.Arg (5) Con GM agriculture threatens the viability of traditional farming communities.Arg (6) Pro GM crops generate greater wealth for farming communities.(Stab and Gurevych, 2014;Persing and Ng, 2016;Nguyen and Litman, 2016) extracted argument units and predicted relations (i.e., support, attack, none) between argument units in persuasive student essays.Peldszus and Stede (2015) identified the argument structure of short texts in a bilingual corpus.In contrast, in our work the argument units are from different texts.Therefore, we do not have discourse connectives (e.g., "on the contrary" or "however") which usually are strong indicators for argument relations.Cabrio and Villata (2012) used a textual entailment system to predict argument relations between argument pairs which are extracted from Debatepedia.An argument pair could be an argument coupled with the subtopic, or an argument coupled 1 http://www.debatepedia.org/with another argument of the opposite stance.
Recently, Menini and Tonelli (2016) predicted agreement/disagreement relations between argument pairs of dialogic argumentation in the political domain.The authors also create a large agreement/disagreement dataset by extracting arguments from the same sub-topic of Debatepedia.However, they only consider argument pairs that share a topic keyword.We do not have such constraints (see Arg (1) and Arg (2) in Table 1).In addition, they use SVM while we do joint inference.
Stance classification.There has been an increasing interest on modeling stance in debates (e.g., congressional debates or online political forums) (Thomas et al., 2006;Somasundaran and Wiebe, 2009;Murakami and Raymond, 2010;Walker et al., 2012;Gottipati et al., 2013;Hasan and Ng, 2014).As discussed in Section 1, there is a close relationship between stance classification and argument relation classification.For instance, Sridhar et al. (2015) showed that stance classification in online debate forums can benefit from modeling disagreement of the reply links (e.g., you could assume an argument is attacking the preceding argument).In our work, we focus on modeling argument relations.

Joint inference and Markov logic networks.
Markov logic networks (MLNs) (Domingos and Lowd, 2009) are a statistical relational learning framework that combine first order logic and Markov networks.They have been successfully applied to various NLP tasks such as semantic role labeling (Meza-Ruiz and Riedel, 2009), information extraction (Poon and Domingos, 2010), coreference resolution (Poon and Domingos, 2008) and bridging resolution (Hou et al., 2013).In this paper, we apply MLNs to model argument relation classification and stance classification jointly.

Method
As stated in the introduction, our goal is argument relation classification as opposed to stance classification.Therefore, given a topic t and a set of arguments A which belongs to t, instead of finding the position (i.e., pro or con) a i (a i ∈ A ) takes with respect to t, we want to predict the relation (i.e., agree or disagree) between a i and a j .
The approach we propose tries to make the best use of the topics and arguments by classifying the stances of arguments and the relations between arguments jointly, using Markov logic networks (MLNs).
More specifically, given a topic t and its argument set A we would like to find the stance s i for each argument a i and the relation r ij between argument a i and a j (a i , a j ∈ A) jointly.Let r ij be a relation assignment for an argument pair a i , a j ∈ A, R A be a relation classification result for all arguments in A, R n A be the set of all relation classification results for A. Let s a be a stance prediction for an argument a ∈ A, S A be a stance prediction result for arguments in A, S n A be the set of all possible stance prediction results for A. Our joint inference for argument relation classification and stance classification can be represented as a log-linear model: where w is the model's weight vector, Φ(A, R A , S A ) is a "global" feature vector which takes the entire relation and stance assignments for all arguments in A into account.We define Φ(A, R A , S A ) as: where Φ l (a i , a j , r ij ) and Φ k (a, s a ) are local feature functions for argument relation classification and stance classification, respectively.The former looks at two arguments a i and a j , the latter at the argument a and the stance s a .The global feature function Φ g (r ij , s a i , s a j ) looks at the relation and stance assignments for a i and a j at the same time (see f 5 − f 8 in Table 2).This log-linear model can be represented using Markov logic networks (MLNs).Table 2 shows formulas for modeling the problem in MLNs.p1 and p2 are hidden predicates that we predict, i.e., predicting the relation (i.e., agree or disagree) between a 1 and a 2 , and deciding the stance (i.e., pro or con) of a 1 .f 1 models the symmetry of argument relation.f 2 models the transitivity of the agree relation.f 3 and f 4 model agree/disagree relations among three arguments.f 5 − f 8 model mutual relation between the two hidden predicates, i.e., arguments holding the same/different stance are likely to agree/disagree with each other.f 9 and f 10 integrate predictions from the local classifier for argument relation classification and stance classification respectively.

Dataset
Debatepedia is an encyclopedia of arguments collected from different sources on debate topics.
Each debate topic is organized hierarchically.It contains background of the topic and usually a number of subtopics, with pro and con arguments for or against each subtopic (see Table 1 for an example).An argument typically includes a claim and a few supporting evidences.We create a corpus by extracting all subtopics and their arguments from Debatepedia.We pair all arguments from the same subtopic and label every argument pair as "agree" (for arguments holding the same stance) or "disagree" (for arguments holding the opposite stance).In total we collect data from 657 topics.We reserve 25 topics as the development set and 25 topics as the test set, using the remaining 607 topics for the training set.Table 3 gives an overview of the whole corpus.Sridhar et al. (2015) as well as the disagreement classifier from Menini and Tonelli (2016).We include features of unigrams, all word pairs of the concatenation of two arguments, the overall sentiment of each argument from Stanford CoreNLP (Socher et al., 2013;Manning et al., 2014), the content overlap of two arguments, as well as the number of negations in each argument using a list of negation cues (e.g., not, no, neither) from Councill et al. (2010).We also include three types of dependency features (Anand et al., 2011) which consist of triples from the dependency parse of the argument.Specifically, a basic dependency feature (rel i , t j , t k ) encodes the syntactic relation rel i between words t j and t k .One variant is to replace the head word of 2 The dataset and splits will be available on publication.
the relation rel i with its part-of-speech tag.The other variant is replacing tokens in a triple with their polarities (i.e., + or −) using MPQA dictionary of opinion words (Wilson et al., 2005).localStanceToRel.We again employ logistic regression to train a local stance classification model (localStance) using the same features as in local-Rel.We construct the training instances by pairing a topic t and all its pro/con arguments in the training set3 .During testing, we predict two arguments agree/disagree to each other if they have the same/differences stances regarding the topic.LSTM+attention.We adapt the attention-based LSTM model used for textual entailment in Rocktäschel et al. (2016).We use GloVe vectors (Pennington et al., 2014) with 100 dimensions trained on Wikipedia and Gigaword as word embeddings.To avoid over-fitting, we apply dropout before and after the LSTM layer with the probability of 0.1.We train the model with 60 epochs using cross-entropy loss.We use Adam for optimization with the learning rate of 0.01.EDIT.We reimplement the approach for argument relation classification from Cabrio and Villata (2012).Specifically, we train the textual entailment system EDIT4 on our training set using the same configuration used in Cabrio and Villata (2012).We then apply the trained model on the testing dataset.Joint model.For our approach described in Section 3, we use the output of the two local classifiers (localRel and localStance) as the input for formulas f 9 and f 10 in Table 2. 5 The weights of the formulas are learned on the dev dataset.We use thebeast6 to learn weights for the formulas and to perform inference.thebeast employs cutting plane inference (Riedel, 2008) to improve the accuracy and efficiency of MAP inference for Markov logic.

Results and Discussion
Table 4 shows the results of different approaches on argument relation classification.EDIT performs the worst among four local classifiers with an accuracy of 0.50.We think this is mainly due to the difference between the corpora, i.e., we don't Table 4: Experimental results of argument relation classification on the testing dataset.Bold indicates statistically significant differences over the baselines using randomization test (p < 0.01).
pair an argument with its topic in our argument relation classification dataset.
Additionally, the results of LSTM+attention are worse than localRel and localStanceToRel.We suspect this is because the amount of our training data is only 1/10 of the SNLI corpus used in Rocktäschel et al. (2016).Also our dataset has a richer lexical variability.
In general, the local model localRel is better at predicting disagree than agree.The approach localStanceToRel flips this by predicting more argument pairs as agree.Overall, there is a small improvement in accuracy from localRel to localStanceToRel.Our joint model combines the strengths of the two local classifiers and performs significantly better than both of them in terms of accuracy and macro-average F-score (randomization test, p < 0.01).

Conclusions
We propose a joint inference model for argument relation classification on dialogic argumentation.The model utilizes the mutual support relations between argument relation classification and stance classification.We show that our joint model significantly outperforms other local models. 2

4. 2
Experimental Setup Local argument relation classification (local-Rel).We employ logistic regression to train a local argument relation classification model using agree and disagree pairs from the training set.Our local classifier replicates, to the extent possible, the state-of-the-art local stance classifier from Walker et al. (2012) used by

Table 2 :
Hidden predicates and formulas used for argument relation classification.a 1 , a 2 represent arguments in the topic t. r ∈ {agree, disagree}, s ∈ {pro, con}.

Table 3 :
Training, development and testing data.