Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism

Neutralisation techniques, e.g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view. We first draw on social science to introduce the problem to the community of nlp, present the granularity of the coding schema and then collect manual annotations of neutralised techniques in text relating to climate change, and experiment with supervised and semi- supervised BERT-based models.


Introduction
There is strong consensus in the scientific community on human-induced climate change (Cook et al., 2016;Powell, 2017). Despite this, action on climate change has become an increasingly partisan issue with strong opposition voices discrediting scientists, and spreading scepticism and misinformation. One such source is climate change counter movement organizations, which are an amalgam of lobbyists, big corporations, conservative think tanks, and media corporations (Dunlap and Jacques, 2013;Boussalis and Coan, 2016;Farrell, 2016;McKie, 2018), whose aim is to fuel climate change scepticism (CCS). Public perception is influenced by the narrative presented to them (Fløttum, 2014;Fløttum et al., 2016), and CCS texts use neutralization techniques to build counter-climate narratives (McKie, 2018).
The cure can't be worse than the disease/problem is a phrase frequently used by climate change sceptics, 1 and also recently by Donald Trump in reference to  Though two widely different issues, neutralization is used to justify opposing a policy, lack of action, and thus promotion of either total denial of the problem (Diethelm and McKee, 2009) or its severity. Table 1 1 https://www.wired.com/story/theanalogy-between-covid-19-and-climatechange-is-eerily-precise/ 2 https://www.business-standard.com/ article/international/trump-opposesperpetual-lockdown-says-cure-cannot-beworse-than-problem-120101300184_1.html Sure, we should reduce greenhouse gases, but if our climate policies hurt our ability to create more wealth and bring power to the world's poor, then we are ridding the patient of the disease, but only by killing him It's very convenient for alarmist greens to blame the fires of Australia and California on global warming. In reality, global warming is just a natural cycle and the policies they themselves advocate are the culprits.
The IPCC falsely attributes natural warming and urban warming to greenhouse gas (GHG) emission warming. It ignores the compelling evidence of natural climate change before 1950 that correlates well with indicators of solar activity presents two examples of neutralization in the context of climate change.
In social science, neutralization is defined as justification/vindication for a deviant behaviour (Sykes and Matza, 1957;Maruna and Copes, 2005;Kaptein and Van Helvoort, 2019). Though initially developed in the field of criminology, it has been widely studied in different fields ranging from lack of corporate social responsibility (Cherry and Sneirson, 2010), to fast fashion (Joy et al., 2012), the tobacco industry (Fooks et al., 2013;Conway, 2010), andCCS (McKie, 2018). McKie (2018) argued that to fully understand the neutralization narrative around CCS, there is a need to break it down into specific techniques (e.g. denial of responsibility vs. denial of victim; see Section 3). Our paper proposes a method to automatically classify these neutralization techniques (henceforth "NT"), as a tool to analyse CCS narrative at scale and help build counter-narratives.
Our contributions in this work are as follows: (1) we introduce the NT (multilabel) classification task; (2) we develop and release a dataset with manual annotations of NT used in CCS texts; and (3) we explore semi-supervised models for the classifica-tion task, resulting in strong results on par with human performance. We release the code and data used in our experiments at: https://github. com/sb1992/cc-neutralization.

Related Work
Sykes and Matza (1957) first introduced the techniques of neutralization, known as the "famous five", as a tool for justification of deviant behaviour. The neutralization techniques inventory has since been expanded to include "metaphor of ledger" (Klockars, 1974), "excuse acceptance" (Minor, 1981) and "no one cares" (Shigihara, 2013). More recently, Kaptein and Van Helvoort (2019) developed a schema which combined them into a hierarchy of categorizations and sub-categorizations. McKie (2018) extended the work of Sykes and Matza (1957) to CCS.
Separately, research on fake news and propaganda has primarily operated at the article level, and focused on binary detection (presence vs. absence) Rashkin et al., 2017). Da San Martino et al. (2019) argued for the need for finer granularity in propaganda detection, both in terms of propaganda sub-types and fragment-level detection. In a similar vein, Nakamura et al. (2020) proposed fine-grained classes of fake news to differentiate between misleading, manipulated, or totally false content. More recently in the climate change domain, Luo et al. (2020) released a stance-annotated dataset for global warming, and proposed an opinion framing task to study discourse used in the debate around global warming.
One challenge in building supervised NLP models is the strong dependency on labelled data. To tackle this, one approach is apply transfer learning from pretrained language models (Radford et al., 2019;Peters et al., 2018;Yang et al., 2019;Conneau and Lample, 2019;Devlin et al., 2019). Another approach is semi-supervised learning. Yang et al. (2017) and Gururangan et al. (2019) employed variational autoencoders, and leveraged cross-view training using a mixture of labelled and unlabelled data. More recently, pretrained models and semi-supervised learning have been combined with great success, e.g. Xie et al. (2020) used BERT along with consistency regularization on unlabeled data, Croce et al. (2020) extended the fine-tuning process of BERT to a generative adversarial setting, and Chen et al. (2020) used interpolation to mix up the hidden representations of BERT to create augmented data for training.

Neutralization and Frames
Dunlap and Brulle (2015), Farrell (2016), and Boussalis and Coan (2016) categorised CCS arguments into 2 frames: science ("SCIENCE") and policy ("POLICY"). SCIENCE questions the scientific facts, is heavy on denial, or promotes pseudo science, whereas POLICY deals with issues of cost and economy (e.g. carbon tax), targets the scientists, or passes the blame for action to other nations. McKie (2018) Table 2. CCS texts often use multiple NT together in their narrative (hence motivating a multilabel classification task), as seen in the second example in Table 1 where Condemn (POLICY) is used to blame the alarmist greens and Deny-Responsibility (SCIENCE) is used to highlight that global warming is a natural cycle. Similarly, in third example as well we see Condemn (POLICY) is used to accuse the IPCC (Intergovernmental Panel on Climate Change, 3 in conjunction with Deny-Responsibility (SCIENCE) to point out climate change being a natural and linked to solar activity.

Dataset
We construct our neutralisaton techniques dataset from 3 sources: (1) paragraphs extracted from CCS documents (Bhatia et al., 2020); (2) CCS sentences/paragraphs from McKie (2018); 4 and (3) anti-global warming opinions (sentences) from Luo et al. (2020). 5 This results in a mixture of sentences and paragraphs, resulting in diversity in the dataset (with longer snippets expected to have more multilabelling). We henceforth call these text snippets "sentences" for brevity.
Our dataset has a total of 8000 sentences, of which 785 were annotated (and the remainder used as unlabelled data). We formulate the task as a multi-label classification problem where an annotator selects NONE, or one or more NT labels.
To make the task easier for annotators, we split it into 2 NT annotation subtasks based on the two frames: (1) the SCIENCE frame (Deny-Responsibility, Deny-Injury1, Deny-Injury2, Deny-Victim, or NONE); and (2) the POLICY frame (Condemn, Loyalties, Justify, or NONE). We combine annotations by taking a majority vote within each frame, and label a sentence as NONE only if it is the majority-class for both sub-tasks (i.e. none of the NT labels are majority-assigned for either frame). We collect human judgements using Amazon Mechanical Turk with 9 sentences forming a single HIT, one of which acts as a quality control in the form of a labelled data instance from McKie (2018). Each HIT was annotated by a minimum of 5 and maximum of 10 annotators. For further details of the annotation process, see Section 8.
We present statistics of the labelled data in Table 3. Interestingly, we see 3 large classes of NT-Deny-Victim, Condemn, and Loyaltiesimplying that most CCS narratives completely deny climate change, condemn the scientists, and prioritise the economy.

Automatic Classification
We experiment with SVM as a baseline and then explore several BERT-based supervised and semisupervised models for classification (Devlin et al., 2019). As it is a multilabel classification problem, we add a number of one-vs-rest classification layers (one for each class) on top of BERT, and update all parameters during fine-tuning.
SVM: Standard linear-kernel SVM used in one vs. rest mode, and adapted to a multilabel setting.
BERT: Standard supervised BERT fine-tuned using the labelled data.
MTEXT: A semi-supervised BERT-based model based on Chen et al. (2020) extended to a multilabel setting. MTEXT combines the hidden representation of 2 training instances (drawn from both labelled and unlabelled instances) via interpolation to create a large number of augmented data samples. The supervised objective (L s ) uses standard cross-entropy loss whereas the unsupervised objective uses consistency loss (L cl ) in the form of KL-divergence. L cl is computed both on labelled and unlabelled data, where the labels for the unlabelled data are inferred in a self-training manner. To encourage sharp probabilities for unsupervised instances, an entropy minimization loss L em is added, yielding the overall objective L nt = w 1 L s + w 2 L cl + w 3 L em , where w x are tunable hyper-parameters.
MTEXT multi : As we see in Section 3, NT is associated with SCIENCE and POLICY frames. We experiment with adding these frames (including the NONE class, 3 in total) as an auxiliary objective, creating another supervised loss (L f rame ). 6 The final objective is L nt + αL f rame , where α is a tunable hyper-parameter.
Following Gururangan et al. (2020), we also experiment with adaptive pretraining for BERT,

Argument or Example NT Frame
There's no indication this is anything other than natural variability, with humans not playing a part

Deny-Responsibility SCIENCE
There is a very real probability that global warming has been overestimated by computer models, and won't be too bad   i.e. before we fine-tune BERT to our task, we pretrain the off-the-shelf BERT using the masked language model objective on CCS documents (Bhatia et al., 2020). Models with adaptive pretraining are marked with '*', e.g. MTEXT * multi . At test time, we add two extra post-processing rules for the NONE class: (1) it is automatically selected if all other classes are predicted to be absent; and (2) it is never selected if any other classes are predicted to be present.

Experiments
We split the labelled data into train/dev/test with 450/135/200 sentences. The semi-supervised models (MTEXT variations) also have access to the unlabelled 7215 sentences. We use the uncased BERT-base as the pretrained model for all experiments. We detail the full training details and hyperparameters in supplementary material.
We present micro-precision, micro-recall and  Table 4: NT multi-label classification performance. "P", "R", and "F" denote micro-precision, micro-recall and micro-F1 respectively. micro-F1 results for the test-set in Table 4. To provide an upper bound, we also present estimated human performance, which is computed by randomly isolating a worker's annotations, and calculating agreement with the rest for the test instances (repeated 100 times to reduce variance, and microaveraged).
We first look at the (fully) supervised results, and see that the baseline BERT performs the worst, but adaptive pretraining (BERT * ) boosts results.
Moving on to semi-supervised models (MTEXT, MTEXT * , MTEXT multi and MTEXT * multi ), we see consistent gains, highlighting the benefits of using unlabelled data. MTEXT multi with its multitask objective gives a small but appreciable gain over MTEXT, producing performance that is on par with human performance. Interestingly, adaptive  pretraining (MTEXT * and MTEXT * multi ) does not seem to help here, we suspect because both techniques are based on the same idea, i.e. to improve performance by leveraging additional unlabelled data. SVM-a simple baseline has the lowest performance To better understand how "data efficient" these models are, we present micro-F1 over varying amounts of labelled training data in Figure 1. We see that MTEXT and MTEXT multi outperform BERT and BERT * substantially with only 30% training data (135 instances) and maintain their the strong performance as data quantity increases.
Finally, we present a breakdown of F1 scores for each class in Table 5. Adaptive learning mostly improves the two large classes (Deny-Responsibility and Loyalties) for BERT vs. BERT * . When we incorporate semisupervised learning (MTEXT and MTEXT multi ), we see large improvements for all the small classes (Deny-Injury1, Deny-Injury2, and Justify), suggesting that semi-supervised learning benefits the smaller classes most. Similar to Table 4 to get an estimated upper bound we also present human F1 scores for each class. Looking at those scores, we ob-serve that the gap with human performance is higher for the smaller classes (Deny-Injury1, Deny-Injury2, and Justify) even for our best model, highlighting the limitations of semisupervised learning.

Technical Details
For the supervised BERT models, we use the following fine-tuning hyper-parameters: batch size=10, epoch =3, learning rate=0.0005, number of epochs =3 and use BERT-base-uncased as the base model. For semi-supervised MTEXT based models, we use following hyper-parameters: labelled batch size=2, unlabelled batch size=5, sharpening temperature=0.6, the beta distribution parameter = 0.2, 7 learning rate=0.00005, w 1 =1, w 2 = 1, w 3 =0.8 in w 1 L s + w 2 L cl + w 3 L em , α for auxiliary objective αL f rame = 0.8, and perform data augmentation for unlabelled data using German and Russian as pivot languages, similar to Chen et al. (2020). For SVM, we use unigrams and bigrams as features with tf-idf weighting and the regularization parameter C = 10. More training details are provided in supplementary material.

Conclusion
We draw on social science literature in introducing the notion of "neutralisation", in the context of climate change sceptics. We collect annotations of neutralisation techniques in text relating to climate change, and experiment with supervised and semisupervised BERT-based models.

Mechanical Turk
To pass quality control for a given HIT, the annotator has to select the correct class for the quality control sentence (which is not flagged in any way to the annotator, and presented in random order); the annotations from a given HIT are not used to determine consensus labelling if their average pass rate across all HITs attempted is ≤ 0.7. We collect additional annotations by releasing the task internally to a small number of local workers. 8 Each HIT was paid at USD$0.61, and took an average of 5 minutes to complete. This amounts to $7.32 per hour, which is slightly above US federal minimum wage ($7.25).

Supplementary Material 1 Training Details
For the supervised BERT models, we use the following fine-tuning hyper-parameters: batch size=10, epoch =3, learning rate=0.0005, number of epochs =3 and use BERT-base-uncased as the base model. We tune our decision boundary threshold to classify the presence of a label based on development set and are 0.2 for DOR, 0.2 for DOI1, 0.2 for DOI2, 0.3 for DOV, 0.3 for COC, 0.3 for AHL, 0.2 for JBC, and 0.2 for NONE.
For semi-supervised MTEXT based models, we use following hyper-parameters: labelled batch size=2, unlabelled batch size=5, sharpening temperature=0.6, epoch =3, the beta distribution parameter = 0.2, 1 , learning rate=0.00005, w 1 =1, w 2 =1, w 3 =0.8 in w 1 L s + w 2 L cl + w 3 L em , α for auxiliary objective αL f rame = 0.8. mixing layers as 7,9,12 and use BERT-base-uncased. We tune our decision boundary threshold to classify the presence of a label based on development set and are 0.75 for DOR, 0.70 for DOI1, 0.70 for DOI2, 0.80 for DOV, 0.85 for COC, 0.80 for AHL, 0.70 for JBC, and 0.60 for NONE. We use 2 augmentations (based on back translation) with Russian and German as the intermediate language.

Other Details
• Computing Infrastructure: We use RTX 2080 Ti and GTX 1080. In MTEXT based models we use 2 gpus when trained with RTX 2080 ti and 3 gpus when trained with GTX 1080. BERT based models are trained on a single gpu.
• Average run time. BERT based models are quite quick and a minute per epoch to fine tune whereas MTEXT based models take around 20 minutes for each epoch.
• As all the models are based on BERT-Base-uncased the number of parameters are around 110M 2 • Validation performance of the various models are given in Table 1 1 we use a small value here to ensure the generated data in the model is similar to labelled data with small noise regularization 2 strictly speaking number of parameters in MTEXTmulti will be slightly more due to auxiliary objective but is insignificant in overall picture  Table 1: NT multi-label classification performance on validation data. "F" denote micro-F1 respectively.
• Hyperparameter tuning was done using manual search and the criteria used was micro-F1 on validation set.
• Parameters used for final set of experiments are given in the above section of Training Details