The Pragmatics behind Politics: Modelling Metaphor, Framing and Emotion in Political Discourse

There has been an increased interest in modelling political discourse within the natural language processing (NLP) community, in tasks such as political bias and misinformation detection, among others. Metaphor-rich and emotion-eliciting communication strategies are ubiquitous in political rhetoric, according to social science research. Yet, none of the existing computational models of political discourse has incorporated these phenomena. In this paper, we present the first joint models of metaphor, emotion and political rhetoric, and demonstrate that they advance performance in three tasks: predicting political perspective of news articles, party affiliation of politicians and framing of policy issues.


Introduction
The role of metaphor and emotion in political discourse has been investigated in fields such as communication studies (Weeks, 2015;Mourão and Robertson, 2019), political science (Ferrari, 2007;Charteris-Black, 2009) and psychology (Edwards, 1999;Bougher, 2012). Political rhetoric may rely on metaphorical framing to shape public opinion (Lakoff, 1991;Musolff, 2004). Framing selectively emphasises certain aspects of an issue that promote a particular perspective (Entman, 1993). For instance, government spending on the wealthy can be portrayed as a partnership or bailout, spending on the middle class as simply spending or stimulus to the economy and spending on the poor as a giveaway or a moral duty, the former corresponding to the conservative and the latter to the liberal point of view (Stone, 1988). Metaphor is an apt framing device, with different metaphors used across communities with distinct political views (Kövecses, 2002;Lakoff and Wehling, 2012). At the same time, metaphorical language has been shown to express and elicit stronger emotion than literal lan-guage (Citron and Goldberg, 2014; Mohammad et al., 2016) and to provoke emotional responses in the context of political discourse covered by mainstream newspapers (Figar, 2014). For instance, the phrase "immigrants are strangling the welfare system" aims to promote fear of immigration. On the other hand, the experienced emotions may influence the effects of news framing on public opinions (Lecheler et al., 2015) and individual variations in emotion regulation styles can predict different political orientations and support for conservative policies (Lee Cunningham et al., 2013). Metaphor and emotion thus represent crucial tools in political communication.
At the same time, computational modelling of political discourse, and its specific aspects, such as political bias in news sources (Kiesel et al., 2019), framing of societal issues (Card et al., 2015), or prediction of political affiliation from text (Iyyer et al., 2014) have received a great deal of attention in the NLP community. Yet, none of this research has incorporated the notions of metaphor and emotion in modelling political rhetoric.
We present the first joint models of metaphor, emotion and political rhetoric, within a multi-task learning (MTL) framework. We make use of auxiliary learning, i.e. training a model in more than one task to improve the performance on a main task. We experiment with three tasks from the political realm, predicting (1) political perspective of a news article; (2) party affiliation of politicians from their social media posts; and (3) framing dimensions of policy issues. We use metaphor and emotion detection as auxiliary tasks, and investigate whether incorporating metaphor or emotion-related features enhances the models of political discourse. Our results show that incorporating metaphor or emotion significantly improves performance across all tasks, emphasising the prominent role they play in political rhetoric.

Related work
Modelling political discourse encompasses a broad spectrum of tasks, including estimating policy positions from political texts (Thomas et al., 2006;Lowe et al., 2011), identifying features that differentiate political rhetoric of opposing parties (Monroe et al., 2008) or predicting political affiliation of Twitter users (Conover et al., 2011;Pennacchiotti and Popescu, 2011;Preoţiuc-Pietro et al., 2017). Deep neural networks have been widely used to model political perspective, bias or affiliation at document level: Iyyer et al. (2014) used a recurrent neural network (RNN) to predict political affiliation from US congressional speeches. Li and Goldwasser (2019) identified the political perspective of news articles using a hierarchical Long Short-Term Memory (LSTM) and modelled social media user data with Graph Convolutional Networks (GCN). Lastly, a recent shared task presented a multitude of deep learning methods to detect political bias in articles (Kiesel et al., 2019). Framing in political discourse has gained some attention recently. Ji and Smith (2017)  Approaches predicting emotions for a given text typically adopt a categorical model of discrete, prototypical emotions, e.g. the six basic emotions of Ekman (1992). Early computational approaches employed vector space models (Danisman and Alpkocak, 2008) or shallow machine learning classifiers (Alm et al., 2005;Yang et al., 2007). Examples of deep neural methods are the recurrent model of Abdul-Mageed and Ungar (2017), who classified 24 fine-grained emotions, and the transformerbased SentiBERT architecture of Yin et al. (2020).
Computational research on metaphor has mainly focused on detecting metaphorical language in text. Early research performed supervised classification with hand-engineered lexical, syntactic and psycholinguistic features (Tsvetkov et al., 2014;Beigman Klebanov et al., 2016;Turney et al., 2011;Strzalkowski et al., 2013;. Alternative approaches perform metaphor detection from distributional properties of words (Shutova et al., 2010;Gutiérrez et al., 2016) or by training deep neural models (Rei et al., 2017;Gao et al., 2018). Dankers et al. (2019) developed a joint model of metaphor and emotion by fine-tuning BERT in an MTL setting.

Tasks and datasets
Political Perspective in News Media Political news can be biased towards the left or right side of the political spectrum. To model such biased perspectives computationally, we classify articles as left, right or centre using data from Li and Goldwasser (2019). 1 The articles are from the website AllSides 2 and are annotated with their source's bias. The training and test sets contain 2008 and 5761 articles, respectively. We use 30% of training data for validation.

Political Affiliation
For this task, we use the dataset of Voigt et al. (2018), 3 which contains public Facebook posts from 412 US politicians. The training, validation and test sets contain 9792, 2356 and 2458 posts, respectively. The classes are balanced and each set does not share politicians with the other sets. The task is thus to predict republican or democrat for posts of unseen politicians.
Framing The Media Frames Corpus 4 (Card et al., 2015) contains news articles discussing five policy issues: tobacco, immigration, same-sex marriage, gun-control and death penalty. There are 15 possible framing dimensions, e.g. economic, political etc. (see Appendix A.3.1). We use the article-level annotation to predict the framing dimension. Of 23,580 articles, we use 15% as the test set, and 15% of the training data for validation.
Metaphor For metaphor detection we use the VU Amsterdam dataset (Steen et al., 2010), which is a subset of the British National Corpus (Leech, 1992). The dataset contains 9,017 sentences and binary labels (literal or metaphorical) per word. We use the data split of Gao et al. (2018), that includes 25% of the sentences in the test set.
Emotion For emotion classification, we use a dataset from SemEval-2018 Task 1 (Mohammad et al., 2018), in which tweets were labelled for eleven emotion classes or as neutral (see Ap- pendix A.3.2). We use the English portion of the dataset (10,097 tweets) and the shared task splits.

Methods
We employ the Robustly optimized BERT approach (RoBERTa-base) presented by Liu et al. (2019) and use the implementation by Wolf et al. (2019). RoBERTa contains twelve stacked transformer layers and assumes an input sequence to be tokenised into subword units called Byte-Pair Encodings (BPE). A special <s> token is inserted at the beginning of the input sequence, to compute a contextualised sequence representation.
Our tasks are defined at three levels of the linguistic hierarchy. The auxiliary tasks of metaphor detection and emotion prediction are defined at word and sentence level, respectively, while the main political tasks are defined at document level.
For word-level metaphor identification, the subword encodings from RoBERTa's last layer are processed by a linear classification layer. A word is considered metaphorical provided that any of its BPEs was labelled as metaphorical. We assume the BPE from inflexions unlikely to cause a word to be incorrectly labelled as metaphorical.
For the sentence-level emotion prediction task and the document-level tasks of political affiliation and framing, the <s> encoding serves as sequence representation and is fed to a linear classification layer. For political perspective in news articles, the average document length exceeds the maximum input size of RoBERTa. We, therefore, split its documents into sentences and collect them in a maximum of 5 subdocuments with up to 256 subwords. After applying RoBERTa to the subdocuments, their <s> encodings are fed to an attention layer yielding a document representation to be classified. A model schematic is shown in Figure 1.
All task models use the cross-entropy loss with a sigmoid activation function. For political perspective detection, the loss function includes class weights to account for class imbalance.

Multi-task learning
The MTL architecture uses hard parameter sharing for the first eleven transformer layers. The last layer of RoBERTa, the classification and attention layers are task-specific to allow for specialisation, similar to the approach of Dankers et al. (2019).
The main political tasks are paired with the metaphor and emotion tasks one by one. The task losses are weighted with α for the main task and 1−α for the auxiliary task. We include an auxiliary warm-up period, during which α = 0.01, for some tasks. This allows the model to initially learn the (lower-level) auxiliary task while focusing mostly on the main task afterwards. This approach is similar to Kiperwasser and Ballesteros (2018).

Experimental setup
The models are trained with the AdamW optimiser, a learning rate of 1e − 5 and a batch size of 32. The learning rate is annealed through a cosine-based schedule and warm-up ratios of 0.2, 0.3 and 0.15 for political perspective in news media, the political affiliation and the framing tasks, respectively. Dropout is applied per layer with a probability of 0.3 for political affiliation and 0.1 otherwise.
The auxiliary warm-up period and α values are estimated per main task, for metaphor (α M ) and emotion (α E ) separately. For political perspective in news media, α M = 0.7, α E = 0.8, and models were trained for 20 epochs, with early stopping. Within political affiliation prediction, α M = α E = 0.9 and the first 5 epochs are for auxiliary warm-up. The models were trained for 20 epochs total. For the framing task α M = α E = 0.5, with 5 epochs of auxiliary warm-up for metaphor. Training lasted 10 epochs at most, with early stopping.
We average results over 10 random seeds. We perform significance testing using an approximated permutation test and 10 thousand permutations.

Discussion
Political Perspective and Affiliation For the political perspective detection task, the performance improvements of MTL models stem mostly from improved predictions for the right-wing class. Example 1 of Table 1 presents an emotive article snippet containing the metaphors of "boil over" and "simmering anger", for which joint learning with metaphor corrected the STL prediction.
For political affiliation, improvements from auxiliary tasks are due to a more accurate identification of the class of democrats. According to Pliskin et al. (2014), liberals are more susceptible to emotions, which could in part explain this result. Appendix A.2.2, Figure 2 visualises the performance across the political spectrum, from which we infer that politicians at the centre are harder to distinguish, and those on the left are better identified by our MTL models. We explored the emotions predicted by the MTL model in politicians' posts, as shown in Table 2. We found that emotions typically associated with conservative rhetoric -e.g. anger, disgust or fear (Jost et al., 2003) -were more frequent in republicans' posts. On the contrary, emotions associated with the liberal rhetoric -e.g. love (Lakoff, 2002) or sadness (Steiger et al., 2019) -are more often predicted for democrats' posts. Appendix A.2.2, Table 7 contains example posts where joint learning using emotion corrected the STL setup.
Framing In case of the framing task, MTL with metaphor prediction yielded the largest improvements for the frames of security and defence, morality and fairness and equality, particularly in articles on the metaphor-rich topics of immigration, guncontrol and death penalty. We automatically annotated metaphorical expressions in these articles to conduct a qualitative analysis. We observed that correct identification of linguistic metaphors often accompanies correct frame classification by the MTL model. Examples of such cases are shown in Table 1. In Example 2, metaphors such as "airtight case" and "evaporated", aided the model to identify the fairness and equality framing within the topic of death penalty. Similarly, presenting border security in Example 3 as a "sticking point in the immigration debate" improved the classification of the security and defense framing of an article on the topic of immigration. Appendix A.2.1, Table 6 presents detailed results per policy issue. The results for the immigration policy subset can be compared to those from Ji and Smith (2017) and Khanehzar et al. (2019).

Conclusion
In this paper, we introduced the first joint models of metaphor, emotion and political rhetoric. We considered predicting the political perspective of news, the party affiliation of politicians and the framing dimensions of policy issues. MTL using metaphor detection resulted in significant performance improvements across all three tasks. This finding emphasises the prevalence of metaphor in political discourse and its importance for the identification of framing strategies. Joint learning with emotion yielded significant performance improvements for the political perspective and affiliation tasks, which suggests that the use of emotion is an important political tool, aiming to influence public opinion. Future research may explore further tasks such as emotion and misinformation detection, which social scientists have found to be inter-related, and deploy more advanced MTL techniques, such as soft parameter sharing. Our code is publicly available at github.com/LittlePea13/mtl political discourse.

A.1 Experimental Setup
Our implementation is in Pytorch and uses Huggingface's Transformers library 7 to load the pretrained models and perform finetuning. Some code for training models was adapted from the utils nlp 8 library. The data splits and code were attached with our submission. In order to train the model, a cluster with 4 x Titan RTX, 24 GB GDDR6 GPU with an Intel R Xeon R 2.30 GHz CPU was used. Each model was trained in under two hours. STL models were trained in half of the time it took to train MTL models. We finetuned the pretrained RoBERTa model and do not account for the time it took to pretrain RoBERTa. RoBERTa itself has 125 million parameters and our task-specific layers added around 5 million parameters, with some variance per task, making a total of 130 million parameters.
We experimented with multiple α values at intervals of 0.1. To estimate the warm-up period for scheduled learning, we experimented with 3, 4 or 5 epochs. For the political affiliation task, dropout probabilities of 0.1, 0.2 and 0.3 were experimented with. The final hyperparameter setups were chosen through manual tuning based on the accuracy scores on the validation sets. Hyperparameters that were shared between MTL and STL for the same main task were selected based on the performance 7 https://github.com/huggingface/transformers 8 https://github.com/microsoft/nlprecipes/tree/master/utils nlp in the STL setup. The validation accuracy socres are listed in Table 5     -Anticipation (also includes interest, vigilance) -Disgust (also includes disinterest, dislike, loathing) -Fear (also includes apprehension, anxiety, terror) -Joy (also includes serenity, ecstasy) -Love (also includes affection) -Optimism (also includes hopefulness, confidence) -Pessimism (also includes cynicism, no confidence) -Sadness (also includes pensiveness, grief) -Surprise (also includes distraction, amazement) -Trust (also includes acceptance, liking, admiration)