Framing Unpacked: A Semi-Supervised Interpretable Multi-View Model of Media Frames

Understanding how news media frame political issues is important due to its impact on public attitudes, yet hard to automate. Computational approaches have largely focused on classifying the frame of a full news article while framing signals are often subtle and local. Furthermore, automatic news analysis is a sensitive domain, and existing classifiers lack transparency in their predictions. This paper addresses both issues with a novel semi-supervised model, which jointly learns to embed local information about the events and related actors in a news article through an auto-encoding framework, and to leverage this signal for document-level frame classification. Our experiments show that: our model outperforms previous models of frame prediction; we can further improve performance with unlabeled training data leveraging the semi-supervised nature of our model; and the learnt event and actor embeddings intuitively corroborate the document-level predictions, providing a nuanced and interpretable article frame representation.


Introduction
Journalists often aim to package complex realworld events into comprehensive narratives, following a logical sequence of events involving a limited set of actors. Constrained by word limits, they necessarily select some facts over others, and make certain perspectives more salient. This phenomenon of framing, be it purposeful or unconscious, has been thoroughly studied in the social and political sciences (Chong and Druckman, 2007). More recently, the natural language processing community has taken an interest in automatically predicting the frames of news articles (Card et al., 2016;Field et al., 2018;Akyürek et al., 2020;Khanehzar et al., 2019;Liu et al., 2019a;Huguet Cabot et al., 2020).
Definitions of framing vary widely including: expressing the same semantics in different forms (equivalence framing); presenting selective facts and aspects (emphasis framing); and using established syntactic and narrative structures to convey information (story framing) (Hallahan, 1999). The model presented in this work builds on the concepts of emphasis framing and story framing, predicting the global (aka. primary) frame of a news article on the basis of the events and participants it features.
Primary frame prediction has attracted substantial interest recently with the most accurate models being supervised classifiers built on top of large pretrained language models (Khanehzar et al., 2019;Huguet Cabot et al., 2020). This work advances prior work in two ways. First, we explicitly incorporate a formalization of story framing into our frame prediction models. By explicitly modeling news stories as latent representations over events and related actors, we obtain interpretable, latent representations lending transparency to our frame prediction models. We argue that transparent machine learning is imperative in a potentially sensitive domain like automatic news analysis, and show that the local, latent labels inferred by our model lend explanatory power to its frame predictions.
Secondly, the latent representations are induced without frame-level supervision, requiring only a pre-trained, off-the-shelf semantic role labeling (SRL) model (Shi and Lin, 2019). This renders our frame prediction models semi-supervised, allowing us to use large unlabeled news corpora.
More technically, we adopt a dictionary learning framework with deep autoencoders through which we learn to map events and their agents and patients 1 independently into their respective structured latent space. Our model thus learns a latent multi-view representation of news stories, with each view contributing evidence to the primary frame prediction from its own perspective. We incorporate the latent multi-view representation into a transformer-based document-level frame classification model to form a semi-supervised model, in which the latent representations are jointly learnt with the classifier.
We demonstrate empirically that our semisupervised model outperforms current state-of-theart models in frame prediction. More importantly, through detailed qualitative analysis, we show how our latent features mapped to events and related actors allow for a nuanced analysis and add interpretability to the model predictions 2 . In summary, our contributions are: • Based on the concepts of story-and emphasis framing, we develop a novel semi-supervised framework which incorporates local information about core events and actors in news articles into a frame classification model.
• We empirically show that our model, which incorporates the latent multi-view semantic role representations, outperforms existing frame classification models, with only labeled articles. By harnessing large sets of unlabeled indomain data, our model can further improve its performance and achieves new state-of-theart performance on the frame prediction task.
• Through qualitative analysis, we demonstrate that the latent, multi-view representations aid interpretability of the predicted frames.

Background and Related Work
A widely accepted definition of frames describes them as a selection of aspects of perceived reality, which are made salient in a communicating context to promote a particular problem definition, causal interpretation, moral evaluation and the treatment recommendation for the described issue (Entman, 1993). While detecting media frames has attracted much attention and spawned a variety of methods, it poses several challenges for automatic prediction due to its vagueness and complexity. Two common approaches in the study of frames focus either on the detailed issue-specific elements of a frame or, somewhat less nuanced, on generic framing themes prevalent across issues. Within the first approach, Matthes and Kohring (2008) developed a manual coding scheme, relying on Entman's definition (Entman, 1993). While the scheme assumes that each frame is composed of common elements, categories within those elements are often specific to the particular issue being discussed (e.g., "same sex marriage" or "gun control"), making comparison across different issues, and detecting them automatically difficult. Similarly, earlier studies focusing specifically on unsupervised models to extract frames, usually employed topic modeling (Boydstun et al., 2013;Nguyen, 2015;Tsur et al., 2015) to find the issue-specific frames, limiting across-issue comparisons.
Studies employing generic frames address this shortcoming by proposing common categories applicable to different issues. For example, Boydstun et al. (2013) proposed a list of 15 broad frame categories commonly used when discussing different policy issues, and in different communication contexts. The Media Frames Corpus (MFC; Card et al. (2015)) includes about 12,000 news articles from 13 U.S. newspapers covering five different policy issues, annotated with the dominant frame from Boydstun et al. (2013). Table 5 in the Appendix lists all 15 frame types present in the MFC. The MFC has been previously used for training and testing frame classification models. Card et al. (2016) provide an unsupervised model that clusters articles with similar collections of "personas" (i.e., characterisations of entities) and demonstrate that these personas can help predict the coarsegrained frames annotated in the MFC. While conceptually related to our approach, their work adopts the Bayesian modelling paradigm, and does not leverage the power of deep learning. Ji and Smith (2017) proposed a supervised neural approach incorporating discourse structure. The current best result for predicting the dominant frame of each article in the MFC comes from Khanehzar et al. (2019), who investigated the effectiveness of a variety of pre-trained language models (XLNet, Bert and Roberta).
Recent methods have been expanded to multilingual frame detection. Field et al. (2018) used the MFC to investigate framing in Russian news. They introduced embedding-based methods for projecting frames of one language into another (i.e., English to Russian). Akyürek et al. (2020) studied multilingual transfer learning to detect multiple frames in target languages with few or no annotations. Recently, Huguet Cabot et al. (2020) investigated joint models incorporating metaphor, The proposal, backed by high-tech companies, would raise the limit of so-called H-1B visas granted each year to skilled workers from abroad. The unsupervised module takes as input semantic role embeddings (v a0 , v p , v a1 ) and a sentence embedding (v si ), and learns latent, role-specific embedding matrices (F) in an auto-encoding framework. The latent representations are incorporated into the overall frame classification module. (c) We predict document-level frames based on transformer-based document embeddings (ŷ s ) and the view-specific latent representations (ŷ u ), using cross-entropy loss with the true frame label y.
emotion and political rhetoric within multi-task learning to predict framing of policy issues.
Our modelling approach is inspired by recent advances in learning interpretable latent representations of the participants and relationships in fiction stories. Iyyer et al. (2016) present Relationship Modelling Networks (RMNs), which induce latent descriptors of types of relationships between characters in fiction stories, in an unsupervised way. RMNs combine dictionary learning with deep autoencoders, and are trained to effectively encode text passages as linear combinations over latent descriptors, each of which corresponds to a distinct relationship (not unlike topics in a topic model). Frermann and Szarvas (2017) extend the idea to a multi-view setup, jointly learning multiple dictionaries, which capture properties of individual characters in addition to relationships. We adopt this methodology for modeling news articles through three latent views: capturing their events (predicates), and participants (ARG0, ARG1). We combine the unsupervised autoencoder with a frame classifier into an interpretable, semi-supervised framework for article-level frame prediction.

Semi-supervised Interpretable Frame Classification
In this section, we present our Frame classifier, which is Interpretable and Semi-supervised (FRISS). The full model is visualized in Figure 1. Given a corpus of news articles, some of which have a label indicating their primary frame y (Figure 1(a)), FRISS learns to predictŷ for each document by combining a supervised classification module (Figure 1(c)) and an unsupervised 3 autoencoding module (Figure 1(b)), which are jointly trained. The unsupervised module (i) can be trained with additional unlabeled training data, which improves performance (Section 5.2); and (ii) learns interpretable latent representations which improve the interpretability of the model (Section 5.3). Intuitively, FRISS predicts frames based on aggregated sentence representation (supervised module; Section 3.2) as well as aggregated fine-grained latent representations capturing actors and events in the article (unsupervised module; 3.1). The un-supervised module combines an auto-encoding objective with a multi-view dictionary learning framework (Iyyer et al., 2016;Frermann and Szarvas, 2017). We treat predicates, their ARG0 and ARG1 as three separate views, and learn to map each view to an individual latent space representative of their relation to the overall framing objective. Below, we will sometimes refer to views collectively as z ∈ {p, a 0 , a 1 }. We finally aggregate the viewlevel representations and sentence representations to predict a document-level frame. The following sections describe FRISS in technical detail.

Input
Each input document is sentence-segmented and automatically annotated by an off-the shelf transformer-based semantic role labeling model (Shi and Lin, 2019;Pradhan et al., 2013) to indicate spans over the three semantic roles: predicates, ARG0s and ARG1s.
We compute a contextualized vector representation for each semantic role span (s p , s a 0 , s a 1 ). We describe the process for obtaining predicate input representations v p here for illustration. Contextualized representations for views a 0 (v a 0 ) and a 1 (v a 1 ) are obtained analogously. First, we pass each sentence through a sentence encoder, and obtain the predicate embedding by averaging all contextualized token representations v w (of dimension D w ) in its span s p of length |s p |: We concatenate v p with an overall sentence representation v s , which is computed by averaging all contextualized token embeddings of the sentence s of length |s|, where [;] denotes vector concatenation. If a sentence has more than one predicate, a separate representation is computed for each of them.

Multi-view Frame representations
We combine ideas from auto-encoding (AE) and dictionary learning, as previously used to capture the content of fictitious stories (Iyyer et al., 2016), and its multi-view extension (Frermann and Szarvas, 2017). We posit a latent space as three view-specific dictionaries (Figure 1 (b)) capturing events (predicates; F p ), their first (ARG0; F a 0 ) and second (ARG1; F a 1 ) arguments, respectively. Given a view-specific input as described above, the autoencoder maps it to a low-dimensional distribution over "dictionary terms" (henceforth descriptors), which are learnt during training. The descriptors are vector-valued latent variables that live in word embedding space, and are hence interpretable through their nearest neighbors ( Table 3 shows examples of descriptors inferred by our model).
By jointly learning the descriptors with the supervised classification objective, each descriptor will capture coherent information corresponding to a frame label in our supervised data set. We hence set the number of descriptors for each dictionary to K = 15, the number of frames in our data set. For More technically, our model follows two steps. First, we encode the inputṽ z of a known view z by passing it through a feed forward layer W h of dimensions 2D w × D h , shared across all the views, followed by a ReLU non-lineararity, and then another feed forward layer W z of dimensions D h × K, specific to each view z. This results in a K-dimensional vector over the view-specific descriptors, training phase: .
( 6) We finally reconstruct the view-specific span embedding as

Unsupervised Objective Contrastive Loss
We use the contrastive maxmargin objective function following previous works in dictionary learning (Iyyer et al., 2016;Frermann and Szarvas, 2017;Han et al., 2019). We randomly sample a set of negative samples (N − ) with the same view as the current input from the mini-batch. The unregularized objective J u z (Eq. 8) is a hinge loss that minimizes the L2 norm 5 between the reconstructed embeddingv z and the true input's view-specific embedding v z , while simultaneously maximizing the L2 norm betweenv z and negative samples v n z : where θ represents the model parameters, |N − | is the number of negative samples, and the margin value is set to 1.
Focal Triplet Loss Preliminary studies (Section 5) suggested that some descriptors (aka frames) are more similar to each other than others. We incorporate this intuition through a novel mechanism to move the descriptors that are least involved in the reconstruction proportionally further away from the most involved descriptor. Concretely, we select t descriptors in F z with smallest weights in g z as additional negative samples. We denote the indices of the selected t smallest components in g z We use F t z to denote the matrix (t × D w ) with only those t descriptors. We re-normalize the weights of the selected t descriptors, and denote the renormalized weights vector as g t z = [g i 1 z , g i 2 z , . . . , g it z ]. For each element in g t z , we compute an individual margin based on its magnitude. Intuitively, the smaller the weight is, the larger its required margin from a given total margin budget |M |, We compute the standard margin-based hinge loss over the additional negative samples with sample-specific margins: We sum the focal triplet objective J t z with J u z , and then sum over all specific spans s ∈ S z , while adding an additional orthogonality encouraging regularization term.
where λ is a hyper-parameter that can be tuned. We finally aggregate the loss from all the views:

Supervised Document-level Frame Classification
We incorporate the semantic role level predictions as described above into a document-level frame classifier consisting of two parts, which are jointly learnt with the unsupervised model described above: (i) a classifier based on aggregated span-level representations computed as described in Sec 3.1 (Fig. 1 (c; left); Sec. 3.2.1) and (ii) a classifier based on an aggregated sentence representations ( Fig. 1 (c; right); Sec. 3.2.2).

Span-based Classifier
The unsupervised module makes predictions on the semantic role span level, however, our goal is to predict document-level frame labels. We aggregate span-level representations d z (Eq. 5) by averaging across spans and then views: 6 where Z is the number of the views, and S z are the set of view-specific spans in the current document. We finally pass the logits through a softmax layer to predict a distribution over frames.

Sentence-based Classifier
We separately predict a document-level frame based on the aggregate sentence level representations computed in Eq. (2). We first pass each sentence embedding through a feed forward layer W r of dimensions D w × D w , followed by a ReLU non-linearity, and another feed forward layer W t to map the resulting representation to K dimensions. Then average across sentences of the current document S d and pass the result through a softmax layer,

Full Loss
We jointly train the supervised and unsupervised model components. The supervised loss X(θ) consists of two parts, one for the sentence-based classification and one for the aggregated span-based classification: X(θ) = X(ŷ u , y) + X(ŷ s , y).
The full loss balances the supervised and unsupervised components with a hyper-parameter α:

Experimental Settings
Dataset We follow prior work on automatic prediction of a single, primary frame of a news article as annotated in the Media Frames Corpus (MFC; Card et al. (2015)). The MFC contains a large number of news articles on five contentious policy issues (immigration, smoking, gun control, death penalty, and same-sex marriage), manually annotated with document-and span-level frames labels from a set of 15 general frames (listed in Table  5 in the Appendix). Articles were selected from 13 major U.S. newspapers, published between 1980 and 2012. Following previous work, we focus on the immigration portion of MFC, which comprises 5,933 annotated articles, as well as an additional 41,286 unlabeled articles. The resulting dataset contains all 15 frames. Table 5 (Appendix) lists the corresponding frame distribution. We partition the labeled dataset into 10 folds, preserving the overall frame distribution for each fold.

Pre-processing and Semantic Role labeling
We apply state-of-art BERT-based SRL model (Shi and Lin, 2019) to obtain SRL spans for each sentence. The off-the-shelf model from AllenNLP is trained on OntoNotes5.0 (close to 50% news text). While a domain-adapted model may lead to a small performance gain, the off-the-shelf model enhances generalizability and reproducibility. Qualitative examples of detected SRL spans are shown in Table 4, which confirm that SRL predictions are overall accurate. We extract semantic role spans for predicates, their associated first (ARG0) and second (ARG1) arguments for each sentence in a document. For the unsupervised component, we disregard sentences with no predicate, and sentences missing both ARG0 and ARG1.
Sentence Encoder In all our experiments, we use RoBERTa (Liu et al., 2019b) as our sentence encoder, as previous work (Khanehzar et al., 2019) has shown that it outperforms BERT (Devlin et al., 2019) and XLNet (Yang et al., 2019). We pass each sentence through RoBERTa and retrieve the token-level embeddings. To obtain the sentence embedding, we average the RoBERTa embeddings of all words (Eq. 2). To obtain SRL span embeddings, we average the token embeddings of all words in a predicted span (Eq. 1). Following Gururangan et al. (2020), we pre-train RoBERTa with immigration articles using the masked language model (MLM) objective. Only the labeled data is used for pre-training for fair comparison between FRISS and previous models.

Parameter Settings
We set the maximum sequence length to RoBERTa 64 tokens, the maximum number of sentences per document to 32, and the maximum number of predicates per sentence to 10. 7 We set the number of dictionary terms K = 15, i.e., the number of frame classes in the MFC corpus. Each dictionary term is of dimension D w = 768, equal to the RoBERTa token embedding dimension. We also fix the dimensions of hidden vector w s (Eqn. 15) and D h to this value. We set the number of descriptors in Focal Triplet Loss t = 8 and the margin pool |M | = t. We set the balancing hyper-parameter between the supervised and unsupervised loss α = 0.5 , and λ = 10 −3 . The dropout rate is set to 0.3.
We perform stochastic gradient descent with mini-batches of 8 documents. We use the Adam optimizer (Kingma and Ba, 2015) with the default parameters, except for the learning rate, which we set to 2 × 10 −5 (for the RoBERTa parameters) and 5 × 10 −4 (for all other parameters). We use a linear scheduler for learning rate decay. The weight decay is applied to all parameters except for bias and batch normalization. We update the Gumbel softmax temperature with the schedule: τ = max(0.5, exp(−5 × 10 −4 × iteration), updating the temperature every 50 iterations. For all our experiments, we run a maximum of 10 epochs, evaluate every 50 iterations, and apply early-stopping if the accuracy does not improve for 20 consecutive evaluations.

Evaluation
In this section, we evaluate the performance of FRISS on primary frame prediction for issuespecific news articles against prior work (Sec 5.1), demonstrate the benefit of adding additional unlabeled data to our semi-supervised model (Sec 5.2), and present a qualitative analysis of our model output corroborating its interpretability (Sec 5.3).

Model
Acc.  labels non-uniformly, suggesting that some pairs of frames are perceived to be more similar than others. This observation motivated the Focal Triplet Loss and Gumbel regularization components of our model. In particular, the following groups of frame labels are confused most frequently {"Policy Prescription and Evaluation", "Public Sentiment", "Political"}, {"Fairness", "Legality"}, {"Crime and Punishment", "Security and Defense"}, and {"Morality", "Quality of Life", "Cultural Identity"}. This ovservation is also corroborated through the empirical gain through the focal triplet loss (Table 2).

Experiment 1: Frame Prediction
For the supervised model, we report accuracy, as has been done in previous work, as well as Macro-F1, which is oblivious to class sizes, shedding light on performance across all frames. Table 1  RoBERTa-S corresponds to the sentence-embedding based component of FRISS (Fig 1(b); left) without and with (+MLM) unsupervised pre-training. Overall, we can see that all our model variants outperform previous work in terms of both accuracy and macro-F1. Experiments were run 5 times with 10-fold cross-validation. The results in Table 1 Figure 2: FRISS frame prediction performance with different portions of the 41K unlabeled documents. model components, we performed an ablation study on Focal Triplet Loss, the Gumbel regularization, and the impact of individual views. Table 2 shows that both the focal loss and the Gumbel regularization contribute to model performance. Training FRISS with any single view individually leads to a performance drop, which is most drastic if the two arguments are omitted, suggesting that the model relies on both predicate and argument information, with arguments playing a slightly more important role.

Experiment 2: Benefit of Unlabeled Data
Our semi-supervised model can leverage news articles without a frame label, in addition to a labeled training set. We investigated the impact of training FRISS with different amounts of additional news articles, taken from the unlabeled immigration portion of the MFC. Figure 2 shows the impact of additional unlabeled data on accuracy and F1: Models with access to more unlabelled data tend to result in higher accuracy and Macro F1 scores. Given the abundance of online news articles, this motivates future work on minimally supervised frame prediction, minimizing the reliance on manual labels and maximizing generalizability to new issues, news outlets or languages.

Experiment 3: Qualitative Evaluation
In this experiment, we explore the added interpretability contributed by the local latent frame representations. Table 4 contains two MFC documents, highlighted with the most highly associated frame for each identified span for p, a 0 or a 1 . We can observe that the frame associations (a) are intuitively meaningful; and (b) provide a detailed account of the predicted primary frame. For both documents the gold primary frame is 'Political', the bottom document is classified correctly, whereas the top document is mis-classified as 'Capacity & Resources'. The detailed span-level predictions ARG0 USCIS, state department, agency, federal official Trump, house republican, Obama, democrat, senate supreme court, justice, federal judge, court organizer,activist,protester,demonstrator,marcher PRED process,handle,swamp,accommodate,wait,exceed veto,defeat,vote,win,introduce,endorse,elect sue,uphold,entitle,appeal,shall,violate,file chant,march,protest,rally,wave,gather,organize ARG1 application,foreign worker,visa,applicant amendment,reform,legislation,voter,senate bill political asylum,asylum,lawsuit,suit,status,case rally,marcher,march,protest,movement,crowd  help to explain the model prediction, and in fact add support for the the mis-prediction, suggesting that predicting a single primary document frame may be inappropriate. In the bottom document "a letter" serves as both a 1 of "Republicans sent a letter", where it is predicted as 'Political', and as a 0 of the clause "a letter [...] describing the legal challenges", where it is classified as 'Legality', another example of the nuance of our model predictions, which can support further in-depth study of issue-specific framing.
The potential of our model for fine-grained frame analysis is illustrated in Table 4, which shows how each particular SRL span contributes differently towards various frame categories. It adds a finer-grained framing picture, and estimate of the trustworthiness of model predictions. It allows to assess the main actors wrt. a particular frame (within and across articles), as well as the secondary frames in each article. Also, using SRL makes our model independent of human annotation, and more generalizable. Going beyond "highlighting indicative phrases", our model can distinguish their roles (e.g., the "ICE" as an actor vs. participant in a particular frame). Table 3 shows the semantic role spans, which are most closely related to Capacity & Resources (blue), Political (red), Legality (purple) and Public Sentiment (green) descriptors in the latent space. We can observe that all associated spans are intuitively relevant to the {frame, view}. Furthermore, ARG0 spans tend to correspond to active participants (agents) in the policy process (including politicians and government bodies), whereas ARG1 spans illustrate the affected participants (patients such as foreign workers, applicants), pro-

Frame:
Capacity & Resources Political Legality Public Sentiment BILL ON IMMIGRANT WORKERS a 1 DIES p . Legislation a 0 to allow p nearly twice as many computer-savvy foreigners and other high-skilled immigrants a 1 into the country next year apparently has died p in Congress. The House a 0 passed p the compromise measure a 1 last month, 288-133, but Sen. Tom Harkin, D-Iowa a 0 , had blocked p a vote a 1 when in the Senate. The proposal a 0 ,a 1 , backed p by high-tech companies a 0 , would raise p the limit of so-called H-1B visas a 1 granted p each year to skilled workers from abroad. Only 65,000 visas a 1 are now granted p each year; the bill a 0 would raise p the annual cap a 1 to 115,500 for the next two years and to 107,500 in 2001. The ceiling a 1 would return p to 65,000 in 2002.
The Fix: Immigration all of a sudden a top campaign issue. 1. The Obama administration's decision a 0 to move forward with a legal challenge to Arizona's stringent illegal immigration law will almost certainly elevate p the issue on the campaign trail a 1 this fall. The Arizona measure a 1 , which was signed p into law by Gov. Jan Brewer (R) a 0 in April, is a major political touchstone-of prime importance to Hispanics, the fastest growing p demographic group a 1 in the country and a coveted electoral prize for both parties. Democratic strategists a 0 see p the Arizona law a 1 as a key moment in the ongoing battle to win p the loyalty of Hispanic voters a 1 . They a 0 believe p that it a 1 will have a similar chilling effect for Republicans with Latinos as the passage of California's Proposition 187 did in the 1990s. Republicans a 0 , on the other hand, believe p that Democrats are badly out of step with the American people on the immigration issue a 1 . They a 0 cite p the Obama administration's aggressive approach a 1 to fighting p the Arizona law a 1 is yet more evidence of that out-oftouchness. In that vein, nearly two dozen House Republicans a 0 sent p a letter ←a 1 ,a 0 → to Attorney General Eric Holder on Tuesday describing p the legal challenge a 1 as the "height of irresponsibility and arrogance." Polling p on the Arizona law a 1 specifically falls p in Republicans' favor, although broader data a 0 suggests p a public a 1 deeply divided p on immigration. In the latest Washington Post/ABC poll, 58 percent a 0 expressed p support for the Arizona law a 1 -including p 42 percent who were strongly supportive a 1 -while 41 percent a 0 opposed p it a 1 . Table 4: Two articles from the MFC, annotated with SRL span-level frame predictions generated by FRISS. The true frame label of both articles is Political (red). Each detected span (p, a 0 or a 1 ) has been highlighted with its most closely associated frame. Darker shades indicate higher confidence. The top document is mis-classified as "Capacity & Resources" (blue), the bottom document is classified correctly. cesses (reforms, cases, movements), or concepts under debate (political asylum). In future work, we aim to leverage these representations in scalable, in-depth analyses of issue-specific media framing. A full table illustrating the learnt descriptors for all 15 frames in the MFC and all three views is included in Table 6 in Appendix.

Conclusion
We presented FRISS, an interpretable model of media frame prediction, incorporating notions of emphasis framing (selective highlighting of issue aspects) and story framing (drawing on the events and actors described in an article). Our semisupervised model predicts article-level frame of news articles, leveraging local predicate and argument level embeddings. We demonstrated its three-fold advantage: first, our model empirically outperforms existing models for frame classification; second, it can effectively leverage additional unlabeled data further improving performance; and, finally, its latent representations add transparency to classifier predictions and provide a nuanced article representation. The analyses provided by our model can support downstream applications such as automatic, yet transparent, highlighting of re-porting patterns across countries or news outlets; or frame-guided summarization which can support both frame-balanced or frame-specific news summaries. In future work, we plan to extend our work to more diverse news outlets and policy issues, and explore richer latent models of article content, including graph representations over all involved events and actors.

Frame
Frame description % IMM External Regulation and Reputation: international reputation or foreign policy of the U.S.

2.2%
Other: any coherent group of frames not covered by the above categories 0.2% Table 5: Framing dimensions from (Boydstun et al., 2013). The final column (% IMM) denotes the frame prevalence in the Immigration portion of the MFC used in the experiments reported in this paper.