Connotation Frames: A Data-Driven Investigation

Through a particular choice of a predicate (e.g., “ x violated y ”), a writer can subtly connote a range of implied sentiments and presupposed facts about the entities x and y : (1) writer’s perspective : projecting x as an “antagonist” and y as a “victim”, (2) entities’ perspective : y probably dislikes x , (3) effect : something bad happened to y , (4) value : y is something valuable, and (5) mental state : y is distressed by the event. We introduce connotation frames as a representation formalism to organize these rich dimensions of connotation using typed relations. First, we investigate the feasibility of obtaining connotative labels through crowdsourcing experiments. We then present models for predicting the con-notation frames of verb predicates based on their distributional word representations and the interplay between different types of connotative relations. Empirical results conﬁrm that connotation frames can be induced from various data sources that reﬂect


Introduction
People commonly express their opinions through subtle and nuanced language (Thomas et al., 2006;Somasundaran and Wiebe, 2010). Often, through seemingly objective statements, the writer can influence the readers' judgments toward an event and their participants. Even by choosing a particular predicate, the writer can indicate rich connotative information about the entities that interact through the predicate. More specifically, through a simple  statement such as "x violated y", the writer can convey: (1) writer's perspective: the writer is projecting x as an "antagonist" and y as a "victim", eliciting negative perspective from readers toward x (i.e., blaming x) and positive perspective toward y (i.e., sympathetic or supportive toward y).
(2) entities' perspective: y most likely feels negatively toward x as a result of being violated.
(3) effect: something bad happened to y.
(4) value: y is something valuable, since it does not make sense to violate something worthless. In other words, the writer is presupposing a positive value of y as a fact. A hearing is scheduled to make a decision on whether to uphold the clinic's suspension. Table 1: Example typed relations (perspective P(x → y), effect E(x), value V(x), and mental state S(x)). Not all typed relations are shown due to space constraints. The example sentences demonstrate the usage of the predicates in left [L] or right [R] leaning news sources.

R
Even though the writer might not explicitly state any of the interpretation [1-5] above, the readers will be able interpret these intentions as a part of their comprehension. In this paper, we present an empirical study of how to represent and induce the connotative interpretations that can be drawn from a verb predicate, as illustrated above.
We introduce connotation frames as a representation framework to organize the rich dimensions of the implied sentiment and presupposed facts. Figure 1 shows an example of a connotation frame for the predicate violate. We define four different typed relations: P(x → y) for perspective of x towards y, E(x) for effect on x, V(x) for value of x, and S(x) for mental state of x. These relationships can all be either positive (+), neutral (=), or negative (-).
Our work is the first study to investigate frames as a representation formalism for connotative meanings. This contrasts with previous computational studies and resource development for frame semantics, where the primary focus was almost exclusively on denotational meanings of language (Baker et al., 1998;Palmer et al., 2005). Our formalism draws inspirations from the earlier work of frame semantics, however, in that we investigate the connection between a word and the related world knowledge associated with the word (Fillmore, 1976), which is essential for the readers to interpret many layers of the implied sentiment and presupposed value judgments.
We also build upon the extensive amount of literature in sentiment analysis (Pang and Lee, 2008;Liu and Zhang, 2012), especially the recent emerging efforts on implied sentiment analysis (Feng et al., 2013;Greene and Resnik, 2009), entityentity sentiment inference , assuming it is an entity that can have a mental state. opinion role induction (Wiegand and Ruppenhofer, 2015) and effect analysis (Choi and Wiebe, 2014). However, our work is the first to organize various aspects of the connotative information into coherent frames.
More concretely, our contributions are threefold: (1) a new formalism, model, and annotated dataset for studying connotation frames from large-scale natural language data and statistics, (2) new datadriven insights into the dynamics among different typed relations within each frame, and (3) an analytic study showing the potential use of connotation frames for analyzing subtle biases in journalism.
The rest of the paper is organized as follows: in §2, we provide the definitions and data-driven insights for connotation frames. In §3, we introduce models for inducing the connotation frames, followed by empirical results, annotation studies, and analysis on news media in §4. We discuss related work in §5 and conclude in §6.

Connotation Frame
Given a predicate v, we define a connotation frame F(v) as a collection of typed relations and their polarity assignments: (i) perspective P v (a i → a j ): a directed sentiment from the entity a i to the entity a j , (ii) value V v (a i ): whether a i is presupposed to be valuable, (iii) effect E v (a i ): whether the event denoted by the predicate v is good or bad for the entity a i , and (iv) mental state S v (a i ): the likely mental state of the entity a i as a result of the event. We assume that each typed relation can have one of the three connotative polarities ∈ {+, −, =}, i.e., positive, negative, or neutral. Our goal in this paper is to focus on the general connotation of the predicate considered out of context. We leave contextual interpretation of connotation as future work.  Table 2: Media Bias in Connotation Frames: Obama, for example, is portrayed as someone who attacks or criticizes others by the right-leaning sources, whereas the left-leaning sources portray Obama as the victim of harsh acts like "attack" and "criticize".
relations for the verbs suffer, guard, and uphold, along with example sentences. For instance, for the verb suffer, the writer is likely to have a positive perspective towards the agent (e.g., being supportive or sympathetic toward the "17-year-old girl" in the example shown on the right) and a negative perspective towards the theme (e.g., being negative towards 'botched abortion").

Data-driven Motivation
Since the meaning of language is ultimately contextual, the exact connotation will vary depending on the context of each utterance. Nonetheless, there still are common shifts or biases in the connotative polarities, as we found from two data-driven analyses. First, we looked at words from the Subjectivity Lexicon  that are used in the argument positions of a small selection of predicates in Google Syntactic N-grams (Goldberg and Orwant, 2013). For this analysis, we assumed that the word in the subject position is the agent while the object is the theme. We found 64% of the words in the agent position of suffer are positive, and 94% of the words in the theme position are negative, which is consistent with the polarities of the writer's perspective towards these arguments, as shown in Table 1. For guard, 57% of the subjects and 76% of the objects are positive, and in the case of uphold, 56% of the subjects and 72% of the objects are positive.
We also investigated how media bias can potentially be analyzed through connotation frames. From the Stream Corpus 2014 dataset (KBA, 2014), we selected all articles from news outlets with known political biases, 2 and compared how they use polarised words such as "accuse", "attack", and "criticize" differently in light of P(w → agent) and P(w → theme) relations of the connotation frames. Table 2 shows interesting contrasts. Obama, for example, is portrayed as someone who attacks or criticizes others according to the rightleaning sources, whereas the left-leaning sources portray Obama as the victim of harsh acts like "attack" or "criticize". 3 Furthermore, by knowing the perspective relationships P(w → a i ) associated with a predicate, we can make predictions about how the left-leaning and right-leaning sources feel about specific people or issues. For example, because left-leaning sources frequently use McCain, Trump, and Limbaugh in the subject position of attack, we might predict that these sources have a negative sentiment towards these entities.

Dynamics between Typed Relations
Given a predicate, the polarity assignments of typed relations are interdependent. For example, if the writer feels positively towards the agent but negatively towards the theme, then it is likely that the agent and the theme do not feel positively towards each other. This insight is related to that of , but differs in that the polarities are predicate-specific and do not rely on knowledge of prior sentiment towards the arguments. This and other possible interdependencies are summarized in Table 3. These interdependencies serve as general guidelines of what properties we expect to depend on one another, especially in the case where the polarities are non-neutral. We will promote these internal consistencies in our factor graph model ( §3) as soft constraints.
There also exist other interdependencies that we will use to simplify our task. First, the directed Perspective Triad: If A is positive towards B, and B is positive towards C, then we expect A is also positive towards C. Similar dynamics hold for the negative case.
Perspective -Effect: If a predicate has a positive effect on the Subject, then we expect that the interaction between the Subject and Object was positive. Similar dynamics hold for the negative case and for other perspective relations. Ea 1 = Pa 2 →a 1 Perspective -Value: If A is presupposed as valuable, then we expect that the writer also views A positively. Similar dynamics hold for the negative case.
Effect -Mental State: If the predicate has a positive effect on A, then we expect that A will gain a positive mental state. Similar dynamics hold for the negative case. Sa 1 = Ea 1 Table 3: Potential Dynamics among Typed Relations: we propose models that parameterize these dynamics using log-linear models (frame-level model in §3).
sentiments between the agent and the theme are likely to be reciprocal, or at least do not directly conflict with + and − simultaneously. Therefore, we assume that P(a 1 → a 2 ) = P(a 2 → a 1 ) = P(a 1 ↔ a 2 ), and we only measure for these binary relationships going in one direction. In addition, we assume the predicted 4 perspective from the reader r to an argument P(r → a) is likely to be the same as the implied perspective from the writer w to the same argument P(w → a). So, we only try to learn the perspective of the writer. Lifting these assumptions will be future work.
For simplicity, our model only explores the polarities involving the agent and the theme roles. We will assume that these roles are correlated to the subject and object positions, and henceforth refer to them as the "Subject" and "Object" of the event.

Modeling Connotation Frames
Our task is essentially that of lexicon induction (Akkaya et al., 2009;Feng et al., 2013) in that we want to induce the connotation frames of previously unseen verbs. For each predicate, we infer a connotation frame composed of 9 relationship aspects that represent: We propose two models: an aspect-level model that makes the prediction for each typed relation independently based on the distributional representation of the context in which the predicate appears ( §3.1), and a frame-level model that makes the pre-4 Surely different readers can and will form varying opinions after reading the same text. Here we concern with the most likely perspective of the general audience, as a result of reading the text.

Perspective of Writer towards Subject
Effect on Subject

Value of Subject
Mental State of Subject Figure 2: A factor graph for predicting the polarities of the typed relations that define a connotation frame for a given verb predicate. The factor graph also includes unary factors (ψ emb ), which we left out for brevity.
diction over the connotation frame collectively in consideration the dynamics between typed relations ( §3.2).

Aspect-Level
Our aspect-level model predicts labels for each of these typed relations separately. As input, we use the 300-dimensional dependency-based word embeddings from Levy and Goldberg (2014). For each aspect, there is a separate MaxEnt (maximum entropy) classifier used to predict the label of that aspect on a given word-embedding, which is treated as a 300 dimensional input vector to the classifier. The MaxEnt classifiers learn their weights using LBFGS on the training data examples with re-weighting of samples to maximize for the best average F1 score.

Frame-Level
Next we present a factor graph model (Figure 2) of the connotation frames that parameterize the dynamics between typed relations. Specifically, for each verb predicate, 5 the factor graph contains 9 nodes representing the different aspects of the connotation frame. All these variables take polarity values from the set {−, =, +}. We define Y i := {P wo , P ws , P so , E o , E s , V o , V s , S o , S s } as the set of relational aspects for the i th verb. The factor graph for Y i , is illustrated in Figure 2, and we will describe the factor potentials in more detail in the rest of this section. The probability of an assignment of polarities to the nodes in Y i is: Embedding Factors We include unary factors on all nodes to represent the results of the aspect-level classifier. Incorporating this knowledge as factors, as opposed to fixing the variables as observed, affords us the flexibility of representing noise in the labels as soft evidence. The potential function ψ emb is a log-linear function of a feature vector f, which is a one-hot feature vector representing the polarity of a node (+,−,or =). For example, with the node representing the value of the object (V o ): The potential ψ emb is defined similarly for the other 8 remaining nodes. All weights were learned using stochastic gradient descent (SGD) over training data.

Interdependency Factors
We include interdependency factors to promote the properties defined by the dynamics between relations ( §2.2). The potentials for Perspective Triad, Perspective-Value, Perspective-Effect, and Effect-State Relationships (ψ PT , ψ PV , ψ PE , ψ ES respectively) are all defined using log-linear functions of one-hot feature vectors that encode the combination of polarities of the neighboring nodes. The potential for ψ PT is therefore: ψ PT (P wo , P ws , P so ) = e w P T ·f (Pwo,Pws,Pso) 5 We consider only verb predicates here.
And we define the potentials for ψ PV , ψ PE , and ψ ES for subject nodes as: ψ PV (P ws , V s ) = e w P V,s ·f (Pws,Vs) ψ PE (P so , E s ) = e w P E,s ·f (Pso,Es) ψ ES (E s , S s ) = e w ES,s ·f (Es,Ss) and we define the potentials for the object nodes similarly. As with the unary seed factors, weights were learned using SGD over training data. Belief Propagation We use belief propagation to induce the connotation frames of previously unseen verbs. In the belief propagation algorithm, messages are iteratively passed between the nodes to their neighboring factors and vice versa. Each message µ, containing a scalar for each value x ∈ {−, 0, +}, is defined from each node v to a neighboring factor a as follows: and from each factor a to a neighboring node v as: At the conclusion of message passing, the probability of a specific polarity associated with node v being equal to x is proportional to a∈N (v) µ a→v (x). Our factor graph does not contain any loops, so we are able to perform exact inference.

Experiments
We first describe crowd-sourced annotations ( §4.1), then present the empirical results of predicting connotation frames ( §4.2), and conclude with qualitative analysis of a large corpus ( §4.3).

Data and Crowdsourcing
In order to understand how humans interpret connotation frames, we designed an Amazon Mechanical Turk (AMT) annotation study. We gathered a set of transitive verbs commonly used in the New York Times corpus (Sandhaus, 2008), selecting the 2400 verbs that are used more than 200 times in the corpus. Of these, AMT workers annotated the 1000 most frequently used verbs. Annotation Design In a pilot annotation experiment, we found that annotators have difficulty thinking about subtle connotative polarities when shown predicates without any context. Therefore, we designed the AMT task to provide a generic context as follows. We first split each verb predicate into 5 separate tasks that each gave workers a different generic sentence using the verb. To create generic sentences, we used Google Syntactic N-grams (Goldberg and Orwant, 2013) to come up with a frequently seen Subject-Verb-Object tuple which served as a simple three-word sentence with generic arguments. For each of the 5 sentences, we asked 3 annotators to answer questions like "How do you think the Subject feels about the event described in this sentence?" In total, each verb has 15 annotations aggregated over 5 different generic sentences containing the verb.
In order to help the annotators, some of the questions also allowed annotators to choose sentiment using additional classes for "positive or neutral" or "negative or neutral" for when they were less confident but still felt like a sentiment might exist. When taking inter-annotator agreement, we count "positive or neutral" as agreeing with either "positive" or "neutral" classes. Annotator agreement Table 4 shows agreements and data statistics. The non-conflicting (NC) agreement only counts opposite polarities as disagreement. 6 From this study, we can see that non-expert annotators are able to see these sort of relationships based on their understanding of how language is used. From the NC agreement, we see that annotators do not frequently choose completely opposite polarities, indicating that even when they disagree, their disagreements are based on the degree of connotations rather than the polarity itself. The average Krippendorff alpha for all of the questions posed to the workers is 0.25, indicating stronger than random agreement. Considering the subtlety of the implicit sentiments that we are asking them to annotate, it is reasonable that some annotators will pick up on more nuances than others. Overall, the percent agreement is encouraging that the connotative relationships are visible to human annotators.

Aggregating Annotations
We aggregated over crowdsourced labels (fifteen annotations per verb) to create a polarity label for each aspect of a verb. 7 Final distributions of the aggregated labels are 6 Annotators were asked yes/no questions related to Value, so this does not have a corresponding NC agreement score. 7 We take the average to obtain scalar value between [−1., 1.] for each aspect of a verb's connotation frame. For simplicity, we cutoff the ranges of negative, neutral and positive polarities as [−1, −0.25), [−0.25, 0.25  inter-annotator agreement. The strict agreement counts agreement over 3 classes ("positive or neutral" was counted as agreeing with either + or neutral), while non-conflicting (NC) agreement also allows agreements between neutral and -/+ (no direct conflicts). Distribution shows the final class distribution of -/+ labels created by averaging annotations.
included in the right-hand columns of Table 4. Notably, the distributions are skewed toward positive and neutral labels. The most skewed connotation frame aspect is the value V(x) which tends to be positive, especially for the subject argument. This makes some intuitive sense since, as the subject actively causes the predicate event to occur, they most likely have some intrinsic potential to be valuable. An example of a verb where the subject was labelled as not valuable is "contaminate". In the most generic case, the writer is using contaminate to frame the subject as being worthless (and even harmful) with regards to the other event participants. For example, in the sentence "his touch contaminated the food," it is clear that the writer considers "his touch" to be of negative value in the context of how it impacts the rest of the event.

Connotation Frame Prediction
Using our crowdsourced labels, we randomly divided the annotated verbs into training, dev, and held-out test sets of equal size (300 verbs each). For evaluation we measured average accuracy and F1 score over the 9 different Connotation Frame relationship types for which we have annotations:

E(s), S(o), and S(s).
Baselines To show the non-trivial challenge of learning Connotation Frames, we include a simple majority-class baselines. The MAJORITY classifier assigns each of the 9 relationships the label of the majority of that relationship type found in the training data. Some of these relationships (in particular, the Value of subject/object) have skewed distributions, so we expect this classifier to achieve a much higher accuracy than random but a much lower overall F1 score.
Additionally, we add a GRAPH PROP baseline that is comparable to algorithms like graph propagation or label propagation which are often used for (sentiment) lexicon induction. We use a factor graph with nodes representing the polarity of each typed relation for each verb. Binary factors connect nodes representing a particular type of relation for two similar verbs (e.g. P(w → o) for verbs persuade and convince). These binary factors have hand-tuned potentials that are proportional to the cosine similarity of the verbs' embeddings, encouraging similar verbs to have the same polarity for the various relational aspects. We use words in the training data as the seed set and use loopy belief propagation to propagate polarities from known nodes to the unknown relationships.
Finally, we use a 3-NEAREST NEIGHBOR baseline that labels relationships for a verb based on the predicate's 300-dimensional word embedding representation, using the same embeddings as in our aspect-level. 3-NEAREST NEIGHBOR labels each verb using the polarities of the three closest verbs found in the training set. The most similar verbs are determined using the cosine similarity between word embeddings.
Results As shown in Table 5, aspect-level and frame-level models consistently outperform all three baselines -MAJORITY, 3-NN, GRAPH PROP in the development set across the different types of relationships. In particular, the improved F1 scores show that these models are able to perform better across all three classes of labels even in the most skewed cases. The frame-level model also frequently improves the F1 scores of the labels from what they were in the aspect-level model. The summarized comparison of the classifiers' performance test set is shown in Table 6. As with the development set, aspect-level and frame-level are both able to outperform the baselines. Furthermore, the frame-level formulation is able to make improvement over the results of the aspectlevel classification, indicating that the modelling of inter-dependencies between relationships did help correct some of the mistakes made.
One point of interest about the frame-level results is whether the learned weights over the consistency factors match our initial intuitions about interdependencies between relationships. The weights learned in our algorithm do tell us something interesting about the degree to which these interdependencies are actually found in our data. We show the heat maps for some of the learned weights in Figure 3. In 3a, we show the weights of one of the embedding factors, and how the polarities are more strongly weighted when they match the relation-level output. In the rest of the figure, we show the weights for the other perspective relationships when P(w → o) is negative (3b), neutral (3c), and positive (3d), respectively. Based on the expected interdependencies, when P(w → o) : −, the model should favor P(w → s) = P(s → o) and when P(w → o) : +, the model should favor P(w → s) = P(s → o). Our model does, in fact, learn a similar trend, with slightly higher weights along these two diagonals in the maps 3b and 3d. Interestingly, when P(w → o) is neutral, weights slightly prefer for the other two perspectives to resemble one another, but with highest weights being when other perspectives are also neutral.

Analysis of a Large News Corpus
Using the connotation frame, we present measured implied sentiment in online journalism.
Data From the Stream Corpus (KBA, 2014), we select 70 million news articles. We extract subject-verb-object relations for this subset using the direct dependencies between noun phrases    Figure 4: Average sentiment of Democrats and Republicans (as subjects) to selected nouns (as their objects), aggregated over a large corpus using the learned lexicon ( §4.2). The line indicates identical sentiments, i.e. Republicans are more positive towards the nouns that are above the line. and verbs as identified by the BBN Serif system, obtaining 1.2 billion unique tuples of the form (url,subject,verb,object,count).We also extracted subject-verb-object tuples from news articles found in the Annotated English Gigaword Corpus (Napoles et al., 2012), which contains nearly 10 million articles. From the Gigaword corpus we extracted a further 120 million unique tuples.
Estimating Entity Polarities Using connotation frames, we can also measure entity-to-entity sentiment at a large scale. Figure 4, for example, presents the polarity of entities "Democrats" and "Republicans" towards a selected set of nouns, by computing the average estimated polarity (using our lexicon) over triples where one of these entities appears as part of the subject (e.g. "Democrats" or "Republican party"). Apart from nouns that both entities are positive ("business", "constitution") or negative ("the allegations","veto threat") towards, we can also see interesting examples in which Democrats feel more positively (below the line: "nancy pelosi", "unions", "gun control", etc.) and ones where Republicans are more positive ("the pipeline", "gop leaders", "budget cuts", etc.) Also, both entities are neutral towards "idea" and "the proposal", which probably owes to the fact that ideas or proposals can be good or bad for either entity depending on the context.

Related Work
Most prior work on sentiment lexicons focused on the overall polarity of words without taking into account their semantic arguments Baccianella et al., 2010;Velikovich et al., 2010;Kaji and Kitsuregawa, 2007;Kamps et al., 2004;Takamura et al., 2005;Adreevskaia and Bergler, 2006). Several recent studies began exploring more specific and nuanced aspects of sentiment such as connotation (Feng et al., 2013), good and bad effects (Choi and Wiebe, 2014), and evoked sentiment (Mohammad and Turney, 2010). Drawing inspirations from them, we present connotation frames as a unifying representation framework to encode the rich dimensions of implied sentiment, presupposed value judgements, and effect evaluation, and propose a factor graph formulation that captures the interplay among different types of connotation relations. Goyal et al. (2010a;2010b) investigated how characters (protagonists, villains, victims) in children's stories are affected by certain predicates, which is related to the effect relations studied in this work. While Klenner et al. (2014) similarly investigated the relation between the polarity of the verbs and arguments, our work introduces new perspective types and proposes a unified representation and inference model. Wiegand and Ruppenhofer (2015) also looked at perspective-based relationships induced by verb predicates with a focus on opinion roles. Building on this concept, our framework also incorporates information about the perspectives' polarities as well as information about other typed relations. There have been growing interests for modeling framing (Greene and Resnik, 2009;Hasan and Ng, 2013), biased language (Recasens et al., 2013) and ideology detection (Yano et al., 2010). All these tasks are relatively less studied, and we hope our connotation frame lexicon will be useful for them. Sentiment inference rules have been explored by the recent work of  and . In contrast, we make a novel conceptual connection between inferred sentiments and frame semantics, organized as connotation frames, and present a unified model that integrates different aspects of the connotation frames. Finally, in a broader sense, what we study as connotation frames draws a connection to schema and script theory (Schank and Abelson, 1975). Unlike most prior work that focused on directly observable actions (Chambers and Jurafsky, 2009;Frermann et al., 2014;Bethard et al., 2008), we focus on implied sentiments that are framed by predicate verbs.

Conclusion
In this paper, we presented a novel system of connotative frames that define a set of implied sentiment and presupposed facts for a predicate. Our work also empirically explores different methods of inducing and modelling these connotation frames, incorporating the interplay between relations within frames. Our work suggests new research avenues on learning connotation frames, and their applications to deeper understanding of social and political discourse. All the learned connotation frames and annotations will be shared at http://homes.cs.washington. edu/˜hrashkin/connframe.html.