Multilingual Connotation Frames: A Case Study on Social Media for Targeted Sentiment Analysis and Forecast

People around the globe respond to major real world events through social media. To study targeted public sentiments across many languages and geographic locations, we introduce multilingual connotation frames: an extension from English connotation frames of Rashkin et al. (2016) with 10 additional European languages, focusing on the implied sentiments among event participants engaged in a frame. As a case study, we present large scale analysis on targeted public sentiments toward salient events and entities using 1.2 million multilingual connotation frames extracted from Twitter.


Introduction
People around the globe use social media to express their reflections and opinions on major real world events (Atefeh and Khreich, 2015;Radinsky and Horvitz, 2013). In order to facilitate multilingual public sentiment tracking on social media, we introduce multilingual connotation frames, 1 a multilingual extension from English connotation frames of Rashkin et al. (2016) with 10 additional European languages, including low-resource languages such as Polish, Finnish, and Russian.
Definition 1.1. Connotation Frames: A framework for encoding predicate-specific connotative relationships implied by a predicate towards its arguments. Figure 1 shows a selected subset of the connotation frames that is relevant in our study. See Rashkin et al. (2016) for the full description of the connotation frames.  Figure 1: The connotation frame of "survive" with respect to directional sentiments among "writer", "agent", "theme", and "reader". The tweet examples show automatically induced multilingual connotation frames.
There are two important benefits to develop multilingual connotation frames. First, they serve as a unique lexical resource to enable targeted sentiment analysis, which rarely exists for most languages.
Definition 1.2. Targeted Sentiment: A sentiment label indicating how a source entity feels about a target entity.
In the example shown in Figure 1, "teenager survived Boston Marathon bombing", the connotation frame allows us to correctly interpret (implied) targeted sentiments including: 1. sentiment(teenager → bombing) = -2. sentiment(writer → bombing) = -3. sentiment(writer → teenager) = + Second, they allow us to study a broad spectrum of sentiments including nuanced ones; in the ex- amples discussed above, connotation frames allow us to infer (1) the likely sentiment among the event participants (e.g., a surviving teenager is likely to be negative toward the Boston bombing), and (2) the likely sentiment of the author towards events and entities (e.g., the writer is likely to be sympathetic toward the teenager while negative toward the incident), even though none of these sentiment implications are overtly expressed.
To validate the empirical utility of the new multilingual connotation lexicon, we present a successful case study of large scale connotation analysis (Section 4.1) and forecast (Section 4.2) based on connotation frames extracted from 1.2 million tweets in 10 different European languages spanning over a 15 day period.

Multilingual Twitter Dataset
We obtained multilingual geo-located tweets spanning Mar 15 -Mar 29, 2016. This 15 day duration covers Brussels attacks on Mar 22 as well as one whole week before and after, allowing us to study the public sentiment dynamics in response to a major terrorist event. We focus on tweets that are likely to be about "news-worthy" topics by selecting tweets that came from trusted sources such as twitter-verified accounts or known news accounts, or contained hashtags #breaking or #news. 2 We used SyntaxNet dependency parser (Andor et al., 2016) and trained additional Syn-taxNet models for 10 non-English languages using Universal Dependencies annotations. 3 We extracted 1.2 million agent-verb-theme tuples as listed in Table 1 Figure 2: Diagram of LSTM model for predicting the distribution of perspectives from a location (e.g., UK) towards an entity (e.g., Brussels) on a given day (e.g., March 25), based on the previous days.

Multilingual Connotation Frames
We perform context-based projection of English connotation frames to 10 additional European languages using large parallel corpora. Since connotation of a word arises from the context in which the word is used, we want to ensure the translated connotation frames are used in similar contexts. We use existing parallel corpora with automatic word-alignment: the Opus Corpus (Tiedemann, 2012) using Multi-UN parallel data (Eisele and Chen, 2010) for Russian and EuroParl parallel data (Koehn, 2005) for all other languages. More concretely, for each non-English verb, v (e.g., assassiner in French), we compute the probability of it being translated to English verb v by counting the alignments.
We then define the connotation frame of v , F(v ), by transferring the connotation frame of the English verb v * , F(v * ), that has the highest translation probability: For example, the connotation frame for assassiner is propagated from murder, the English word that it is aligned with the most.

Extracting Targeted Sentiments
Using the connotation frame lexicon, we compute the distribution of targeted sentiments towards most-frequently-mentioned named entities. We also compute sentiments expressed by each  country by aggregating all sentiments of the writers located in that country (e.g., the distribution of positive, neutral, and negative perspectives expressed towards Obama in British tweets). The aggregated polarities can be represented as a 3dimensional probability vector, p = [p + p = p − ], as will be used in the sentiment forecast task below. For other analysis, we summarize this polarity distribution as a scalar score by taking the expected value of the polarity: E[p] = p + − p − .

Forecasting Sentiment Dynamics
We also study forecasting sentiment dynamics: predicting the sentiment distribution of the next day given the sentiment trend of previous days. For this task, we track the distribution of directional sentiments from each country towards the hundred most-frequently-mentioned named entities. At test-time, each model is given directional sentiment distributions for the previous 4 days as input and predicts tomorrow's distribution (e.g., forecasting 1 day ahead). We also train models for predicting the distribution half a week later (forecasting 4 days ahead). We performed an additional experiment for English (EN J ) where the perspectives of all countries are aggregated together in order to predict the global perspective. For all experiments, we use 10-fold cross-validation and measure the symmetric Kullback-Leibler (KL) divergence between the true distribution and the predicted one.
We experiment with Long Short-term Memory models (LSTMs) (Hochreiter and Schmidhuber, 1997) to integrate the dynamic contextual information from the past, as depicted in 2. Hidden dimension is 16, and we use ADAM optimization with KL divergence as the objective. For implementation, we use Keras 4 on top of Theano. 5 Baselines We use two baselines. The first is MEAN, the average distribution seen in the training data. The second are SVMs with linear kernels, which worked well in predicting influenza activity in a similar set-up (Santillana et al., 2015). For the baselines, we encode the distributions from the 4 previous days as a flattened 12-dim. vector, and each portion of the distribution is predicted separately.

Connotation Analysis
For the most frequently mentioned named entities, we compute heatmaps of the expected perspective being expressed towards that entity.
In Figure 3A, we use the English tweets from European countries to plot the change in connotative polarity towards these entities over the course of the 15 day period. Generally, the changes in polarity from day-to-day seem to be gradual and frequently are similar to the day before. There are a couple of exceptions e.g., the polarity towards Brussels changes abruptly on March 22 (the day of the Brussels attacks), reflecting the change in tone of all tweets related to Brussels at that time.
Overall, there are mostly positive polarities expressed. This may reflect people's tendency to avoid phrasing stories too harshly, choosing to be more euphemistic even when they discuss bad news.
In Figure 3B, we aggregated the polarities of these tweets by country of origin. While most of the polarities are positive-strongly positive, the tweets about Brussels and Belgium are more neutral or even slightly negative.
Lastly, in Figure 3C, we used all of the tweets from European countries to aggregate expected polarities in 11 different languages. Non-English languages show a much higher tendency towards positive scores, particularly the languages with less tweets (Polish, Finnish, Swedish  As a more detailed analysis, Figure 4 shows a heatmap of how the connotation expressed towards Obama shifts over time across different languages. Obama was not discussed much in Finnish or Swedish, whereas he was discussed everyday in English, Spanish, and Russian. In the middle of the two week period, the perspective towards Obama drops slightly, most notably in Spanish, which overlaps with his controversial trip to Cuba (March 20 -22).

Sentiment Dynamics
In Table 2, we summarize the results of our experiments for predicting targetted sentiment dynamics. For each language, we report the average Kullback-Leibler divergence for the baselines and the LSTM model (higher scores are worse). We show prediction results in two settings: predicting  the distribution one day ahead vs. four days ahead. The LSTM outperforms the baselines in most languages with a few exceptions, such as Portuguese. All models perform worse at forecasting 4 days into the future than one day ahead, demonstrating how much connotation can vary over time as news events change, even in a small time period. On average, the LSTM achieves KL-divergence of 1.7 when predicting one day ahead and 3.26 when predicting 4 days ahead, lower than any of the baselines.

Error Analysis
For error analysis, we removed entities from Figure 3 from the training data and used them as a small test set for an LSTM trained on the remaining data in English with aggregation over all countries. In Figure 5, we have plotted the predicted marginal probabilities of four entities with the positive portion of the distribution (blue line) on the top half of the y-axis and the negative portion (red line) flipped onto the negative half of the axis.
The LSTM follows the general shape of the true curves, but frequently misses sudden spikes (e.g., the spike in negative polarity towards Russia on March 27th). In Table 3, we also report the KL divergences on predictions towards these entities. The model tends to perform less well at predicting sentiment towards entities where there were sudden spikes in sentiment based on news stories.

Related Work
There have been substantial studies for sentiment analysis on twitter (Agarwal et al., 2011;Kouloumpis et al., 2011;Pak and Paroubek, 2010;    Liu and Zhang, 2012), as well as targetted sentiment (Deng and Wiebe, 2015), implicit sentiment (Deng and Wiebe, 2014;Feng et al., 2013;Greene and Resnik, 2009) and specific aspects of subjective language (Mohammad and Turney, 2010; Choi and Wiebe, 2014) in other domains. Previous investigations include using targetted sentiment to predict international relations (Chambers et al., 2015), analyzing stylistic elements to predict tweet popularity (Tan et al., 2014), and exploring the re-phrasing of social media posts referencing specific news articles (Tan et al., 2016). Compared to most prior studies that focused on overt sentiment in English-only tweets, our work aims to study targeted implied sentiments across temporal, spatial, and linguistic borders.
Some work (Tsytsarau et al., 2014;O'Connor et al., 2010;De et al., 2016) has analyzed the transition of overt sentiment over a period of time and related the shifts in sentiment to news events. A body of work has also used predictive signals in Twitter to track and sense upcoming unrest and protests in specific countries (Ramakrishnan et al., 2014;Goode et al., 2015), and the future progression of flu activity based on multiple text sources (Santillana et al., 2015). In contrast, we focus on predicting the sentiment dynamics in social media based on previous trends.

Conclusions
When reporting news, people write with their own implicit and explicit biases and judgments. An author's choice of language reveals connotations towards entities, which can be captured within the connotation frames that we have extended to 10 European languages.
This work is one of the first to present a large scale analysis of multilingual connotation dynamics, and helps explore multiple perspectives on diverse issues across languages, time and countries -a critical piece in understanding journalistic portrayal and biases.