Measuring Issue Ownership using Word Embeddings

Sentiment and topic analysis are common methods used for social media monitoring. Essentially, these methods answers questions such as,"what is being talked about, regarding X", and"what do people feel, regarding X". In this paper, we investigate another venue for social media monitoring, namely issue ownership and agenda setting, which are concepts from political science that have been used to explain voter choice and electoral outcomes. We argue that issue alignment and agenda setting can be seen as a kind of semantic source similarity of the kind"how similar is source A to issue owner P, when talking about issue X", and as such can be measured using word/document embedding techniques. We present work in progress towards measuring that kind of conditioned similarity, and introduce a new notion of similarity for predictive embeddings. We then test this method by measuring the similarity between politically aligned media and political parties, conditioned on bloc-specific issues.


Introduction
Social Media Monitoring (SMM; i.e. monitoring of online discussions in social media) has become an established application domain with a large body of scientific literature, and considerable commercial interest. The subfields of Topic Detection and Tracking (Allan et al., 1998;Sridhar, 2015) and Sentiment Analysis (Turney, 2002;Pang and Lee, 2008;Liu, 2012;Pozzi et al., 2016) are both scientific topics spawned entirely within the SMM domain. In its most basic form, SMM entails nothing more than counting occurrences of terms in data; producing frequency lists of commonly used vocabulary, and matching of term sets related to various topics and sentiments.
More sophisticated approaches use various forms of probabilistic topic detection (such as Latent Dirichlet Allocation) and sentiment analysis based on supervised machine learning.
The central questions SMM seeks to answer are "what do users talk about?" and "how do they feel about it?". Answers to these questions may provide useful insight for market research and communications departments. It is apparent how product and service companies may use such analysis to gain an understanding of their target audience. It is also apparent how such analysis may be used in the context of elections for providing an indication of citizens' opinions as manifested in what they write in social media. There are numerous studies attempting to use various forms of social media monitoring techniques to predict the outcome of elections, with varying success (Bermingham and Smeaton, 2011;Ceron et al., 2015).
Most notably, the recent examples of the inadequacy of standard opinion measuring techniques to forecast the most recent US election and the Brexit demonstrate that for certain questions related to measuring mass opinion, standard SMM techniques may be inadequate. Political scientists have used the concepts of agenda setting and issue ownership to explain voter choice and election outcomes (Klüver and naki Sagarzazu, 2016;Kiousis et al., 2015;Stubager, 2018). In short, the issue ownership theory of voting states that voters identify the most credible party proponent of a particular issue and cast their ballots for that issue owner (Bélanger and Meguid, 2008). Agenda setting refers to the media's role in influencing the importance of issues in the public agenda (Mccombs and Reynolds, 2002). Note that current social media monitoring techniques are unable to measure these concepts in a satisfactory manner; it does not suffice to measure the occurrence of cer-tain keywords, since most parties tend to use the same vocabulary to discuss issues, and sentiment analysis does not touch upon the issue ownership and agenda setting questions. What is needed for measuring issue ownership and agenda setting is a way to measure language use, i.e. when talking about an issue, to which extent does the language used align with issue owner A vs. issue owner B.
We argue that issue alignment can be seen as a kind of semantic source similarity of the kind "how similar is source A to issue owner P, when talking about issue X", and as such can be measured using word/document embedding techniques. To measure that kind of conditioned similarity we introduce a new notion of similarity for predictive word embeddings. This method enables us to manipulate the similarity measure by weighting the set of entities we account for in the predictive scoring function. The proposed method is applied to measure similarity between party programs and various subsets of online text sources, conditioned on bloc specific issues. The results indicate that this conditioning disentangles similarity. We can, for example, observe that while the Left Party representation is, overall, similar to that of nativist media, it differs significantly on nativist issue, while this effect is not seen to the same extent on more mainstream left wing or right wing media.

Vector Similarity
Vector similarity has been a foundational concept in natural language processing ever sine the introduction of the vector space model for information retrieval by Salton (1971). In this model, queries and document are represented as vectors in term space, and similarity is expressed using cosine similarity. The main reason for using cosine similarity in the vector space model is that it normalizes for vector length; the fact that a document (or query) contains a certain word is more important than how many times it occurs in the document. The vector space model was the main source of inspiration for early work on vector semantics, such as Latent Semantic Analysis (Deerwester et al., 1990;Landauer and Dumais, 1997) and the works on word space models by Schütze (1992Schütze ( , 1993. These works continued to embrace cosine similarity as the similarity metric of choice, since length normalization is equally desired when words are repre-sented by vectors whose elements encode (some function of) co-occurrences with other words. Contemporary research on distributional semantics (Sahlgren, 2006;Bullinaria and Levy, 2007;Turney and Pantel, 2010;Pennington et al., 2014) still use largely the same mathematical machinery as the vector space model, and cosine similarity is still the preferred similarity metric due to its simplicity and use of length normalization.
Even neural language models, which originate from the neural network community, employ cosine similarity to quantify similarity between learned representations Bojanowski et al., 2017).
Word embeddings, as these techniques are nowadays referred to, have been used extensively in SMM, both for topic detection (Sridhar, 2015) and for sentiment analysis (Severyn and Moschitti, 2015).
To the best of our knowledge, only one previous study (Dahlberg and Sahlgren, 2014) has used word embeddings to analyze issue ownership. However, that study relied on simple nearest neighbor analysis using cosine similarity to study language use in the Swedish blogosphere.
We believe that prediction-based word embeddings such as Word2Vec are amenable to another notion of similarity, which we call predictive similarity.

Predictive Similarity
Given a function f : A × B → R, we define the predictive similarity of two items x, y ∈ A as the correlation of f (x, b), and f (y, b), where b is a random variable of type B: (1) At a very general level, prediction based word embeddings such as Word2Vec or FastText consists of a scoring function s : C × T → R with an objective function taking the following form: where l is the logistic loss function l(x) = log(1+ e −x ) and s being the model-specific scoring function that relates to the probability of observing the orange  paint  juice  county  1  deep-red  cranberry  siskiyou  2  fuschia  lime  calaveras  3  lime-green  caraway  ventura  4  hand-woven  fanta  osceola  5  blue  clove  yolo  6  yellow  zests mendocino  7  ocher  coconut bernardino  8  linoleum peppercorns  okanogan  9 duck-egg lemons okfuskee 10 rust-colored peach tuolumne Table 1: Examples of predictive similarity neighborhoods of "orange" conditioned on "paint", "juice", and "county", respectively. 2 target t in the context c. For the Skipgram variant of Word2Vec, this function s is simply the dot product between a vector representation of the target word t, and a vector representation of the context word c.
The predictive similarity has several interpretations for the Skipgram model, but the simplest one is the one where we let f = s, i.e. we say that the similarity of two words x and y is the correlation between the scores they assign to target words b, i.e. corr(s(x, b), s(y, b)). Since s is linear, this correlation takes a fairly simple form: 1 We argue that we can get a a notion of conditioned similarity by estimating a weighted correlation, where the weighting acts as the conditioning. Table 1 shows a small example where we queried the neighborhood of the word "orange", conditioned such that a single word ("paint", 1 It might be interesting to note that this coincides with cosine similarity if var(b) is a scalar multiple of the identity, i.e. if there is no correlation between dimensions and all dimensions have the same variance. "juice", and "county", respectively) accounts for half the weight in var(b), with all other words in the vocabulary having equal weights.
Predictive similarity can easily be extended to similar models, and for the purpose of this paper in particular, we extend it to Doc2Vec (Le and Mikolov, 2014), a model where the notion of context is enriched by the source 3 of the utterance. The scoring function s then takes the following form: s(t, c, d) = t T (c + d), with d being a vector representation of the source in question.
We argue that by using conditioned predictive similarity on document embeddings we can answer questions such as: "how similar is The BBC to The Daily Mail, when talking about Climate Change". The end goal is to measure aggregate similarity in specific issues: "when talking about health policy, to which extent does the general language use align with Source A, Source B, Source C, et.c.".

Experiments
To answer the language similarity question posed by issue ownership we measure aggregate predictive similarity between party platforms and various subsets of online text data, conditioned on words pertaining to left wing issues, right wing issues, nativist issues, and general political topics.
We built Doc2Vec embeddings (Le and Mikolov, 2014) on Swedish online data from 2018 crawled by Trendiction and manually scraped party platforms from the eight parties in parliament and Feministiskt Initiativ (Feminist Initiative). 4 Doc2Vec requires us to define a notion of source. For the data crawled by Trendiction, we take the source to be the domain name of the document, e.g. www.wikipedia.se, whereas for the manually scraped party platforms, we assign it the appropriate party identifier. The model was trained using the Gensim package (Řehůřek and Sojka, 2010) with embedding dimension 100 and a context window of size 8.
In collaboration with the Political Science department at Gothenburg University we also extracted keywords for each party from their party platform. We use these party specific keywords as a crude proxy for issues: we let left wing issues be Abbr.  defined by the union of left bloc party keywords, right wing issues be defined by right bloc party keywords, and nativist issues be defined by the keywords of Sverigedemokraterna (The Swedish Democrats), we also let the union of all keywords be representative for general political discourse. The parties' bloc alignment and the size of the data used to generate representations for them can be seen in Table 2. We let the conditioned predictive similarity between sources two x and y be defined by the following equation (Equation 4), i.e. a weighted variant of equation 3, where only words among the given issues keywords are accounted for, as described by Equation 5.
psim(x, y) = x T var(t; w)y x T var(t; w)x y T var(t; w)y (4) Above, x and y are document vectors and var(t; w t ) is the weighted covariance matrix of the target word vectors. This is the equivalent of letting s(d, c, t) = d T t, i.e. the case we ignore the effect of context words.
Table 3 (next side) shows the average predictive similarity between the political party platforms and various online data sources, conditioned on left wing party issues, right wing party issues, nativist party issues, and general political discourse. Average cosine similarity between the sources and parties is also shown as a comparison.

Discussion
As can be seen in Table 3, there is a marked difference when conditioning on issues versus using regular document -i.e. cosine -similarity. Furthermore, we observe that conditioned similarity seems to align left wing media with left wing parties, nativist media with the Swedish Democrats, but not align right wing media with right wing parties. This effect can be made more apparent by grouping the parties into blocs and fitting a simple additive model for the similarities along all dimensions (i.e. Media, Issues, and Bloc), as a way to normalize for general Media, Issue, and Bloc similarity. The results of this normalization, i.e. the residuals, can be observed in Table 4. From this one can see a small trend where left wing media is similar to left wing parties, nativist media being similar to the Swedish Democrats, and both left wing media and right wing media being dissimilar to the Swedish Democrats.
Furthermore, we see a strong dissimilarity between nativist media and all parties regarding nativist issues. This is particularly true for parties promoting liberal immigration policy: The Left Party, The Social Democrats, The Green Party, The Centre Party, and The Moderates are all currently or historically promoting liberal immigration policy at odds with nativist sentiment.
A shortcoming of the method used here is the rather limited amount of party specific data: the quality and the quantity of the text data used varies drastically between parties, as can be seen in Table 2. Using, for example, parliamentary debates, opinion pieces, and other official party communication might improve data coverage.

Conclusion
In this paper we have introduced some very preliminary results on how to measure similarities in language use, conditioned on discourse, e.g. "how similar is The BBC to The Daily Mail, when talking about Climate Change". The end goal is to measure aggregate similarity in specific issues, answering questions such as "when talking about health policy, to which extent does the general language use align with Source A, Source B, etc.", and use such an aggregate measure to study issue ownership at scale. We believe that issue ownership and agenda setting can be explored through the lens of language use and similarity, but deem it necessary to condition similarity to the specific issue at hand. The reason for this is the need to distinguish between level of engagement in an issue and agreement in an issue: two sources that talk a lot about an issue -e.g. health insurance -but in very different ways should not be considered similar. Dually, if a source very rarely talks about an issue, but consistently does so in a way that is very similar to the way some political party talks about it, we consider it reasonable to believe that that source's opinion aligns with the political party in question on that specific issue.
While we have not found a satisfactory, direct, evaluation of this task, we do believe that the examples we put forward show some face validity of the proposed method at measuring ideological alignment.