Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization

We presents in this paper our approach for modeling inter-topic preferences of Twitter users: for example, “those who agree with the Trans-Pacific Partnership (TPP) also agree with free trade”. This kind of knowledge is useful not only for stance detection across multiple topics but also for various real-world applications including public opinion survey, electoral prediction, electoral campaigns, and online debates. In order to extract users’ preferences on Twitter, we design linguistic patterns in which people agree and disagree about specific topics (e.g., “A is completely wrong”). By applying these linguistic patterns to a collection of tweets, we extract statements agreeing and disagreeing with various topics. Inspired by previous work on item recommendation, we formalize the task of modeling inter-topic preferences as matrix factorization: representing users’ preference as a user-topic matrix and mapping both users and topics onto a latent feature space that abstracts the preferences. Our experimental results demonstrate both that our presented approach is useful in predicting missing preferences of users and that the latent vector representations of topics successfully encode inter-topic preferences.


Introduction
Social media have changed the way people shape public opinion. The latest survey by the Pew Research Center reported that a majority of US adults (62%) obtain news via social media, and of those, 18% do so often (Gottfried and Shearer, 2016). Given that news and opinions are shared and amplified by friend networks of individuals (Jamieson and Cappella, 2008), individuals are thereby isolated from information that does not fit well with their opinions (Pariser, 2011). Ironically, cutting-edge social media technologies promote ideological groups even with its potential to deliver diverse information.
A large number of studies already analyzed discussions, interactions, influences, and communities on social media along the political spectrum from liberal to conservative (Adamic and Glance, 2005;Zhou et al., 2011;Cohen and Ruths, 2013;Bakshy et al., 2015;Wong et al., 2016). Even though these studies provide intuitive visualizations and interpretations along the liberalconservative axis, political analysts argue that the axis is flawed and insufficient for representing public opinion and ideologies (Kerlinger, 1984;Maddox and Lilie, 1984).
A potential solution for analyzing multiple axes of the political spectrum on social media is stance detection (Thomas et al., 2006;Somasundaran and Wiebe, 2009;Murakami and Raymond, 2010;Anand et al., 2011;Walker et al., 2012;Mohammad et al., 2016;Johnson and Goldwasser, 2016), whose task is to determine whether the author of a text is for, neutral, or against a topic (e.g., free trade, immigration, abortion). However, stance detection across different topics is extremely difficult. Anand et al. (2011) reported that a sophisticated method with topic-dependent features substantially improved the performance of stance detection within a topic, but such an approach could not outperform a baseline method with simple n-gram features when evaluated across topics. More recently, all participants of SemEval 2016 Task 6A (with five topics) could not outperform the baseline supervised method using n-gram features (Mohammad et al., 2016).
In addition, stance detection encounters dif-   Figure 1: An overview of this study.
ficulties with different user types. Cohen and Ruths (2013) observed that existing methods on stance detection fail on "ordinary" users because such methods primarily obtain training and test data from politically vocal users (e.g., politicians); for example, they found that a stance detector trained on a dataset with politicians achieved 91% accuracy on other politicians but only achieved 54% accuracy on "ordinary" users. Establishing a bridge across different topics and users remains a major challenge not only in stance detection, but also in social media analytics.
An important component in establishing this bridge is commonsense knowledge about topics. For example, consider a topic a revision of Article 96 of the Japanese Constitution. We infer that the statement "we should maintain armed forces" tends to favor this topic even without any lexical overlap between the topic and the statement. This inference is reasonable because: the writer of the statement favors armed forces; those who favor armed forces also favor a revision of Article 9 1 ; and those who favor a revision of Article 9 also favor a revision of Article 96 2 . In general, this kind of commonsense knowledge can be expressed in 1 Article 9 prohibits armed forces in Japan. 2 Article 96 specifies high requirements for making amendments to Constitution of Japan (including Article 9). the format: those who agree/disagree with topic A also agree/disagree with topic B. We call this kind of knowledge inter-topic preference throughout this paper.
We conjecture that previous work on stance detection indirectly learns inter-topic preferences within the same target through the use of n-gram features on a supervision data. In contrast, in the present paper, we directly acquire inter-topic preferences from an unlabeled corpus of tweets. This acquired knowledge regarding inter-topic preferences is useful not only for stance detection, but also for various real-world applications including public opinion survey, electoral campaigns, electoral predictions, and online debates. Figure 1 provides an overview of this work. In our system, we extract linguistic patterns in which people agree and disagree about specific topics (e.g., "A is completely wrong"); to accomplish this, as described in Section 2.1, we make use of hashtags within a large collection of tweets. The patterns are then used to extract instances of users' preferences regarding various topics, as detailed in Section 2.2. Inspired by previous work on item recommendation, in Section 3, we formalize the task of modeling inter-topic preferences as a matrix factorization: representing a sparse user-topic matrix (i.e., the extracted instances) with the prod-uct of low-rank user and topic matrices. These low-rank matrices provide latent vector representations of both users and topics. This approach is also useful for completing preferences of "ordinary" (i.e., less vocal) users, which fills the gap between different types of users.
The contributions of this paper are threefold.
1. To the best of our knowledge, this is the first study that models inter-topic preferences for unlimited targets on real-world data.
2. Our experimental results show that this approach can accurately predict missing topic preferences of users accurately (80-94%).
3. Our experimental results also demonstrate that the latent vector representations of topics successfully encode inter-topic preferences, e.g., those who agree with nuclear power plants also agree with nuclear fuel cycles.
This study uses a Japanese Twitter corpus because of its availability from the authors, but the core idea is applicable to any language.

Mining Topic Preferences of Users
In this section, we describe how we collect statements in which users agree or disagree with various topics on Twitter, which then serves as source data for modeling inter-topic preferences. More formally, we are interested in acquiring a collection of tuples (u, t, v), where: u ∈ U is a user; U is the set of all users on Twitter; t ∈ T is a topic; T is the set of all topics; and v ∈ {+1, −1} is +1 when the user u agrees with the topic t and −1 otherwise (i.e., disagreement). Throughout this work, we use a corpus consisting of 35,328,745,115 Japanese tweets (7,340,730 users) crawled from February 6, 2013 to September 30, 2016. We removed retweets from the corpus.

Mining Linguistic Patterns of Agreement and Disagreement
We use linguistic patterns to extract tuples (u, t, v) from the aforementioned corpus. More specifically, when a tweet message matches to one of linguistic patterns of agreement (e.g., "t is necessary"), we regard that the author u of the tweet agrees with topic t. Conversely, a statement of disagreement is identified by linguistic patterns for disagreement (e.g., "t is unacceptable").
In order to design linguistic patterns, we focus on hashtags appearing in the corpus that have been popular clues for locating subjective statements such as sentiments (Davidov et al., 2010), emotions (Qadir and Riloff, 2014), and ironies (Van Hee et al., 2016). Hashtags are also useful for finding strong supporters and critics, as well as their target topics; for example, #immigrantsWelcome indicates that the author favors immigrants; and #StopAbortion is against abortion.
Based on this intuition, we design regular expressions for both pro hashtags "#(.+)sansei" 3 and con hashtags "#(.+)hantai" 4 , where (.+) matches a target topic. These regular expressions can find users who have strong preferences to topics. Using this approach, we extracted 31,068 occurrences of pro/con hashtags used by 18,582 users for 4,899 topics. We regard the set of topics found using this procedure as set of target topics T in this study.
Each time we encounter a tweet containing a pro/con hashtag, we searched for corresponding textual statements as follows. Suppose that a tweet includes a hashtag (e.g., #TPPsansei) for a topic t (e.g., TPP). Assuming that the author of the given tweet does not change their attitude toward a topic over time, we search for other tweets posted by the same author that also have the topic keyword t. This process retrieves tweets like "I support TPP." Then, we replace the topic keyword into a variable A to extract patterns, e.g., "I support A." Here, the definition of the pattern unit is language specific. For Japanese tweets, we simply recognize a pattern that starts with a variable (i.e., topic) and ends at the end of the sentence 5 .
Because this procedure also extracts useless patterns such as "to A" and "this is A", we manually choose useful patterns in a systematic way: sort patterns in descending order of the number of users who use the pattern; and check the sorted list of patterns manually; and remove useless patterns.
Using this approach, we obtained 100 pro patterns (e.g., "welcome A" and "A is necessary") and 100 con patterns ("do not let A" and "I don't want A").

Extracting Instances of Topic Preferences
By using the pro and con patterns acquired using the approach described in Section 2.1, we extract instances of (u, t, v) as follows. When a sentence in a tweet whose author is user u matches one of the pro patterns (e.g., "t is necessary") and the topic t is included in the set of target topics T , we recognize this as an instance of (u, t, +1). Similarly, when a sentence matches one of the con patterns (e.g., "I don't want t") and the topic t is included in the set of target topics T , we recognize this as an instance of (u, t, −1). Using this approach, we collected 25,805,909 tuples corresponding to 3,302,613 users and 4,899 topics. Because these collected tuples included comparatively infrequent users and topics, we removed users and topics that appeared less than five times. In addition, there were also meaningless frequent topics such as "of" and "it". Therefore, we sorted topics in descending order of their co-occurrence frequencies with each of the pro patterns and con patterns, and then removed meaningless topics in the top 100 topics. This resulted in 9,961,509 tuples regarding 273,417 users and 2,323 topics.

Matrix Factorization
Using the methods described in Section 2, from the corpus, we collected a number of instances of users' preferences regarding various topics. However, Twitter users do not necessarily express preferences for all topics. In addition, it is by nature impossible to predict whether a new (i.e., nonexistent in the data) user agrees or disagrees with given topics. Therefore, in this section, we apply matrix factorization (Koren et al., 2009) in order to predict missing values, inspired by research regarding item recommendation (Bell and Koren, 2007;Dror et al., 2011). In essence, matrix factorization maps both users and topics onto a latent feature space that abstracts topic preferences of users.
Here, let R be a sparse matrix of |U |×|T |. Only when a user u expresses a preference for topic t do we compute an element of the sparse matrix r u,t , Here, #(u, t, +1) and #(u, t, −1) represent the numbers of occurrences of instances (u, t, +1) and (u, t, −1), respectively. Thus, an element r u,t approaches +1 as the user u favors the topic t, and −1 otherwise. If the user u does not make any statement regarding the topic t (i.e., neither (u, t, +1) nor (u, t, −1) exists in the data), we do not fill the corresponding element, leaving it as a missing value. Matrix factorization decomposes the sparse matrix R into low-dimensional matrices P ∈ R k×|U | and Q ∈ R k×|T | , where k is a parameter that specifies the number of dimensions of the latent space. We minimize the following objective function to find the matrices P and Q, Here, (u, t) ∈ R is repeated for elements filled in the sparse matrix R, p u ∈ R k and q v ∈ R k are u column vectors of P and v column vectors of Q, respectively, and λ P ≥ 0 and λ Q ≥ 0 represent coefficients of regularization terms. We call p u and q t the user vector and topic vector, respectively.
Using these user and topic vectors, we can predict an elementr u,t that may be missing in the original matrix R,r u,t p u q t . (3) We use libmf 6 (Chin et al., 2015) to solve the optimization problem in Equation 2. We set regularization coefficients λ P = 0.1 and λ Q = 0.1 and use default values for the other parameters of libmf.

Determining the Dimension Parameter k
How good is the low-rank approximation found by matrix factorization? And can we find the "sweet spot" for the number of dimensions k of the latent space? We investigate the reconstruction error of matrix factorization using different values of k to answer these questions. We use Root Mean Squared Error (RMSE) to measure error, Here, N is the number of elements in the sparse matrix R (i.e., the number of known values). Figure 2 shows RMSE values over iterations of libmf with the dimension parameter k ∈ {1, 2, 5, 10, 30, 50, 100, 300, 500}. We observed that the reconstruction error decreased as the iterative method of libmf progressed. The larger the number of dimensions k was, the smaller the reconstruction error became; the lowest reconstruction error was 0.3256 with k = 500. We also observed the error with k = 1, which corresponds to mapping users and topics onto one dimension similarly to the political spectrum of liberal and conservative. Judging from the relatively high RMSE values with k = 1, we conclude that it may be difficult to represent everything in the data using a one-dimensional axis. Based on this result, we concluded that matrix factorization with k = 100 is sufficient for reconstructing the original matrix R and therefore used this parameter value for the rest of our experiments.

Predicting Missing Topic Preferences
How accurately can the user and topic vectors predict missing topic preferences? To answer this question, we evaluate the accuracy in predicting hidden preferences in the matrix R as follows. First, we randomly selected 5% of existing elements in R and let Y represent the collection of the selected elements (test set). We then perform matrix factorization on the sparse matrix without the selected elements of Y , that is, only with the remaining 95% elements of R (training set). We define the accuracy of the prediction as 1 |Y | u,t∈Y 1 (sign(r u,t ) = sign(r u,t )) (5) Here, r u,t denotes the actual (i.e., self-declared) preference values,r u,t represents the preference value predicted by Equation 3, sign(.) represents the sign of the argument, and 1(.) yields 1 only when the condition described in the argument holds and 0 otherwise. In other words, Equation 5 computes the proportion of correct predictions to all predictions, assuming zero to be the decision boundary between pro and con. Figure 3 plots prediction accuracy values calculated from different sets of users. Here the xaxis represents a threshold θ, which filters out users whose declarations of topic preferences are no greater than θ topics. In other words, Figure  3 shows prediction accuracy when we know user preferences for at least θ topics. For comparison, we also include the majority baseline that predicts pro and con based on the majority of preferences regarding each topic in the training set.
Our proposed method was able to predict missing preferences with an 82.1% accuracy for users stating preferences for at least five topics. This accuracy increased as our method received more information regarding the users, reaching a 94.0% accuracy when θ = 100. This result again indicates that our proposed method reasonably utilizes known preferences to complete missing preferences.
In contrast, the performance of the majority baseline decreased as it received more information regarding the users. Because this result was rather counter-intuitive, we examined the cause of this phenomenon. Consequently, this result turned out to be reasonable because preferences of vocal users deviated from those of the average users. declared topics. In the figure, the x-axis represents a threshold θ, which filters out users whose statements of topic preferences are no greater than θ topics. We observe that the mean variance increased as we focused on vocal users. Overall, these results demonstrate the usefulness of user and topic vectors in predicting missing preferences. Table 1 shows examples in which missing preferences of two users were predicted from known statements of agreements and disagreements 7 . In the table, predicted topics are accompanied by the correspondingr u,t value in parentheses. As an example, our proposed method predicted that the user A, who is positive toward regime change but negative toward Okinawa US military base, may also be positive toward vote of non-confidence to Cabinet but negative toward construction of a new base.

Inter-topic Preferences
Do the topic vectors obtained by matrix factorization capture inter-topic preferences, such as "People who agree with A also agree with B"?
Because no dataset exists for this evaluation, we created a dataset of pairwise inter-topic preferences by using a crowdsourcing service 8 . Sampling topic pairs randomly, we collected 150 topic pairs whose cosine similarities of topic vectors were below −0.6, 150 pairs whose cosine similarities were between −0.6 and 0.6, and 150 pairs whose cosine similarities were above 0.6. In this way, we obtained 450 topic pairs for evaluation.
Given a pair of topics A and B, a crowd worker was asked to choose a label from the following three options: (a) those who agree/disagree with topic A may also agree/disagree with topic B; (b) those who agree/disagree with topic A may conversely disagree/agree with topic B; (c) otherwise (no association between A and B). Creating twenty pairs of topics as gold data, we removed labeling results from workers whose accuracy is less than 90%.
Consequently, we obtained 6-10 human judgements for every topic pair. Regarding (a) as +1 point, (b) as −1 point, and (c) as 0 point, we computed the mean of the points (i.e., average human judgements) for each topic pair. Spearman's rank correlation coefficient (ρ) between cosine similarity values of topic vectors and human judgements was 0.2210. We could observe a moderate correlation even though inter-topic preferences collected in this manner were highly subjective.
In addition to the quantitative evaluation, as summarized in Table 2, we also checked similar topics for three controversial topics, Liberal Democratic Party (LDP), constitutional amendment and right of foreigners to vote (Table 2). Topics similar to LDP included synonymous ones (e.g., Abe's LDP and Abe administration) and other topics promoted by the LDP (e.g., resuming nuclear power plant operations, bus rapid transit (BRT) and hate speech countermeasure law). Considering that people who support the LDP may also tend to favor its policies, we found these results reasonable. As for the other example, constitutional amendment had a feature vector that was similar to that of amendment of Article 9, enforcement of specific secret protection law and security related law. From these results, we concluded that topic vectors were able to capture inter-topic preferences.

Related Work
In this section, we summarize the related work that spreads across various research fields.
Employing a single axis (e.g., liberal to conservative) or a few axes (e.g., political parties and candidates of elections), these studies provide intuitive visualizations and interpretations along the respective axes. In contrast, this study is the first attempt to recognize and organize various axes of topics on social media with no prior assumptions regarding the axes. Therefore, we think our study provides a new tool for computational social science and political science that enables researchers to analyze and interpret phenomena on social media.
Next, we describe previous research focused on acquiring lexical knowledge of politics. Sim et al. (2013) measured ideological positions of candidates in US presidential elections from their speeches. The study first constructs "cue lexicons" from political writings labeled with ideologies by domain experts, using sparse additive generative models (Eisenstein et al., 2011). These constructed cue lexicons were associated with such ideologies as left, center, and right. Representing each speech of a candidate with cue lexicons, they inferred the proportions of ideologies of the candidate. The study requires a predefined set of labels and text data associated with the labels. Bamman and Smith (2015) presented an unsupervised method for assessing the political stance of a proposition, such as "global warming is a hoax," along the political spectrum of liberal to conservative.
In their work, a proposition was represented by a tuple in the form subject, predicate , for example, global warming, hoax . They presented a generative model for users, subjects, and predicates to find a one-dimensional latent space that corresponded to the political spectrum.
Similar to our present work, their work (Bamman and Smith, 2015) did not require labeled data to map users and topics (i.e., subjects) onto a latent feature space. In their paper, they reported that the generative model outperformed Principal Component Analysis (PCA), which is a method for matrix factorization. Empirical results here probably reflected the underlying assumptions that PCA treats missing elements as zero and not as missing data. In contrast, in the present work, we properly distinguish missing values from zero, excluding missing elements of the original matrix from the objective function of Equation 2. Further, this work demonstrated the usefulness of the latent space, that is, topic and user vectors, in predicting missing topic preferences of users and inter-topic preferences.
Fine-grained Opinion Analysis The method presented in Section 2 is an instance of finegrained opinion analysis (Wiebe et al., 2005;Choi et al., 2006;Johansson and Moschitti, 2010;Yang and Cardie, 2013;Deng and Wiebe, 2015), which extracts a tuple of a subjective opinion, a holder of the opinion, and a target of the opinion from text. Although these previous studies have the potential to improve the quality of the user-topic matrix R, unfortunately, no corpus or resource is available for the Japanese language. We do not currently have a large collection of English tweets, but combining fine-grained opinion analysis with matrix factorization is an immediate future work.
Causality Relation Some of inter-topic preferences in this work can be explained by causality relation, for example, "TPP promotes free trade." A number of previous studies acquire instances of causal relation (Girju, 2003;Do et al., 2011) and promote/suppress relation (Hashimoto et al., 2012;Fluck et al., 2015) from text. The causality knowledge is useful for predicting (hypotheses of) future events Hashimoto et al., 2015).
Inter-topic preferences, however, also include pairs of topics in which causality relation hardly holds. As an example, it is unreasonable to infer that nuclear plant and railroading of bills have a causal relation, but those who dislike nuclear plant also oppose railroading of bills because presumably they think the governing political parties rush the bill for resuming a nuclear plant. In this study, we model these inter-topic preferences based on preferences of the public. That said, we have as a promising future direction of our work plans to in-corporate approaches to acquire causality knowledge.

Conclusion
In this paper, we presented a novel approach for modeling inter-topic preferences of users on Twitter. Designing linguistic patterns for identifying support and opposition statements, we extracted users' preferences regarding various topics from a large collection of tweets. We formalized the task of modeling inter-topic preferences as a matrix factorization that maps both users and topics onto a latent feature space that abstracts users' preferences. Through our experimental results, we demonstrated that our approach was able to accurately predict missing topic preferences of users (80-94%) and that our latent vector representations of topics properly encoded inter-topic preferences.
For our immediate future work, we plan to embed the topic and user vectors to create a crosstopic stance detector. It is possible to generalize our work to model heterogeneous signals, such as interests and behaviors of people, for example, "those who are interested in A also support B," and "those who favor A also vote for B". Therefore, we believe that our work will bring about new applications in the field of NLP and other disciplines.