Predicting Personal Opinion on Future Events with Fingerprints

Predicting users’ opinions in their response to social events has important real-world applications, many of which political and social impacts. Existing approaches derive a population’s opinion on a going event from large scores of user generated content. In certain scenarios, we may not be able to acquire such content and thus cannot infer an unbiased opinion on those emerging events. To address this problem, we propose to explore opinion on unseen articles based on one’s fingerprinting: the prior reading and commenting history. This work presents a focused study on modeling and leveraging fingerprinting techniques to predict a user’s future opinion. We introduce a recurrent neural network based model that integrates fingerprinting. We collect a large dataset that consists of event-comment pairs from six news websites. We evaluate the proposed model on this dataset. The results show substantial performance gains demonstrating the effectiveness of our approach.


Introduction
Opinion mining and sentiment analysis has important applications with practical socio-political and economical benefits. There are numerous examples of works that show applications of sentiment analysis beyond classification. It can be used for analyzing political preferences of the electorate or for mining sentiments and emotions of people who lived in the past. The goal of these studies is not only to recognize sentiments, but also to understand how they were formed. All these approaches share the same starting point: abundant availability of user-generated content (UGC), such as reviews and Twitter posts (Dave et al., 2003;Hu and Liu, 2004;Pang and Lee, 2008;Pak and Paroubek, 2010;Liu, 2010;Taboada et al., 2011;Liu, 2012;Zhu et al., 2011;Yang et al., 2016;Joulin et al., 2017). They focus on document classification and predict sentiment, stance, subjectivity, or aspect.
The success of these methods relies on the assumption that people have already expressed their opinion on a topic or event. However, we may not acquire those opinion if people are not willing to reveal them or did not have the chance to express their opinions (e.g., about a new piece of legislature). Another instance is the availability of biased content on the web. One cannot possibly read all the content on the web and can only peruse a tiny part of the information, and people often read the content that is consistent with their prior beliefs (Mullainathan and Shleifer, 2005;Xiang and Sarvary, 2007). As a result, existing methods cannot deal with the situation when there is no user-generated content and might fail to collect opinion on certain topics.
We aim to fill this gap. We propose to predict opinion about public events before people leave their comments. We consider news articles as the description of events, because people usually check breaking news about emerging events from news portals, micro-blogs, and social media (Di Crescenzo et al., 2017) and leave comments. We predict the opinion of users on all events regardless of whether they have read those articles. The key idea is to access the historical event-opinion pairs of users and learn about a user's fingerprinting, which we assume to be consistent over a period of time. We expect this fingerprinting ... Event: Hillary Clinton reacts to controversial Trump retweet: 'We need a real president'. Comment: Actually Trump is still your president so don't cry. ... Event: Pelosi creates new House committee with subpoena power for coronavirus oversight. Comment: Still your president. Btw Trumps polls are higher than 2016 so that is a win. ...   Figure 1: The illustration of our model.
to implicitly capture what a person has said about events and assist opinion mining on unseen articles. For example, a user commented "still your president" for events about "Hillary Clinton" and "Pelosi" as shown in Table 1, and we might expect this user to write similar comments on other events of the Democratic Party, such as the candidacy of Elizabeth Warren to U.S. presidency. Since there is no research directly addressing this problem, we collect event-comment pairs from six websites: ArchiveIs, DailyMail, FoxNews, NYTimes, TheGuardian, and WSJ. There are in total more than 45K articles, 35M comments, and 376K users in this dataset. The overview of the dataset is given in Table 2. Empirically, we quantify the concept of opinion by sentiment and subjectivity and apply four methods for auto labeling. We propose an approach that takes two inputs: (i) a user identifier that enables the system to access historical data; (ii) an event as reported by a news article. We evaluate four baselines: three of them leverage the historical data and one baseline learns user embedding with neural collaborative filtering (He et al., 2017). We further introduce a recurrent neural network model that encodes a user's reading and commenting history. Our experiment indicates that encoding user historical data with recurrent neural network improves the performance of predicting sentiment and subjectivity on unseen articles over these baselines.

Method
We describe the proposed model in this section. Assuming that among K users we are interested in knowing the opinion of the user u k on a newly occurring event a T , and the user u k has previously left n article-comment pairs, i.e., [c u k 0 , . . . , c u k n ] and [a u k 0 , . . . , a u k n ], we aim to model the prediction p u k T as the opinion of the user on a future article a u k T . An overview of the model is illustrated in Figure 1. Since recurrent neural network architecture has achieved the state-of-the-art results on encoding sequences, we build our model using RNN for both words and the user history. Specifically, we leverage two GRUs (Cho et al., 2014) instead of LSTM (Hochreiter and Schmidhuber, 1997) because the former has fewer parameters. The first GRU, denoted as W-RNN in Figure 1 for modeling Words, encodes events and comments into fixed-length vectors, i.e, [h a,u k 0 , . . . , h a,u k n ], [h c,u k 0 , . . . , h c,u k n ], and h a,u k T . The second GRU, denoted as S-RNN in Figure 1 for modeling the Sequence of history, takes as the input the concatenation of prior events and comments, where [; ] means concatenation, and outputs the user's fingerprint embedding h f,u k . Finally, we give the concatenation of the fingerprinting embedding and the current event embedding, i.e., [h f,u k ; h a,u k T ], to a one layer feedforward network, which is denoted as MLP in Figure 1. This MLP outputs the final prediction with a softmax activation function.

Experiments
We describe the experimental setup in this section. We first explain how we process the dataset. Then we detail the baselines and present the results.

Data Preparation
News articles were randomly collected from AchiveIs, DailyMail, FoxNews, NewYorkTimes, TheGuadian, and WSJ using FLORIN (Liu et al., 2015). We remove users if they have less than ten comments and then we remove articles that the remaining users have not commented. We have manually checked a subset of articles and their comments, and find that irrelevant comments are few enough to ignore. We split the training, validation, and test as in Figure 2. Suppose in the past T time user u k contributed a sequence of article-comment pairs, we use the most recent article a u k T as the unseen article for test and a u k T −1 as the unseen article for validation, so that their corresponding comments (opinion) are not viewed during the training. For the training set [a 1 , c 1 , . . . , a T −2 , c T −2 ], each article is used as the unseen article to form a training instance. Given the user and an unseen article, we include previous m article-comment pairs as the historical data to model the user fingerprint, and we assume that users have consistent views and stance on the same event within these m pairs . Thus, the numbers of examples in the test set and the validation set are equal to the user number U , and the number of training examples is equal to U * (T − 2).
The opinion expressed in the comments are hard for evaluation, so we quantify the concept of opinion by sentiment polarity and subjectivity. Given the volume of our dataset, it would be inefficient to conduct manual annotation. We apply four methods, including Vader (Hutto and Gilbert, 2014), Flair (Akbik et al., 2019), BlobText sentiment, and BlobText subjectivity (Loria et al., 2014), to automatically label all comments. Vader is a rule-based model for general sentiment analysis. It is constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon. When assessing the sentiment of tweets, Vader outperforms individual human raters (Hutto and Gilbert, 2014). Flair presents a unified interface for all word embeddings and supports methods for producing vector representation of entire documents. We use the Flair pretrained classification model for sentiment labels. The model is trained on the IMDB dataset and has 90.54 micro F1 score. Flair predicts either positive or negative. BlobText is a simple rule-based API for sentiment analysis. It has both sentiment model and subjectivity model, and we refer them Bsent, Bsubj in Table 3, respectively. We cast the real value prediction to categorical value by comparing with a threshold. For example, Vader predicts the sentiment between -1 and 1, and we take the threshold of zero.
Notably, we do not consider stance prediction because some websites might have a clear political orientation, e.g., the FoxNews favors the Republican Party, and their readers and comments reveal a similar trend. Nevertheless, we do not restrict sentiment and subjectivity in our task. For example, we can also predict one's emotional reaction about unseen articles (Bostan and Klinger, 2018), which we will explore in future shortly. Essentially, we argue that the proposed task requires models to effectively capture the fingerprinting of users based on what they have read and commented, so that we can generalize one's history to predict their opinion on unseen events.
To validate that the automatic annotation gives reasonable labels, we perform standard document classification on these labels with a RNN classifier and report micro F1 in Table 3, which we denoted as Oracle. According to the result, labeling with rule-based methods (Vader and BlobText) gives a better performance than pre-trained neural network model Flair. The reason could be that rule-based methods provide less noise: they will label an instance neutral if no sentiment lexicon is found. It is worthwhile to note that the performance of the oracle is consistent across a variety of news sites yielding confidence  on the labels that we use to evaluate our model.

Baselines and Implementation
We evaluate four baselines to compare. UF: we retrieve the most frequent opinion from one's history and use it as the opinion of an unseen article. A-tfidf: we compare the cosine distance between the unseen article and one's reading history and extract the opinion of the most similar article as the opinion of the unseen article. We use TF-IDF to represent each article after we remove stop words. A-BERT: it also compares the similarity between the unseen article and one's reading history, but adopts pre-trained BERT (Devlin et al., 2018) for representation. CF: this baseline uses neural collaborative filtering (He et al., 2017), which has been well applied for recommendation system. In our task, we still use GRU to encode the unseen article but replace the fingerprint embedding with a learnable user embedding. We denote the proposed method FPE, short for FingerPrint Embedding. We implement the model with the Pytorch package. We arbitrarily set each instance to access at most 14 previous article-comment pairs. In all cases, the size of hidden dimension is 256. We use the Adam optimizer (Kingma and Ba, 2015) with a fixed learning rate of 0.001. We apply dropouts with a fixed probability of 0.2. We further apply BPEmb to process documents and utilize the pretrained 300 dimension sub-word embeddings (Heinzerling and Strube, 2018). We train CF and FPE for 16 epochs and save the model based on the micro F1 score of the validation set. The best model is usually achieved around five epochs. The code is released at: https://github.com/fYYw/fingerprinting.

Results and Discussion
We report the micro F1 score in Table 3. All the results are acquired by using the best model on the test set. From Table 3, we first observe that all the evaluated methods suffer a large performance drop compared to the oracle setting. This means that the task of predicting the opinion on unseen events is challenging. Note that the "oracle" directly uses responses, so it is not predicting future opinion from past history. We view the oracle as a quality check to ensure labels are correct. How to effectively model the fingerprint of a user and generalize one's history to future event will require more investigation.
Nevertheless, even on such a hard task, the FPE model outperforms UF, A-tfidf, and A-BERT on all labels. This is important because automatic labeling may bring noise on individual labels while considering four labeling schemas together makes better sense. The improvement indicates that using RNN can better leverage one's reading and commenting history, and therefore creates the fingerprint of a user more generalized on unseen events than other methods. Because previous works also suggest that the recurrent neural network can effectively track one's sequential actions (Tan et al., 2016;Pei et al., 2017;Beutel et al., 2018), we conclude that the recurrent architecture is the reason for improvements. Comparing FPE with CF, we see that FPE consistently outperforms CF except a small drop with the Flair score on ArchiveIs. Since the only difference between FPE and CF is whether or not we encode the user historical data, we conclude that historical event-comment pairs carry valuable information to build one's fingerprint embedding and therefore benefit prediction on unseen articles. We also observe that A-BERT does not perform well, possibly because there is no fine-tuning step on the pretrained BERT model. Another choice is to replace BERT with Sentence-BERT (Reimers and Gurevych, 2019) for semantic textual similarity, which could improve A-BERT due to a better sentence embeddings.

Conclusion
In this work, we introduce a new task that predicts opinion on unseen events based on one's reading and commenting history. We design a recurrent neural network based model to encode the historical data and use the fingerprint embedding to obtain opinion on new articles. Experiments on our newly collected dataset show that leveraging recurrent neural network and one's historical data gives better performance that four used baselines. We believe that the proposed novel problem setting lays the foundation for a variety of more rigorous works to fully explore how to learn and generalize user fingerprints. In the future, we plan to quantify one's comment with more dimensions, such as emotion, and predict them on unseen events. We argue that a successful model would effectively leverage one's fingerprinting, and it is worthwhile to investigate different architectures for this task.

Acknowledgement
Research was supported in part by grants NSF 1838147, NSF 1838145, ARO W911NF-20-1-0254. The views and conclusions contained in this document are those of the authors and not of the sponsors. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.