Neural Temporal Opinion Modelling for Opinion Prediction on Twitter

Opinion prediction on Twitter is challenging due to the transient nature of tweet content and neighbourhood context. In this paper, we model users’ tweet posting behaviour as a temporal point process to jointly predict the posting time and the stance label of the next tweet given a user’s historical tweet sequence and tweets posted by their neighbours. We design a topic-driven attention mechanism to capture the dynamic topic shifts in the neighbourhood context. Experimental results show that the proposed model predicts both the posting time and the stance labels of future tweets more accurately compared to a number of competitive baselines.


Introduction
Social media platforms allow users to express their opinions online towards various subject matters. Despite much progress in sentiment analysis in social media, the prediction of opinions, however, remains challenging. Opinion formation is a complex process. An individual's opinion could be influenced by their own prior belief, their social circles and external factors. Existing studies often assume that socially connected users hold similar opinions. Social network information is integrated with user representations via weighted links and encoded using neural networks with attentions or more recently Graphical Convolutional Networks (GCNs) (Chen et al., 2016;Li and Goldwasser, 2019). This strand of work, including (Chen et al., 2018;Zhu et al., 2020;Del Tredici et al., 2019), leverages both the chronological tweet sequence and social networks to predict users' opinions.
The majority of previous work requires a manual segmentation of a tweet sequence into equallyspaced intervals based on either tweet counts or * Corresponding author time duration. Models trained on the current interval are used to predict users' opinions in the next interval. However, we argue that such a manual segmentation may not be appropriate since users post tweets at different frequency. Also, the time interval between two consecutively published tweets by a user is important to study the underlying opinion dynamics system and hence should be treated as a random variable.
Inspired by the multivariate Hawkes process (Aalen et al., 2008;Du et al., 2016), we propose to model a user's posting behaviour by a temporal point process that when user u posts a tweet d at time t, they need to decide on whether they want to post a new topic/opinion, or post a topic/opinion influenced by past tweets either posted by other users or by themselves. We thus propose a neural temporal opinion model to jointly predict the time when the new post will be published and its associated stance. Instead of using the fixed formulation of the multivariate Hawkes process, the intensity function of the point process is automatically learned by a gated recurrent neural network. In addition, one's neighbourhood context and the topics of their previously published tweets are also taken into account for the prediction of both the posting time and stance of the next tweet.
To the best of our knowledge, this is the first work to exploit the temporal point process for opinion prediction on Twitter. Experimental results on the two Twitter datasets relating to Brexit and US general election show that our proposed model outperforms existing approaches on both stance and posting time prediction.

Methodology
We present in Figure 1 the overall architecture of our proposed Neural Temporal Opinion Model (NTOM). The input to the model at time step i Figure 1: Overview of the Neural Temporal Opinion Model.
consists of user's own tweet x i , bag-of-word representation x b i , time interval τ i between the i − 1 th tweet and the i th tweet, user embedding u, and neighbours' tweet queue {d i,1 , d i,2 , . . . , d i,L }. At first, a Bi-LSTM layer is applied to extract features from input tweets. Then the neighborhood tweets are processed by a stacked Bi-LSTM/LSTM layer for the extraction of neighborhood context, which is fed into an attention module queried by the user's own tweet h i and topic z i . The output of attention module is concatenated with tweet representation, time interval τ i , user representation u, and topic representation z i , which is encoded from x b i via a Variational Autoencoder (VAE). Finally, the combined representation is sent to a GRU cell, whose hidden state participates in computing the intensity function and the softmax function, for the prediction of the posting time interval and the stance label of the next tweet. In the following, we elaborate the model in more details: Tweet representation: Words in tweets are mapped to pre-trained word embeddings (Baziotis et al., 2017) 1 , which is specially trained for tweets. Then Bi-LSTM is used to generate the tweet representation. Topic extraction: The topic representation z i in Figure 1 captures the topic focus of the i th tweet. It is learned by VAE (Kingma and Welling, 2014), which approximates the intractable true posterior 1 https://github.com/cbaziotis/ datastories-semeval2017-task4 by optimising the reconstruction error between the generated tweet and the original tweet. Specifically, we convert each tweet to the bag-of-word format weighted by term frequency, x b i , and feed it to two inference neural networks defined as f µ φ and f Σ φ . These generate mean and variance of a Gaussian distribution from which the latent topic vector z i is sampled. Then the approximated posterior would be q To generate the observationx b i conditional on the latent topic vector z i , we define the generative network as . The reconstruction loss for the tweet x b i is then: Neighbourhood Context Attention: To capture the influence from the neighbourhood context, we first input the neighbours' recent L tweets to an LSTM in a temporal ascending order. The output of the LSTM is weighed by the attention signals queried by the user's i th tweet and topic: where {h c i,1 , h c i,2 , . . . , h c i,L } denotes the hidden state output of each tweet d i,l in the neighbourhood context, z c i,l denotes the associated topic, h i is the representation of the user's own tweet at time step i, and both W h and W z are weight matrices.
We use this attention mechanism to align the user's tweet to the most relevant part in the neighbourhood context. Our rationale is that a user would attend to their neighbours' tweets that discuss similar topics. The attention output c i is then concatenated with a user's own tweet h i and the extracted topic z i . We further enrich the representation with the elapsed time τ i between the posting time of the current tweet and the last posted tweet, and add a randomly initialised user vector u to distinguish the user from others. The final representation is passed to a GRU cell for the joint prediction of the posting time and stance label of the next tweet. Temporal Point Process: The goal of NTOM is to forecast the time gap till the next post, together with the stance label. Instead of modelling the time interval value based on regression analysis, we use the GRU (Cho et al., 2014) to simulate the temporal point process.
At each time step, the combined representation is input to the GRU cell to iteratively update the hidden state taking into account the influence of previous tweets: where g i is the hidden state of GRU cell. Given g i , the intensity function is formulated as: Here, H i summarises all the tweet histories up to tweet i, b λ denotes the base density level, the term v T λ g i captures the influence from all previous tweets and w λ t denotes the influence from the instant interval. The likelihood that the next tweet will be posted at the next interval τ given the history is: The expectation for the occurrence of the next tweet can be estimated using: Loss: We expect the predicted interval to be close to the actual interval as much as possible by minimising the Gaussian penalty function: For the stance prediction we employ the crossentropy loss denoted as L stan . The final objective function is computed as: where η, β and γ are coefficients determining the contribution of various loss functions.

Setup
We perform experiments on two publicly available Twitter datasets 2 (Zhu et al., 2020) Figure 2 the number of users versus the number of tweets and found that over 81.6% users have published fewer than 7 tweets, we therefore set the maximum length of the tweet sequence of each user to 7. For users who have published more than 7 tweets, we split their tweet sequence into multiple training sequences of length 7 with an overlapping window size of 1. For each user, we use 90% of their tweets for training and 10% (round up) for testing. Our settings are η = 0.2, β = 0.4 and γ = 0.4. We set the topic number to 50 and the vocabulary size to 3k for the tweet bag-of-words input to VAE. The mini-batch size is 16. We use Adam optimizer with learning rate 0.0005 and learning rate decay 0.9. The evaluation metrics are accuracy for stance prediction and Mean Squared Error (MSE) for posting time prediction. The results are compared against the following baselines: We also perform ablation study on our model by removing the topic extraction component (NTOM -VAE ) or removing the neighbourhood context component (NTOM -context ). In addition, to validate that NTOM does benefit from point process modelling and can better forecast the time and stance of the next tweet, we remove the intensity function (i.e. no Eq. (5)- (7)) and directly use vanilla RNN and its variants including LSTM and GRU to predict the true time interval. Furthermore, to investigate if is is more beneficial to use GCN to encode the neighbourhood context, we learn tweet representation using GCN 3 (Hamilton et al., 2017), which preserves high-order influence in social networks through convolution. As in (Li and Goldwasser, 2019), we use a 2-hop GCN and denote the variant as NTOM -GCN . For the Brexit dataset, MSE is measured in hours, while for the Election dataset it is measured in minutes due to the intensive tweets published within two days.

Results
We report in Table 1 the stance prediction accuracy and MSE scores of predicted posting time. Compared to baselines, NTOM consistently achieves better performance on both datasets, showing the benefit of modelling the tweet posting sequence as a temporal point process. In the second set of experiments, we study the effect of temporal process modelling. The results verify the benefit of using the intensity function, with at least a 2% increase in accuracy and 0.2 decrease in MSE compared with vanilla RNN and its variants. In the ablation study, the removal of neighbourhood context component caused the largest performance decline compared to other components, verifying the importance of social influence in opinion prediction. Removing either VAE (for topic extraction) or intensity function (using only GRU) results in slight drops in stance prediction and more noticeable performance gaps in time prediction. It can be also observed that using GCN to model higher-order influence in social networks does not bring any benefits, possibly due to extra noise introduced to the model.

Visualisation of Topical Attention
To investigate the effectiveness of the context attention that is queried by topics, we first select some example topics from the topic-word matrix in VAE. The label of each topic is manually assigned based on its associated top 10 words. Then we display a tweet's topic distribution together with its neighborhood tweets' topic distribution. We also visualize the attention weights assigned to the 3 neighborhood tweets. Figure 3 illustrates the example topics, topic distribution and attention signals towards context tweets. Here, x 2 and x 4 denote a user's 2 nd and 4 th tweets respectively. The most recent 3 neighborhood tweets are denoted as d 1 , d 2 , d 3 . Blue in the leftmost separate column denotes the attention weights, and each row on top of T 1, T 2 and T 3 denotes the topic distribution. It can be observed that the user's concerned topic shifts from immigration to Boris Johnson in 2 time steps. The drift also appears in the neighbour's tweets. Higher attention weights are assigned to the neighbour's tweets which share similar topical distribution as the user. We can thus infer that the topic vector does help select the most relevant neighborhood tweet.  T1 T2 T3  x4   T1 T2 T3  x2 bbc debate remain team has two aggressive bullies… vote leave on thursday ! make it our independence day …eu is a closed protectionist market we pay than we ever… absolutely correct , so many reasons to vote leave then vote leave for the sake of the fishermen . vote leave #brexit and we will all still be europeans free to do them all

Related Work
The prediction of real-time stances on social media is challenging, partly caused by the diversity and fickleness of users (Andrews and Bishop, 2019). A line of work mitigated the problem by taking into account the homophily that users are similar to their friends (McPherson et al., 2001;Halberstam and Knight, 2016). For example, Chen et al. (2016) gauged a user's opinion as an aggregated stance of their neighborhood users. Linmei et al. (2019) took a step further by exploiting the extracted topics, which discern a user's focus on neighborhood tweets. Recent advances in this strand also include the application of GCNs, with which the social relationships are leveraged to enrich the user representations (Li and Goldwasser, 2019;Del Tredici et al., 2019).
On the other hand, several work has utilized the chronological order of tweets. Chen et al. (2018) presented an opinion tracker that predicts a stance every time a user publishes a tweet, whereas (Zhu et al., 2020) extended the previous work by introducing a topic-dependent attention. Shrestha et al. (2019) considered diverse social behaviors and jointly forecast them through a hierarchical neural network. However, the aforementioned work requires a manual segmentation of a tweet sequence. Furthermore, they are unable to predict when a user will next publish a tweet and what its associated stance is. These problems can be addressed using the Hawkes process (Hawkes, 1971), which has been successfully applied to event tracking (Sri-jith et al., 2017), rumor detection Alvari and Shakarian, 2019) and retweet prediction (Kobayashi and Lambiotte, 2016). A combination of the Hawkes process with recurrent neural networks, called Recurrent Marked Temporal Pointed Process (RMTPP), was proposed to automatically capture the influence of the past events on future events, which shows promising results on geolocation prediction (Du et al., 2016). Benefiting from the flexibility and scalability of neural networks, several work has been done in this vein including event sequence prediction (Mei and Eisner, 2017) and failure prediction (Xiao et al., 2017). Our work is partly inspired by RMTPP, but departs from the previous work by jointly considering users' social relations and topical attentions for stance prediction on social media.

Conclusion
In this paper, we propose a novel Neural Temporal Opinion Model (NTOM) to address users' changing interest and dynamic social context. We model users' tweet posting behaviour based on a temporal point process for the joint prediction of the posting time and stance label of the next tweet. Experimental results verify the effectiveness of the model. Furthermore, visualisation of the topics and attention signals shows that NTOM captures the dynamics in the focused topics and contextual attention.