Modeling Tweet Arrival Times using Log-Gaussian Cox Processes

Research on modeling time series text corpora has typically focused on predicting what text will come next, but less well studied is predicting when the next text event will occur. In this paper we address the latter case, framed as modeling continuous inter-arrival times under a log-Gaussian Cox process, a form of inhomogeneous Poisson process which captures the varying rate at which the tweets arrive over time. In an application to ru-mour modeling of tweets surrounding the 2014 Ferguson riots, we show how inter-arrival times between tweets can be accurately predicted, and that incorporating textual features further improves predictions.


Introduction
Twitter is a popular micro-blogging service which provides real-time information on events happening across the world. Evolution of events over time can be monitored there with applications to disaster management, journalism etc. For example, Twitter has been used to detect the occurrence of earthquakes in Japan through user posts (Sakaki et al., 2010). Modeling the temporal dynamics of tweets provides useful information about the evolution of events. Inter-arrival time prediction is a type of such modeling and has application in many settings featuring continuous time streaming text corpora, including journalism for event monitoring, real-time disaster monitoring and advertising on social media. For example, journalists track several rumours related to an event. Predicted arrival times of tweets can be applied for ranking rumours according to their activity and narrow the interest to investigate a rumour with a short interarrival time over that of a longer one.
Modeling the inter-arrival time of tweets is a challenging task due to complex temporal patterns exhibited. Tweets associated with an event stream arrive at different rates at different points in time.
For example, Figure 1a shows the arrival times (denoted by black crosses) of tweets associated with an example rumour around Ferguson riots in 2014. Notice the existence of regions of both high and low density of arrival times over a one hour interval. We propose to address inter-arrival time prediction problem with log-Gaussian Cox process (LGCP), an inhomogeneous Poisson process (IPP) which models tweets to be generated by an underlying intensity function which varies across time. Moreover, it assumes a non-parametric form for the intensity function allowing the model complexity to depend on the data set. We also provide an approach to consider textual content of tweets to model inter-arrival times. We evaluate the models using Twitter rumours from the 2014 Ferguson unrest, and demonstrate that they provide good predictions for inter-arrival times, beating the baselines e.g. homogeneous Poisson Process, Gaussian Process regression and univariate Hawkes Process. Even though the central application is rumours, one could apply the proposed approaches to model the arrival times of tweets corresponding to other types of memes, e.g. discussions about politics. This paper makes the following contributions: 1. Introduces log-Gaussian Cox process to predict tweet arrival times. 2. Demonstrates how incorporating text improves results of inter-arrival time prediction.

Related Work
Previous approaches to modeling inter-arrival times of tweets (Perera et al., 2010;Sakaki et al., 2010;Esteban et al., 2012;Doerr et al., 2013) were not complex enough to consider their time varying characteristics. Perera et al. inter-arrival times as independent and exponentially distributed with a constant rate parameter. A similar model is used by Sakaki et al. (2010) to monitor the tweets related to earthquakes. The renewal process model used by Esteban et al. (2012) assumes the inter-arrival times to be independent and identically distributed. Gonzalez et al. (2014) attempts to model arrival times of tweets using a Gaussian process but assumes the tweet arrivals to be independent every hour. These approaches do not take into account the varying characteristics of arrival times of tweets.
Point processes such as Poisson and Hawkess process have been used for spatio-temporal modeling of meme spread in social networks (Yang and Zha, 2013;Simma and Jordan, 2010). Hawkes processes (Yang and Zha, 2013) were also found to be useful for modeling the underlying network structure. These models capture relevant network information in the underlying intensity function. We use a log-Gaussian cox process which provides a Bayesian method to capture relevant information through the prior. It has been found to be useful e.g. for conflict mapping (Zammit-Mangion et al., 2012) and for frequency prediction in Twitter (Lukasik et al., 2015).

Data & Problem
In this section we describe the data and we formalize the problem of modeling tweet arrival times.
Data We consider the Ferguson rumour data set (Zubiaga et al., 2015), consisting of tweets on ru-mours around 2014 Ferguson unrest. It consists of conversational threads that have been manually labeled by annotators to correspond to rumours 1 . Since some rumours have few posts, we consider only those with at least 15 posts in the first hour as they express interesting behaviour (Lukasik et al., 2015). This results in 114 rumours consisting of a total of 4098 tweets.
Problem Definition Let us consider a time interval [0, 2] measured in hours, a set of rumours , where x i j is text (in our case a vector of Brown clusters counts, see section 5) and t i j is time of occurrence of post p i j , measured in time since the first post on rumour E i .
We introduce the problem of predicting the exact time of posts in the future unobserved time interval, which is studied as inter-arrival time prediction. In our setting, we observe posts over a target rumour i for one hour and over reference rumours (other than i) for two hours. Thus, the training data set is

Model
The problem of modeling the inter-arrival times of tweets can be solved using Poisson processes (Perera et al., 2010;Sakaki et al., 2010). A homogeneous Poisson process (HPP) assumes the intensity to be constant (with respect to time and the rumour statistics). It is not adequate to model the inter-arrival times of tweets because it assumes constant rate of point arrival across time. Inhomogeneous Poisson process (IPP) (Lee et al., 1991) can model tweets occurring at a variable rate by considering the intensity to be a function of time, i.e. λ(t). For example, in Figure 1a we show intensity functions learnt for two different IPP models. Notice how the generated arrival times vary according to the intensity function values.
Log-Gaussian Cox process We consider a log-Gaussian Cox process (LGCP) (Møller and Syversveen, 1998), a special case of IPP, where the intensity function is assumed to be stochastic. The intensity function λ(t) is modeled using a latent function f (t) sampled from a Gaussian process (Rasmussen and Williams, 2005). To ensure positivity of the intensity function, we consider λ(t) = exp (f (t)). This provides a nonparametric Bayesian approach to model the intensity function, where the complexity of the model is learnt from the training data. Moreover, we can define the functional form of the intensity function through appropriate GP priors.
Modeling inter-arrival time Inhomogeneous Poisson process (unlike HPP) uses a time varying intensity function and hence, the distribution of inter-arrival times is not independent and identically distributed (Ross, 2010). In IPP, the number of tweets y occurring in an interval [s, e] is Poisson distributed with rate Assume that n th tweet occurred at time E n = s and we are interested in the inter-arrival time T n of the next tweet. The arrival time of next tweet E n+1 can be obtained as E n+1 = E n + T n . The cumulative distribution for T n , which provides the probability that a tweet occurs by time s + u can be obtained as 2 p(T n ≤ u) = 1 − p(T n > u|λ(t), E n = s) The derivation is obtained by considering a Poisson probability for 0 counts with rate parameter given by s+u s λ(t)dt and applying integration by substitution to obtain (2). The probability density function of the random variable T n is obtained by taking the derivative of (2) with respect to u: (3) The computational difficulties arising from integration are dealt by assuming the intensity function to be constant in an interval and approximating the inter-arrival time density as (Møller and Syversveen, 1998;Vanhatalo et al., 2013) We associate a distinct intensity function λ i (t) = exp(f i (t)) with each rumour E i as they have varying temporal profiles. The latent function f i is modelled to come from a zero mean Gaussian process (GP) (Rasmussen and Williams, 2005) prior with covariance defined by a squared exponential (SE) kernel over time, k time (t, t ) = a exp(−(t − t ) 2 /l). We consider the likelihood of posts E O i over the entire training period to be product of Poisson distribution (1) over equal length sub-intervals with the rate in a sub-interval [s, e] approximated as (e − s) exp(f i ( 1 2 (s + e))). The likelihood of posts in the rumour data is obtained by taking the product of the likelihoods over individual rumours.
The distribution of the posterior p(f i |E O i ) is intractable and a Laplace approximation (Rasmussen and Williams, 2005) is used to obtain the posterior. The predictive distribution f i (t i * ) at time t i * is obtained using the approximated posterior. The intensity function value at the point t i * is then obtained as Algorithm 1 Importance sampling for predicting the next arrival time 1: Input: Intensity function λ(t), previous arrival time s, proposal distribution q(t) = exp(t; 2), number of samples N 2: for i = 1 to N do 3: Sample u i ∼ q(t).

4:
Obtain weights w i = p(u i ) q(u i ) , where p(t) is given by (4). 5: end for 6: Predict expected inter-arrival time as Predict the next arrival time ast = s +ū. 8: Return:t Importance sampling We are interested in predicting the next arrival time of a tweet given the time at which the previous tweet was posted. This is achieved by sampling the inter-arrival time of occurrence of the next tweet using equation (4). We use the importance sampling scheme (Gelman et al., 2003) where an exponential distribution is used as the proposal density. We set the rate parameter of this exponential distribution to 2 which generates points with a mean value around 0.5. Assuming the previous tweet occurred at time s, we obtain the arrival time of next tweet as outlined in Algorithm 1. We run this algorithm sequentially, i.e. the timet returned from Algorithm 1 becomes starting time s in the next iteration. We stop at the end of the interval of interest, for which a user wants to find times of post occurrences.
Incorporating text We consider adding the kernel over text from posts to the previously introduced kernel over time.
We join text from the observed posts together, so a different component is added to kernel values across different rumours. The full kernel then takes form k TXT ((t, i), (t , i )) = k time (t, t ) + k text We compare text via linear kernel with additive underlying base similarity, expressed by k text (x, x ) = b + cx T x .
Optimization All model parameters (a, l, b, c) are obtained by maximizing the marginal likelihood p(E O i ) = p(E O i |f i )p(f i )df i over all rumour data sets.

Experiments
Data preprocessing In our experiments, we consider the first two hours of each rumour lifespan. The posts from the first hour of a target rumour is considered as observed (training data) and we predict the arrival times of tweets in the second hour. We consider observations over equal sized time intervals of length six minutes in the rumour lifespan for learning the intensity function. The text in the tweets is represented by using Brown cluster ids associated with the words. This is obtained using 1000 clusters acquired on a large scale Twitter corpus (Owoputi et al., 2013).
Evaluation metrics Let the arrival times predicted by a model be (t 1 , . . . ,t M ) and let the actual arrival times be (t 1 , . . . , t N ). We introduce two metrics based on root mean squared error (RMSE) for evaluating predicted inter-arrival times. First is aligned root mean squared error (ARMSE), where we align the initial K = min(M, N ) arrival times and calculate the RMSE between such two subsequences.
The second is called penalized root mean squared error (PRMSE). In this metric we penalize approaches which predict a different number of inter-arrival times than the actual number. The PRMSE metric is defined as the square root of the following expression.
The second and third term in (5) respectively penalize for the excessive or insufficient number of points predicted by the model. to 1000 (above the maximum count yielded by any rumour from our dataset), thus reducing the error from this method. We also compare against Hawkes Process (HP) (Yang and Zha, 2013), a self exciting point process where an occurrence of a tweet increases the probability of tweets arriving soon afterwards. We consider a univariate Hawkes process where the intensity function is modeled as λ i (t) = µ + t i j <t k time (t i j , t). The kernel parameters and µ are learnt by maximizing the likelihood. We apply the importance sampling algorithm discussed in Algorithm 1 for generating arrival times for Hawkess process. We consider this baseline only in the single-task setting, where reference rumours are not considered.
LGCP settings In the case of LGCP, the model parameters of the intensity function associated with a rumour are learnt from the observed interarrival times from that rumour alone. LGCP Pooled and LGCPTXT consider a different setting where this is learnt additionally using the interarrival times of all other rumours observed over the entire two hour life-span.
Results Table 1 reports the results of predicting arrival times of tweets in the second hour of the rumour lifecycle. In terms of ARMSE, LGCP is the best method, performing better than LGCP-TXT (though not statistically significantly) and outperforming other approaches. However, this metric does not penalize for the wrong number of predicted arrival times. Figure 1b depicts an example rumour, where LGCP greatly overesti-mates the number of points in the interval of interest. Here, the three points from the ground truth (denoted by black crosses) and the initial three points predicted by the LGCP model (denoted by red pluses), happen to lie very close, yielding a low ARMSE error. However, LGCP predicts a large number of arrivals in this interval making it a bad model compared to LGCPTXT which predicts only four points (denoted by blue dots). ARMSE fails to capture this and hence we use PRMSE. Note that Hawkes Process is performing worse than the LGCP approach.
According to PRMSE, LGCPTXT is the most successful method, significantly outperforming all other according to Wilcoxon signed rank test. Figure 1a depicts the behavior of LGCP and LGCP-TXT on rumour 39 with a larger number of points from the ground truth. Here, LGCPTXT predicts relatively less number of arrivals than LGCP. The performance of Hawkes Process is again worse than the LGCP approach. The self excitory nature of Hawkes process may not be appropriate for this dataset and setting, where in the second hour the number of points tends to decrease as time passes.
We also note, that GPLIN performs very poorly according to PRMSE. This is because the interarrival times predicted by GPLIN for several rumours become smaller as time grows resulting in a large number of arrival times.

Conclusions
This paper introduced the log-Gaussian Cox processes for the problem of predicting the interarrival times of tweets. We showed how text from posts helps to achieve significant improvements. Evaluation on a set of rumours from Ferguson riots showed efficacy of our methods comparing to baselines. The proposed approaches are generalizable to problems other than rumours, e.g. disaster management and advertisement campaigns.