Public Sentiment Drift Analysis Based on Hierarchical Variational Auto-encoder

Detecting public sentiment drift is a challenging task due to sentiment change over time. Existing methods ﬁrst build a classiﬁcation model using historical data and subsequently detect drift if the model performs much worse on new data. In this paper, we focus on distribution learning by proposing a novel Hierarchical Variational Auto-Encoder (HVAE) model to learn better distribution representation, and design a new drift measure to directly evaluate distribution changes between historical data and new data. Our experimental results demonstrate that our proposed model achieves better results than three existing state-of-the-art methods.


Introduction
Public sentiments, whose information hidden in a temporal sequence of documents, become increasingly valuable for real-world applications. Especially, identifying when public drifts occur is of great importance to different stakeholders, such as government agencies, companies and news agencies, where they can take proactive actions to avoid damages and pay close attentions to new topics/sentiments etc (Hu et al., 2017). However, the dynamic nature of drifts/changes makes it a challenging problem, although concept drift analysis can be applied to focus on detecting variation of data distributions over time. Given historical data and new incoming data, how to accurately detect drift based on distributional change is a critical issue.
While existing methods are proposed for sentiment analysis Wu et al. (2019); Fu et al. (2019); Kong et al. (2019); Hoang et al. (2019); , etc., many of them are not designed for sentiment drift detection task. Xia et al. (2016) focuses on polarity shift detection, but it is at document-level instead of multi-document (or public) level. Recently, some research have been conducted for stream sentiment classification Iosifidis et al. (2017). In particular, statistic processing control (SPC) Ross et al. (2012); Raza et al. (2015) built a detection mechanism which accumulates statistic information of drift indications, and Zhou et al. (2018) applied the mechanism in sentiment drift detection. In addition, Bifet and Gavalda (2007) proposed ADWIN method, which used the upper bound of Hoeffding's inequality to mark drifts. Additionally, Nguyen et al. (2018) combined ADWIN with variational inference and built an online classification system. Wang et al. (2013) proposed an opinion drift detection method which is thresholdbased, restricting its applications. A novel framework, proposed by Liu et al. (2016), contains an opinion shift detector based on KL-divergence, while its detection performance is affected by its labeling results. Tsytsarau and Palpanas (2016) defined a novel concept of opinion contradictions and used it in a sentiment change detection experiment. However, the pair-wise method does not involve much history information.
We observe most of the above methods indirectly detect drifts instead of directly evaluate distribution difference, leading to less effective results. The Variational Auto-Encoder (VAE) model, proposed by Kingma and Welling (2014), is capable to learn latent distributions of inputs, and has better generalization performance (Zhao et al., 2018). As such, we propose a novel Hierarchical Variational Auto-Encoder (HVAE) model to tackle sentiment drift problem. In particular, we take sentiment distribution changes as the drifts. Practically, sentiments are represented into a 2D vector, whose dimensions and values correspond to polarities (positive or negative) and corresponding intensities respectively. Our main contributions can be summarized as follows: 1. We propose a HVAE model, which designs 3-level meta-distributions to extend VAE over hierarchical structure, enabling effective learning latent distribution representations of input sentiments.
2. We propose a new drift detection measure to compare historical and new data distributions learned by the proposed HVAE.
3. Extensive experimental results on real-world data demonstrate our proposed model is significantly better than state-of-the-arts for sentiment drift detection.

Methodology
We now introduce our proposed methodology, including HVAE model and drift measure.  In the bottom level of Fig. 1, each document s is viewed as a sample from the middle level distribution of z ; On the same principle, each z distribution is sampled from its corresponding top level meta-distribution z.

HVAE Model
The HVAE is composed of an encoder and a decoder, where the encoder infers latent distributions from inputs. In Eq. 1 and 2, the meta-distribution among input sentiments s On the other hand, the decoder module is applied to generate inputs by making use of the learned latent variables. In Eq. 5 and 7, z and z i are sampled  1:L N } are historical data in window, and logp(S) is the log-likelihood for fitting inputs. The Evidence Lower Bound (ELBO) of the log-likelihood is increased through training, and latent meta-distribution are learned at the same time.
Note at the bottom level of HVAE, input information of each time period are compressed to a less noisy condensed representation in middle level distribution form. In the same spirit, information from middle level time periods of a window are also compressed into a concise meta-distribution. Through training, the representativeness of all metadistributions are increased. As such, the follow- up drift comparison step can benefit significantly from the better learned distribution representations, where certain level of data chaotic issue that frequently occurs in sentiment drift will be more tolerated than existing methods.

Drift Measure
The drift measure evaluates the distributional difference between historical and newly arrived data. Note this is different from existing methods which are typically based on classifier performance degradation; here we directly compare data distribution and thus more effective. In particular, we choose two distributions, namely z |z and z new |S new , for drift measuring. Difference between two distributions are illustrated as shading area in Fig. 2. Obviously, the bigger the size of shading area, the smaller the similarity between the new and history data distribution, which has become a part of built-in drift detection algorithm of HVAE model. Mathematically, the irregular shading area can be computed as the integration of distribution difference. Correspondingly, we propose a new measure, namely, Accumulation of Distribution Differences (ADD) as Eq. 10. The parameters of latent distribution over newly arrived z new are (µ φnew , σ φnew ), and the parameters of latent distribution within latest historical data window z |z are (µ θ , σ θ ). The p is the drift indicator, whose value is larger when the drift is more significant and vice versa.
More specifically, we compute the intersections of the two distribution curves, which are named as x 1 and x 2 (x 1 = x 2 when there is only one intersection, i.e. both distributions are same, and make x 1 ≤ x 2 valid). The intersections split the curves into segments whose probability difference accumulations are normalized to [0,1] as our final drift score. Figure 2: The difference between two gaussian distributions, which is represented as the shading area, the x 1 and x 2 are intersections of distributions. The reason of gaussian distribution assumption is that mean sentiment of across multiple documents is more likely to perform as gaussian according to the Central Limit Theorem.
Under the condition of processing extremely unstable inputs (big drifts), many of the values will be very close to 1, which decreases system performance. Hence, for increasing sparsity, the score can be squared to be the final drift score, i.e., p 2 from Eq. 11, named as ADD 2 .
Through the ADD, all sentiment drifts of each time period is collected to current window W . Whether to update the HVAE with new data in the window can be viewed as a Bernoulli experiment, which applies the parameterp i = mean(p 1 , p 2 , · · · , p N ) and deviation σ i = p i (1 −p i )/i. The parameters are inputted to SPC method (Bouchachia, 2011) for drift detection, and retrain/update model with next window once it alarms for drift occurring. If SPC does not incur alarm, the window moves one time period and meanwhile obtains new parameters as new data arrives.

Experiments
We have conducted extensive experiments to evaluate our proposed HVAE model.

Datasets & Baselines
We employ two datasets for our experiments, including Twitter data and CIRCLES data. Twitter Sentiment 1 , containing 1.6m tweets created from 2009-04-06 to 2009-06-25. The dataset is split by hours (time periods), and we delete those periods with empty categories, resulting in 432 periods. Note inputs of tweets are sentiment labels whose format is one-hot.
The second dataset is CIRCLES, which is taken from Gama et al. (2004) and sampled from a uniform distribution in which x ∈ [0, 1.2], y ∈ [0, 1]. It contains 40 data blocks, each of them contains 10 time periods which apply one of four kind category boundaries (in Tab. 2). Each time period contains 100 2-dimensions numeric vectors. The CIRCLES data is noise-free and used for simulating gradual drift scenario. Both datasets are used to validate model performance, representing both noise-free (ideal) and real-world scenario. We compared with three state-of-the-art systems, including: 1) Nguyen et al. (2018), the VAE cooperated with a built-in drift detection method, 2) Zhou et al. (2018), an improved EWMA algorithm (Raza et al., 2015) based on statistic chart, 3) Iosifidis et al. (2017), stream sentiment classification based method.

Experimental Settings
Given a sequence of tweets, we will detect sentiment drifts across different time periods (cutting points). As such, drift detection is treated as sequence segmentation task, and the better segmentations, the higher the overall accuracy. Therefore, experiment results of our model and all baselines are compared with the same metric, i.e., overall accuracy. For running Nguyen et al. (2018) and Zhou et al. (2018) on the two datasets, their drift detection components are implemented and tested. Our experimental settings are the same with existing methods Iosifidis et al. (2017). Accumulative multinomial Naive Bayes (Accumulative MNB) is employed as the sentiment classifier. Data is processed from a new time period according to the principle of prequential evaluation, and drift adaption is done in an rebuild way. Specifically, if drift does not occur when a new period arrives, labels of data are predicted and then appended to the training set to retrain the classifier. Otherwise, the 1 http://help.sentiment140.com/ current training set is abandoned, and the classifier is updated with a new window. In the case of the CIRCLES, each detected drift is viewed as a category boundary switch and data before the drift are evaluated by the previous boundary.
For ablation experiment, several variations of HVAE models are generated: N o D does not apply decoder module, while N o E does not apply encoder module. Finally P lain has only one level meta-distribution. Table 3 shows HVAE with ADD 2 setting achieves the best accuracy, i.e. 0.55% better than HVAE with ADD. In addition, HVAE is 3. 97%, 4.33%, 8.75% better than Nguyen et al. (2018), Zhou et al. (2018) and Iosifidis et al. (2017) respectively, indicating it is extremely effective for sentiment drift detection.  Table 4 shows HVAE with ADD 2 once again achieves best result and is 10.5% and 13.5% better than two existing methods. Note we did not compare with Iosifidis et al. (2017), as it cannot be applied to numeric data. For ablation study, we can clearly see the importance of our proposed 3-level hierarchical structure for meta-distribution learning, encoder and decode modules respectively. According to results, HVAE achieves better performance than all baselines and ablation models, which validate the effectiveness of several novelties in our proposed model. It is obvious that all models have much better results on tweets than CIRCLES, as gradual drift detection is more difficult. Generally, the resutls of ADD 2 measure algorithm is superior than the ADD, except for results which apply Plain model with the Twitter dataset. Since the sentiment fluctuation of tweets are stable in most of time periods, the disadvantage of ADD may not be so obvious. Moreover, the Plain manner without latent meta-distributions is lack of generalization capability, which performs much worse than the superior ADD 2 . In the ablation experiment on CIRCLES, different from results with Twitter data, N o E performance worst and Plain is best. This is because the CIRCLES data are sampled from uniform distribution, but the N o E model is based on Gaussian distribution assumption and it lacks of Encoder part for input fitting. The Plain suffers less performance loss from distribution assumption since it has no meta-distribution structure. The results indicate the importance of HVAE's all innovative components where we need to extract latent distributions from inputs (encoder) as well as fit them with the distributions (decoder).

Conclusions
To tackle challenges in sentiment drift detection, we have proposed a novel HVAE model, which is 3-level meta-distributions to extend VAE over hierarchical structure, leading to effective learning latent distribution representations of input sentiments. In addition, a new drift measure is designed to effectively measure distribution difference between historical and newly arrived data. Different from existing classifier and threshold based models, the proposed method directly measures the distribution differences and thus is more effective. Finally, extensive experimental results demonstrate that HVAE performs significantly better than three state-of-the-art techniques across two benchmark datasets, indicating that it can be effectively used for real-world public sentiment drift analysis.