Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model

Online petitions are a cost-effective way for citizens to collectively engage with policy-makers in a democracy. Predicting the popularity of a petition — commonly measured by its signature count — based on its textual content has utility for policymakers as well as those posting the petition. In this work, we model this task using CNN regression with an auxiliary ordinal regression objective. We demonstrate the effectiveness of our proposed approach using UK and US government petition datasets.


Introduction
A petition is a formal request for change or an action to any authority, co-signed by a group of supporters. Research has shown the impact of online petitions on the political system (Lindner and Riehm, 2011;Hansard, 2016;Bochel and Bochel, 2017). Modeling the factors that influence petition popularity -measured by the number of signatures a petition gets -can provide valuable insights to policy makers as well as those authoring petitions (Proskurnia et al., 2017).
Previous work on modeling petition popularity has focused on predicting popularity growth over time based on an initial popularity trajectory (Hale et al., 2013;Yasseri et al., 2017;Proskurnia et al., 2017), e.g. given the number of signatures a petition gets in the first x hours, prediction of the total number of signatures at the end of its lifetime. Asher et al. (2017) and Proskurnia et al. (2017) examine the effect of sharing petitions on Twitter on its overall success, as a time series regression task. Other work has analyzed the importance of content on the success of the petition (Elnoshokaty et al., 2016). Proskurnia et al. (2017) also consider the anonymity of authors and petitions featured on the front-page of the website as additional factors. Huang et al. (2015) analyze 'power' users on petition platforms, and show their influence on other petition signers.
In general, the target authority for a petition can be political or non-political. In this work, we use petitions from the official UK and US government websites, whereby citizens can directly appeal to the government for action on an issue. In the case of UK petitions, they are guaranteed an official response at 10k signatures, and the guarantee of parliamentary debate on the topic at 100k signatures; in the case of US petitions, they are guaranteed a response from the government at 100k signatures. Political scientists refer to this as advocacy democracy (Dalton et al., 2003), in that people are able to engage with elected representatives directly. Our objective is to predict the popularity of a petition at the end of its lifetime, solely based on the petition text. Elnoshokaty et al. (2016) is the closest work to this paper, whereby they target Change.org petitions and perform correlation analysis of popularity with the petition's category, target goal set, 2 and the distribution of words in General Inquirer categories (Stone et al., 1962). In our case, we are interested in the task of automatically predicting the number of signatures.
We build on the convolutional neural network (CNN) text regression model of Bitvai and Cohn (2015) to infer deep latent features. In addition, we evaluate the effect of an auxiliary ordinal regression objective, which can discriminate petitions that attract different scales of popular-ity (e.g., 10 signatures, the minimum count needed to not be closed vs. 10k signatures, the minimum count to receive a response from UK government).
Finally, motivated by text-based message propagation analysis work (Tan et al., 2014;Piotrkowicz et al., 2017), we hand-engineer features which capture wording effects on petition popularity, and measure the ability of the deep model to automatically infer those features.

Proposed Approach
Inspired by the successes of CNN for text categorization (Kim, 2014) and text regression (Bitvai and Cohn, 2015), we propose a CNN-based model for predicting the signature count. An outline of the model is provided in Figure 1. A petition has three parts: (1) title, (2) main content, and (3) (optionally) additional details. 3 We concatenate all three parts to form a single document for each petition. We have n petitions as input training examples of the form {a i , y i }, where a i and y i denote the text and signature count of petition i, respectively. Note that we log-transform the signature count, consistent with previous work (Elnoshokaty et al., 2016;Proskurnia et al., 2017).
We represent each token in the document via its pretrained GloVe embedding (Pennington et al., 2014), which we update during learning. We then apply multiple convolution filters with width one, two and three to the dense input document matrix, and apply a ReLU to each. They are then passed through a max-pooling layer with a tanh activation function, and finally a multi-layer perceptron via the exponential linear unit activation, to obtain the final output (y i ), which is guaranteed to be positive. We train the model by minimizing mean squared error in log-space, whereŷ i is the estimated signature count for petition i. We refer to this model as CNN regress .

Auxiliary Ordinal Regression Task
We augment the regression objective with an ordinal regression task, which discriminates petitions 3 Applicable for the UK government petitions only.

Custom Features
Fully Connected Layers Figure 1: CNN-Regression Model. y denotes signature count. > r k is the auxiliary task that denotes p(petition attracting > r k signatures).
that achieve different scale of signatures. The intuition behind this is that there are pre-determined thresholds on signatures which trigger different events, with the most important of these being 10k (to guarantee a government response) and 100k (to trigger a parliamentary debate) for the UK petitions; and 100k (to get a government response) for the US petitions. In addition to predicting the number of signatures, we would like to be able to predict whether a petition is likely to meet these thresholds, and to this end we use the exponential ordinal scale based on the thresholds O = {10, 100, 1000, 10000, 100000}. 4 Overall this follows the exponential distribution of signature counts closely (Yasseri et al., 2017). We transform the ordinal regression problem into a series of simpler binary classification subproblems, as proposed by Li and Lin (2007). We construct binary classification objectives for each threshold in O. For each petition i we construct an additional binary vector o i , with a 0-1 encoding for each of the ordinal classes ({a i ,y i , o i }). Note that the transformation is done in a consistent way, i.e., if a petition has y signatures, then in addition to immediate lower-bound threshold in O determined by l = log 10 y (for y < 10 6 ), all classes which have a lesser threshold are also set to 1 (o t:t<l ).
With this transformation, apart from the realvalued output y i , we also learn a mapping from h i with sigmoid activation for each class ( r i ). Finally we minimize cross-entropy loss for each binary classification task, denoted L aux .
Overall, the loss function for the joint model is: where γ ≥ 0 is a hyper-parameter which is tuned on the validation set. We refer to this model as CNN regress+ord .

Hand-engineered Features
We hand-engineered custom features, partly based on previous work on non-petition text. This includes features from Tan et al. (2014) and Piotrkowicz et al. (2017) such as structure, syntax, bias, polarity, informativeness of title, and novelty (or freshness), in addition to novel features developed specifically for our task, such as policy category and political bias features. We provide a brief description of the features below: • Additional Information (ADD): binary flag indicating whether the petition has additional details or not. The custom features are passed through a hidden layer with tanh activations (c i ), and concatenated with the hidden representation learnt using the dense input document (Section 2), h i c i , before mapping to the output layer ( Figure 1). We refer to this model as CNN regress+ord+feat . We use the Adam optimizer (Kingma and Ba, 2014) to train all our models.

Evaluation
We collected our data from the UK 6 and US 7 government websites over the term of the 2015-17 Conservative and 2011-14 Democratic governments respectively. The UK dataset contains 10950 published petitions, with over 31m signatures in total. We removed US petitions with ≤ 150 signatures, resulting in a total of 1023 petitions, with over 12m signatures in total. We split the data chronologically into train/dev/test splits based on a 80/10/10 breakdown. Distribution over log signature counts is given in Figures 2 and 3.
To analyze the statistical significance of each feature varying across ordinal groups O, we ran a Kruskal-Wallis test (at α = 0.05: Kruskal and Wallis (1952)) on the training set. The test results in the test statistic H and the corresponding p-value, with a high H indicating that there is a difference between the two groups. The analysis is given in Table 2, where p < 0.001, p < 0.01 and p < 0.05 are denoted as "***", "**" and "*", respectively. Note that the ordinal groups are different for the two datasets: analyzing the UK dataset with the same ordinal groups used for the US dataset ({1000,10000,100000}) resulted in a similarly sparse set of significance values for nonsyntactic features as the US dataset.
We benchmark our proposed approach against the following baseline approaches: Mean: average signature count in the raining set. Linear BoW : linear regression (Linear) model using TF-IDF weighted bag-of-words features. Linear GI : linear regression model based on word distributions from the General Inquirer lexicon; similar to (Elnoshokaty et al., 2016), but without the target goal set or category of the petition (neither of which is relevant to our datasets). SVR BoW : support vector regression (SVR) model with RBF kernel and TF-IDF weighted bag-ofwords features. SVR feat : SVR model using the hand-engineered features from Section 3. SVR BoW+feat : SVR model using combined TF-IDF weighted bag-of-words and handengineered features. We present the regression results for the baseline and proposed approaches based on: (1) mean absolute error (MAE), and (2) mean absolute percentage error (MAPE, similar to Proskurnia et al. (2017)), calculated as 100 n n i=1 |ŷ i −y i | y i . Results are given in Table 1.
The proposed CNN models outperform all of the baselines. Comparing the CNN model with regression loss only, CNN regress , and the joint model, CNN regress+ord is superior across both datasets and measures. When we add the handengineered features (CNN regress+ord+feat ), there is a very small improvement. In order to further understand the effect of the hand-engineering features without the ordinal regression loss, we use it only with the regression task (CNN regress+feat ), which mildly improves over CNN regress , but is below CNN regress+ord+feat . We also evaluate a variant of CNN regress+ord+feat with an additional hidden layer, given in the final row of Table 1, and find it to lead to further improvements in the regression results. Adding more hidden layers did not show further improvements.

Classification Performance
The F-score is calculated over the three classes of [0, 10000), [10000, 100000) and [100000, ∞) (corresponding to the thresholds at which the petition leads to a government response or parliamentary debate) for the UK dataset;and [150, 100000) and [100000, ∞) for the US dataset, by determining if the predicted and actual signature counts are in the same bin or not. We also built an SVM-based ordinal classifier (Li and Lin, 2007) over the significant ordinal classes, as an additional baseline. The CNN models struggle to improve F-score (in large part due to the imbalanced data). For the UK dataset, CNN models with an ordinal objective (CNN regress+ord and CNN regress+ord+feat ) result in a macro-averaged Fscore of 0.36, compared to 0.33 for all other methods. But for the US dataset, which is a binary classification task, all methods obtain a 0.49 F-score. In addition to text, considering other factors such as early signature growth (Hale et al., 2013) which determines the timeliness to get the issue online on the US website -could be necessary.

Latent vs. Hand-engineered Features
Finally, we built a linear regression model with the estimated hidden features from CNN regress+ord as independent variables and hand-engineered features as dependent variables, to study their linear dependencies in a pair-wise fashion. The most significant dependencies (given by p-value, p hidden ) over the test set are given in Table 2. We found that the model is able to learn latent feature representations for syntactic features (NNC, VBC, ADC, 8 RBC 9 ), FRE, NEC, IND and DEF, 8 but not the other features -these can be considered to provide deeper information than can be extracted automatically from the data, or else information that has no utility for the signature prediction task. From the analysis in Table 2 Table 2: Dependency of hand-engineered features against the signature count (p and H) and deep hidden features (p hidden ). ADD is not applicable for the US government petitions dataset. p < 0.001, p < 0.01 and p < 0.05 are denoted as "***", "**" and "*", respectively.
Overall our proposed approach with the auxiliary loss and hand-engineered features (CNN regress+ord+feat ) provides a reduction in MAE over CNN regress by 2.1% and 3.2%, and SVR by 7.2% and 13.7% on the UK and US datasets, resp. Although the ordinal classification performance is not very high, it must be noted that the data is heavily skewed (only 2% of the UK test-set falls in the [10000, 100000) and [100000, ∞) bins put together), and we tuned the hyper-parameters wrt the regression task only.

Conclusion and Future Work
This paper has targeted the prediction of the popularity of petitions directed at the UK and US gov-ernments. In addition to introducing a novel task and dataset, contributions of our work include: (a) we have shown the utility of an auxiliary ordinal regression objective; and (b) determined which hand-engineered features are complementary to our deep learning model. In the future, we aim to study other factors that can influence petition popularity in conjunction with text, e.g., social media campaigns, news coverage, and early growth rates.