Party Matters: Enhancing Legislative Embeddings with Author Attributes for Vote Prediction

Predicting how Congressional legislators will vote is important for understanding their past and future behavior. However, previous work on roll-call prediction has been limited to single session settings, thus not allowing for generalization across sessions. In this paper, we show that text alone is insufficient for modeling voting outcomes in new contexts, as session changes lead to changes in the underlying data generation process. We propose a novel neural method for encoding documents alongside additional metadata, achieving an average of a 4% boost in accuracy over the previous state-of-the-art.


Introduction
Quantitative analysis of the voting behavior of legislators has long been a problem of interest in political science, and recently in NLP as well (Gerrish and Blei, 2011;Kraft et al., 2016). One of the most popular techniques in political science for modeling legislator behavior is the application of spatial, or ideal point, models built from voting records (Poole and Rosenthal, 1985;Clinton et al., 2004), that are often used to represent uni-dimensional or multi-dimensional ideological stances. While roll call votes (i.e Congressional voting records) provide explanatory power about a legislators position with respect to previously voted-on bills, these models are limited to in-sample analysis, and are thus incapable of predicting votes on new bills.
To address this limitation, recent work has introduced methods that take advantage the text of the bill, along with the voting records, to model Congressional voting behavior (Gerrish and Blei, 2011;Nguyen et al., 2015;Kraft et al., 2016). This work is related to a long line of studies on using political text to model behavior, ranging over political books, Supreme Court decisions, speeches and Twitter (Mosteller and Wallace, 1963;Thomas et al., 2006;Yu et al., 2008;Sim et al., 2016;Iyyer et al., 2014a;Sim et al., 2013;Preoţiuc-Pietro et al., 2017).
In addition to enabling prediction, associating text with ideology allows for a further degree of interpretability. However, all previous work incorporating text into roll call prediction have limited their evaluation to in-session training and testing. 1 As legislators typically serve for multiple sessions, and similar bills are proposed across sessions, we want to be able to leverage this data across sessions to inform our model. However, the generalizability of previous methods to a crosssession setting is unknown.
In this work, we explore the problem of roll call prediction across sessions. We show that previous methods are unable to generalize across sessions, thus suggesting that current text representations are not sufficient for modeling voting outcomes in new contexts. We hypothesize that each session has a different underlying data generation process, wherein the ideological position of the observed bills varies depending on the controlling party. This is supported by the observation that about 75% of bills up for a vote in a given session have a sponsor in the party in power.
As noted in Linder et al. (2018), the policy area, or topic, of the bill, and the ideological position, are two separate dimensions underlying the text. Since legislators tend to sponsor bills that are ideologically aligned with them, a model trained on a single session will mostly be exposed to bills with a specific ideology on each topic. Thus, a single session model may get the ideology information as an implicit prior without needing to explicitly capture it. This challenge was not obvious in previous studies that were limited to a single session. Across sessions, however, the ideological prior on a given topic changes, resulting in variations in voting patterns that are not captured by current text modeling methodologies alone.
In applications where the text may contain an insufficient signal, researchers may turn to additional metadata features. This technique has previously been used in various contexts, such as incorporating sponsor and committee features for predicting bill committee survival (Yano et al., 2012), and enhancing tweet recommendations with location data (Xing and Paul, 2017).
We propose a neural architecture that directly models the ideological variation across sessions using metadata about the bill sponsors, and show that this can strongly improve performance with little overhead to complexity and training time.

Model
Spatial voting models assume that a legislator has a numeric ideal point which represents their ideology. They make voting decisions on bills, which also have a numeric representation. While the details of the implementation vary, 2 spatial voting models share the idea that the closer a bill's representation is to a legislator's ideal point the more likely the legislator is to vote yes.
Following this framework, we model the core vote prediction problem as follows: Given a legislator, L, and a bill, B, predict their vote y, with possible outcomes: yes or no.
Using these inputs, let v L be an embedding representing the legislator, and v B be the bill embedding. First, v B is projected into the legislator embedding space: where W B and b B are a weight matrix and a bias vector, respectively. Then, we measure the alignment between the two vectors. Previous work used a dot-product for this step, instead, we express the comparison as follows: where represents element-wise multiplication, and W v is a weight vector of the same dimensions as v L . Finally, we apply a sigmoid activation function to get the vote prediction: Using this architecture, we develop several novel bill representations. First, we consider different text-only representations, then we show how to incorporate metadata.

Text Model
Previous work incorporating text has primarily been based on topic models (Gerrish and Blei, 2011;Lauderdale and Clark, 2014;Nguyen et al., 2015) and embeddings (Kraft et al., 2016). As the embedding framework achieved superior performance, we adopt a similar architecture. While Kraft et al. (2016) represented the text using a mean word embedding (MWE) representation, we replace it with a Convolutional Neural Network (CNN) representation (Kim, 2014), which has achieved superior performance on recent text classification tasks (Dauphin et al., 2016;Wen et al., 2016;Yang et al., 2016). Our CNN uses 4-grams and 400 filter maps.

Sponsor Metadata
We posit that a legislator's voting behavior is influenced both by the topic and the ideology of a bill. A legislator may be more liberal on one issue and more conservative on another. Thus, we need to capture both aspects. While previous work has shown that text alone contains ideological information (Iyyer et al., 2014b), the metadata of the bill may be a stronger source, especially for ideology. This approach has had success in the related problem of bill committee survival, 3 where signals about the sponsors, committee and chamber were used in conjunction with text models (Yano et al., 2012).
We use this idea to improve our bill representations. One particularly strong signal is the author of the bill, because of their ideological motives. For simplicity, we represent the bill's authorship as the percentage of Republican and Democrat sponsors (p r and p d ). We propose that the Republican and Democratic sponsors influence the text of the bill in different ways. To obtain the overall ideological position of the bill, we combine the versions of the bill influenced by each party. The final bill can thus be represented as follows: where T r and T d are the Republican and Democratic copies of the text representation (e.g MWE or CNN); p r and p d are the scalars representing the percentage of sponsors from each party (e.g 0.7 and 0.3); and a r and a p are vectors representing how the percentages should influence each dimension of the text embedding. The larger p r or p d is, the stronger the influence of that party on the bill.
We test two text representations for T r and T d : one using MWEs and one using CNNs. The underlying word embeddings are initialized with 50d GloVE vectors (Pennington et al., 2014) and are non-static during training.
The rest of the model weights are initialized randomly with the glorot uniform distribution (Glorot and Bengio, 2010). The length of v L is set to 25. All models are trained using binary crossentropy loss and optimized with the AdaMax algorithm (Kingma and Ba, 2014). The models are trained for 50 epochs, using mini-batches of size 50.

Dataset
Our dataset was collected from GovTrack, 4 and consists of nonunanimous roll call votes and texts of resolutions and bills introduced in the 106th to 111th Congressional sessions. 5 We also collect the bill summaries written by the Congressional Research Service 6 (a non-partisan organization), that provide shorter descriptions of the key actions in each bill. All text is preprocessed by lowercasing and removing stop-words.
As bills are often much longer than the typical document encountered in other NLP tasks, with an average of 2683 words per bill, and some bills having hundreds of pages, with correspondingly 4 https://theunitedstates.io/ 5 We exclude bills with unanimous votes because these are typically associated with routine matters (for example, the naming a post office or an official commendation) that do not contain ideological motivation. We consider bills where less than 1% of legislators voted 'no' to be unanimous; about 42% of bills fall into this category. 6 https://www.congress.gov/help/ legislative-glossary/

Results
To understand how sponsor parties and text interact in the input, and how our predictive power changes when testing on in-session bills and outof-session bills. We test the following models: • MWE: mean word embedding text model as described in Kraft et al. (2016) using summaries; • MWE+FT: MWE model using full bill text; • CNN: text model from Section 2.1 over summaries; • MWE+Meta: MWE representation combined with metadata as described in Section 2.2; • CNN+Meta: like MWE+Meta but using a CNN instead of averaging; • MWE+Meta+FT: As above using full bill text; • Meta-Only: A variation on MWE+Meta that uses the same, random "dummy" text for all the bills, only changing the metadata (p r and p d ).
Each model is first evaluated in-session, where both train and test bills come from the same set of sessions, and thus same distribution, and then outof-session, where training bills are from one set of sessions and the model is evaluated on a different set. All results are presented in Table 3.

In-session Results
We evaluate our models with accuracy on 5-fold cross-validation. All three models combining text with metadata perform significantly better than the others, showing that the text and meta information have complimentary predictive power, and that our models' sponsor-augmented text representation is able to capture the ideological preference. The CNN+Meta achieves the highest accuracy of 86.21, followed by MWE+Meta at 85.96, showing that the CNN learns a somewhat better text representation than MWE. Compare this to the baseline MWE model without meta information, which achieves an accuracy of 81.10, only slightly better than the Meta-Only model at 80.27. Contrary to our hypothesis, MWE achieves higher accuracy than Meta-Only. However, it remains unclear whether this signal is related to ideology or other contextual information. The performance on the out-of-session setting will determine whether this signal is akin to ideology.

Out-of-session Results
In this setting, on both test sessions, text with meta information achieves the best performance as well. On the 2013-2014 session, the CNN+Meta model does the best at 83.59. Unlike the in-session setting, Meta-only does better than the text-only   CNN). This supports the theory that within the sessions we are able to capture contextual ideology from the text, but once we move to a new session the text models no longer contain an accurate representation of the Congressional ideology.
While in other experiments we are able to achieve at least a 17% improvement over the Guess Yes baseline, on 2015-2016, the best model, MWE+Meta, is only able to achieve a 10% gain. During this session divisions arose within the Republican party in the House of Representatives that disrupted the typical voting dynamics. 7 Unlike 2013-2014, the Meta-Only model does worse than the text ones; however, the gap between them is much smaller.

Overall Analysis
These experiments provide several interesting insights. First, because using both text and metadata (MWE+Meta or CNN+Meta) results in the strongest model in every case, we confirm that legislators vote based on both the topic and the ideology of the bill.
Second, the text-only models do significantly worse on the out-of-session tests than the insession ones. This confirms our theory that session-specific contextual information is implicitly captured by the previous single-session models, but that context is not accurate in new sessions. If we were capturing ideology from the text, then the text only model should have performed well out-of-session.
Third, to further examine whether a neural model was the best technique for modeling text with metadata, we trained a SVM model over the bag-of-words representation of the summary, indicator variables for the legislators and the percent of bill sponsors in each party (e.g p d ). This model did not perform as well as either MWE or Meta-Only, showing that the embedding approach is better at representing this combination of features.
Finally, the models that embed the full text (+FT) generally perform worse than embedding the summaries. While this confirms that the summary contains sufficient information about the topics and the actions in the bill, we did not fully explore the bill text.

Future Work
While Congress introduces close to 20, 000 bills every session, very few of them receive a vote, limiting the dataset. We would like to explore various bootstrapping techniques that would allow us to expand the dataset size with artificial votes.
Furthermore, while our text representations are sufficient for modeling shorter text, i.e. summaries, we would like to test more sophisticated representations in the future, in particular, those designed to handle longer texts.

Conclusion
In this paper, we developed a neural network architecture to predict legislators votes that augments bill text with sponsor metadata. We introduced a new evaluation setting for this task: outof-session performance; which allows us to examine the generalizability of our proposed model, and was not considered in past studies. Finally, we showed that the introduction of metadata to bias the text representations outperforms the existing text-based methods in all experimental settings.