Text-Based Ideal Points

Ideal point models analyze lawmakers’ votes to quantify their political positions, or ideal points. But votes are not the only way to express a political position. Lawmakers also give speeches, release press statements, and post tweets. In this paper, we introduce the text-based ideal point model (TBIP), an unsupervised probabilistic topic model that analyzes texts to quantify the political positions of its authors. We demonstrate the TBIP with two types of politicized text data: U.S. Senate speeches and senator tweets. Though the model does not analyze their votes or political affiliations, the TBIP separates lawmakers by party, learns interpretable politicized topics, and infers ideal points close to the classical vote-based ideal points. One benefit of analyzing texts, as opposed to votes, is that the TBIP can estimate ideal points of anyone who authors political texts, including non-voting actors. To this end, we use it to study tweets from the 2020 Democratic presidential candidates. Using only the texts of their tweets, it identifies them along an interpretable progressive-to-moderate spectrum.


Introduction
Ideal point models are widely used to help characterize modern democracies, analyzing lawmakers' votes to estimate their positions on a political spectrum (Poole and Rosenthal, 1985). But votes aren't the only way that lawmakers express political preferences-press releases, tweets, and speeches all help convey their positions. Like votes, these signals are recorded and easily collected.
This paper develops the text-based ideal point model ( ), a probabilistic topic model for analyzing unstructured political texts to quantify the political preferences of their authors. While classical ideal point models analyze how different people vote on a shared set of bills, the analyzes how different authors write about a shared set of latent topics. The is inspired by the idea of political framing: the specific words and phrases used when discussing a topic can convey political messages (Entman, 1993). Given a corpus of political texts, the estimates the latent topics under discussion, the latent political positions of the authors of texts, and how per-topic word choice changes as a function of the political position of the author.
A key feature of the is that it is unsupervised. It can be applied to any political text, regardless of whether the authors belong to known political parties. It can also be used to analyze non-voting actors, such as political candidates. Figure 1 shows a analysis of the speeches of the 114th U.S. Senate. The model lays the senators out on the real line and accurately separates them by party. (It does not use party labels in its analysis.) Based only on speeches, it has found an interpretable spectrum-Senator Bernie Sanders is liberal, Senator Mitch McConnell is conservative, and Senator Susan Collins is moderate. For comparison, Figure 2 also shows ideal points estimated from the voting record of the same senators; their language and their votes are closely correlated.
The also finds latent topics, each one a vocabulary-length vector of intensities, that describe the issues discussed in the speeches. For each topic, the involves both a neutral vector of intensities and a vector of ideological adjustments that describe how the intensities change as a function of the political position of the author. Illustrated in Table 1 are discovered topics about immigration, health care, and gun control. In the gun control topic, the neutral intensities focus on words like "gun" and "firearms." As the author's ideal point becomes more negative, terms like "gun violence" and "background checks" increase in intensity. As the author's ideal point becomes more positive, terms like "constitutional rights" increase.
The is a bag-of-words model that combines ideas from ideal point models and Poisson factor- ization topic models (Canny, 2004;Gopalan et al., 2015). The latent variables are the ideal points of the authors, the topics discussed in the corpus, and how those topics change as a function of ideal point. To approximate the posterior, we use an efficient black box variational inference algorithm with stochastic optimization. It scales to large corpora.
We develop the details of the and its variational inference algorithm. We study its performance on three sessions of U.S. Senate speeches, and we compare the to other methods for scaling political texts (Slapin and Proksch, 2008;Lauderdale and Herzog, 2016a). The performs best, recovering ideal points closest to the vote-based ideal points. We also study its performance on tweets by U.S. senators, again finding that it closely recovers their vote-based ideal points. (In both speeches and tweets, the differences from vote-based ideal points are also qualitatively interesting.) Finally, we study the on tweets by the 2020 Democratic candidates for President, for which there are no votes for comparison. It lays out the candidates along an interpretable progressiveto-moderate spectrum.

The text-based ideal point model
We develop the text-based ideal point model ( ), a probabilistic model that infers political ideology from political texts. We first review Bayesian ideal points and Poisson factorization topic models, two probabilistic models on which the is built.

Background: Bayesian ideal points
Ideal points quantify a lawmaker's political preferences based on their roll-call votes (Poole and Rosenthal, 1985;Jackman, 2001;Clinton et al., 2004). Consider a group of lawmakers voting "yea" or "nay" on a shared set of bills. Denote the vote of lawmaker i on bill j by the binary variable v ij .
The Bayesian ideal point model posits scalar perlawmaker latent variables x i and scalar per-bill latent variables .˛j ; Á j /. It assumes the votes come from a factor model, where .t / D 1 1Ce t . The latent variable x i is called the lawmaker's ideal point; the latent variable Á j is the bill's polarity. When x i and Á j have the same sign, lawmaker i is more likely to vote for bill j ; when they have opposite sign, the lawmaker is more likely to vote against it. The per-bill intercept term˛j is called the popularity. It captures that some bills are uncontroversial, where all lawmakers are likely to vote for them (or against them) regardless of their ideology.
Using data of lawmakers voting on bills, political scientists approximate the posterior of the Bayesian ideal point model with an approximate inference method such as Markov Chain Monte Carlo (MCMC) (Jackman, 2001;Clinton et al., 2004) or expectation-maximization (EM) (Imai et al., 2016). Empirically, the posterior ideal points of the lawmakers accurately separate political parties and capture the spectrum of political preferences in American politics (Poole and Rosenthal, 2000).

Background: Poisson factorization
Poisson factorization is a class of non-negative matrix factorization methods often employed as a topic model for bag-of-words text data (Canny, 2004;Cemgil, 2009;Gopalan et al., 2014).
Poisson factorization factorizes a matrix of document/word counts into two positive matrices: a matrix Â that contains per-document topic intensities, and a matrixˇthat contains the topics. Denote the count of word v in document d by y dv . Poisson factorization posits the following probabilistic model over word counts, where a and b are hyperparameters: Given a matrix y, practitioners approximate the posterior factorization with variational inference (Gopalan et al., 2015) or MCMC (Cemgil, 2009). Note that Poisson factorization can be interpreted as a Bayesian variant of nonnegative matrix factorization, with the so-called "KL loss function" (Lee and Seung, 1999). When the shape parameter a is less than 1, the latent vectors Â d andˇk tend to be sparse. Consequently, the marginal likelihood of each count places a high mass around zero and has heavy tails (Ranganath et al., 2015). The posterior components are interpretable as topics (Gopalan et al., 2015).

The text-based ideal point model
The text-based ideal point model ( ) is a probabilistic model that is designed to infer political preferences from political texts.
There are important differences between a dataset of votes and a corpus of authored political language. A vote is one of two choices, "yea" or "nay." But political language is high dimensional-a lawmaker's speech involves a vocabulary of thousands. A vote sends a clear signal about a lawmaker's opinion about a bill. But political speech is noisy-the use of a word might be irrelevant to ideology, provide only a weak signal about ideology, or change signal depending on context. Finally, votes are organized in a matrix, where each one is unambiguously attached to a specific bill and nearly all lawmakers vote on all bills. But political language is unstructured and sparse. A corpus of political language can discuss any number of issues-with speeches possibly involving several issues-and the issues are unlabeled and possibly unknown in advance.
The is based on the concept of political framing. Framing is the idea that a communicator will emphasize certain aspects of a messageimplicitly or explicitly -to promote a perspective or agenda (Entman, 1993;Chong and Druckman, 2007). In politics, an author's word choice for a particular issue is affected by the ideological message she is trying to convey. A conservative discussing abortion is more likely to use terms such as "life" and "unborn," while a liberal discussing abortion is more likely to use terms like "choice" and "body." In this example, a conservative is framing the issue in terms of morality, while a liberal is framing the issue in terms of personal liberty.
The casts political framing in a probabilistic model of language. While the classical ideal point model infers ideology from the differences in votes on a shared set of bills, the infers ideology from the differences in word choice on a shared set of topics.
The is a probabilistic model that builds on Poisson factorization. The observed data are word counts and authors: y dv is the word count for term v in document d , and a d is the author of the document. Some of the latent variables in the are inherited from Poisson factorization: the nonnegative K-vector of per-document topic intensities is Â d and the topics themselves are non-negative V -vectorsˇk, where K is the number of topics and V is the vocabulary size. We refer toˇas the neutral topics. Two additional latent variables capture the politics: the ideal point of an author s is a real-valued scalar x s , and the ideological topic is a real-valued V -vector Á k .
The uses its latent variables in a generative model of authored political text, where the ideological topic adjusts the neutral topic-and thus the word choice-as a function of the ideal point of the author. Place sparse Gamma priors on Â andˇ, and normal priors on Á and x, so for all documents d , words v, topics k, and authors s, These latent variables interact to draw the count of term v in document d , For a topic k and term v, a non-zero Á kv will increase the Poisson rate of the word count if it shares the same sign as the ideal point of the author x a d , and decrease the Poisson rate if they are of opposite signs. Consider a topic about gun control and suppose Á kv > 0 for the term "constitution." An author with an ideal point x s > 0, say a conservative Table 1. The learns topics from Senate speeches that vary as a function of the senator's political positions. The neutral topics are for an ideal point of 0; the ideological topics fix ideal points at 1 and C1. We interpret one extreme as liberal and the other as conservative. Data is from the 114th U.S. Senate. author, will be more likely to use the term "constitution" when discussing gun control; an author with an ideal point x s < 0, a liberal author, will be less likely to use the term. Suppose Á kv < 0 for the term "violence." Now the liberal author will be more likely than the conservative to use this term. Finally suppose Á kv D 0 for the term "gun." This term will be equally likely to be used by the authors, regardless of their ideal points.
To build more intuition, examine the elements of the sum in the Poisson rate of Equation (3) and rewrite slightly to Each of these elements mimics the classical ideal point model in Equation (1), where Á kv now measures the "polarity" of term v in topic k and logˇk v is the intercept or "popularity." When Á kv and x a d have the same sign, term v is more likely to be used when discussing topic k. If Á kv is near zero, then the term is not politicized, and its count comes from a Poisson factorization. For each document d , the elements of the sum that contribute to the overall rate are those for which Â d k is positive; that is, those for the topics that are being discussed in the document.
The posterior distribution of the latent variables provides estimates of the ideal points, neutral topics, and ideological topics. For example, we estimate this posterior distribution using a dataset of senator speeches from the 114th United States Senate session. The fitted ideal points in Figure 1 show that the largely separates lawmakers by political party, despite not having access to these labels or votes. Table 1 depicts neutral topics (fixing the fitted O Á kv to be 0) and the corresponding ideological topics by varying the sign of O Á kv . The topic for immigration shows that a liberal framing emphasizes "Dreamers" and "DACA", while the conservative frame emphasizes "laws" and "homeland security." We provide more details and empirical studies in Section 5.

Related work
Most ideal point models focus on legislative rollcall votes. These are typically latent-space factor models (Poole and Rosenthal, 1985;McCarty et al., 1997;Poole and Rosenthal, 2000), which relate closely to item-response models (Bock and Aitkin, 1981;Bailey, 2001). Researchers have also developed Bayesian analogues (Jackman, 2001;Clinton et al., 2004) and extensions to time series, particularly for analyzing the Supreme Court (Martin and Quinn, 2002).
Some recent models combine text with votes or party information to estimate ideal points of legislators. Gerrish and Blei (2011) analyze votes and the text of bills to learn ideological language. Gerrish and Blei (2012) and Lauderdale and Clark (2014) use text and vote data to learn ideal points adjusted for topic. The models in Nguyen et al. (2015) and Kim et al. (2018) analyze votes and floor speeches together. With labeled political party affiliations, machine learning methods can also help map language to party membership. Iyyer et al. (2014) use neural networks to learn partisan phrases, while the models in Tsur et al. (2015) and Gentzkow et al. (2019) use political party labels to analyze differences in speech patterns. Since the does not use votes or party information, it is applicable to all political texts, even when votes and party labels are not present. Moreover, party labels can be restrictive because they force hard membership in one of two groups (in American politics). The can infer how topics change smoothly across the political spectrum, rather than simply learning topics for each political party.
Annotated text data has also been used to pre-dict ideological positions. Wordscores (Laver et al., 2003;Lowe, 2008) uses texts that are hand-labeled by political position to measure the conveyed positions of unlabeled texts; it has been used to measure the political landscape of Ireland (Benoit and Laver, 2003;Herzog and Benoit, 2015). Ho et al. (2008) analyze hand-labeled editorials to estimate ideal points for newspapers. The ideological topics learned by the are also related to political frames (Entman, 1993;Chong and Druckman, 2007). Historically, these frames have either been hand-labeled by annotators (Baumgartner et al., 2008;Card et al., 2015) or used annotated data for supervised prediction (Johnson et al., 2017;Baumer et al., 2015). In contrast to these methods, the is completely unsupervised. It learns ideological topics that do not need to conform to pre-defined frames. Moreover, it does not depend on the subjectivity of coders.
(Slapin and Proksch, 2008) is a model of authored political texts about a single issue, similar to a single-topic version of . has been applied to party manifestos (Proksch and Slapin, 2009; and single-issue dialogue (Schwarz et al., 2017).
-(Lauderdale and Herzog, 2016a) extends to multiple issues by analyzing a collection of labeled texts, such as Senate speeches labeled by debate topic.
fits separate models to the texts about each label, and combines the fitted models in a one-dimensional factor analysis to produce ideal points. In contrast to these models, the does not require a grouping of the texts into single issues. It naturally accommodates unstructured texts, such as tweets, and learns both ideal points for the authors and ideologyadjusted topics for the (latent) issues under discussion. Furthermore, by relying on stochastic optimization, the algorithm scales to large data sets. In Section 5 we empirically study how the ideal points compare to both of these models.

Inference
The involves several types of latent variables: neutral topicsˇk, ideological topics Á k , topic intensities Â d , and ideal points x s . Conditional on the text, we perform inference of the latent variables through the posterior distribution p.Â;ˇ; Á; xjy/. But calculating this distribution is intractable. We rely on approximate inference.
We use mean-field variational inference to fit an approximate posterior distribution (Jordan et al., 1999;Wainwright et al., 2008;Blei et al., 2017). Variational inference frames the inference problem as an optimization problem. Set q .Â;ˇ; Á; x/ to be a variational family of approximate posterior distributions, indexed by variational parameters .
Variational inference aims to find the setting of that minimizes the KL divergence between q and the posterior. Minimizing this KL divergence is equivalent to maximizing the evidence lower bound ( ), E q OElog p.Â;ˇ; Á; x/ C log p.yjÂ;ˇ; Á; x/ log q .Â;ˇ; Á; x/: The sums the expectation of the log joint (here broken up into the log prior and log likelihood) and the entropy of the variational distribution.
To approximate the posterior we set the variational family to be the mean-field family. The mean-field family factorizes over the latent variables, where d indexes documents, k indexes topics, and s indexes authors: We use lognormal factors for the positive variables and Gaussian factors for the real variables, Our goal is to optimize the with respect to D f Â ; 2 Â ; ˇ; 2 ; Á ; 2 Á ; x ; 2 x g. We use stochastic gradient ascent. We form noisy gradients with Monte Carlo and the "reparameterization trick" (Kingma and Welling, 2014;Rezende et al., 2014), as well as with data subsampling (Hoffman et al., 2013). To set the step size, we use Adam (Kingma and Ba, 2015).
We initialize the neutral topics and topic intensities with a pre-trained model. Specifically, we pre-train a Poisson factorization topic model using the algorithm in Gopalan et al. (2015). The algorithm uses the resulting factorization to initialize the variational parameters for Â d andˇk. The full procedure is described in Appendix A.
For the corpus of Senate speeches described in Section 2, training takes 5 hours on a single

Empirical studies
We study the text-based ideal point model ( ) on several datasets of political texts. We first use the to analyze speeches and tweets (separately) from U.S. senators. For both types of texts, the ideal points, which are estimated from text, are close to the classical ideal points, which are estimated from votes. We also compare the to existing methods for scaling political texts (Slapin and Proksch, 2008;Lauderdale and Herzog, 2016a). The performs better, finding ideal points closer to the vote-based ideal points. Finally, we use the to analyze a group that does not vote: 2020 Democratic presidential candidates. Using only tweets, it estimates ideal points for the candidates on an interpretable progressive-to-moderate spectrum.

The on U.S. Senate speeches
We analyze Senate speeches provided by  In the estimates, progressive senator Bernie Sanders (I-VT) is on one extreme, and Mitch Mc-Connell (R-KY) is on the other. Susan Collins (R-ME), a Republican senator often described as moderate, is near the middle. The correlation between the ideal points and vote ideal points is high, 0:88. Using only the text of the speeches, the captures meaningful information about political preferences, separating the political parties and organizing the lawmakers on a meaningful political spectrum.
We next study the topics. For selected topics, Table 1 shows neutral terms and ideological terms. To visualize the neutral topics, we list the top words based on Ǒ k . To visualize the ideological topics, we calculate term intensities for two poles of the political spectrum, x s D 1 and x s D C1. For a fixed k, the ideological topics thus order the words by EOEˇk v exp. Á kv / and EOEˇk v exp.Á kv /.
Based on the separation of political parties in Figure 1, we interpret negative ideal points as liberal and positive ideal points as conservative. Table 1 shows that when discussing immigration, a senator with a neutral ideal point uses terms like "immigration" and "United States." As the author moves left, she will use terms like "Dreamers" and "DACA." As she moves right, she will emphasize terms like "laws" and "homeland security." The also captures that those on the left refer to health care legislation as the Affordable Care Act, while those on the right call it Obamacare. Additionally, a liberal senator discussing guns brings attention to gun control: "gun violence" and "background checks" are among the largest intensity terms. Meanwhile, conservative senators are likely to invoke gun rights, emphasizing "constitutional rights." Comparison to Wordfish and Wordshoal. We next treat the vote-based ideal points as "groundtruth" labels and compare the ideal points to those found by and . requires debate labels, so we use the labeled Senate speech data provided by Lauderdale and Herzog (2016b) on the 111th-113th Senates to train each method. Because we are interested in comparing models, we use the same variational inference procedure to train all methods. See Appendix B for more details.
We use two metrics to compare text-based ideal points to vote-based ideal points: the correlation between ideal points and Spearman's rank correlation between their orderings of the senators. With both metrics, when compared to vote ideal points from Equation (1), the outperforms and ; see Table 2. Comparing to another vote-based method, - (Poole, 2005), produces similar results; see Appendix C.

The on U.S. Senate tweets
We use the to analyze tweets from U.S. senators during the 114th Senate session, using a corpus provided by VoxGovFEDERAL (2020). Tweet-based ideal points almost completely separate Democrats and Republicans; see Figure 2. Again, Bernie Sanders (I-VT) is the most extreme Democrat, and Mitch McConnell (R-KY) is one of the most extreme Republicans. Susan Collins (R-ME) remains near the middle; she is among the most moderate senators in vote-based, speechbased, and tweet-based models. The correlation between vote-based ideal points and tweet-based ideal points is 0:94.
We also use senator tweets to compare the to (we cannot apply because tweets do not have debate labels). Again, the learns closer ideal points to the classical vote ideal points; see Table 2.

Using the as a descriptive tool
As a descriptive tool, the provides hints about the different ways senators use speeches or tweets to convey political messages. We use a likelihood ratio to help identify the texts that influenced the ideal point. Consider the log likelihood of a document using a fixed ideal point Q x and fitted values for the other latent variables, Ratios based on this likelihood can help point to why the places a lawmaker as extreme or moderate. For a document d , if`d . O x a d / `d .0/ is high then that document was (statistically) influ- x s // is high then that document was influential in making O x a d less extreme. We emphasize this diagnostic does not convey any causal information, but rather helps understand the relationship between the data and the inferences.

Bernie Sanders (I-VT).
Bernie Sanders is an Independent senator who caucuses with the Democratic party; we refer to him as a Democrat. Among Democrats, his ideal point changes the most between one estimated from speeches and one estimated from votes. Although his vote-based ideal point is the 17th most liberal, the ideal point based on Senate speeches is the most extreme.
We use the likelihood ratio to understand this difference in his vote-based and speech-based ideal  Figure 3. Based on tweets, the places 2020 Democratic presidential candidates along an interpretable progressive-to-moderate spectrum.
points. His speeches with the highest likelihood ratio are about income inequality and universal health care, which are both progressive issues. The following is an excerpt from one such speech: "The United States is the only major country on Earth that does not guarantee health care to all of our people... At a time when the rich are getting richer and the middle class is getting poorer, the Republicans take from the middle class and working families to give more to the rich and large corporations." Sanders is considered one of the most liberal senators; his extreme speech ideal point is sensible.
That Sanders' vote-based ideal point is not more extreme appears to be a limitation of the vote-based method. Applying the likelihood ratio to votes helps illustrate the issue. (Here a bill takes the place of a document.) The ratio identifies H.R. 2048 as influential. This bill is a rollback of the Patriot Act that Sanders voted against because it did not go far enough to reduce federal surveillance capabilities (RealClearPolitics, 2015). In voting "nay", he was joined by one Democrat and 30 Republicans, almost all of whom voted against the bill because they did not want surveillance capabilities curtailed at all. Vote-based ideal points, which only model binary values, cannot capture this nuance in his opinion. As a result, Sanders' vote-based ideal point is pulled to the right.

Deb Fischer (R-NE).
Turning to tweets, Deb Fischer's tweet-based ideal point is more liberal than her vote-based ideal point; her vote ideal point is the 11th most extreme among senators, while her tweet ideal point is the 43rd most extreme. The likelihood ratio identifies the following tweets as responsible for this moderation: "I want to empower women to be their own best advocates, secure that they have the tools to negotiate the wages they deserve. #EqualPay" "FACT: 1963 Equal Pay Act enables women to sue for wage discrimination. #GetitRight #EqualPayDay" The associates terms about equal pay and women's rights with liberals. A senator with the most liberal ideal point would be expected to use the phrase "#EqualPay" 20 times as much as a senator with the most conservative ideal point and "women" 9 times as much, using the topics in Fischer's first tweet above. Fischer's focus on equal pay for women moderates her tweet ideal point.

Jeff Sessions (R-AL).
The likelihood ratio can also point to model limitations. Jeff Sessions is a conservative voter, but the identifies his speeches as moderate. One of the most influential speeches for his moderate text ideal point, as identified by the likelihood ratio, criticizes Deferred Actions for Childhood Arrivals (DACA), an immigration policy established by President Obama that introduced employment opportunities for undocumented individuals who arrived as children: "The President of the United States is giving work authorizations to more than 4 million people, and for the most part they are adults. Almost all of them are adults. Even the so-called DACA proportion, many of them are in their thirties. So this is an adult job legalization program." This is a conservative stance against DACA. So why does the identify it as moderate? As depicted in Table 1, liberals bring up "DACA" when discussing immigration, while conservatives emphasize "laws" and "homeland security." The fitted  Table 3. The learns topics from 2020 Democratic presidential candidate tweets that vary as a function of the candidate's political positions. The neutral topics are for an ideal point of 0; the ideological topics fix ideal points at 1 and C1. We interpret one extreme as progressive and the other as moderate. expected count of "DACA" using the most liberal ideal point for the topics in the above speech is 1:04, in contrast to 0:04 for the most conservative ideal point. Since conservatives do not focus on DACA, Sessions even bringing up the program sways his ideal point toward the center. Although Sessions refers to DACA disapprovingly, the bag-of-words model cannot capture this negative sentiment.

2020 Democratic candidates
We also analyze tweets from Democratic presidential candidates for the 2020 election. Since all of the candidates running for President do not vote on a shared set of issues, their ideal points cannot be estimated using vote-based methods. Figure 3 shows tweet-based ideal points for the 2020 Democratic candidates. Elizabeth Warren and Bernie Sanders, who are often considered progressive, are on one extreme. Steve Bullock and John Delaney, often considered moderate, are on the other. The selected topics in Table 3 showcase this spectrum. Candidates with progressive ideal points focus on: billionaires and Wall Street when discussing the economy, Medicare for All when discussing health care, and the Green New Deal when discussing climate change. On the other extreme, candidates with moderate ideal points focus on: trade wars and farmers when discussing the economy, universal plans for health care, and technological solutions to climate change.

Summary
We developed the text-based ideal point model ( ), an ideal point model that analyzes texts to quantify the political positions of their authors. It estimates the latent topics of the texts, the ideal points of their authors, and how each author's political position affects her choice of words within each topic.
We used the to analyze U.S. Senate speeches and tweets. Without analyzing the votes themselves, the separates lawmakers by party, learns interpretable politicized topics, and infers ideal points close to the classical vote-based ideal points. Moreover, the can estimate ideal points of anyone who authors political texts, including non-voting actors. When used to study tweets from 2020 Democratic presidential candidates, the identifies them along a progressive-to-moderate spectrum.  (4) while the evidence lower bound ( ) has not converged do sample a document index d 2 f1; 2; : : : ; Dg sample z Â ; zˇ; z Á ; z x N .0; count values from long speeches, we take the natural logarithm of the counts matrix before performing inference (appropriately adding 1 and rounding so that a word count of 1 is transformed to still be 1).
We use a single Monte Carlo sample to approximate the gradient of each batch. We assume 50 latent topics and posit the following prior distributions: Â d k ;ˇk v Gamma.0:3; 0:3/, Á kv ; x s N .0; 1/.
We train the vote ideal point model by removing all votes that are not cast as "yea" or "nay" and performing mean-field variational inference with Gaussian variational distributions. Since each variational family is Gaussian, we approximate gradients using the reparameterization trick (Rezende et al., 2014;Kingma and Ba, 2015).
For the comparisons against and , we preprocess speeches in the same way as Lauderdale and Herzog (2016a). We train each Senate session separately, thereby only including one timestep for . For this reason, our results on the U.S. Senate differ from those reported by Lauderdale and Herzog (2016a), who train a model jointly over all time periods. Additionally, we use variational inference with reparam-eterization gradients to train all methods. Specifically, we perform mean-field variational inference, positing Gaussian variational families on all real variables and lognormal variational families on all positive variables.
Senator tweets Our Senate tweet preprocessing is similar to the Senate speech preprocessing, although we now include all terms that appear in at least 0.05% of documents rather than 0.01% to account for the shorter tweet lengths. We remove cities and states in addition to stopwords and the names of politicians. This preprocessing leaves us with 209,779 tweets. We use the same model and hyperparameters as for speeches, although we no longer take the natural logarithm of the counts matrix since individual tweets cannot have extreme word counts due to the character limit. We use a batch size of 1,024.
2020 Democratic candidates We scrape the Twitter feeds of 19 candidates, including all tweets between January 1, 2019 and February 27, 2020. We do not include Andrew Yang, Jay Inslee, and Marianne Williamson since it is difficult to define  Table 4. The learns ideal points most similar to vote ideal points for U.S. senator speeches and tweets. It learns closer ideal points than and in terms of both correlation (Corr.) and Spearman's rank correlation (SRC). The numbers in the column titles refer to the Senate session of the corpus.
cannot be applied to tweets because there are no debate labels.
the political preferences of non-traditional or singleissue candidates. We follow the same preprocessing we used for the 114th Senate, except we include tokens that are used in more than 0.05% of documents rather than 0.1%. We remove phrases used by only one candidate, along with stopwords and candidate names. This preprocessing leaves us with 45,927 tweets for the 19 candidates. We use the same model and hyperparameters as for senator tweets.

C Comparison to DW-Nominate
- (Poole, 2005) is a dynamic method for learning ideal points from votes. As opposed to the vote ideal point model in Equation (1), it analyzes votes across multiple Senate sessions. It also learns two latent dimensions per legislator. We also compare text ideal points to the first dimension of DW-Nominate, which corresponds to economic/redistributive preferences (Lewis et al., 2020). We use the fitted ideal points available on Voteview (Lewis et al., 2020). The learns ideal points closer to than and ; see Table 4. In Section 5, we observed that Bernie Sanders' vote ideal point is somewhat moderate under the scalar ideal point model from Equation (1). It is worth noting that Sanders' vote ideal point is more extreme under than under the scalar model: his ideal point is the third-most extreme among Democrats. Since uses two dimensions to model each legislator's latent preferences, it can more flexibly model Sanders' voting deviations. Additionally, the dynamic nature of may capture salient information from other Senate sessions. However, restricting the vote ideal point to be static and a scalar, like it is for the , results in the more moderate vote ideal point in Section 5.