Text-based inference of moral sentiment change

We present a text-based framework for investigating moral sentiment change of the public via longitudinal corpora. Our framework is based on the premise that language use can inform people’s moral perception toward right or wrong, and we build our methodology by exploring moral biases learned from diachronic word embeddings. We demonstrate how a parameter-free model supports inference of historical shifts in moral sentiment toward concepts such as slavery and democracy over centuries at three incremental levels: moral relevance, moral polarity, and fine-grained moral dimensions. We apply this methodology to visualizing moral time courses of individual concepts and analyzing the relations between psycholinguistic variables and rates of moral sentiment change at scale. Our work offers opportunities for applying natural language processing toward characterizing moral sentiment change in society.


Moral sentiment change and language
People's moral sentiment-our feelings toward right or wrong-can change over time. For instance, the public's views toward slavery have shifted substantially over the past centuries (Oldfield, 2012). How society's moral views evolve has been a long-standing issue and a constant source of controversy subject to interpretations from social scientists, historians, philosophers, among others. Here we ask whether natural language processing has the potential to inform moral sentiment change in society at scale, involving minimal human labour or intervention.
The topic of moral sentiment has been thus far considered a traditional inquiry in philosophy (Hume, 1739;Smith, 1759;Kant, 1785), with contemporary development of this topic represented * Equal contribution. in social psychology (Piaget, 1932;Kohlberg, 1969;Stigler et al., 1990;Fiske and Taylor, 1991;Pizarro and Bloom, 2003), cognitive linguistics (Lakoff, 1996), and more recently, the advent of Moral Foundations Theory (Haidt and Joseph, 2004;Haidt et al., 2007;Graham et al., 2013). Despite the fundamental importance and interdisciplinarity of this topic, large-scale formal treatment of moral sentiment, particularly its evolution, is still in infancy from the natural language processing (NLP) community (see overview in Section 2).
We believe that there is a tremendous potential to bring NLP methodologies to bear on the problem of moral sentiment change. We build on extensive recent work showing that word embeddings reveal implicit human biases (Bolukbasi et al., 2016;Caliskan et al., 2017) and social stereotypes (Garg et al., 2018). Differing from this existing work, we demonstrate that moral sentiment change can be revealed by moral biases implicitly learned from diachronic text corpora. Accordingly, we present to our knowledge the first text-based framework for probing moral sentiment change at a large scale with support for different levels of analysis concerning moral relevance, moral polarity, and fine-grained moral dimensions. As such, for any query item such as slavery, our goal is to automatically infer its moral trajectories from sentiments at each of these levels over a long period of time.
Our approach is based on the premise that people's moral sentiments are reflected in natural language, and more specifically, in text (Bloom, 2010). In particular, we know that books are highly effective tools for conveying moral views to the public. For example, Uncle Tom's Cabin (Stowe, 1852) was central to the anti-slavery movement in the United States. The framework that we develop builds on this premise to explore 1990s Figure 1: Illustration of moral sentiment change over the past two centuries. Moral sentiment trajectories of three probe concepts, slavery, democracy, and gay, are shown in moral sentiment embedding space through 2D projection from Fisher's discriminant analysis with respect to seed words from the classes of moral virtue, moral vice, and moral irrelevance. Parenthesized items represent moral categories predicted to be most strongly associated with the probe concepts. Gray markers represent the fine-grained centroids (or anchors) of these moral classes. changes in moral sentiment reflected in longitudinal or historical text. Figure 1 offers a preview of our framework by visualizing the evolution trajectories of the public's moral sentiment toward concepts signified by the probe words slavery, democracy, and gay. Each of these concepts illustrates a piece of "moral history" tracked through a period of 200 years (1800 to 2000), and our framework is able to capture nuanced moral changes. For instance, slavery initially lies at the border of moral virtue (positive sentiment) and vice (negative sentiment) in the 1800s yet gradually moves toward the center of moral vice over the 200-year period; in contrast, democracy considered morally negative (e.g., subversion and anti-authority under monarchy) in the 1800s is now perceived as morally positive, as a mechanism for fairness; gay, which came to denote homosexuality only in the 1930s (Kay et al., 2019), is inferred to be morally irrelevant until the modern day. We will describe systematic evaluations and applications of our framework that extend beyond these anecdotal cases of moral sentiment change.
The general text-based framework that we propose consists of a parameter-free approach that facilitates the prediction of public moral sentiment toward individual concepts, automated retrieval of morally changing concepts, and broad-scale psycholinguistic analyses of historical rates of moral sentiment change. We provide a description of the probabilistic models and data used, followed by comprehensive evaluations of our methodology.

Emerging NLP research on morality
An emerging body of work in natural language processing and computational social science has investigated how NLP systems can detect moral sentiment in online text. For example, moral rhetoric in social media and political discourse (Garten et al., 2016;Johnson and Goldwasser, 2018;, the relation between moralization in social media and violent protests (Mooijman et al., 2018), and bias toward refugees in talk radio shows (Gillani and Levy, 2019) have been some of the topics explored in this line of inquiry. In contrast to this line of research, the development of a formal framework for moral sentiment change is still under-explored, with no existing systematic and formal treatment of this topic (Bloom, 2010).
While there is emerging awareness of ethical issues in NLP (Hovy et al., 2017;Alfano et al., 2018), work exploiting NLP techniques to study principles of moral sentiment change is scarce. Moreover, since morality is variable across cultures and time (Graham et al., 2013;Bloom, 2010), developing systems that capture the diachronic nature of moral sentiment will be a pivotal research direction. Our work leverages and complements existing research that finds implicit human biases from word embeddings (Bolukbasi et al., 2016;Caliskan et al., 2017;Garten et al., 2016) by developing a novel perspective on using NLP methodology to discover principles of moral sentiment change in human society. Figure 2: Illustration of the three-tier framework that supports moral sentiment inference at different levels.

A three-tier modelling framework
Our framework treats the moral sentiment toward a concept at three incremental levels, as illustrated in Figure 2. First, we consider moral relevance, distinguishing between morally irrelevant and morally relevant concepts. At the second tier, moral polarity, we further split morally relevant concepts into those that are positively or negatively perceived in the moral domain. Finally, a third tier classifies these concepts into fine-grained categories of human morality.
We draw from research in social psychology to inform our methodology, most prominently Moral Foundations Theory (MFT; Graham et al., 2013). MFT seeks to explain the structure and variation of human morality across cultures, and proposes five moral foundations: Care / Harm, Fairness / Cheating, Loyalty / Betrayal, Authority / Subversion, and Sanctity / Degradation. Each foundation is summarized by a positive and a negative pole, resulting in ten fine-grained moral categories.

Lexical data for moral sentiment
To ground moral sentiment in text, we leverage the Moral Foundations Dictionary (MFD; Graham et al., 2009). The MFD is a psycholinguistic resource that associates each MFT category with a set of seed words, which are words that provide evidence for the corresponding moral category in text. We use the MFD for moral polarity classification by dividing seed words into positive and negative sets, and for fine-grained categorization by splitting them into the 10 MFT categories.
To implement the first tier of our framework and detect moral relevance, we complement our morally relevant seed words with a corresponding set of seed words approximating moral irrelevance based on the notion of valence, i.e., the degree of pleasantness or unpleasantness of a stimulus. We refer to the emotional valence ratings collected by Warriner et al. (2013) for approximately 14,000 English words, and choose the words with most neutral valence rating that do not occur in the MFD as our set of morally irrelevant seed words, for an equal total number of morally relevant and morally irrelevant words.

Models
We propose and evaluate a set of probabilistic models to classify concepts in the three tiers of morality specified above. Our models exploit the semantic structure of word embeddings (Mikolov et al., 2013) to perform tiered moral classification of query concepts. In each tier, the model receives a query word embedding vector q and a set of seed words for each class in that tier, and infers the posterior probabilities over the set of classes c to which the query concept is associated with.
The seed words function as "labelled examples" that guide the moral classification of novel concepts, and are organized per classification tier as follows. In moral relevance classification, sets S 0 and S 1 contain the morally irrelevant and morally relevant seed words, respectively; for moral polarity, S + and S − contain the positive and negative seed words; and for fine-grained moral categories, S 1 , . . . , S 10 contain the seed words for the 10 categories of MFT. Then our general problem is to estimate p(c | q), where q is a query vector and c is a moral category in the desired tier.
We evaluate the following four models: • A Centroid model summarizes each set of seed words by its expected vector in embedding space, and classifies concepts into the class of closest expected embedding in Euclidean distance following a softmax rule; • A Naïve Bayes model considers both mean and variance, under the assumption of independence among embedding dimensions, by fitting a normal distribution with mean vector and diagonal covariance matrix to the set of seed words of each class; • A k-Nearest Neighbors (kNN) model exploits local density estimation and classifies concepts according to the majority vote of the k seed words closest to the query vector; • A Kernel Density Estimation (KDE) model performs density estimation at a broader scale by considering the contribution of each seed word toward the total likelihood of each class, regulated by a bandwidth parameter h that controls the sensitivity of the model to distance in embedding space.

Historical corpus data
To apply our models diachronically, we require a word embedding space that captures the meanings of words at different points in time and reflects changes pertaining to a particular word as diachronic shifts in a common embedding space. Following Hamilton et al. (2016), we combine skip-gram word embeddings (Mikolov et al., 2013) trained on longitudinal corpora of English with rotational alignments of embedding spaces to obtain diachronic word embeddings that are aligned through time.
We divide historical time into decade-long bins, and use two sets of embeddings provided by Hamilton et al. (2016), each trained on a different historical corpus of English: • Google N-grams (Lin et al., 2012): a corpus of 8.5 × 10 11 tokens collected from the English literature (Google Books, all-genres) spanning the period 1800-1999.
• COHA (Davies, 2010): a smaller corpus of 4.1 × 10 8 tokens from works selected so as to be genre-balanced and representative of American English in the period 1810-2009.

Model evaluations
We evaluated our models in two ways: classification of moral seed words on all three tiers (moral relevance, polarity, and fine-grained categories), and correlation of model predictions with human judgments.

Moral sentiment inference of seed words
In this evaluation, we assessed the ability of our models to classify the seed words that compose our moral environment in a leave-one-out classification task. We performed the evaluation for all three classification tiers: 1) moral relevance, where seed words are split into morally relevant and morally irrelevant; 2) moral polarity, where moral seed words are split into positive and negative; 3) fine-grained categories, where moral seed words are split into the 10 MFT categories. In each test, we removed one seed word from the training set at a time to obtain cross-validated model predictions. Table 2 shows classification accuracy for all models and corpora on each tier for the 1990-1999 period. 1 We observe that all models perform substantially better than chance, confirming the efficacy of our methodology in capturing moral dimensions of words. We also observe that models using word embeddings trained on Google Ngrams perform better than those trained on COHA, which could be expected given the larger corpus size of the former.
In the remaining analyses, we employ the Centroid model, which offers competitive accuracy and a simple, parameter-free specification.

Alignment with human valence ratings
We evaluated the approximate agreement between our methodology and human judgments using valence ratings, i.e., the degree of pleasantness or un-  Table 2: Classification accuracy of moral seed words for moral relevance, moral polarity, and fine-grained moral categories based on 1990-1999 word embeddings for two independent corpora, Google N-grams and COHA.

Corpus
Correlation Google N-grams 0.43 (n = 12293; p < 0.0001) COHA 0.38 (n = 7141; p < 0.0001) pleasantness of a stimulus. Our assumption is that the valence of a concept should correlate with its perceived moral polarity, e.g., morally repulsive ideas should evoke an unpleasant feeling. However, we do not expect this correspondence to be perfect; for example, the concept of dessert evokes a pleasant reaction without being morally relevant.
In this analysis, we took the valence ratings for the nearly 14,000 English nouns collected by Warriner et al. (2013) and, for each query word q, we generated a corresponding prediction of positive moral polarity from our model, P(c + | q). Table 3 shows the correlations between human valence ratings and predictions of positive moral polarity generated by models trained on each of our corpora. We observe that the correlations are significant, suggesting the ability of our methodology to capture relevant features of moral sentiment from text.
In the remaining applications, we use the diachronic embeddings trained on the Google Ngrams corpus, which enabled superior model performance throughout our evaluations.

Applications to diachronic morality
We applied our framework in three ways: 1) evaluation of selected concepts in historical time courses and prediction of human judgments; 2) automatic detection of moral sentiment change; and 3) broad-scale study of the relations between psycholinguistic variables and historical change of moral sentiment toward concepts.

Moral change in individual concepts
Historical time courses. We applied our models diachronically to predict time courses of moral relevance, moral polarity, and fine-grained moral categories toward two historically relevant topics: slavery and democracy. By grounding our model in word embeddings for each decade and querying concepts at the three tiers of classification, we obtained the time courses shown in Figure 3.
We note that these trajectories illustrate actual historical trends. Predictions for democracy show a trend toward morally positive sentiment, consistent with the adoption of democratic regimes in Western societies. On the other hand, predictions for slavery trend down and suggest a drop around the 1860s, coinciding with the American Civil War. We also observe changes in the dominant fine-grained moral categories, such as the perception of democracy as a fair concept, suggesting potential mechanisms behind the polarity changes and providing further insight into the public sentiment toward these concepts as evidenced by text.
Prediction of human judgments. We explored the predictive potential of our framework by comparing model predictions with human judgments of moral relevance and acceptability. We used data from the Pew Research Center's 2013 Global Attitudes survey (Pew Research Center, 2013), in which participants from 40 countries judged 8 topics such as abortion and homosexuality as one of "acceptable", "unacceptable", and "not a moral issue".
We compared human ratings with model predictions at two tiers: for moral relevance, we paired the proportion of "not a moral issue" human responses with irrelevance predictions p(c 0 | q) for

Concept
Rate ( Table 5: Top 10 changing words towards moral positive (upper panel) and negative (lower panel) polarities, with model-inferred most representative moral categories during historical and modern periods and the switching periods. *, **, and *** denote p < 0.05, p < 0.001, and p < 0.0001, all Bonferroni-corrected for multiple tests.
each topic, and for moral acceptability, we paired the proportion of "acceptable" responses with positive predictions p(c + | q). We used 1990s word embeddings, and obtained predictions for twoword topics by querying the model with their averaged embeddings. Figure 4 shows plots of relevance and polarity predictions against survey proportions, and we observe a visible correspondence between model predictions and human judgments despite the difficulty of this task and limited number of topics.

Retrieval of morally changing concepts
Beyond analyzing selected concepts, we applied our framework predictively on a large repertoire of words to automatically discover the concepts that have exhibited the greatest change in moral sentiment at two tiers, moral relevance and moral polarity.
We selected the 10,000 nouns with highest total frequency in the 1800-1999 period according to data from Hamilton et al. (2016)  Moral Polarity Figure 4: Model predictions against percentage of Pew respondents who selected "Not a moral concern" (left) or "Acceptable" (right), with lines of best fit and Pearson correlation coefficients ρ shown in the background.
1995) for validation. For each such word q, we computed diachronic moral relevance scores R i = p(c 1 | q), i = 1, . . . , 20 for the 20 decades in our time span. Then, we performed a linear regression of R on T = 1, . . . , n and took the fitted slope as a measure of moral relevance change. We repeated the same procedure for moral polarity. Finally, we removed words with average relevance score below 0.5 to focus on morally relevant retrievals. Table 4 shows the words with steepest predicted change toward moral relevance, along with their predicted fine-grained moral categories in mod-ern times (i.e., 1900-1999). Table 5 shows the words with steepest predicted change toward the positive and negative moral poles. To further investigate the moral sentiment that may have led to such polarity shifts, we also show the predicted fine-grained moral categories of each word at its earliest time of predicted moral relevance and in modern times. Although we do not have access to ground truth for this application, these results offer initial insight into the historical moral landscape of the English language at scale.

Broad-scale investigation of moral change
In this application, we investigated the hypothesis that concept concreteness is inversely related to change in moral relevance, i.e., that concepts considered more abstract might become morally relevant at a higher rate than concepts considered more concrete. To test this hypothesis, we performed a multiple linear regression analysis on rate of change toward moral relevance of a large repertoire of words against concept concreteness ratings, word frequency (a correlate of semantic change; see Hamilton et al., 2016), and word length (as a proxy for concept complexity; see Lewis and Frank, 2016).
We obtained norms of concreteness ratings from Warriner et al. (2013). We collected the same set of high-frequency nouns as in the previous analysis, along with their fitted slopes of moral relevance change. Since we were interested in moral relevance change within this large set of words, we restricted our analysis to those words whose model predictions indicate change in moral relevance, in either direction, from the 1800s to the 1990s.
We performed a multiple linear regression under the following model: (1) Here ρ(w) is the slope of moral relevance change for word w; f (w) is its average frequency; l(w) is its character length; c(w) is its concreteness rating; β f , β l , β c , and β 0 are the corresponding factor weights and intercept, respectively; and ε ∼ N (0, σ ) is the regression error term. Table 6 shows the results of multiple linear regression. We observe that concreteness is a significant negative predictor of change toward moral relevance, suggesting that abstract concepts are more strongly associated with increasing moral relevance over time than concrete concepts. This
significance persists under partial correlation test against the control factors (p < 0.01). We further verified the diachronic component of this effect in a random permutation analysis. We generated 1,000 control time courses by randomly shuffling the 20 decades in our data, and repeated the regression analysis to obtain a control distribution for each regression coefficient. All effects became non-significant under the shuffled condition, suggesting the relevance of concept concreteness for diachronic change in moral sentiment (see Supplementary Material).

Discussion and conclusion
We presented a text-based framework for exploring the socio-scientific problem of moral sentiment change. Our methodology uses minimal parameters and exploits implicit moral biases learned from diachronic word embeddings to reveal the public's moral perception toward a large concept repertoire over a long historical period.
Differing from existing work in NLP that treats moral sentiment as a flat classification problem (Garten et al., 2016;Johnson and Goldwasser, 2018), our framework probes moral sentiment change at multiple levels and captures moral dynamics concerning relevance, polarity, and finegrained categories informed by Moral Foundations Theory (Graham et al., 2013). We applied our methodology to the automated analyses of moral change both in individual concepts and at a broad scale, thus providing insights into psycholinguistic variables that associate with rates of moral change in the public.
Our current work focuses on exploring moral sentiment change in English-speaking cultures. Future research should evaluate the appropriateness of the framework to probing moral change from a diverse range of cultures and linguistic backgrounds, and the extent to which moral sentiment change interacts and crisscrosses with linguistic meaning change and lexical coinage. Our work creates opportunities for applying natural language processing toward characterizing moral sentiment change in society.