Predicting the Growth of Morphological Families from Social and Linguistic Factors

We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Prediction (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good performance on MFEP.


Introduction
Lexical change is a prime indicator of topical dynamics in social media. When people or events attract the attention of a user community, this is reflected by the token frequency evolution of individual words. The burst in token frequency of the word "trump" in social media before the 2016 presidential election (see Figure 1), e.g., mirrors the increasing presence of Donald Trump in public discourse during that time.
However, token frequency is only one way of measuring changes in topical prominence. Accompanying the increase in token frequency of "trump", there was a parallel increase in the number of words morphologically related to "trump", i.e., words like "trumpification", "antitrumpism", and "detrumpify" (see Figure 1, Table 1). Most of these words have a very low token frequency and are removed in the first steps of a typical NLP pipeline.
Contributions. We introduce the novel task of Morphological Family Expansion Prediction (MFEP), which aims at predicting whether a morphological family will increase in size or not. We publish a benchmark for MFEP and show that the growth of morphological families can be successfully modeled using social and linguistic factors relating to the morphological parent. Furthermore, our results add a new perspective to the growing body of research on the link between cultural and linguistic change in social media. 1

Morphological Families
We define a morphological family as a set F of words w with a shared free morpheme. Thus, "trump", "trumpification", "antitrumpism", and "detrumpify" are in the same morphological family because they share the free morpheme "trump". By contrast, "antitrumpism" and "antiprogressivism" are in different morphological families: even though both words have two morphemes in common ("anti" and "ism"), they do not belong to the same morphological family according to our definition since "anti" and "ism" are bound morphemes, not free morphemes like in the case of "trump", "trumpification", "antitrumpism", and "detrumpify". In this study, we only consider derivational morphology. 2 Compounds such as "trumpwall" are not split into their parts, but they can become a parent (see below). Thus, each word belongs to exactly one morphological family. The cardinality |F | of a family will be referred to as the morphological family size, a term also used in other studies (Schreuder and Baayen, 1997;de Jong et al., 2000;del Prado Martín et al., 2004).
The morphological parent w * is the morphologically most basic word of a family F . The word "trump" is the parent of "antitrumpism", "trumpification", and "detrumpify". We denote the morphological family of w * as F (w * ).
Except for the parent, all members of a family are morphological children and form a subset F of the entire family F . The words "antitrumpism", "trumpification", and "detrumpify" are all morphological children. We further distinguish between old childrenF o , established words in the lexicon, and new childrenF n , innovative forms. While "trumpify" can still be considered a new child of "trump", "trumpster" is on its way to becoming an old child in the family. 3 As the example of "trumpster" shows, morpho-1 We make all our data and code publicly available at https://github.com/valentinhofmann/mfep. 2 The distinction between inflection and derivation is gradual, not binary (Haspelmath and Sims, 2010). The suffix "ly", e.g., is variously defined as inflectional or derivational (Bauer, 2019). We try to exclude inflectional morphology as far as possible (e.g., by lemmatizing), but we are aware that a clear separation does not exist in linguistic reality.
3 "Trumpster" is already listed in the English Wiktionary at https://en.wiktionary.org/wiki/Trumpster. logical families are in constant flux. Specifically, there are three types of change in morphological families: word birth ∅ F n , word entrenchment F n F o , and word deathF n ,F o ∅. The topic of this paper is word birth, i.e., we ask: given a set F of morphological families, which will increase in size during a specified time interval? This differentiates our work from previous research on lexical change in social media which has focused on word entrenchment (see Section 7). One question we are particularly interested in is whether endogenous (language-internal) or exogenous (languageexternal) factors are better predictors of morphological family growth; these factors have been previously compared for changes in word token frequency (Altmann et al., 2011), but not for changes in morphological type frequency.

Experimental Data
We develop MFEP using Reddit, a social media platform hosting discussions about a variety of topics. Reddit is divided into smaller communities centered around a shared interest, so-called subreddits (SRs), which are highly conducive to linguistic innovation (del Tredici and Fernández, 2018). Concretely, we draw upon the Baumgartner Reddit Corpus, a collection of (almost) all publicly available comments posted on Reddit since 2005. 4 A three-year slice of this corpus was used in a study on lexical change by Stewart and Eisenstein (2018). Gaffney and Matias (2018) show that the corpus's coverage of Reddit is not complete, but we do not expect this to affect our analysis.
Our study examines data from 2007 to 2018 in the four SRs r/gaming, r/movies, r/nba, and r/politics. These SRs were chosen because they are of comparable size, belong to the largest SRs of Reddit, and at the same time all reflect distinct areas of interest (Table 2). For each month, we also draw a random sample of comments from all SRs that will be used for computing word topicality (Section 4). The size of the sample equals the average size of the four selected SRs within the respective month.
Preprocessing. As in previous work (Tan and Lee, 2015), we filter posts for known bots and spammers. We remove abbreviations ("viz."), strings containing numbers ("b4"), references to users ("u/user") and SRs ("r/subreddit"), and both  full and shortened hyperlinks. We convert British English spelling variants to American English and lemmatize all words to remove inflectional morphology. We follow Han and Baldwin (2011) in reducing repetitions of more than three letters ("niiiiice") to three letters. Except for stopwords, we do not employ a frequency threshold; in particular, we include words that occur only once.
Computing morphological families. Given a collection of texts S, we define the morphological families as follows. Let V S be the vocabulary of S, i.e., all words occurring in it. We define the set of parents O S ⊂ V S as the 1,000 most frequent words in S, regardless of whether the word is decomposable or not. This means that parents are not necessarily morphological roots (Haspelmath and Sims, 2010). 5 We attempt to segment all other words w using affixes from a representative list of productive prefixes and suffixes in English (Crystal, 1997). We define the set C of candidate parents of w as follows. If w ∈ O S , then C(w) = {w}. Otherwise, C(w) = b∈B(w) C(b), where B(w) is the set of bases that remain when one of w's affixes is removed. For w * ∈ O S , we then define its morphological family as Procedurally, families can be identified by a recursive bottom-up algorithm. The algorithm is sensitive to morpho-orthographic rules of English (Plag, 2003); e.g., when "ness" is removed from "trumpiness", the result is "trumpy", not "trumpi". 5 In a situation where both "sense" and "sensation", e.g., fall above the frequency threshold, we get two separate morphological families of the parent "sense" (a root) with "nonsense", "sensitive", etc. and the parent "sensation" (not a root) with "sensational", "sensationalism", etc. (the children of "sensation" are not added to the family of "sense"). However, most morphological parents are in fact roots.

Predictors for MFEP
In a setup similar to Altmann et al. (2011), we formalize MFEP as a binary classification task. Given a context interval i (c) , a following temporally adjacent probe interval i (p) , and a morphological parent w * , we ask: what properties of w * can predict whether |F (w * )| increases in i (p) or not?
Here, we set the length of i (c) to 18 months and the length of i (p) to 6 months. Morphological families are computed separately for each pair of i (c) and i (p) . The lowest frequency count of a parent in i (c) is 244. Table 2 summarizes statistics of the morphological families for each SR.
We define a number of predictors for family expansion that are measurements of properties of w * . All predictors are motivated by work in psycholinguistics and NLP. They fall into three natural classes: (i) a type frequency-based predictor (|F |), (ii) token frequency-based predictors (f r ,z (p) ), and (iii) dissemination-based predictors (D U w , D T w , Q w ). All predictors except forz (p) (which is measured in i (c) and i (p) ) are measured in i (c) .
Family size |F |. The family size is a prime example of an endogenous (language-internal) factor, i.e., one that depends on the linguistic system. A morpheme with a large family might combine more readily with new affixes than a morpheme that occurs only with a small number of affixes. This idea bears a theoretical connection to smoothing techniques such as Witten-Bell and Kneser-Ney smoothing, which model the probability of previously unseen n-grams containing a given word (≈ |F n |) by assuming a rich-get-richer process (Manning and Schütze, 1999;Teh, 2006). It is also in line with lexical growth models based on preferential attachment (Steyvers and Tenenbaum, 2005). Intuitively, the fact that morphological children themselves can become the basis for new derivations also suggests a rich-get-richer process.
Notice that |F | is equivalent to the type fre-quency of w * . In linguistics, type frequency is known to be a good predictor of the productivity of inflectional patterns (Bybee, 1995). Furthermore, it has been shown that the morphological family size facilitates lexical processing (Schreuder and Baayen, 1997). To probe whether type frequency also influences the likelihood of a family to grow, we include the predictor |F |, morphological family size averaged over the 18 months of i (c) .
Relative token frequency of parent f r . Frequent words are known to be more accessible in lexical processing than rare words (Jescheniak and Levelt, 1994). Therefore, they might be more available for use in novel derivations, causing an increase in morphological family size.
Trending behaviorz (p) . Changes in the relative frequency of a morphological parent might be indicative of concomitant changes in the morphological family size. If a word gains in popularity and becomes more frequent, this could increase the chances of new morphologically related words being created. The trending behavior is a prime example of an exogenous (language-external) factor, i.e., one that depends on non-linguistic events (e.g., a presidential election). Therefore, we measure whether the parent increases in frequency. This is done in the following way: we calculate the z-score of the frequency distribution of the parent in the probe interval i (p) relative to the frequency distribution in the context interval i (c) . The mean of these z-scores is then used as a continuous variable in the model, where |i (p) | = 6 is the length in months of the probe interval and x and σ (c) are mean and standard deviation of relative frequency of the parent in the 18 months of the context interval i (c) . The measure detects increases in frequency relative to the intrinsic variation in usage frequency of a particular word. This is necessary since some words naturally exhibit stronger short-term fluctuations, which we do not want to count as frequency bursts. Similar methods for peak detection in time series are frequently used, e.g., in Baskozos et al. (2019). We use both i (c) and i (p) for calculatingz (p) because this captures the idea of exogenous forcing without any additional assumptions; notice that the metric is calculated on the parent only and does not include any information about what is being predicted in MFEP, namely changes in the morphological family size.
User dissemination D U w . Following findings by Church and Gale (1995) and Altmann et al. (2011), we define user dissemination D U w as the extent to which the number of users of a specific word w deviates from a Poisson process, where U w is the number of users who posted at least one comment including w in i (c) , f w is the relative frequency of w in i (c) , andŨ (f w ) is the expected number of users under a Poisson model given the relative frequency f w .Ũ (f w ) can be calculated as where N U is the number of users,Ũ j is the probability that the posts of user j contain w at least once, and m U j is the total number of words posted by user j in i (c) . The approximation is valid for f w 1 and m U j / N U j m U j 1. Our data satisfy both requirements.
User dissemination and the following dissemination measures have a cognitive justification: it has been shown that items that are used in more diverse situations and contexts are stored in human memory in a way that makes them more retrievable (Anderson and Milson, 1989;Brysbaert et al., 2016). Thus, words with a higher dissemination are more accessible to speakers and could figure more prominently among bases for new formations. The dissemination measures fall into a gray area between exogenous and endogenous factors since they reflect the cognitive representation of language-external properties (Altmann et al., 2011).
Thread dissemination D T w . Similar to user dissemination, thread dissemination D T w is defined as the extent to which the number of threads containing a specific word w deviates from a Poisson process (Altmann et al., 2011), where T w is the number of threads that include at least one instance of w, andT (f w ) is the expected number of threads under a Poisson model.T (f w ) can again be calculated as where N T ,T j , and m T j are defined analogously to N U ,Ũ j , and m U j . The approximation is again valid since the data satisfy m T j / N T j m T j 1. Topicality Q w . Because SRs are communities centered around interests, words that are characteristic of a SR's topic are more frequent in that SR than in the others. Topicality has been shown to have an impact on lexical dynamics at long time scales (Church, 2000;Montemurro and Zanette, 2016). It could also influence the productivity of morphological families: higher topical dissemination, i.e., lower topicality, could facilitate growth. To capture this effect, we introduce a metric of topical distinctiveness, Q w , which we define as where f w is the relative frequency of the word w in a SR in i (c) , andf w is the expected relative frequency of w based on a random sample of posts from all SRs in i (c) . The polarity of Q w is reversed to D U w and D T w , i.e., a word that is very clumped in SR space will have a high value of Q w , but a word that is very clumped in user or thread space will have a low value of D U w or D T w , respectively.

Experimental Setup
Finding growing families. We use two different notions of growth for MFEP: absolute growth and relative growth. We define absolute growth as |F | are the mean morphological family size in i (p) and i (c) , respectively. Relative growth is defined similarly as For both δ a and δ r , we define binary features based on thresholds l a and l r , i.e., we define a morphological family F to be a positive example if δ a (F ) > l a for a pair of i (p) and i (c) (in the case of absolute growth). We thus train two models: one for predicting whether δ a (F ) > l a , one for predicting whether δ r (F ) > l r .
Model. We use Random Forests (RF) to perform the classification (Breiman, 2001). RF offers two main advantages in comparison with other models. Firstly, as opposed to other tree-based models, RFs decorrelate trees, which is important if the features are correlated (as is the case here). Secondly, the feature importance scores of a RF provide a transparent way to compare the predictive power of features. We do not use more complex albeit potentially better performing methods such as deep architectures since our primary goal is to compare various features and show that MFEP is a feasible computational task.
Since the data contain considerably more negative than positive examples, we randomly sample one negative example for every positive example for the final data. The interval pairs from all SRs were merged into one dataset, which was then split into 0.8 and 0.2 for train/dev and test sets. The train/dev set was split again into 0.8 and 0.2 for train and dev sets. Thus, all sets contain a balanced sample of interval pairs from all SRs. 6 We use a total of 68,000 pairs of intervals (i (c) , i (p) ) where i (c) is the context interval and i (p) the probe interval (see also Table 2). Recall that i (c) has length 18 months and i (p) 6 months. Temporally adjacent interval pairs are overlapping by |i (p) | months, i.e., every month in the original data is used exactly once in a probe interval and three times in a context interval.
We do not perform hyperparameter tuning and instead choose typical values for the hyperparameters of RF: 80 for the number of trees, and 20 for tree depth. For our initial MFEP models, we set the thresholds l a = 2.4 and l r = 1.6, two values in the mid-range of existing values for δ a and δ r . We will later analyze the sensitivity to these hyperparameters in greater detail.

Results
Overall performance. As shown in Table 3, the RF models exhibit a good performance with an overall prediction accuracy of 80.9% for l a = 2.4 and 70.8% for l r = 1.6 (random baselines: 50.0%).    Table 3. Importance loadings are calculated for different values of the thresholds l a and l r . Highest importance loadings for features are highlighted in gray, second-highest in light gray.
The strongest predictor for both models is type frequency with a feature importance of 39.3% and 25.3%, respectively (Table 4). Models trained only on this feature already achieve accuracies of 74.2% and 64.5%, respectively (Table 3). However, the effect of |F | is reversed: while larger morphological families have higher absolute growth values, which is in accordance with theories of lexical growth based on preferential attachment (Steyvers and Tenenbaum, 2005), smaller morphological families have higher relative growth rates. This can be explained by the observation that a large family needs much higher increases in family size to have the same relative growth rate as a small family. The fact that larger families are generally more likely to grow does not seem to counteract this imbalance.
We now analyze how different design choices impact the performance of our model.
Thresholds l a and l r . We systematically vary l a and l r in the range 0.0 ≤ l a ≤ 4.8 and 1.0 ≤ l r ≤ 2.2 to examine their influence on per-formance. 7 We find that accuracies for predicting larger δ a and δ r are considerably higher than for smaller increases (Table 3). For l a = 4.8 (i.e., the family size increases by more than 4.8 members on average), the model has an error rate of 15.4%, which is less than half compared to the error rate of 38.2% for l a = 0.0. This striking result is in line with studies on the predictability of extreme events in social media (Miotto and Altmann, 2014) and statistical physics (Hallerberg and Kantz, 2008) showing that extreme events are generally better predictable than non-extreme events.
We then train single-feature models for varying l a and l r . The best predictor for all values of l a and l r is |F | ( Table 3). The overall second-best predictor isz (p) , even though it is (sometimes only marginally) outperformed by f r and D T w on several values of l a and l r .
To further analyze the relative importance of inidividual features, we examine the RF feature  importance loadings for varying l a and l r (Table  4, Figure 2, Figure 3). While |F | is again the best predictor overall,z (p) much more clearly comes out as the second-best predictor: especially for δ r , it steadily increases with higher values for l r and even surpasses |F | for l r = 2.2.
These results indicate that while the family size is most predictive of morphological family growth in general, high growth rates are particularly likely for families of trending parents (most of which have initially small family sizes). An example for the second case is the burst in the "trump" family before the 2016 presidential election illustrated in Section 1 (Figure 1, Table 1). This would explain how small morphological families can grow in the first place given the overall dominating importance of a large family size: small families need exogenous forcing (Altmann et al., 2011), i.e., external events leading to a burst in token frequency and a subsequent increase in type frequency. In order to test this hypothesis, we retrain the model for δ a on small families (1.5 ≤ |F | ≤ 1.6), varying  175 .187 .196 .193 .165 .190 .184±.011 z (p) . 222 .216 .223 .253 .269 .234 .236±.019 D U w .183 .198 .190 .181 .179 .190 .187±.006 D T w .212 .196 .201 .192 .195 .233 .205±.014 Q w .208 .202 .190 .181 .192 .153 .188±.018 Table 5: Importance loadings for individual features with small families (1.5 ≤ |F | ≤ 1.6). Features as in Table 3. Importance loadings are calculated for different values of the thresholds l a . Highest importance loadings for features are highlighted in gray, secondhighest in light gray. 0.0 ≤ l a ≤ 1 (which is the range of δ a for families of that size);z (p) has the highest feature importance for all values of l a (Table 5). 8 Furthermore, it is interesting to note that the frequency-based as well as the dissemination-based measures are considerably clumped together in feature importance space for absolute growth δ a , with the frequency-based predictors topping the dissemination-based ones. This is in line with recent work on the relative importance of frequency and social dissemination in lexical change (Stewart and Eisenstein, 2018). Higher values in these features correlate with a higher likelihood of growth, except for topicality: here, growth is more likely with lower topicality, which as discussed above indicates higher topical dissemination.
Length of i (c) and i (p) . In previous experiments, the length of i (c) and i (p) was set to 18 and 6 months, respectively. We now analyze how the choice of the interval length, specifically of the length of i (c) , influences the performance of our MFEP model. We retrain the model for 0.0 ≤ l a ≤ 4.8 and 1.0 ≤ l r ≤ 2.2 with |i (c) | = 12 and |i (c) | = 24, i.e., the context intervals are six months shorter and longer than previously. The length of the probe interval is kept unchanged, |i (p) | = 6.
The performances of the two MFEP models is comparable with |i (c) | = 18 (Table 3). Both show top performance at l a = 4.8 and l r = 2.2. However, it is interesting to note that the performance with |i (c) | = 12 tends to be better than |i (c) | = 24 with large values for l a and l r , but worse with smaller values, suggesting that shorter context intervals have an advantage in predicting large increases in family size while longer context intervals have an advantage in predicting smaller increases. New children. In our main study design, growth in the families is not necessarily due to new children being added to the family, i.e., due to an increase of |F n |. A rare but established English word w ∈F o that only occurs a couple of times in the data counts as much to the growth as an innovative form. Here, we try to exclude fluctuations due toF o by excluding all words in the data that are listed on a comprehensive list of English words encompassing over 400,000 word types, an independent estimate of established words. 9 Training the model on the resulting data, we find that accuracies tend be be higher for δ a 10 but lower for δ r than corresponding accuracies on the full dataset (Table 3). This result indicates that our model is not only capable of forecasting the evolution of the entire family but also of predicting the birth of new morphological children.
Error analysis. The segmentation algorithm is doomed to produce a certain number of false positives. To get a clearer picture of its accuracy, we manually examine 500 randomly selected families from one month in the data. Macro-averaged over families, 8.8% of the words are errors, i.e., they do not belong to the morphological family assigned by the algorithm. However, the error rate is not distributed evenly: only 10 of the 500 families are responsible for more than 60% of the errors.
One frequent source of erroneous segmentations is incorrect orthography. The word "representatives", e.g., is frequently written as "represenatives" due to its being pronounced without the consonant "t". The algorithm then segments "represenatives" into "re+pre+senate+ive+s" and adds it toF ("senate"). Another frequent case is the erroneous segmentation of emphatical repetitions of vowels, e.g., "heyy" is segmented as "hey+y" and added toF ("hey"). Such false positives are a major source of distortion in the data.

Related work
Morphological families and productivity. The concept of morphological families was introduced in psycholinguistic work on lexical processing (Schreuder and Baayen, 1997;de Jong et al., 2000;del Prado Martín et al., 2004). These studies show that response latencies in lexical decision are not only influenced by token frequency but also by type frequency, i.e., the size of their morphological family. Morphological families have also been used for analyzing lexical change on historical time scales Schultz, 2013, 2014). However, this work is not comparable to our study since it relies on dictionaries, which typically exclude the transparently formed, non-entrenched words inF n we are interested in (Bauer, 2001).
The main question of our study (how can we predict the growth of morphological families?) is related to a long-standing problem in traditional linguistic scholarship, i.e., what factors influence morphological productivity. The productivity of a morpheme is defined as its propensity to be used in novel combinations and traditionally understood to refer to bound morphemes (Haspelmath and Sims, 2010). Pierrehumbert and Granell (2018) highlight the fact that morphological productivity, just as morphology (Rácz et al., 2015) and other components of language (Labov, 1963) in general, is heavily influenced by social variation. Social groups differ in the morphological patterns they use and in the extent to which they extend these patterns to new words. This makes morphological productivity an exciting new area for future research in computational social science, and it further underscores the relevance of MFEP for that field.
Derivational morphology in NLP. Derivational morphology has received increasing attention in NLP recently. Key challenges include segmenting (Cotterell et al., 2016;Luo et al., 2017;Cotterell and Schütze, 2018), modeling the meaning (Lazaridou et al., 2013;Kisselew et al., 2015;Padó et al., 2016;Cotterell and Schütze, 2018), and predicting the form Deutsch et al., 2018) as well as morphological well-formedness (Hofmann et al., 2020) of derivatives. Whereas all these studies approach derivational morphology from a synchronic standpoint, MFEP is to the best of our knowledge the first computational task that addresses diachronic aspects of derivation.
Lexical change in social media. Language change (Croft, 2000;Bybee, 2015) is most visible on the lexical level. New words like "detrumpify" attract attention, often becoming the subject of public discourse (Metcalf, 2002). Since innovations are taking place at a much faster rate on internet media (Crystal, 2004), social media have become a central resource for studies on lexical change over the last decade (Altmann et al., 2011;Garley and Hockenmaier, 2012;Danescu-Niculescu-Mizil et al., 2013;Grieve et al., 2016;Kershaw et al., 2016;Sang, 2016;Stewart and Eisenstein, 2018;del Tredici and Fernández, 2018).
One central question in this field is: what factors determine whether a word will survive in the lexicon of an online community? Usage frequency is a well-known factor that influences the evolution of a word at historical time scales (Pagel et al., 2007). Studies on lexical change in online groups have shown that this is also true for shorter time scales (Altmann et al., 2011;Stewart and Eisenstein, 2018). Another main factor is the dissemination of a word, i.e., how widely a word is spread across different social and linguistic contexts. Generally, the more disseminated a word is, the more likely it is to grow. This holds for social dissemination across users and threads (Altmann et al., 2011) as well as linguistic dissemination across different lexical collocations (Stewart and Eisenstein, 2018).
The studies mentioned so far focus on token frequency. An exciting new approach looks instead at the meaning of words using diachronic word embeddings (Hamilton et al., 2016). del Tredici et al. (2019), e.g., explore short-term meaning shifts on Reddit and identify considerable changes even within a period of eight years.
A main goal of this study is to add a third approach to studies on lexical change in social media besides word frequency and word embeddings: word families. From a linguistic point of view, these three approaches can be viewed to be complementary: whereas word frequency is contextindependent, both word embeddings and word families reflect context-sensitive measures. However, while word embeddings reflect proximity on the utterance level (which words are close to each other in spoken sentences?), word families reflect proximity on the system level (which words are close to each other in the mental lexicon?).

Conclusion
In this paper, we have proposed MFEP (Morphological Family Expansion Prediction), a new task that aims at predicting how morphological families evolve over time. We have shown that changes in morphological family size provide a fresh look at topical dynamics in social media, thus complementing token frequency as a metric.
Furthermore, we have presented a random forest model for MFEP that achieves very good accura-cies, particularly in predicting extreme growth in morphological family size. The strongest predictor of growth is the morphological family size itself, an endogenous factor. However, the initial growth of small families is mainly driven by the trending behavior of the parent, an exogenous factor. This reflection of external events makes morphological families a promising tools for various fields drawing upon NLP techniques for tracing temporal dynamics in text (e.g., virality detection).
Overall, we see our study as an exciting step in the direction of bringing together computational social science and derivational morphology. In future work, we intend to further fine-tune our methodological apparatus for tackling MFEP.