Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large. However, these methods often require extensive filtering of the vocabulary to perform well, and - as we show in this work - result in unstable, and hence less reliable, results. We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word. The method is simple, interpretable and stable. We demonstrate its effectiveness in 9 different setups, considering different corpus splitting criteria (age, gender and profession of tweet authors, time of tweet) and different languages (English, French and Hebrew).


Introduction
Analyzing differences in corpora from different sources (different time periods, populations, geographic regions, news outlets, etc) is a central use case in digital humanities and computational social science. A particular methodology is to identify individual words that are used differently in the different corpora. This includes words that have their meaning changed over time periods (Kim et al., 2014;Kulkarni et al., 2015;Hamilton et al., 2016b;Kutuzov et al., 2018;Tahmasebi et al., 2018), and words that are used differently by different populations (Azarbonyad et al., 2017;Rudolph et al., 2017). It is thus desired to have an automatic, robust and simple method for detecting such potential changes in word usage and surfacing them for human analysis. In this work we present such a method. * Equal contribution.
A popular method for performing the task ( §4) is to train word embeddings on each corpus and then to project one space to the other using a vectorspace alignment algorithm. Then, distances between a word-form to itself in the aligned space are used as an estimation of word usage change (Hamilton et al., 2016b). We show that the common alignment-based approach is unstable, and hence less reliable for the usage change detection task ( §3, §7). In addition, it is also sensitive to proper nouns and requires filtering them.
We propose a new and simple method for detecting usage change, that does not involve vector space alignment ( §5). Instead of trying to align two different vector spaces, we propose to work directly in the shared vocabulary space: we take the neighbors of a word in a vector space to reflect its usage, and consider words that have drastically different neighbours in the spaces induced by the different corpora to be words subjected to usage change. The intuition behind this approach is that words that are used significantly differently across corpora are expected to have different contexts and thus to have only few neighboring words in common. In order to determine the extent of the usage change of a word, we simply consider its top-k neighbors in each of the two corpora, and compute the size of the intersection of the two lists. The smaller the intersection is, the bigger we expect the change to be. The words are ranked accordingly.
The advantages of our method are the following: 1. Simplicity: the method is extremely simple to implement and apply, with no need for space alignment, hyperparameter tuning, and vocabulary filtering, except for simple frequency cutoffs.
2. Stability: Our method is stable, outputting similar results across different word embed-dings trained on the same corpora, in contrast to the alignment-based approach.
3. Interpretability: The ranking produced by our method is very intuitive to analyze. Looking at the neighborhood of a word in the two corpora reveals both the meaning of the word in each, and the extent to which the word has changed.
4. Locality: The interpretability aspect is closely linked to the locality of the decision. In our approach, the score of each word is determined only by its own neighbours in each of the spaces. In contrast, in the projection based method the similarity of a pair of words after the projection depends on the projection process, which implicitly takes into account all the other words in both spaces and their relations to each other, as well as the projection lexicon itself, and the projection algorithm. This makes the algorithmic predictions of the projection-based methods opaque and practically impossible to reason about.
We demonstrate the applicability and robustness of the proposed method ( §7) by performing a series of experiments in which we use it to identify word usage changes in a variety of corpus pairs, reflecting different data division criteria. We also demonstrate the cross-linguistic applicability of the method by successfully applying it to two additional languages beyond English: French (a Romance language) and Hebrew (a Semitic language).
We argue that future work on detecting word change should use our method as an alternative to the now dominant projection-based method. To this end, we provide a toolkit for detecting and visualizing word usage change across corpora. 1

Task Definition
Our aim is to analyze differences between corpora by detecting words that are used differently across them. This task is often referred to as "detecting meaning change" (Azarbonyad et al., 2017;Del Tredici et al., 2019).
However, we find the name "meaning change" to be misleading. Words may have several meanings in the different corpora, but different dominant sense in each corpus, indicating different use of the 1 https://github.com/gonenhila/usage_ change word. For this reason, we refer to this task as "detecting usage change".
We define our task as follows: given two corpora with substantial overlapping vocabularies, identify words that their predominant use is different in the two corpora. The algorithm should return a ranked list of words, from the candidate that is most likely to have undergone usage-change, to the least likely.
Since the primary use of such algorithm is corpus-based research, we expect a human to manually verify the results. To this end, while the method does not need to be completely accurate, it is desirable that most of the top returned words are indeed those that underwent change, and it is also desirable to provide explanations or interpretations as to the usage of the word in each corpus. Lastly, as humans are susceptible to be convinced by algorithms, we prefer algorithms that reflect real trends in the data and not accidental changes in environmental conditions.

Stability
A desired property of an analysis method is stability: when applied several times with slightly different conditions, we expect the method to return the same, or very similar, results. Insignificant changes in the initial conditions should result in insignificant changes in the output. This increases the likelihood that the uncovered effects are real and not just artifacts of the initial conditions. Recent works question the stability of word embedding algorithms, demonstrating that different training runs produce different results, especially with small underlying datasets. Antoniak and Mimno (2018) focuses on the cosine-similarity between words in the learned embedding space, showing large variability upon minor manipulations on the corpus. Wendlandt et al. (2018) make a similar argument, showing that word embeddings are unstable by looking at the 10-nearest neighbors (NN) of a word across the different embeddings, and showing that larger lists of nearest neighbors are generally more stable.
In this work, we are concerned with the stability of usage-change detection algorithms, and present a metric for measuring this stability. A usagechange detection algorithm takes as input two corpora, and returns a ranked list r of candidate words, sorted from the most likely to have changed to the least likely. For a stable algorithm, we expect different runs to return similar lists. While we do not care about the exact position of a word within a list, we do care about the composition of words at the top of the list. We thus propose a measure we call intersection@k, measuring the percentage of shared words in the the top-k predictions of both outputs: where r 1 and r 2 are the two ranked lists, and r k i is the set of top k ranked words in ranking r i .
A value of 0 in this measure means that there are no words in the intersection, which indicates high level of variability in the results, while a value of 1 means that all the words are in the intersection, indicating that the results are fully consistent. We expect to see higher intersection@k as k grows. This expectation is confirmed by our experiments in Section 7.2.
We measure the stability of the usage-change detection algorithms with respect to a change in the underlying word embeddings: we apply the intersection@k metric to two runs of the usagechange detection algorithm on the same corpuspair, where each run is based on a different run of the underlying word embedding algorithm.

The Predominant Approach
The most prominent method for detecting usage change is that of Hamilton et al. (2016b), originally applied to detect shifts in dominant word senses across time. It is still the predominant approach in practice, 2 with recent works building upon it (Yao et al., 2018;Rudolph and Blei, 2018). This method was also shown to be the best performing one among several others (Schlechtweg et al., 2019).
It works by training word embeddings on the two corpora, aligning the spaces, and then ranking the words by the cosine-distance between their representations in the two spaces, where large distance is expected to indicate significant change in meaning. We refer to this method as AlignCos.
The alignment is performed by finding an orthogonal linear transformation Q that, when given matrices X and Y , projects X to Y while minimizng the squared loss: 2 This is also indicated by the large number of citations: 350 according to Google Scholar.
The rows of X correspond to embeddings of words in space A, while the rows of Y are the corresponding embeddings in space B. This optimization is solved using the Orthogonal Procrustes (OP) method (Schönemann, 1966), that provides a closed form solution.
Vector space alignment methods are extensively studied also outside of the area of detecting word change, primarily for aligning embedding spaces across language pairs (Xing et al., 2015;Artetxe et al., 2018b;Lample et al., 2018a;Artetxe et al., 2018a). Also there, the Orthogonal Procrustes method is taken to be a top contender (Lample et al., 2018b;Kementchedjhieva et al., 2018).

Shortcomings of the alignment approach
Self-contradicting objective. Note that the optimization procedure in the (linear) alignment stage attempts to project each word to itself. This includes words that changed usage, and which therefore should not be near each other in the space. While one may hope that other words and the linearity constraints will intervene, the method may succeed, by mistake, to project words that did change usage next to each other, at the expense of projecting words that did not change usage further apart than they should be. This is an inherent problem with any alignment based method that attempts to project the entire vocabulary onto itself.
Requires non-trivial filtering to work well. In addition, the alignment-based method requires nontrivial vocabulary filtering to work well. For example, Hamilton et al. (2016b) extensively filter proper nouns. Indeed, without such filtering, proper-nouns dominate the top of the changed words list. This does not indicate real word usage change, but is an artifact of names being hard to map across embedding spaces. In that respect, it makes sense to filter proper nouns. However, some cases of word usage change do involve names. For example, the word "Harlem", which is used as either a name of a neighborhood in NY or as a name of a basketball team, was detected by our method as a word whose usage changed between tweets of celebrities with different occupations ( §7.1).
Not stable across runs. As we discuss in Section 3 and show in Section 7.2, the approach is not very stable with respect to different random seeds in the embeddings algorithm.

Nearest Neighbors as a Proxy for Meaning
Rather than attempting to project two embedding spaces into a shared space (which may not even map 1:1), we propose to work at the shared vocabulary space. The underlying intuition is that words whose usage changed are likely to be interchangeable with different sets of words, and thus to have different neighbors in the two embedding spaces. This gives rise to a simple and effective algorithm: we represent each word in a corpus as the set of its top k nearest neighbors (NN). We then compute the score for word usage change across corpora by considering the size of the intersection of the two sets (not to be confused with intersection@k defined in Section 3): where N N k i (w) is the set of k-nearest neighbors of word w in space i. Words with a smaller intersection are ranked higher as their meaning-change potential.
We only consider the words in the intersection of both vocabularies, as words that are rare in one of the corpora are easy to spot using the frequency in the two spaces, and do not neatly fit the definition of usage change.
Note that our method does not require extensive filtering of words -we only filter words based on their frequency in the corpus 3 .
We use a large value of k = 1000 4 in practice, because large neighbor sets are more stable than small ones (Wendlandt et al., 2018), leading to improved stability for our algorithm as well.
Limitations Similar to previous methods, our method assumes high quality embeddings, and hence also a relatively large corpus. Indeed, in many cases we can expect large quantities of data to be available to the user, especially when considering the fact that the data needed is raw text rather than labeled text. Using a limited amount of data results in lower quality embeddings, but also with smaller vocabulary size, which might affect our method. For high-quality embeddings with small vocabulary sizes, we believe that changing k accordingly should suffice. Naturally, results will likely degrade as embeddings quality deteriorate.
It is also important to note that, like previous approaches, our method does not attempt to provide any guarantees that the detected words have indeed undergone usage change. It is only intended to propose and highlight candidates for such words. These candidates are meant to later be verified by a user who needs to interpret the results in light of their hypothesis and familiarity with the domain. Unlike previous methods, as we discuss in Section 7.4, our method also provides intuitive means to aid in such an interpretation process.

Experimental Setup
We compare our proposed method (NN) to the method of Hamilton et al. (2016b) described in Section 4 (AlignCos), in which the vector spaces are first aligned using the OP algorithm, and then words are ranked according to the cosine-distance between the word representation in the two spaces. 5 This method was shown to outperform all others that were compared to it by Schlechtweg et al. (2019).
We demonstrate our approach by using it to detect change in word usage in different scenarios. We use the following corpora, whose statistics are listed in Table 1.
We consider three demographics-based distinctions (age, gender, occupation), a day-of-week based distinction, and short-term (4y) diachronic distinctions. We also compare to the longer-term (90y) diachronic setup of Hamilton et al. (2016b), which is based on Google books.
Day-of-week Yang and Leskovec (2011) collect 580 million tweets in English from June 2009 to February 2010, along with their time-stamps. As this is a fairly large corpus, we consider the tweets of a single month (November 2009). We create a split based on the Day-of-Week: weekday (tweets created on Tuesday and Wednesday) vs. weekend (tweets created on Saturday and Sunday). We remove duplicated tweets, as preliminary experiments revealed odd behavior of the representations due to heavily duplicated spam tweets. Hebrew Diachronic (4y, tweets) The Hebrew data we use is taken from a collection of Hebrew tweets we collected for several consecutive years, up to 2018. The collection was performed by using the streaming API and filtering for tweets containing at least one of the top 400 most frequent Hebrew words. We use the 2014 and 2018 portions of the data, and create a split accordingly.
English Diachronic (90y, books) For diachronic study on English corpora, we make use of the embeddings trained on Fiction from Google Books (Davies, 2015) provided by the authors of Hamilton et al. (2016b), specifically for the two years, 1900 and 1990. These embeddings are originally aligned using Orthogonal Procrustes and the words whose relative frequencies are above 10 −5 in both the time periods are ranked using cosine distance.

Implementation details
Tokenization and Word Embeddings We use 300 dimensions word2vec vectors with 4 words context window. Further details of embeddings algorithm and tokenization are available in the appendix.
Vocabulary and Filtering We perform frequency-based filtering of the vocabulary, removing stop words (the most frequent 200 words for each corpus, as well as English stop words as defined in nltk 6 ), as well as low frequency words (we discard the 20% least frequent words in each corpus, and require a minimum of 200 occurrences).
Notably, we do not perform any other form of filtering, and keep proper-nouns and person-names intact.
We consider neighbors having a raw frequency greater than 100 and identify 1000 such nearest neighbors (k =1000) to perform the intersection.

Qualitative Evaluation: Detected Words
We run our proposed method and AlignCos (Hamilton et al., 2016b) on the different scenarios described in Section 6, and manually inspect the results. While somewhat subjective, we believe that the consistent success on a broad setting, much larger than explored in any earlier work, is convincing. We provide examples for two of the setups (English Diachronic and Performer vs. Sports), with the rest of the setups in the appendix. For each one, we list a few interesting words detected by the method, accompanied by a brief explanation (according to the neighbors in each corpus).
In addition, we depict the top-10 words our method yields for the Age split (Table 2), accompanied by the nearest neighbors in each corpus (excluding words in the intersection), to better understand the context. For comparison, we also mention the top-10 words according to the AlignCos method. Similar tables for the other splits are provided in the Appendix.
Across all splits, our method is able to detect high quality words as words that undergo usage change, most of them easily explained by their neighboring words in the two corpora. As expected, we see that the AlignCos method (Hamilton et al.,  Table 2: Top-10 detected words from our method (NN) vs. AlignCos method (last row), for corpus split according to the age of the tweet-author. Each word from our method is accompanied by its top-10 neighbors in each of the two corpora (Young vs. Older). 2016b) is highly sensitive to names, featuring many in the top-10 lists across the different splits. As opposed to AlignCos, our method is robust to global changes in the embedding space, since it looks at many neighbors. As a result, it is not sensitive to groups of words that "move together" in the embedding space (which might be the case with names).
English (diachronic, 90y) Top-100 words identified by our method cover all the words attested as real semantic shift in Hamilton et al. (2016b)'s top-10 except the word 'wanting'. Specifically, three attested words, 'gay', 'major' and 'check' are present in our top-10, which also has more interesting words not present in Hamilton et al. (2016b)'s top-10 (1900 vs. 1990): van (captain vs. vehicle), press (printing vs. places), oxford (location vs. university). In addition, interesting words that came up in the top-30 list are the following: headed (body part vs. move in a direction), mystery (difficulty in understanding vs. book genre).
Occupation (performer vs. sports) Interesting words found at the top-10 list are the following: cc (carbon copy vs. country club), duo (duet vs. pair of people), wing (politics vs. football player position). In addition, interesting words that came up in the top-30 list are the following: jazz (music genre vs. basketball team), worlds (general meaning vs. championships), stages (platforms vs. company(bikes)), record (music record vs. achievement), harlem (neighborhood vs. basketball team).

Quantitative Evaluation: Stability
We compare the stability of our method to that of the AlignCos method (Hamilton et al., 2016b) using the intersection@k metric, as defined in Section 3. We use k ∈ 10, 20, 50, 100, 200, 500, 1000.
In Figure 1(a) we plot the intersection@k for different values of k for all splits, with solid lines for the results of our method and dashed lines for the results of AlignCos method. It is clear that our method is significantly more stable, for all k values and across all splits. To better understand the parameters that affect the stability of the different methods, we also examine how the intersection changes with different values of frequency cut-off. In Figure 1(b) we plot intersection@100 as a function of the frequency cut-off (minimum word occurrences required for a word to be included in the ranking). Here, our method is again more stable for all corpus splits. In addition, our method is similarly stable, regardless the frequency cut-off, unlike the AlignCos method. We also examine how the size of NN lists considered for the intersection   affects the stability. In Figure 1(c) we plot the in-tersection@100 against number of neighbors taken into consideration using our method. We get that from around k = 250, our method is substantially more stable for all splits.

Quantitative Evaluation: DURel and SURel datasets
This field of semantic change suffers from lack of proper evaluation datasets, and there is no common benchmark that is being used. Two new datasets were recently introduced, and used to extensively compare between previous methods ( Both datasets include a limited number of German words, along with human annotations of the degrees of semantic relatedness between contexts of the words (across the different texts). However, they are not ideal as they are extremely limited (22 words each) 7 .
Evaluation Metrics Spearman correlation is the standard measure used in this field to compare between methods with respect to gold rankings. However, it is extremely important to note its limitations in this setting, since comparing to a very small gold ranking might be tricky.  not take into account the global ranking of each method, but only the relative position of each of the gold words in each method's ranking. For example, a method that ranks all the gold words at the bottom of the ranking (out of all the words in the vocabulary) in the same order, would be considered perfect, even though it is clearly not the case. As a possible solution for this problem, we suggest to use Discounted Cumulative Gain (DCG), which better captures also global rankings. As opposed to Spearman, this measure takes into account not only the order of the words, but also their actual scores: where W are the words in the gold dataset, and M is the model being evaluated. We report the results in Table 3. We compute AlignCos results with the best parameters reported in Schlechtweg et al. ing with spearman correlation 9 and with DCG. For DURel, AlignCos gets better results when measuring with spearman, but both methods are on par when using DCG.

Interpretation and Visualization
We find that in many cases, it is not clear why the returned candidate words were chosen, and questions such as "why is the word 'dam' different across age groups?" often arise. The NN method lends itself to interpretation, by considering the top-10 neighbors, as shown in Table 2. We note that this interpretation approach is very reliable in our method, as we are guaranteed to gain insights about the usage change when looking at neighboring words, since most of the neighbors will be different for the identified words. While we can definitely attempt at looking at the NN also for the OP-based meth-9 Average Spearman score over model runs with different numbers of iterations, as done in (Schlechtweg et al., 2019). ods, there we are not guaranteed at all to even spot a difference between the neighbors: it may absolutely be the case that the identified word moved in the embedding space "together" with most of its neighbors. In this case, looking at the neighbors will provide no insight on the nature of this change. We observed this phenomenon in practice. Nonetheless, comparing flat word lists is hard, and 10 words are often insufficient.
We present a visualization method that aids in understanding the model's suggestions. The visualization consists of projecting the word of interest and its top-50 neighbors from each corpus into two dimensions using t-SNE (Maaten and Hinton, 2008), and plotting the result while coloring the neighbors in the intersection in one color and the neighbors unique to each corpus in other colors. We expect the neighbors of a word of interest to have distinct neighbors across the corpora. Figures 2 and 3 show the visualizations for the word clutch in the Gender split, with cyan for female and violet for male, and the word dam in the Age split, with cyan for older and violet for young (in both cases they were no shared neighbours). We plot the projection of the words twice -one plot for each embedding space. We can see that, as expected, the neighboring words are distinct, and that the target word belongs to the respective neighborhood in each space. We conclude that this is a useful tool for interpreting the results of our model.
In addition, two works are more closely related to our approach. In Azarbonyad et al. (2017), the authors also use the neighbors of a word in order to determine its stability (and therefore, the extent to which it changes). Their best model combines the traditional alignment-based approach with weighting the neighbors according to their rank and their stability. The algorithm is iterative, and they update the stability of all the words in the vocabulary in each update step. Our method uses the neighbors of the words directly, does not include an iterative process, and does not rely on cosine-distance in the aligned embeddings. In addition, their method requires computation for the whole vocabulary, while other methods, including ours, usually allow querying for a single word.
Another work that considers the neighbors of the word in order to determine the extent of change is that of Hamilton et al. (2016a), in which they suggest a measure that is based on the changes of similarities between the target word and its neighbors in both spaces. They find that this method is more suitable for identifying changes that are due to cultural factors, rather than linguistic shift. This may serve as another motivation to move from the global measures to a local one. (2019) relies on the inclusion of example-based word sense inventories over time from the Oxford dictionary to a Bert model. Doing so provides an efficient fine-grained word sense representation and enables a seemingly accurate way to monitor word sense change over time. Most of those approaches could be easily used with our method, the inclusion of contextualized embeddings would be for example straightforward, we leave it for future work.

Conclusion
Detecting words that are used differently in different corpora is an important use-case in corpusbased research. We present a simple and effective method for this task, demonstrating its applicability in multiple different settings. We show that the method is considerably more stable than the popular alignment-based method popularized by Hamilton et al. (2016b), and requires less tuning and word filtering. We suggest researchers to adopt this method, and provide an accompanying software toolkit.

A Implementation Details
Tokenization We tokenize the English, French and Hebrew tweets using ark-twokenize-py 10 , Moses tokenizer 11 and UDPipe (Straka and Straková, 2017), respectively. We lowercase all the tweets and remove hashtags, mentions, retweets and URLs. We replace all the occurrences of numbers with a special token. We discard all words that do not contain one of the following: (1) a character from the respective language; (2) one of these punctuations: "-", "'", "."; (3) emoji.
Word embeddings We construct the word representations by using the continuous skip-gram negative sampling model from Word2vec (Mikolov et al., 2013a,b). We use the Gensim 12 implementation. For all our experiments, we set vector dimension to 300, window size to 4, and minimum number of occurrences of a word to 20. The rest of the hyperparameters are set to their default value.
For the stability experiments we run the embedding algorithm twice, each time with a different random seed.

B Qualitative Evaluation: Detected Words
We show the top-10 words our method yields for each of the different splits, accompanied with the nearest neighbors in each corpus (excluding words in the intersection), to better understand the context. For comparison, we also show the top-10 words according to the AlignCos method. The splits are the following: English : 1900 vs. 1990 The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the year of the English text is displayed in Table 4.
Age: Young vs. Older The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the age of the tweet-author is displayed in Section 7. Interesting words found at the top-10 list are the following (young vs. older): dem ('them' vs. US political party), dam ('damn' vs. water barrier), assist (football contribution vs. help). In addition, interesting words that came up in the top-30 list are the following: pc (personal computer vs. Canadian party), presents (introduces vs. gifts), wing (general vs. political meaning), prime (general vs. political meaning), lab (school vs. professional).
Gender: Male vs. Female The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the gender of the tweet-author is displayed in Table 5. Interesting words found at the top-10 list are the following (male vs. female): clutch (grasping vs. female bag), bra (colloquial usage like 'bro' vs. female clothing), gp (grand prix event vs. general practitioner). In addition, interesting words that came up in the top-40 list are the following: stat (statistics vs. right away), pit (car-related vs. dog-related), dash (radio station vs. quantity), pearl (pearl harbor vs. gemstone and color).
Occupation: Performer vs. Sports The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the occupation (Performer vs. Sports) of the tweetauthor is displayed in Table 6.
Occupation: Creator vs. Sports The list of top-10 detected words from our method (NN) vs. Align-Cos method, for corpus split according to the occupation (Creator vs. Sports) of the tweet-author is displayed in Table 7. Interesting words found at the top-10 list are the following (creator vs. sports): cc (carbon copy vs. country club), op (event opening vs. operation), wing (politics vs. football player position), worlds (earth vs. world cup). In addition, interesting words that came up in the top-20 list are the following: oval (oval office vs. sports ground), fantasy (genre vs. fantasy football), striking (shocking vs. salient), chilling (frightening vs. relaxing), fury (book: fire and fury vs. British boxer).
Occupation: Creator vs. Performer The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the occupation (Creator vs. Performer) of the tweetauthor is displayed in  Table 10. Interesting words found at the top-10 list are the following (2014 vs. 2018): ia (frequent misspelled contraction of "ya" in 2014, vernacular form of "il y a", there is, vs. "intelligence artificielle", artificial intelligence), divergent (the movie vs. the adjective).
In addition, interesting words that came up in the top-30 list are the following: pls (contraction of the borrowing "please" vs. the acronym of "Position latérale de sécurité", lateral safety position, which is now used as a figurative synonym for "having a stroke". In the same vein, and tied to political debates, we note apl (contraction of "appel/appeler", call/to call vs. controversial housing subsidies).
Hebrew: 2014 vs. 2018 The list of top-10 detected words from our method (NN) vs. AlignCos method, for corpus split according to the year of the Hebrew text is displayed in Figure 4. Interesting words found at the top-10 list (2014 vs. 2018) are the following (we use transliteration accompanied with a literal translation to English): beelohim-in god (pledge word vs. religion-related) and Kim-Kim (First name vs. Kim Jong-un). In addition, interesting words that came up in the top-30 list are the following: shtifat-washing (plumbing vs. brainwashing), miklat-shelter (building vs. asylum (for refugees)), borot-pit/ignorance (plural of pit vs. ignorance).