Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Text-based Translation

We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, given a text in 124 source languages, we translate it into a severely low resource language using only ∼1,000 lines of low resource data without any external help. Firstly, we propose a systematic method to rank and choose source languages that are close to the low resource language. We call the linguistic definition of language family Family of Origin (FAMO), and we call the empirical definition of higher-ranked languages using our metrics Family of Choice (FAMC). Secondly, we build an Iteratively Pretrained Multilingual Order-preserving Lexiconized Transformer (IPML) to train on ∼1,000 lines (∼3.5%) of low resource data. In order to translate named entities well, we build a massive lexicon table for 2,939 Bible named entities in 124 source languages, and include many that occur once and covers more than 66 severely low resource languages. Moreover, we also build a novel method of combining translations from different source languages into one. Using English as a hypothetical low resource language, we get a +23.9 BLEU increase over a multilingual baseline, and a +10.3 BLEU increase over our asymmetric baseline in the Bible dataset. We get a 42.8 BLEU score for Portuguese-English translation on the medical EMEA dataset. We also have good results for a real severely low resource Mayan language, Eastern Pokomchi.


Abstract
We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, given a text in 124 source languages, we translate it into a severely low resource language using only ∼1,000 lines of low resource data without any external help. Firstly, we propose a systematic method to rank and choose source languages that are close to the low resource language. We call the linguistic definition of language family Family of Origin (FAMO), and we call the empirical definition of higher-ranked languages using our metrics Family of Choice (FAMC). Secondly, we build an Iteratively Pretrained Multilingual Orderpreserving Lexiconized Transformer (IPML) to train on ∼1,000 lines (∼3.5%) of low resource data. To translate named entities correctly, we build a massive lexicon table for 2,939 Bible named entities in 124 source languages, and include many that occur once and covers more than 66 severely low resource languages. Moreover, we also build a novel method of combining translations from different source languages into one. Using English as a hypothetical low resource language, we get a +23.9 BLEU increase over a multilingual baseline, and a +10.3 BLEU increase over our asymmetric baseline in the Bible dataset. We get a 42.8 BLEU score for Portuguese-English translation on the medical EMEA dataset. We also have good results for a real severely low resource Mayan language, Eastern Pokomchi.

Introduction
We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, we aim to translate well under three constraints: having severely small training data in the new target low resource language, having massive source language parallelism, having the same closed text across all languages. Generalization to other texts is prefer- able but not necessary in the goal of producing high quality translation of the closed text. 2020 is the year that we started the life-saving hand washing practice globally. Applications like translating water, sanitation, and hygiene (WASH) guidelines into severely low resource languages are very impactful in tribes like those in Papua New Guinea with 839 living languages (Gordon Jr, 2005;Simons and Fennig, 2017). Translating humanitarian texts like WASH guidelines with scarce data and expert help is key (Bird, 2020).
We focus on five challenges that are not addressed previously. Most multilingual transformer works that translate into low resource language limit their training data to available data in the same or close-by language families or the researchers' intuitive discretion; and are mostly limited to less than 30 languages (Gu et al., 2018;Zhou et al., 2018a;Zhu et al., 2020). Instead, we examine ways to pick useful source languages from 124 source languages in a principled fashion. Secondly, most works require at least 4,000 lines of low resource data (Lin et al., 2020;Qi et al., 2018;Zhou et al., 2018a); we use only ∼1,000 lines of low resource data to simulate real-life situation of having ex-tremely small seed target translation. Thirdly, many works use rich resource languages as hypothetical low resource languages. Moreover, most works do not treat named entities separately; we add an orderpreserving lexiconized component for more accurate translation of named entities. Finally, many multilingual works present final results as sets of translations from all source languages; we build a novel method to combine all translations into one.
We have five contributions. Firstly, we rank the 124 source languages to determine their closeness to the low resource language and choose the top few. We call the linguistic definition of language family Family of Origin (FAMO), and we call the empirical definition of higher-ranked languages using our metrics Family of Choice (FAMC). They often overlap, but may not coincide.
Secondly, we build an Iteratively Pretrained Multilingual Order-preserving Lexiconized Transformer (IPML) training on ∼1,000 lines of low resource data. Using iterative pretraining, we get a +23.9 BLEU increase over a multilingual orderpreserving lexiconized transformer baseline (MLc) using English as a hypothetical low resource language, and a +10.3 BLEU increase over our asymmetric baseline. Training with the low resource language on both the source and target sides boosts translation into the target side. Training on randomly sampled 1,093 lines of low resource data, we reach a 31.3 BLEU score testing on 30,022 lines of Bible. We have a 42.8 BLEU score for Portuguese-English translation on the medical EMEA dataset.
Thirdly, we use a real-life severely low resource Mayan language, Eastern Pokomchi, a Class 0 language (Joshi et al., 2020) as one of our experiment setups. In addition, we also use English as a hypothetical low resource language for easy evaluation.
We also add an order-preserving lexiconized component to translate named entities well. To solve the variable-binding problem to distinguish "Ian calls Yi" from "Yi calls Ian" (Fodor and Pylyshyn, 1988;Graves et al., 2014;Zhou et al., 2018a), we build a lexicon table for 2,939 Bible named entities in 124 source languages including more than 66 severely low resource languages.
Finally, we combine translations from all source languages by using a novel method. For every sentence, we find the translation that is closest to the translation cluster center. The expectated BLEU score of our combined translation is higher than translation from any of the individual sources.

Information Dissemination
Interactive Natural Language Processing (NLP) systems are classified into information assimilation, dissemination, and dialogue (Bird, 2020;Ranzato et al., 2015;Waibel and Fugen, 2008). Information assimilation involves information flow from low resource to rich resource language communities while information dissemination involves information flow from rich resource to low resource language communities. Taken together, they allow dialogue and interaction of different groups at eye level. Most work on information assimilation (Bérard et al., 2020;Earle et al., 2012;Brownstein et al., 2008). Few work on dissemination due to small data, less funding, few experts and limited writing system (Östling and Tiedemann, 2017;Zoph et al., 2016;Anastasopoulos et al., 2017;Adams et al., 2017;Bansal et al., 2017).

Multilingual Transformer
In training, each sentence is labeled with the source and target language label. For example, if we translate from Chuj ("ca") to Cakchiquel ("ck"), each source sentence is tagged with __opt_src_ca __opt_tgt_ck. A sample source sentence is "__opt_src_ca __opt_tgt_ck Tec'b'ejec e b'a mach ex tzeyac'och Jehová yipoc e c'ool". We train on Geforce RTX 2080 Ti using ∼100 million parameters, a 6-layer encoder and a 6-layer decoder that are powered by 512 hidden states, 8 attention heads, 512 word vector size, a dropout of 0.1, an attention dropout of 0.1, 2,048 hidden transformer feed-forward units, a batch size of 6,000, "adam" optimizer, "noam" decay method, and a label smoothing of 0.1 and a learning rate of 2.5 on OpenNMT (Klein et al., 2017;Vaswani et al., 2017). After 190,000 steps, we validate based on BLEU score with early stopping patience of 5.

Star Versus Complete Configuration
We show two configurations of translation paths in Figure 1: star graph (multi-source single-target) configuration and complete graph (multi-source multi-target) configuration. The complete configuration data increases quadratically with the number of languages while the star configuration data increases linearly.

Order-preserving Lexiconized transformer
The variable binding problem issue is difficult in severely low resource scenario; most neural models cannot distinguish the subject and the object of a simple sentence like "Fatma asks her sister Wati to call Yi, the brother of Andika", especially when all named entities appear once or never appear in training (Fodor and Pylyshyn, 1988;Graves et al., 2014). Recently, researchers use order-preserving lexiconized Neural Machine Translation models where named entities are sequentially tagged in a sentence as __NEs (Zhou et al., 2018a). The previous example becomes "__NE0 asks her sister __NE1 to call __NE2, the brother of __NE3". This method works under the assumption of translating a closed text known in advance. Its success relies on good coverage of named entities. To cover many named entities, we build on existing research literature (Wu et al., 2018;Zhou et al., 2018a) to construct a massively parallel lexicon table that covers 2,939 named entities across 124 languages in our Bible database. Our lexicon table is an expansion of the existing literature that covers 1,129 named entities (Wu et al., 2018). We add in 1,810 named entities that are in the extreme end of the tail occurring only once. We also include 66 more real-life severely low resource languages.
For every sentence pair, we build a target named entity decoding dictionary by using all target lexicons from the lexicon table that match with those in the source sentence. In severely low resource setting, our sequence tagging is larged based on dictionary look-up; we also include lexicons that are not in the dictionary but have small edit distances with the source lexicons. In evaluation, we replace all the ordered __NEs using the target decoding dictionary to obtain our final translation.
Let us translate "Fatma asks her sister Wati to call Yi, the brother of Andika" to Chinese and German. Our tagged source sentence that translates to Chinese is "__opt_src_en __opt_tgt_zh __NE0 asks her sister __NE1 to call __NE2, the brother of __NE3"; and we use __opt_tgt_de for German. The source dictionary is "__NE0: Fatma, __NE1: Wati, __NE2: Yi, __NE3: Andika" and we create the target dictionaries. The Chinese output is "__NE0叫她的姐妹__NE1去打电话 给__NE3的兄弟__NE2" and the German output is "__NE0 bittet ihre Schwester __NE1 darum, __NE2, den Bruder __NE3, anzurufen". We decode the named entities to get final translations.

Ranking Source Languages
Existing works on translation from multiple source languages into a single low resource language usually have at most 30 source languages (Gu et al., 2018;Zhou et al., 2018a;Zhu et al., 2020). They are limited within the same or close-by language families, or those with available data, or those chosen based on the researchers' intuitive discretion. Instead, we examine ways to pick useful source languages in a principled fashion motivated by crosslingual impacts and similarities (Shoemark et al., 2016;Sapir, 1921;Odlin, 1989;Cenoz, 2001;Toral and Way, 2018;De Raad et al., 1997;Hermans, 2003;Specia et al., 2016). We find that using many languages that are distant to the target low resource language may produce marginal improvements, if not negative impact. Indeed, existing literature on zero-shot translation also suffers from the limitation of linguistic distance between the source languages and the target language (Lauscher et al., 2020;Lin et al., 2020;Pfeiffer et al., 2020). We therefore rank and select the top few source languages that are closer to the target low resource language using the two metrics below.
We rank source languages according to their closeness to the low resource language. We construct the Family of Choice (FAMC) by comparing different ways of ranking linguistic distances empirically based on the small low resource data.
Let S s and S t be the source and target sentences, let L s be the source length, let P (S t = s t |s s , l s ) be the alignment probability, let F s be the fertility of how many target words a source word is aligned to, let D t be the distortion based on the fixed distancebased reordering model (Koehn, 2009).
We first construct a word-replacement model based on aligning the small amount of target low resource data with that of each source language using fast_align (Dyer et al., 2013). We replace every source word with the most probable target word according to the product of the alignment probability and the probability of fertility equalling one and distortion equalling zero P (F s = 1, D t = 0|s t , s s , l s ). We choose a simple word-replacement model because we aim to work with around 1,000 lines of low resource data. For fast and efficient ranking on such small data, a word-replacement model suits our purpose.
We use two alternatives to create our FAMCs. Our distortion measure is the probability of distortion equalling zero, P (D t = 0|s t , s s , l s ), aggre-gated over all words in a source language. We use the distortion measure to rank the source languages and obtain the distortion-based FAMC (FAMD); we use the translation BLEU scores of the wordreplacement model as another alternative to build the performance-based FAMC (FAMP). In Table 1, we list the top ten languages in FAMD and FAMP for Eastern Pokomchi and English. We use both alternatives to build FAMCs.
To prepare for transformer training, we choose the top ten languages neighboring our target low resource language in FAMD and FAMP. We choose ten because existing literature shows that training with ten languages from two neighboring language families is sufficient in producing quality translation through cross-lingual transfer (Zhou et al., 2018a). Since for some low resource languages, there may not be ten languages in FAMO in our database, we add languages from neighboring families to make an expanded list denoted by FAMO + .

Iterative Pretraining
We have two stages of pretraining using multilingual order-preserving lexiconized transformer on the complete and the star configuration. We design iterative pretraining on symmetric data to address catastrophic forgetting that is common in training (French, 1999;Kirkpatrick et al., 2017).

Stage 1: Pretraining on Neighbors
Firstly, we pretrain on the complete graph configuration of translation paths using the top ten languages neighboring our target low resource language in FAMD, FAMP, and FAMO + respectively. Low resource data is excluded in training.
We use the multilingual order-preserving lexiconized transformer. Our vocabulary is the combination of the vocabulary for the top ten languages together with the low resource vocabulary built from the ∼1,000 lines. The final model can translate from any of the ten languages to each other.

Stage 2: Adding Low Resource Data
We include the low resource data in the second stage of training. Since the low resource data covers ∼ 3.5% of the text while all the source languages cover the whole text, the data is highly asymmetric. To create symmetric data, we align the low resource data with the subset of data from all source languages. As a result, all source languages in the second stage of training have ∼ 3.5% of the text that is aligned with the low resource data.

Source Sentence IPML Translation Reference
En terwyl Hy langs die see van Galiléa loop, sien Hy Simon en Andréas, sy broer, besig om 'n net in die see uit te gooi; want hulle was vissers.
And as He drew near to the lake of Galilee, He Simon saw Andrew, and his brother, lying in the lake, for they were fishermen.
And walking along beside the Sea of Galilee, He saw Simon and his brother Andrew casting a small net in the sea; for they were fishers.
And being in a distance, He saw James, the son of Zebedee, and John, his brother. who kept the nets in the boat.
And going forward from there a little, He saw James the son of Zebedee, and his brother John. And they were in the boat mending the nets.
En verder Jakobus, die seun van Sebedéüs, en Johannes, die broer van Jakobus-aan hulle het Hy die bynaam Boanérges gegee, dit is, seuns van die donder-And James the son of Zebedee, and John the brother of James; and He gave to them the name, which is called Boanerges, being of the voice.
And on James the son of Zebedee, and John the brother of James, He put on them the names Boanerges, which is, Sons of Thunder. We therefore create a complete graph configuration of training paths using all the eleven languages. Using the pretrained model from the previous stage, we train on the complete graph configuration of translation paths from all eleven languages including our low resource language. The vocabulary used is the same as before. We employ the multilingual order-preserving lexiconized transformer for pretraining. The final model can translate from any of the eleven languages to each other.

Final Training
Finally, we focus on translating into the low resource language. We use the symmetric data built from the second stage of pretraining. However, instead of using the complete configuration, we use the star configuration of translation paths from the all source languages to the low resource language. All languages have ∼ 3.5% of the text. Using the pretrained model from the second stage, we employ the multilingual order-preserving lexiconized transformer on the star graph configuration. We use the same vocabulary as before. The final trained model can translate from any of the ten source languages to the low resource language. Using the lexicon dictionaries, we decode the named entities and obtain our final translations.

Combination of Translations
We have multiple translations, one per each source language. Combining all translations is useful for both potential post-editting works and systematic comparison of different experiments especially when the sets of the source languages differ.
Our combination method assumes that we have the same text in all source languages. For each sentence, we form a cluster of translations from all source languages into the low resource language. Our goal is to find the translation that is closest to the center of the cluster. We rank all translations according to how centered this translation is with respect to other sentences by summing all its similarities to the rest. The top is closest to the center of the translation cluster. We take the most centered translation for every sentence to build the combined translation output. The expectated BLEU score of our combined translation is higher than translation from any of the individual source languages.

Data
We use the Bible dataset and the medical EMEA dataset (Mayer and Cysouw, 2014;Tiedemann, 2012). EMEA dataset is from the European Medicines Agency and contains a lot of medical information that may be beneficial to the low resource communities. Our method can be applied to other datasets like WASH guidelines. For the Bible dataset, we use 124 source languages with 31,103 lines of data and a target low resource language with ∼1,000 lines (∼3.5%) of data. We have two setups for the target low resource language. One uses Eastern Pokomchi, a Mayan language; the other uses English as a hypothetical low resource language. We train on only ∼1,000 lines of low resource data from the book of Luke and test on the 678 lines from the book of Mark. Mark is topically similar to Luke, but is written by a different author. For the first stage of pretraining, we use 80%, 10%, 10% split for training, validation and testing. For the second stage onwards, we use 95%, 5% split of Luke for training and validation, and 100% of Mark for testing.
Eastern Pokomchi is Mayan, and English is Germanic. Since our database does not have ten members of each family, we use FAMO + , the expanded version of FAMO. For English, we include five Germanic languages and five Romance languages in FAMO + ; for Eastern Pokomchi, we include five Mayan languages and five Amerindian languages in FAMO + . The Amerindian family is broadly believed to be close to the Mayan family by the linguistic community.
We construct FAMCs by comparing different ways of ranking linguistic distances empirically based on ∼1,000 lines of training data. In Table 1, we list the top ten languages for Eastern Pokomchi and English in FAMD and FAMP respectively.
To imitate the real-life situation of having small seed target translation data, we choose to use ∼1,000 lines (∼3.5%) of low resource data. We also include Eastern Pokomchi in addition to using English as a hypothetical low resource language. Though data size can be constrained to mimic severely low resource scenarios, much implicit information is still used for the hypothetical low resource language that is actually rich resource. For example, implicit information like English is Germanic is often used. For real low resource scenarios, the family information may have yet to be determined; the neighboring languages may be unknown, and if they are known, they are highly likely to be low resource too. We thus use Eastern Pokomchi as our real-life severely low resouce language.
In addition to the Bible dataset, we work with the medical EMEA dataset (Tiedemann, 2012). Using English as a hypothetical language, we train on randomly sampled 1,093 lines of English data, and test on 678 lines of data. Since there are only 9 languages in Germanic and Romance families in EMEA dataset, we include a slavic language Polish in our FAMO + for experiments.
The EMEA dataset is less than ideal comparing with the Bible dataset. The Bible dataset contains the same text for all source languages; however, the EMEA dataset does not contain the same text. It is built from similar documents but has different parallel data for each language pair. Therefore, during test time, we do not combine the translations from various source languages in the EMEA dataset.

Results
We compare our iteratively pretrained multilingual order-preserving lexiconized transformer (IPML) with five baselines in Table 3. MLc is a baseline model of multilingual order-preserving lexiconized transformer training on complete configuration; in other words, we skip the first stage of pretraining and train on the second stage in Chapter 3.3.2 only. MLs is a baseline model of multilingual orderpreserving lexiconized transformer training on star configuration; in other words, we skip both steps of pretraining and train on the final stage in Chapter 3.4 only. PMLc is a baseline model of pretrained multilingual order-preserving lexiconized transformer training on complete configuration; in other words, we skip the final stage of training after completing both stages of pretraining. PMLs is a baseline model of pretrained multilingual orderpreserving lexiconized transformer training on star configuration; in other words, after the first stage of pretraining, we skip the second stage of pretraining and proceed to the final training directly. Finally, AML is a baseline model of multilingual order-preserving lexiconized transformer on asymmetric data. We replicate the ∼1,000 lines of the low resource data till it matches the training size of other source languages; we train on the complete graph configuration using eleven languages. Though the number of low resource training lines is the same as others, information is highly asym-  metric.
Pretraining is key as IPML beats the two baselines that skip pretraining in Table 3. Using English as a hypothetical low resource language training on FAMO + , combined translation improves from 13.4 (MLc) and 14.7 (MLs) to 37.3 (IPML) with iterative pretraining. Training with the low resource language on both the source and the target sides boosts translation into the target side. Star configuration has a slight advantage over complete configuration as it gives priority to translation into the low resource language. Iterative pretraining with BLEU score 37.3 has an edge over one stage of pretraining with scores 34.7 (PMLc) and 35.7 (PMLs).
All three pretrained models on symmetric data, IPML, PMLc and PMLs, beat asymmetric baseline AML. In Table 3, IPML has a +10.3 BLEU increase over our asymmetric baseline on combined translation using English as a hypothetical low resource language training on FAMO + . All four use the same amount of data, but differ in training strategies and data configuration. In severely low resource scenarios, effective training strategies on symmetric data improve translation greatly.
We compare IPML results training on different sets of source languages in FAMO + , FAMD, and FAMP, for English and Eastern Pokomchi in Table 4 and 5. FAMP performs the best for translation into English while both FAMP and FAMD outperforms FAMO + as shown in Table 4. FAMD performs best for translation into Eastern Pokomchi as shown in Table 5. Afrikaans has the highest score for English's FAMD and FAMP, outperform-   ing Dutch, German or French. A reason may be that Afrikaans is the youngest language in the Germanic family with many lexical and syntactic borrowings from English and multiple close neighbors of English (Gordon Jr, 2005). When language family information is limited, constructing FAMC to determine neighbors is very useful in translation.
Comparing Eastern Pokomchi results with English results, we see that translation into real-life severely low resource languages is more difficult than translation into hypothetical ones. The combined score is 38.3 for English in Table 4 and 23.1 for Eastern Pokomchi on FAMD in Table 5. Eastern Pokomchi has ejective consonants which makes tokenization process difficult. It is agglutinative, morphologically rich and ergative just like Basque (Aissen et al., 2017;Clemens et al., 2015). It is complex, unique and nontransparent to the out-Source Sentence IPML Translation Reference Caso detecte efeitos graves ou outros efeitos não mencionados neste folheto, informe o médico veterinário.
If you notice any side effects or other side effects not mentioned in this leaflet, please inform the vétérinaire.
If you notice any serious effects or other effects not mentioned in this leaflet, please inform your veterinarian.
No tratamento de Bovinos com mais de 250 Kg de peso vivo, dividir a dose de forma a não administrar mais de 10 ml por local de injecção.
In the treatment of infants with more than 250 kg in vivo body weight, a the dose to not exceed 10 ml per injection.
For treatment of cattle over 250 kg body weight, divide the dose so that no more than 10 ml are injected at one site.
However, because any of side effects is possible, any treatment that 1-5 weeks should be administered under regular supraveghere.
However, since side effects might occur, any treatment exceeding 1-2 weeks should be under regular veterinary supervision. sider (England, 2011). Indeed, translation into real severely low resource languages is difficult.
We are curious of how our model trained on ∼1,000 lines of data performs on the rest of the Bible. In other words, we would like to know how IPML performs if we train on ∼3.5% of the Bible and test on ∼96.5% of the Bible. In Table 7, we achieve a BLEU score of 31.3 training IPML on randomly sampled 1,093 lines of data for English on FAMO + . Note that the training data is randomly sampled in Table 7 comparing to training on Luke in Table 4 and Table 5. We use this experiment to show that we have good results not only with specific book, but also with randomly sampled data.
We show qualitative examples in Table 2 and 9. The source content is translated well overall and there are a few places for improvement in Table 2. The words "fishermen" and "fishers" are paraphrases of the same concept. IPML predicts the correct concept though it is penalized by BLEU.
Infusing the order-preserving lexiconized component to our training greatly improves qualitative evaluation. But it does not affect BLEU much as BLEU has its limitations in severely low resource scenarios. This is why all experiments include the lexiconized component in training. The BLEU comparison in our paper also applies to the comparison of all experiments without the order-preserving lexiconized component. This is important in reallife situations when a low resource lexicon list is not available, or has to be invented. For example, a person growing up in a local village in Papua New Guinea may have met many people named "Bosai" or "Kaura", but may have never met a person named "Matthew", and we may need to create a lexicon word in the low resource language for "Matthew" possibly through phonetics.
We also see good results with the medical EMEA dataset. Treating English as a hypothetical low resource language, we train on only 1,093 lines of English data. For Portuguese-English translation, we obtain a BLEU score of 42.8 while the rest of languages all obtain BLEU scores above 34 in Table 6 and Table 8. In Table 8, we see that our translation is very good, though a few words are carried from the source language including "vétérinaire". This is mainly because our ∼1,000 lines contain very small vocabulary; however, by carrying the source word over, key information is preserved.

Conclusion
We use ∼1,000 lines of low resource data to translate a closed text that is known in advance to a severely low resource language by leveraging massive source parallelism. We present two metrics to rank the 124 source languages and construct FAMCs. We build an iteratively pretrained multilingual order-preserving lexiconized transformer and combine translations from all source languages into one by using our centric measure. Moreover, we add a multilingual order-preserving lexiconized component to translate the named entities accurately. We build a massively parallel lexicon table for 2,939 Bible named entities in 124 source languages, covering more than 66 severely low resource languages. Our good result for the medical EMEA dataset shows that our method is useful for other datasets and applications.
Our final result can also serve as a ranking measure for linguistic distances though it is much more expensive in terms of time and resources. In the future, we would like to explore more metrics that are fast and efficient in ranking linguistic distances to the severely low resource language.

Appendix
In Table 1 and Table 5, Kanjobal is Eastern Kanjobal, Mam is Northern Mam, Cuzco is Cuzco Quechua, Ayacucho is Ayacucho Quechua, Bolivian is South Bolivian Quechua, and Huallaga is Huallaga Quechua. We show an illustration of WASH guidelines in Figure 2. We also show IPML translations into Eastern Pokomchi (Mayan) in Table 9.