Richard Alexander Castro-Mamani

Also published as: Richard Alexander Castro Mamani


2021

pdf bib
Love Thy Neighbor: Combining Two Neighboring Low-Resource Languages for Translation
John E. Ortega | Richard Alexander Castro Mamani | Jaime Rafael Montoya Samame
Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)

Low-resource languages sometimes take on similar morphological and syntactic characteristics due to their geographic nearness and shared history. Two low-resource neighboring languages found in Peru, Quechua and Ashaninka, can be considered, at first glance, two languages that are morphologically similar. In order to translate the two languages, various approaches have been taken. For Quechua, neural machine transfer-learning has been used along with byte-pair encoding. For Ashaninka, the language of the two with fewer resources, a finite-state transducer is used to transform Ashaninka texts and its dialects for machine translation use. We evaluate and compare two approaches by attempting to use newly-formed Ashaninka corpora for neural machine translation. Our experiments show that combining the two neighboring languages, while similar in morphology, word sharing, and geographical location, improves Ashaninka– Spanish translation but degrades Quechua–Spanish translations.

2020

pdf bib
Overcoming Resistance: The Normalization of an Amazonian Tribal Language
John E Ortega | Richard Alexander Castro-Mamani | Jaime Rafael Montoya Samame
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages

Languages can be considered endangered for many reasons. One of the principal reasons for endangerment is the disappearance of its speakers. Another, more identifiable reason, is the lack of written resources. We present an automated sub-segmentation system called AshMorph that deals with the morphology of an Amazonian tribal language called Ashaninka which is at risk of being endangered due to the lack of availability (or resistance) of native speakers and the absence of written resources. We show that by the use of a cross-lingual lexicon and finite state transducers we can increase accuracy by more than 30% when compared to other modern sub-segmentation tools. Our results, made freely available on-line, are verified by an Ashaninka speaker and perform well in two distinct domains, everyday literary articles and the bible. This research serves as a first step in helping to preserve Ashaninka by offering a sub-segmentation process that can be used to normalize any Ashaninka text which will serve as input to a machine translation system for translation into other high-resource languages spoken by higher populated locations like Spanish and Portuguese in the case of Peru and Brazil where Ashaninka is mostly spoken.

2014

pdf bib
Morphological Disambiguation and Text Normalization for Southern Quechua Varieties
Annette Rios Gonzales | Richard Alexander Castro Mamani
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects