Experimental Fast-Tracking of Morphological Analysers for Nguni Languages

Sonja Bosch, Laurette Pretorius, Kholisa Podile, Axel Fleisch


Abstract
The development of natural language processing (NLP) components is resource-intensive and therefore justifies exploring ways of reducing development time and effort when building NLP components. This paper addresses the experimental fast-tracking of the development of finite-state morphological analysers for Xhosa, Swati and (Southern) Ndebele by using an existing morphological analyser prototype for Zulu. The research question is whether fast-tracking is feasible across the language boundaries between these closely related varieties. The objective is a thorough assessment of recognition rates yielded by the Zulu morphological analyser for the three related languages. The strategy is to use techniques comprising several cycles of the following steps: applying the analyser to corpus data from all languages, identifying failures, and implementing the respective changes in the analyser. Tests show that the high degree of shared typological properties and formal similarities among the Nguni varieties warrants a modular fast-tracking approach. Word forms recognized by the Zulu analyser were mostly adequately interpreted. Therefore, the focus lies on providing adaptations based on failure output analysis for each language. As a result, the development of analysers for Xhosa, Swati and Ndebele is considerably faster than the creation of the Zulu prototype. The paper concludes with comments on the feasibility of the experiment, and the results of the evaluation.
Anthology ID:
L08-1537
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/643_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Sonja Bosch, Laurette Pretorius, Kholisa Podile, and Axel Fleisch. 2008. Experimental Fast-Tracking of Morphological Analysers for Nguni Languages. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Experimental Fast-Tracking of Morphological Analysers for Nguni Languages (Bosch et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/643_paper.pdf