Developing a Twi (Asante) Dictionary from Akan Interlinear Glossed Texts

Dorothee Beermann, Lars Hellan, Pavel Mihaylov, Anna Struck


Abstract
Traditionally, a lexicographer identifies the lexical items to be added to a dictionary. Here we present a corpus-based approach to dictionary compilation and describe a procedure that derives a Twi dictionary from a TypeCraft corpus of Interlinear Glossed Texts. We first extracted a list of unique words. We excluded words belonging to different dialects of Akan (mostly Fante and Abron). We corrected misspellings and distinguished English loan words to be integrated in our dictionary from instances of code switching. Next to the dictionary itself, one other resource arising from our work is a lexicographical model for Akan which represents the lexical resource itself, and the extended morphological and word class inventories that provide information to be aggregated. We also represent external resources such as the corpus that serves as the source and word level audio files. The Twi dictionary consists at present of 1367 words; it will be available online and from an open mobile app.
Anthology ID:
2020.sltu-1.41
Volume:
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Dorothee Beermann, Laurent Besacier, Sakriani Sakti, Claudia Soria
Venue:
SLTU
SIG:
Publisher:
European Language Resources association
Note:
Pages:
294–297
Language:
English
URL:
https://aclanthology.org/2020.sltu-1.41
DOI:
Bibkey:
Cite (ACL):
Dorothee Beermann, Lars Hellan, Pavel Mihaylov, and Anna Struck. 2020. Developing a Twi (Asante) Dictionary from Akan Interlinear Glossed Texts. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 294–297, Marseille, France. European Language Resources association.
Cite (Informal):
Developing a Twi (Asante) Dictionary from Akan Interlinear Glossed Texts (Beermann et al., SLTU 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sltu-1.41.pdf