Experimental detection of vowel pronunciation variants in Amharic

Thomas Pellegrini, Lori Lamel


Abstract
The pronunciation lexicon is a fundamental element in an automatic speech transcription system. It associates each lexical entry (usually a grapheme), with one or more phonemic or phone-like forms, the pronunciation variants. Thorough knowledge of the target language is a priori necessary to establish the pronunciation baseforms and variants. The reliance on human expertise can pose difficulties in developing a system for a language where such knowledge may not be readily available. In this article a speech recognizer is used to help select pronunciation variants in Amharic, the official language of Ethiopia, focusing on alternate choices for vowels. This study is carried out using an audio corpus composed of 37 hours of speech from radio broadcasts which were orthographically transcribed by native speakers. Since the corpus is relatively small for estimating pronunciation variants, a first set of studies were carried out at a syllabic level. Word lexica were then constructed based on the observed syllable occurences. Automatic alignments were compared for lexica containing different vowel variants, with both context-independent and context-dependent acoustic models sets. The variant2+ measure proposed in (Adda-Decker and Lamel, 1999) is used to assess the potential need for pronunciation variants.
Anthology ID:
L06-1434
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Editors:
Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/701_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Thomas Pellegrini and Lori Lamel. 2006. Experimental detection of vowel pronunciation variants in Amharic. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
Experimental detection of vowel pronunciation variants in Amharic (Pellegrini & Lamel, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/701_pdf.pdf