Developing Resources for Automated Speech Processing of Quebec French

Mélanie Lancien, Marie-Hélène Côté, Brigitte Bigi


Abstract
The analysis of the structure of speech nearly always rests on the alignment of the speech recording with a phonetic transcription. Nowadays several tools can perform this speech segmentation automatically. However, none of them allows the automatic segmentation of Quebec French (QF hereafter), the acoustics and phonotactics of QF differing widely from that of France French (FF hereafter). To adequately segment QF, features like diphthongization of long vowels and affrication of coronal stops have to be taken into account. Thus acoustic models for automatic segmentation must be trained on speech samples exhibiting those phenomena. Dictionaries and lexicons must also be adapted and integrate differences in lexical units and in the phonology of QF. This paper presents the development of linguistic resources to be included into SPPAS software tool in order to get Text normalization, Phonetization, Alignment and Syllabification. We adapted the existing French lexicon and developed a QF-specific pronunciation dictionary. We then created an acoustic model from the existing ones and adapted it with 5 minutes of manually time-aligned data. These new resources are all freely distributed with SPPAS version 2.7; they perform the full process of speech segmentation in Quebec French.
Anthology ID:
2020.lrec-1.655
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5323–5328
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.655
DOI:
Bibkey:
Cite (ACL):
Mélanie Lancien, Marie-Hélène Côté, and Brigitte Bigi. 2020. Developing Resources for Automated Speech Processing of Quebec French. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5323–5328, Marseille, France. European Language Resources Association.
Cite (Informal):
Developing Resources for Automated Speech Processing of Quebec French (Lancien et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.655.pdf