FAB: The French Absolute Beginner Corpus for Pronunciation Training

Sean Robertson, Cosmin Munteanu, Gerald Penn


Abstract
We introduce the French Absolute Beginner (FAB) speech corpus. The corpus is intended for the development and study of Computer-Assisted Pronunciation Training (CAPT) tools for absolute beginner learners. Data were recorded during two experiments focusing on using a CAPT system in paired role-play tasks. The setting grants FAB three distinguishing features from other non-native corpora: the experimental setting is ecologically valid, closing the gap between training and deployment; it features a label set based on teacher feedback, allowing for context-sensitive CAPT; and data have been primarily collected from absolute beginners, a group often ignored. Participants did not read prompts, but instead recalled and modified dialogues that were modelled in videos. Unable to distinguish modelled words solely from viewing videos, speakers often uttered unintelligible or out-of-L2 words. The corpus is split into three partitions: one from an experiment with minimal feedback; another with explicit, word-level feedback; and a third with supplementary read-and-record data. A subset of words in the first partition has been labelled as more or less native, with inter-annotator agreement reported. In the explicit feedback partition, labels are derived from the experiment’s online feedback. The FAB corpus is scheduled to be made freely available by the end of 2020.
Anthology ID:
2020.lrec-1.815
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6613–6620
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.815
DOI:
Bibkey:
Cite (ACL):
Sean Robertson, Cosmin Munteanu, and Gerald Penn. 2020. FAB: The French Absolute Beginner Corpus for Pronunciation Training. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6613–6620, Marseille, France. European Language Resources Association.
Cite (Informal):
FAB: The French Absolute Beginner Corpus for Pronunciation Training (Robertson et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.815.pdf