Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech

Atli Sigurgeirsson, Gunnar Örnólfsson, Jón Guðnason


Abstract
Atli Þór Sigurgeirsson, atlithors@ru.is, Reykjavik University Gunnar Thor Örnólfsson, gunnarthor@hi.is, Árni Magnússon institute of Icelandic studies Dr. Jón Guðnason, jg@ru.is In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3% of all Icelandic diphones at least once and 81% of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.
Anthology ID:
2020.sltu-1.44
Volume:
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Dorothee Beermann, Laurent Besacier, Sakriani Sakti, Claudia Soria
Venue:
SLTU
SIG:
Publisher:
European Language Resources association
Note:
Pages:
316–320
Language:
English
URL:
https://aclanthology.org/2020.sltu-1.44
DOI:
Bibkey:
Cite (ACL):
Atli Sigurgeirsson, Gunnar Örnólfsson, and Jón Guðnason. 2020. Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 316–320, Marseille, France. European Language Resources association.
Cite (Informal):
Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech (Sigurgeirsson et al., SLTU 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sltu-1.44.pdf