Evaluating expressive speech synthesis from audiobook corpora for conversational phrases

Éva Székely, Joao Paulo Cabral, Mohamed Abou-Zleikha, Peter Cahill, Julie Carson-Berndsen


Abstract
Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice styles represented in a particular audiobook, using unsupervised clustering to group the speech corpus of the audiobook into smaller subsets representing the detected voice styles. These subsets of corpora of different voice styles reflect the various ways a speaker uses their voice to express involvement and affect, or imitate characters. This study is an evaluation of the detection of voice styles in an audiobook in the application of expressive speech synthesis. A further aim of this study is to investigate the usability of audiobooks as a language resource for expressive speech synthesis of utterances of conversational speech. Two evaluations have been carried out to assess the effect of the genre transfer: transmitting expressive speech from read aloud literature to conversational phrases with the application of speech synthesis. The first evaluation revealed that listeners have different voice style preferences for a particular conversational phrase. The second evaluation showed that it is possible for users of speech synthesis systems to learn the characteristics of a voice style well enough to make reliable predictions about what a certain utterance will sound like when synthesised using that voice style.
Anthology ID:
L12-1513
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3335–3339
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/864_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Éva Székely, Joao Paulo Cabral, Mohamed Abou-Zleikha, Peter Cahill, and Julie Carson-Berndsen. 2012. Evaluating expressive speech synthesis from audiobook corpora for conversational phrases. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 3335–3339, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Evaluating expressive speech synthesis from audiobook corpora for conversational phrases (Székely et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/864_Paper.pdf