Open-source Multi-speaker Corpora of the English Accents in the British Isles

Isin Demirsahin, Oddur Kjartansson, Alexander Gutkin, Clara Rivera


Abstract
This paper presents a dataset of transcribed high-quality audio of English sentences recorded by volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The recording scripts were curated specifically for accent elicitation, covering a variety of phonological phenomena and providing a high phoneme coverage. The scripts include pronunciations of global locations, major airlines and common personal names in different accents; and native speaker pronunciations of local words. Overlapping lines for all speakers were included for idiolect elicitation, which include the same or similar lines with other existing resources such as the CSTR VCTK corpus and the Speech Accent Archive to allow for easy comparison of personal and regional accents. The resulting corpora include over 31 hours of recordings from 120 volunteers who self-identify as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English.
Anthology ID:
2020.lrec-1.804
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6532–6541
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.804
DOI:
Bibkey:
Cite (ACL):
Isin Demirsahin, Oddur Kjartansson, Alexander Gutkin, and Clara Rivera. 2020. Open-source Multi-speaker Corpora of the English Accents in the British Isles. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6532–6541, Marseille, France. European Language Resources Association.
Cite (Informal):
Open-source Multi-speaker Corpora of the English Accents in the British Isles (Demirsahin et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.804.pdf