The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work

Andrej Žgank, Ana Zwitter Vitez, Darinka Verdonik


Abstract
The aim of the paper is to search for common guidelines for the future development of speech databases for less resourced languages in order to make them the most useful for both main fields of their use, linguistic research and speech technologies. We compare two standards for creating speech databases, one followed when developing the Slovene speech database for automatic speech recognition ― BNSI Broadcast News, the other followed when developing the Slovene reference speech corpus GOS, and outline possible common guidelines for future work. We also present an add-on for the GOS corpus, which enables its usage for automatic speech recognition.
Anthology ID:
L14-1558
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2644–2647
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/710_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Andrej Žgank, Ana Zwitter Vitez, and Darinka Verdonik. 2014. The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2644–2647, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work (Žgank et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/710_Paper.pdf