Evaluation of Spoken Document Retrieval for Historic Speech Collections

Willemijn Heeren, Franciska de Jong, Laurens van der Werff, Marijn Huijbregts, Roeland Ordelman


Abstract
The re-use of spoken word audio collections maintained by audiovisual archives is severely hindered by their generally limited access. The CHoral project, which is part of the CATCH program funded by the Dutch Research Council, aims to provide users of speech archives with online, instead of on-location, access to relevant fragments, instead of full documents. To meet this goal, a spoken document retrieval framework is being developed. In this paper the evaluation efforts undertaken so far to assess and improve various aspects of the framework are presented. These efforts include (i) evaluation of the automatically generated textual representations of the spoken word documents that enable word-based search, (ii) the development of measures to estimate the quality of the textual representations for use in information retrieval, and (iii) studies to establish the potential user groups of the to-be-developed technology, and the first versions of the user interface supporting online access to spoken word collections.
Anthology ID:
L08-1142
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/520_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Willemijn Heeren, Franciska de Jong, Laurens van der Werff, Marijn Huijbregts, and Roeland Ordelman. 2008. Evaluation of Spoken Document Retrieval for Historic Speech Collections. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Evaluation of Spoken Document Retrieval for Historic Speech Collections (Heeren et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/520_paper.pdf