HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish

Fernando Fernández-Martínez, Juan Manuel Lucas-Cuesta, Roberto Barra Chicote, Javier Ferreiros, Javier Macías-Guarasa


Abstract
In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulfil different sets of specific goals following predefined scenarios, according to both different complexity levels and degrees of freedom or initiative allowed to the user. Due to a careful design and its size, the recorded database allows comprehensive studies on speech recognition, speech understanding, dialogue modeling and management, microphone array based speech processing, and both speech and video-based acoustic source localisation. The database has been labelled for quality and efficiency studies on dialogue performance. The whole database has been validated through both objective and subjective tests.
Anthology ID:
L10-1156
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/230_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Fernando Fernández-Martínez, Juan Manuel Lucas-Cuesta, Roberto Barra Chicote, Javier Ferreiros, and Javier Macías-Guarasa. 2010. HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish (Fernández-Martínez et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/230_Paper.pdf