From Human Communication to Intelligent User Interfaces: Corpora of Spoken Estonian

Tiit Hennoste, Olga Gerassimenko, Riina Kasterpalu, Mare Koit, Andriela Rääbis, Krista Strandson


Abstract
We argue for the necessity of studying human-human spoken conversations of various kinds in order to create user interfaces to databases. An efficient user interface benefits from a well-organized corpus that can be used for investigating the strategies people use in conversations in order to be efficient and to handle the spoken communication problems. For modeling the natural behaviour and testing the model we need a dialogue corpus where the roles of participants are close to the roles of the dialogue system and its user. For that reason, we collect and investigate the Corpus of the Spoken Estonian and the Estonian Dialogue Corpus as the sources for human-human interaction investigation. The transcription conventions and annotation typology of spoken human-human dialogues in Estonian are introduced. For creating a user interface the corpus of one institutional conversation type is insufficient, since we need to know what phenomena are inherent for the spoken language in general, what means are used only in certain types of the conversations and what are the differences.
Anthology ID:
L08-1514
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/518_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Tiit Hennoste, Olga Gerassimenko, Riina Kasterpalu, Mare Koit, Andriela Rääbis, and Krista Strandson. 2008. From Human Communication to Intelligent User Interfaces: Corpora of Spoken Estonian. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
From Human Communication to Intelligent User Interfaces: Corpora of Spoken Estonian (Hennoste et al., LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/518_paper.pdf