Prosodic, syntactic, semantic guidelines for topic structures across domains and corpora

Ana Isabel Mata, Helena Moniz, Telmo Móia, Anabela Gonçalves, Fátima Silva, Fernando Batista, Inês Duarte, Fátima Oliveira, Isabel Falé


Abstract
This paper presents the annotation guidelines applied to naturally occurring speech, aiming at an integrated account of contrast and parallel structures in European Portuguese. These guidelines were defined to allow for the empirical study of interactions among intonation and syntax-discourse patterns in selected sets of different corpora (monologues and dialogues, by adults and teenagers). In this paper we focus on the multilayer annotation process of left periphery structures by using a small sample of highly spontaneous speech in which the distinct types of topic structures are displayed. The analysis of this sample provides fundamental training and testing material for further application in a wider range of domains and corpora. The annotation process comprises the following time-linked levels (manual and automatic): phone, syllable and word level transcriptions (including co-articulation effects); tonal events and break levels; part-of-speech tagging; syntactic-discourse patterns (construction type; construction position; syntactic function; discourse function), and disfluency events as well. Speech corpora with such a multi-level annotation are a valuable resource to look into grammar module relations in language use from an integrated viewpoint. Such viewpoint is innovative in our language, and has not been often assumed by studies for other languages.
Anthology ID:
L14-1141
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1188–1193
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1201_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ana Isabel Mata, Helena Moniz, Telmo Móia, Anabela Gonçalves, Fátima Silva, Fernando Batista, Inês Duarte, Fátima Oliveira, and Isabel Falé. 2014. Prosodic, syntactic, semantic guidelines for topic structures across domains and corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1188–1193, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Prosodic, syntactic, semantic guidelines for topic structures across domains and corpora (Mata et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1201_Paper.pdf