Parsing transcripts of speech

Andrew Caines, Michael McCarthy, Paula Buttery


Abstract
We present an analysis of parser performance on speech data, comparing word type and token frequency distributions with written data, and evaluating parse accuracy by length of input string. We find that parser performance tends to deteriorate with increasing length of string, more so for spoken than for written texts. We train an alternative parsing model with added speech data and demonstrate improvements in accuracy on speech-units, with no deterioration in performance on written text.
Anthology ID:
W17-4604
Volume:
Proceedings of the Workshop on Speech-Centric Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Nicholas Ruiz, Srinivas Bangalore
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27–36
Language:
URL:
https://aclanthology.org/W17-4604
DOI:
10.18653/v1/W17-4604
Bibkey:
Cite (ACL):
Andrew Caines, Michael McCarthy, and Paula Buttery. 2017. Parsing transcripts of speech. In Proceedings of the Workshop on Speech-Centric Natural Language Processing, pages 27–36, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Parsing transcripts of speech (Caines et al., 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-4604.pdf
Data
English Web Treebank