Text Genre and Training Data Size in Human-like Parsing

John Hale, Adhiguna Kuncoro, Keith Hall, Chris Dyer, Jonathan Brennan


Abstract
Domain-specific training typically makes NLP systems work better. We show that this extends to cognitive modeling as well by relating the states of a neural phrase-structure parser to electrophysiological measures from human participants. These measures were recorded as participants listened to a spoken recitation of the same literary text that was supplied as input to the neural parser. Given more training data, the system derives a better cognitive model — but only when the training examples come from the same textual genre. This finding is consistent with the idea that humans adapt syntactic expectations to particular genres during language comprehension (Kaan and Chun, 2018; Branigan and Pickering, 2017).
Anthology ID:
D19-1594
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
5846–5852
Language:
URL:
https://aclanthology.org/D19-1594
DOI:
10.18653/v1/D19-1594
Bibkey:
Cite (ACL):
John Hale, Adhiguna Kuncoro, Keith Hall, Chris Dyer, and Jonathan Brennan. 2019. Text Genre and Training Data Size in Human-like Parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5846–5852, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
Text Genre and Training Data Size in Human-like Parsing (Hale et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1594.pdf