Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography

Yohei Oseki, Masayuki Asahara


Abstract
The past decade has witnessed the happy marriage between natural language processing (NLP) and the cognitive science of language. Moreover, given the historical relationship between biological and artificial neural networks, the advent of deep learning has re-sparked strong interests in the fusion of NLP and the neuroscience of language. Importantly, this inter-fertilization between NLP, on one hand, and the cognitive (neuro)science of language, on the other, has been driven by the language resources annotated with human language processing data. However, there remain several limitations with those language resources on annotations, genres, languages, etc. In this paper, we describe the design of a novel language resource called BCCWJ-EEG, the Balanced Corpus of Contemporary Written Japanese (BCCWJ) experimentally annotated with human electroencephalography (EEG). Specifically, after extensively reviewing the language resources currently available in the literature with special focus on eye-tracking and EEG, we summarize the details concerning (i) participants, (ii) stimuli, (iii) procedure, (iv) data preprocessing, (v) corpus evaluation, (vi) resource release, and (vii) compilation schedule. In addition, potential applications of BCCWJ-EEG to neuroscience and NLP will also be discussed.
Anthology ID:
2020.lrec-1.24
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
189–194
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.24
DOI:
Bibkey:
Cite (ACL):
Yohei Oseki and Masayuki Asahara. 2020. Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 189–194, Marseille, France. European Language Resources Association.
Cite (Informal):
Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography (Oseki & Asahara, LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.24.pdf