MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants

Simon Ostermann, Michael Roth, Manfred Pinkal


Abstract
We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions cannot be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102.uni-saarland.de/?page_id=2582
Anthology ID:
S19-1012
Volume:
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Rada Mihalcea, Ekaterina Shutova, Lun-Wei Ku, Kilian Evang, Soujanya Poria
Venue:
*SEM
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–117
Language:
URL:
https://aclanthology.org/S19-1012
DOI:
10.18653/v1/S19-1012
Bibkey:
Cite (ACL):
Simon Ostermann, Michael Roth, and Manfred Pinkal. 2019. MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), pages 103–117, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants (Ostermann et al., *SEM 2019)
Copy Citation:
PDF:
https://aclanthology.org/S19-1012.pdf
Data
ConceptNetMCScript