A Method for Building a Commonsense Inference Dataset based on Basic Events

Kazumasa Omura; Daisuke Kawahara; Sadao Kurohashi

doi:10.18653/v1/2020.emnlp-main.192

A Method for Building a Commonsense Inference Dataset based on Basic Events

Kazumasa Omura, Daisuke Kawahara, Sadao Kurohashi

Abstract

We present a scalable, low-bias, and low-cost method for building a commonsense inference dataset that combines automatic extraction from a corpus and crowdsourcing. Each problem is a multiple-choice question that asks contingency between basic events. We applied the proposed method to a Japanese corpus and acquired 104k problems. While humans can solve the resulting problems with high accuracy (88.9%), the accuracy of a high-performance transfer learning model is reasonably low (76.0%). We also confirmed through dataset analysis that the resulting dataset contains low bias. We released the datatset to facilitate language understanding research.

Anthology ID:: 2020.emnlp-main.192
Volume:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:: November
Year:: 2020
Address:: Online
Editors:: Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2450–2460
Language:
URL:: https://aclanthology.org/2020.emnlp-main.192/
DOI:: 10.18653/v1/2020.emnlp-main.192
Bibkey:
Cite (ACL):: Kazumasa Omura, Daisuke Kawahara, and Sadao Kurohashi. 2020. A Method for Building a Commonsense Inference Dataset based on Basic Events. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2450–2460, Online. Association for Computational Linguistics.
Cite (Informal):: A Method for Building a Commonsense Inference Dataset based on Basic Events (Omura et al., EMNLP 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.emnlp-main.192.pdf
Video:: https://slideslive.com/38939260

PDF Cite Search Video Fix data