A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC

Mark Yatskar


Abstract
We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers. We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third. Because of the datasets’ structural similarity, a single extractive model can be easily adapted to any of the datasets and we show improved baseline results on both SQuAD 2.0 and CoQA. Despite the similarity, models trained on one dataset are ineffective on another dataset, but we find moderate performance improvement through pretraining. To encourage cross-evaluation, we release code for conversion between datasets.
Anthology ID:
N19-1241
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2318–2323
Language:
URL:
https://aclanthology.org/N19-1241
DOI:
10.18653/v1/N19-1241
Bibkey:
Cite (ACL):
Mark Yatskar. 2019. A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2318–2323, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC (Yatskar, NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1241.pdf
Code
 my89/co-squac
Data
CoQAQuACSQuAD