Standardized Tests as benchmarks for Artificial Intelligence

Mrinmaya Sachan, Minjoon Seo, Hannaneh Hajishirzi, Eric Xing


Abstract
Standardized tests have recently been proposed as replacements to the Turing test as a driver for progress in AI (Clark, 2015). These include tests on understanding passages and stories and answering questions about them (Richardson et al., 2013; Rajpurkar et al., 2016a, inter alia), science question answering (Schoenick et al., 2016, inter alia), algebra word problems (Kushman et al., 2014, inter alia), geometry problems (Seo et al., 2015; Sachan et al., 2016), visual question answering (Antol et al., 2015), etc. Many of these tests require sophisticated understanding of the world, aiming to push the boundaries of AI. For this tutorial, we broadly categorize these tests into two categories: open domain tests such as reading comprehensions and elementary school tests where the goal is to find the support for an answer from the student curriculum, and closed domain tests such as intermediate level math and science tests (algebra, geometry, Newtonian physics problems, etc.). Unlike open domain tests, closed domain tests require the system to have significant domain knowledge and reasoning capabilities. For example, geometry questions typically involve a number of geometry primitives (lines, quadrilaterals, circles, etc) and require students to use axioms and theorems of geometry (Pythagoras theorem, alternating angles, etc) to solve them. These closed domains often have a formal logical basis and the question can be mapped to a formal language by semantic parsing. The formal question representation can then provided as an input to an expert system to solve the question.
Anthology ID:
D18-3005
Volume:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Month:
October-November
Year:
2018
Address:
Melbourne, Australia
Editors:
Mausam, Lu Wang
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/D18-3005
DOI:
Bibkey:
Cite (ACL):
Mrinmaya Sachan, Minjoon Seo, Hannaneh Hajishirzi, and Eric Xing. 2018. Standardized Tests as benchmarks for Artificial Intelligence. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Standardized Tests as benchmarks for Artificial Intelligence (Sachan et al., EMNLP 2018)
Copy Citation: