%0 Conference Proceedings %T An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction %A Larson, Stefan %A Mahendran, Anish %A Peper, Joseph J. %A Clarke, Christopher %A Lee, Andrew %A Hill, Parker %A Kummerfeld, Jonathan K. %A Leach, Kevin %A Laurenzano, Michael A. %A Tang, Lingjia %A Mars, Jason %Y Inui, Kentaro %Y Jiang, Jing %Y Ng, Vincent %Y Wan, Xiaojun %S Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) %D 2019 %8 November %I Association for Computational Linguistics %C Hong Kong, China %F larson-etal-2019-evaluation %X Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope—i.e., queries that do not fall into any of the system’s supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems. %R 10.18653/v1/D19-1131 %U https://aclanthology.org/D19-1131 %U https://doi.org/10.18653/v1/D19-1131 %P 1311-1316