NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions

Fuxiang Chen, Seung-won Hwang, Jaegul Choo, Jung-Woo Ha, Sunghun Kim


Abstract
Generating SQL codes from natural language questions (NL2SQL) is an emerging research area. Existing studies have mainly focused on clear scenarios where specified information is fully given to generate a SQL query. However, in developer forums such as Stack Overflow, questions cover more diverse tasks including table manipulation or performance issues, where a table is not specified. The SQL query posted in Stack Overflow, Pseudo-SQL (pSQL), does not usually contain table schemas and is not necessarily executable, is sufficient to guide developers. Here we describe a new NL2pSQL task to generate pSQL codes from natural language questions on under-specified database issues, NL2pSQL. In addition, we define two new metrics suitable for the proposed NL2pSQL task, Canonical-BLEU and SQL-BLEU, instead of the conventional BLEU. With a baseline model using sequence-to-sequence architecture integrated by denoising autoencoder, we confirm the validity of our task. Experiments show that the proposed NL2pSQL approach yields well-formed queries (up to 43% more than a standard Seq2Seq model). Our code and datasets will be publicly released.
Anthology ID:
D19-1262
Volume:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan
Venues:
EMNLP | IJCNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2603–2613
Language:
URL:
https://aclanthology.org/D19-1262
DOI:
10.18653/v1/D19-1262
Bibkey:
Cite (ACL):
Fuxiang Chen, Seung-won Hwang, Jaegul Choo, Jung-Woo Ha, and Sunghun Kim. 2019. NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2603–2613, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
NL2pSQL: Generating Pseudo-SQL Queries from Under-Specified Natural Language Questions (Chen et al., EMNLP-IJCNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-1262.pdf