Learning to Synthesize Data for Semantic Parsing

Bailin Wang, Wenpeng Yin, Xi Victoria Lin, Caiming Xiong


Abstract
Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized data generated from our model can substantially help a semantic parser achieve better compositional and domain generalization.
Anthology ID:
2021.naacl-main.220
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2760–2766
Language:
URL:
https://aclanthology.org/2021.naacl-main.220
DOI:
10.18653/v1/2021.naacl-main.220
Bibkey:
Cite (ACL):
Bailin Wang, Wenpeng Yin, Xi Victoria Lin, and Caiming Xiong. 2021. Learning to Synthesize Data for Semantic Parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2760–2766, Online. Association for Computational Linguistics.
Cite (Informal):
Learning to Synthesize Data for Semantic Parsing (Wang et al., NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.220.pdf
Video:
 https://aclanthology.org/2021.naacl-main.220.mp4
Code
 berlino/tensor2struct-public