Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig


Abstract
Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.
Anthology ID:
2020.acl-main.538
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6045–6052
Language:
URL:
https://aclanthology.org/2020.acl-main.538
DOI:
10.18653/v1/2020.acl-main.538
Bibkey:
Cite (ACL):
Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, and Graham Neubig. 2020. Incorporating External Knowledge through Pre-training for Natural Language to Code Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6045–6052, Online. Association for Computational Linguistics.
Cite (Informal):
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation (Xu et al., ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.538.pdf
Video:
 http://slideslive.com/38928800
Code
 neulab/external-knowledge-codegen +  additional community code
Data
CoNaLaCoNaLa-Ext