Preliminary Program

Learning to Generate Textual Data

Guillaume Bouchard¹, Pontus Stenetorp², Sebastian Riedel¹
¹UCL, ²University College London

Abstract

To learn text understanding models requiring millions of parameters, one needs a massive amount of data. Instead, we argue that generating data can compensate the need for large datasets. While defining generic data generators is tricky, we propose to allow these generators to be "weakly" specified, leaving letting the undetermined coefficients to be learned from data. We derived an efficient algorithm called GeneRe, that jointly estimate the parameters of the model and the undetermined sampling coefficients, removing the need for costly cross-validation. We illustrate its benefit by learning to solve math exam questions using a sequence-to-sequence recurrent network.