Learning to Generate Textual Data

Guillaume Bouchard1, Pontus Stenetorp2, Sebastian Riedel1
1UCL, 2University College London


Abstract

To learn text understanding models requiring millions of parameters, one needs a massive amount of data. Instead, we argue that generating data can compensate the need for large datasets. While defining generic data generators is tricky, we propose to allow these generators to be "weakly" specified, leaving letting the undetermined coefficients to be learned from data. We derived an efficient algorithm called GeneRe, that jointly estimate the parameters of the model and the undetermined sampling coefficients, removing the need for costly cross-validation. We illustrate its benefit by learning to solve math exam questions using a sequence-to-sequence recurrent network.