Challenges in Data-to-Document Generation

Sam Wiseman, Stuart Shieber, Alexander Rush


Abstract
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
Anthology ID:
D17-1239
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2253–2263
URL:
https://www.aclweb.org/anthology/D17-1239
DOI:
10.18653/v1/D17-1239
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://www.aclweb.org/anthology/D17-1239.pdf
Attachment:
 D17-1239.Attachment.pdf