Reading Between the Lines: Exploring Infilling in Visual Narratives

Khyathi Raghavi Chandu, Ruo-Ping Dong, Alan W Black


Abstract
Generating long form narratives such as stories and procedures from multiple modalities has been a long standing dream for artificial intelligence. In this regard, there is often crucial subtext that is derived from the surrounding contexts. The general seq2seq training methods render the models shorthanded while attempting to bridge the gap between these neighbouring contexts. In this paper, we tackle this problem by using infilling techniques involving prediction of missing steps in a narrative while generating textual descriptions from a sequence of images. We also present a new large scale visual procedure telling (ViPT) dataset with a total of 46,200 procedures and around 340k pairwise images and textual descriptions that is rich in such contextual dependencies. Generating steps using infilling technique demonstrates the effectiveness in visual procedures with more coherent texts. We conclusively show a METEOR score of 27.51 on procedures which is higher than the state-of-the-art on visual storytelling. We also demonstrate the effects of interposing new text with missing images during inference. The code and the dataset will be publicly available at https://visual-narratives.github.io/Visual-Narratives/.
Anthology ID:
2020.emnlp-main.93
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1220–1229
Language:
URL:
https://aclanthology.org/2020.emnlp-main.93
DOI:
10.18653/v1/2020.emnlp-main.93
Bibkey:
Cite (ACL):
Khyathi Raghavi Chandu, Ruo-Ping Dong, and Alan W Black. 2020. Reading Between the Lines: Exploring Infilling in Visual Narratives. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1220–1229, Online. Association for Computational Linguistics.
Cite (Informal):
Reading Between the Lines: Exploring Infilling in Visual Narratives (Chandu et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.93.pdf
Video:
 https://slideslive.com/38939186
Data
VIST