Improving Context Modelling in Multimodal Dialogue Generation

Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, Verena Rieser


Abstract
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system. Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain. We introduce a multimodal extension to the Hierarchical Recurrent Encoder-Decoder (HRED) model and show that this extension outperforms strong baselines in terms of text-based similarity metrics. We also showcase the shortcomings of current vision and language models by performing an error analysis on our system’s output.
Anthology ID:
W18-6514
Volume:
Proceedings of the 11th International Conference on Natural Language Generation
Month:
November
Year:
2018
Address:
Tilburg University, The Netherlands
Editors:
Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–134
Language:
URL:
https://aclanthology.org/W18-6514
DOI:
10.18653/v1/W18-6514
Bibkey:
Cite (ACL):
Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. 2018. Improving Context Modelling in Multimodal Dialogue Generation. In Proceedings of the 11th International Conference on Natural Language Generation, pages 129–134, Tilburg University, The Netherlands. Association for Computational Linguistics.
Cite (Informal):
Improving Context Modelling in Multimodal Dialogue Generation (Agarwal et al., INLG 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-6514.pdf
Code
 shubhamagarwal92/mmd