Multimodal Neural Graph Memory Networks for Visual Question Answering

Mahmoud Khademi


Abstract
We introduce a new neural network architecture, Multimodal Neural Graph Memory Networks (MN-GMN), for visual question answering. The MN-GMN uses graph structure with different region features as node attributes and applies a recently proposed powerful graph neural network model, Graph Network (GN), to reason about objects and their interactions in an image. The input module of the MN-GMN generates a set of visual features plus a set of encoded region-grounded captions (RGCs) for the image. The RGCs capture object attributes and their relationships. Two GNs are constructed from the input module using the visual features and encoded RGCs. Each node of the GNs iteratively computes a question-guided contextualized representation of the visual/textual information assigned to it. Then, to combine the information from both GNs, the nodes write the updated representations to an external spatial memory. The final states of the memory cells are fed into an answer module to predict an answer. Experiments show MN-GMN rivals the state-of-the-art models on Visual7W, VQA-v2.0, and CLEVR datasets.
Anthology ID:
2020.acl-main.643
Volume:
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2020
Address:
Online
Editors:
Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel Tetreault
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7177–7188
Language:
URL:
https://aclanthology.org/2020.acl-main.643
DOI:
10.18653/v1/2020.acl-main.643
Bibkey:
Cite (ACL):
Mahmoud Khademi. 2020. Multimodal Neural Graph Memory Networks for Visual Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7177–7188, Online. Association for Computational Linguistics.
Cite (Informal):
Multimodal Neural Graph Memory Networks for Visual Question Answering (Khademi, ACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.acl-main.643.pdf
Video:
 http://slideslive.com/38929347