Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, Marcus Rohrbach


Anthology ID:
D16-1044
Volume:
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2016
Address:
Austin, Texas
Editors:
Jian Su, Kevin Duh, Xavier Carreras
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
457–468
Language:
URL:
https://aclanthology.org/D16-1044
DOI:
10.18653/v1/D16-1044
Bibkey:
Cite (ACL):
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 457–468, Austin, Texas. Association for Computational Linguistics.
Cite (Informal):
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (Fukui et al., EMNLP 2016)
Copy Citation:
PDF:
https://aclanthology.org/D16-1044.pdf
Code
 akirafukui/vqa-mcb +  additional community code
Data
Flickr30K EntitiesFlickr30kMS COCOVisual GenomeVisual Question AnsweringVisual Question Answering v2.0Visual7W