VQD: Visual Query Detection In Natural Scenes

Manoj Acharya, Karan Jariwala, Christopher Kanan


Abstract
We propose a new visual grounding task called Visual Query Detection (VQD). In VQD, the task is to localize a variable number of objects in an image where the objects are specified in natural language. VQD is related to visual referring expression comprehension, where the task is to localize only one object. We propose the first algorithms for VQD, and we evaluate them on both visual referring expression datasets and our new VQDv1 dataset.
Anthology ID:
N19-1194
Volume:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:
June
Year:
2019
Address:
Minneapolis, Minnesota
Editors:
Jill Burstein, Christy Doran, Thamar Solorio
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1955–1961
Language:
URL:
https://aclanthology.org/N19-1194
DOI:
10.18653/v1/N19-1194
Bibkey:
Cite (ACL):
Manoj Acharya, Karan Jariwala, and Christopher Kanan. 2019. VQD: Visual Query Detection In Natural Scenes. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1955–1961, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):
VQD: Visual Query Detection In Natural Scenes (Acharya et al., NAACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/N19-1194.pdf
Data
VQDv1MS COCORefCOCOVisual Question Answering