An exciting workshop on Integrating Language and Vision will be held in coordination with the NIPS 2011 Conference in Granada, Spain.
Workshop date: December 16 or 17, 2011, Location: Sierra Nevada, Spain
See https://sites.google.com/site/nips2011languagevisionworkshop/home for details.
A growing number of researchers in computer vision have started to explore how language accompanying images and video can be used to aid interpretation and retrieval, as well as train object and activity recognizers. Simultaneously, an increasing number of computational linguists have begun to investigate how visual information can be used to aid language learning and interpretation, and to ground the meaning of words and sentences in perception. However, there has been very little direct interaction between researchers in these two distinct disciplines.
Traditional machine learning for both computer vision and NLP requires manually annotating images, video, text, or speech with detailed labels, parse-trees, segmentations, etc. Methods that integrate language and vision hold the promise of greatly reducing such manual supervision by using naturally co-occurring text and images/video to mutually supervise each other.
There is a wide range of important real-world applications that require integrating vision and language, including but not limited to: image and video retrieval, human-robot interaction, medical image processing, human-computer interaction in virtual worlds, and computer graphics generation.
The workshop presentations will mainly consist of short invited talks, currently scheduled speakers are:
* Tamara Berg, Stony Brook University
* Dieter Fox, University of Washington
* Julia Hockenmaier, UIUC
* Mirella Lapata, University of Edinburgh
* Percy Liang, Stanford University
* Kate Saenko & Yangqing Jia, UC Berkeley
* Jeff Siskind, Purdue University
* Josh Tenenbaum, MIT
There will also be relevant invited panels and we also invite submission of abstracts to present posters at the workshop. The topics of interest include but are not limited to:
* joint modeling of text and images/video
* text-to-image and image-to-text generation
* search in multimodal collections
* situated dialogue
* interactive systems/robotics
* meaning representations, common sense knowledge bases
* grounded language learning
* semantic image parsing
* visual ontologies
Selected abstracts will be presented as posters during the workshop sessions as well as 1 minute spotlights. We also have a limited number of student travel grants available (see information on applying.)
Submission deadline: 23:59 EST, October 10, 2011.
Acceptance/grant award notification: October 24, 2011.
Submissions should be made by filling out this form.
The Workshop Organizers:
* Trevor Darrell, UC Berkeley
* Raymond J. Mooney, UT Austin
* Kate Saenko, UC Berkeley & Harvard