Grounded PCFG Induction with Images

Lifeng Jin, William Schuler


Abstract
Recent work in unsupervised parsing has tried to incorporate visual information into learning, but results suggest that these models need linguistic bias to compete against models that only rely on text. This work proposes grammar induction models which use visual information from images for labeled parsing, and achieve state-of-the-art results on grounded grammar induction on several languages. Results indicate that visual information is especially helpful in languages where high frequency words are more broadly distributed. Comparison between models with and without visual information shows that the grounded models are able to use visual information for proposing noun phrases, gathering useful information from images for unknown words, and achieving better performance at prepositional phrase attachment prediction.
Anthology ID:
2020.aacl-main.42
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Kam-Fai Wong, Kevin Knight, Hua Wu
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
396–408
Language:
URL:
https://aclanthology.org/2020.aacl-main.42
DOI:
Bibkey:
Cite (ACL):
Lifeng Jin and William Schuler. 2020. Grounded PCFG Induction with Images. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 396–408, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Grounded PCFG Induction with Images (Jin & Schuler, AACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aacl-main.42.pdf
Data
MS COCO