Learning Character-level Compositionality with Visual Features

Frederick Liu, Han Lu, Chieh Lo, Graham Neubig


Abstract
Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the character-level: the meaning of a character is derived by the sum of its parts. In this paper, we model this effect by creating embeddings for characters based on their visual characteristics, creating an image for the character and running it through a convolutional neural network to produce a visual character embedding. Experiments on a text classification task demonstrate that such model allows for better processing of instances with rare characters in languages such as Chinese, Japanese, and Korean. Additionally, qualitative analyses demonstrate that our proposed model learns to focus on the parts of characters that carry topical content which resulting in embeddings that are coherent in visual space.
Anthology ID:
P17-1188
Volume:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2017
Address:
Vancouver, Canada
Editors:
Regina Barzilay, Min-Yen Kan
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2059–2068
Language:
URL:
https://aclanthology.org/P17-1188
DOI:
10.18653/v1/P17-1188
Bibkey:
Cite (ACL):
Frederick Liu, Han Lu, Chieh Lo, and Graham Neubig. 2017. Learning Character-level Compositionality with Visual Features. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2059–2068, Vancouver, Canada. Association for Computational Linguistics.
Cite (Informal):
Learning Character-level Compositionality with Visual Features (Liu et al., ACL 2017)
Copy Citation:
PDF:
https://aclanthology.org/P17-1188.pdf
Code
 frederick0329/Wikipedia_title_dataset +  additional community code
Data
Wikipedia Title