Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Marzena Karpinska, Bofang Li, Anna Rogers, Aleksandr Drozd


Abstract
Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.
Anthology ID:
W18-2905
Volume:
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
Month:
July
Year:
2018
Address:
Melbourne, Australia
Editors:
Georgiana Dinu, Miguel Ballesteros, Avirup Sil, Sam Bowman, Wael Hamza, Anders Sogaard, Tahira Naseem, Yoav Goldberg
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–37
Language:
URL:
https://aclanthology.org/W18-2905
DOI:
10.18653/v1/W18-2905
Bibkey:
Cite (ACL):
Marzena Karpinska, Bofang Li, Anna Rogers, and Aleksandr Drozd. 2018. Subcharacter Information in Japanese Embeddings: When Is It Worth It?. In Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP, pages 28–37, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Subcharacter Information in Japanese Embeddings: When Is It Worth It? (Karpinska et al., ACL 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-2905.pdf