Word Ordering as Unsupervised Learning Towards Syntactically Plausible Word Representations

Noriki Nishida; Hideki Nakayama

Word Ordering as Unsupervised Learning Towards Syntactically Plausible Word Representations

Abstract

The research question we explore in this study is how to obtain syntactically plausible word representations without using human annotations. Our underlying hypothesis is that word ordering tests, or linearizations, is suitable for learning syntactic knowledge about words. To verify this hypothesis, we develop a differentiable model called Word Ordering Network (WON) that explicitly learns to recover correct word order while implicitly acquiring word embeddings representing syntactic knowledge. We evaluate the word embeddings produced by the proposed method on downstream syntax-related tasks such as part-of-speech tagging and dependency parsing. The experimental results demonstrate that the WON consistently outperforms both order-insensitive and order-sensitive baselines on these tasks.

Anthology ID:: I17-1008
Volume:: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:: November
Year:: 2017
Address:: Taipei, Taiwan
Editors:: Greg Kondrak, Taro Watanabe
Venue:: IJCNLP
SIG:
Publisher:: Asian Federation of Natural Language Processing
Note:
Pages:: 70–79
Language:
URL:: https://aclanthology.org/I17-1008
DOI:
Bibkey:
Cite (ACL):: Noriki Nishida and Hideki Nakayama. 2017. Word Ordering as Unsupervised Learning Towards Syntactically Plausible Word Representations. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 70–79, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):: Word Ordering as Unsupervised Learning Towards Syntactically Plausible Word Representations (Nishida & Nakayama, IJCNLP 2017)
Copy Citation:
PDF:: https://aclanthology.org/I17-1008.pdf
Code: norikinishida/won
Data: BookCorpus, Penn Treebank

PDF Cite Search Code