Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet

Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, Maosong Sun


Abstract
Word sense disambiguation (WSD) is a fundamental natural language processing task. Unsupervised knowledge-based WSD only relies on a lexical knowledge base as the sense inventory and has wider practical use than supervised WSD that requires a mass of sense-annotated data. HowNet is the most widely used lexical knowledge base in Chinese WSD. Because of its uniqueness, however, most of existing unsupervised WSD methods cannot work for HowNet-based WSD, and the tailor-made methods have not obtained satisfying results. In this paper, we propose a new unsupervised method for HowNet-based Chinese WSD, which exploits the masked language model task of pre-trained language models. In experiments, considering existing evaluation dataset is small and out-of-date, we build a new and larger HowNet-based WSD dataset. Experimental results demonstrate that our model achieves significantly better performance than all the baseline methods. All the code and data of this paper are available at https://github.com/thunlp/SememeWSD.
Anthology ID:
2020.coling-main.155
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1752–1757
Language:
URL:
https://aclanthology.org/2020.coling-main.155
DOI:
10.18653/v1/2020.coling-main.155
Bibkey:
Cite (ACL):
Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, and Maosong Sun. 2020. Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1752–1757, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet (Hou et al., COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.155.pdf
Code
 thunlp/sememewsd