Semi-Supervised Disfluency Detection

Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, Bo Xu


Abstract
While the disfluency detection has achieved notable success in the past years, it still severely suffers from the data scarcity. To tackle this problem, we propose a novel semi-supervised approach which can utilize large amounts of unlabelled data. In this work, a light-weight neural net is proposed to extract the hidden features based solely on self-attention without any Recurrent Neural Network (RNN) or Convolutional Neural Network (CNN). In addition, we use the unlabelled corpus to enhance the performance. Besides, the Generative Adversarial Network (GAN) training is applied to enforce the similar distribution between the labelled and unlabelled data. The experimental results show that our approach achieves significant improvements over strong baselines.
Anthology ID:
C18-1299
Volume:
Proceedings of the 27th International Conference on Computational Linguistics
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Emily M. Bender, Leon Derczynski, Pierre Isabelle
Venue:
COLING
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3529–3538
Language:
URL:
https://aclanthology.org/C18-1299
DOI:
Bibkey:
Cite (ACL):
Feng Wang, Wei Chen, Zhen Yang, Qianqian Dong, Shuang Xu, and Bo Xu. 2018. Semi-Supervised Disfluency Detection. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3529–3538, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Semi-Supervised Disfluency Detection (Wang et al., COLING 2018)
Copy Citation:
PDF:
https://aclanthology.org/C18-1299.pdf