Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection

Jingfeng Yang, Diyi Yang, Zhaoran Ma


Abstract
Existing approaches to disfluency detection heavily depend on human-annotated data. Numbers of data augmentation methods have been proposed to alleviate the dependence on labeled data. However, current augmentation approaches such as random insertion or repetition fail to resemble training corpus well and usually resulted in unnatural and limited types of disfluencies. In this work, we propose a simple Planner-Generator based disfluency generation model to generate natural and diverse disfluent texts as augmented data, where the Planner decides on where to insert disfluent segments and the Generator follows the prediction to generate corresponding disfluent segments. We further utilize this augmented data for pretraining and leverage it for the task of disfluency detection. Experiments demonstrated that our two-stage disfluency generation model outperforms existing baselines; those disfluent sentences generated significantly aided the task of disfluency detection and led to state-of-the-art performance on Switchboard corpus.
Anthology ID:
2020.emnlp-main.113
Volume:
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Month:
November
Year:
2020
Address:
Online
Editors:
Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1450–1460
Language:
URL:
https://aclanthology.org/2020.emnlp-main.113
DOI:
10.18653/v1/2020.emnlp-main.113
Bibkey:
Cite (ACL):
Jingfeng Yang, Diyi Yang, and Zhaoran Ma. 2020. Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1450–1460, Online. Association for Computational Linguistics.
Cite (Informal):
Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection (Yang et al., EMNLP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.emnlp-main.113.pdf
Video:
 https://slideslive.com/38938957