Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words

Ryu Takeda, Kazunori Komatani


Abstract
This paper describes a Bayesian language model for predicting spontaneous utterances. People sometimes say unexpected words, such as fillers or hesitations, that cause the miss-prediction of words in normal N-gram models. Our proposed model considers mixtures of possible segmental contexts, that is, a kind of context-word selection. It can reduce negative effects caused by unexpected words because it represents conditional occurrence probabilities of a word as weighted mixtures of possible segmental contexts. The tuning of mixture weights is the key issue in this approach as the segment patterns becomes numerous, thus we resolve it by using Bayesian model. The generative process is achieved by combining the stick-breaking process and the process used in the variable order Pitman-Yor language model. Experimental evaluations revealed that our model outperformed contiguous N-gram models in terms of perplexity for noisy text including hesitations.
Anthology ID:
C16-1016
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
161–170
Language:
URL:
https://aclanthology.org/C16-1016
DOI:
Bibkey:
Cite (ACL):
Ryu Takeda and Kazunori Komatani. 2016. Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 161–170, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Bayesian Language Model based on Mixture of Segmental Contexts for Spontaneous Utterances with Unexpected Words (Takeda & Komatani, COLING 2016)
Copy Citation:
PDF:
https://aclanthology.org/C16-1016.pdf