Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System

Kwanchiva Saykham, Ananlada Chotimongkol, Chai Wutiwiwatchai


Abstract
This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and online news articles available during deployment especially the data from the same time period as the broadcast news speech being recognized. We found that the data that are closer in time are more similar in terms of perplexity and are more suitable for language model adaptation. The LMs that are adapted over time with more recent news data are better, both in terms of perplexity and WER, than the static LM trained from only the initial set of broadcast news data. Adaptation data from broadcast news transcription improved perplexity by 38.3% and WER by 7.1% relatively. Though, online news articles achieved less improvement, it is still a useful resource as it can be obtained automatically. Better data pre-processing techniques and data selection techniques based on text similarity could be applied to the news articles to obtain further improvement from this promising result.
Anthology ID:
L10-1169
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/249_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Kwanchiva Saykham, Ananlada Chotimongkol, and Chai Wutiwiwatchai. 2010. Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System (Saykham et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/249_Paper.pdf