Suffix Trees as Language Models

Casey Redd Kennington, Martin Kay, Annemarie Friedrich


Abstract
Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how some properties of suffix trees naturally provide the functionality of an n-gram language model with variable n. We explain these properties of suffix trees, which we leverage for our Suffix Tree Language Model (STLM) implementation and explain how a suffix tree implicitly contains the data needed for n-gram language modeling. We also discuss the kinds of smoothing techniques appropriate to such a model. We then show that our suffix-tree language model implementation is competitive when compared to the state-of-the-art language model SRILM (Stolke, 2002) in statistical machine translation experiments.
Anthology ID:
L12-1378
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
446–453
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/649_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Casey Redd Kennington, Martin Kay, and Annemarie Friedrich. 2012. Suffix Trees as Language Models. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 446–453, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Suffix Trees as Language Models (Kennington et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/649_Paper.pdf