Discriminative, Syntactic Language Modeling through Latent SVMs

Colin Cherry, Chris Quirk


Abstract
We construct a discriminative, syntactic language model (LM) by using a latent support vector machine (SVM) to train an unlexicalized parser to judge sentences. That is, the parser is optimized so that correct sentences receive high-scoring trees, while incorrect sentences do not. Because of this alternative objective, the parser can be trained with only a part-of-speech dictionary and binary-labeled sentences. We follow the paradigm of discriminative language modeling with pseudo-negative examples (Okanohara and Tsujii, 2007), and demonstrate significant improvements in distinguishing real sentences from pseudo-negatives. We also investigate the related task of separating machine-translation (MT) outputs from reference translations, again showing large improvements. Finally, we test our LM in MT reranking, and investigate the language-modeling parser in the context of unsupervised parsing.
Anthology ID:
2008.amta-papers.4
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
65–74
Language:
URL:
https://aclanthology.org/2008.amta-papers.4
DOI:
Bibkey:
Cite (ACL):
Colin Cherry and Chris Quirk. 2008. Discriminative, Syntactic Language Modeling through Latent SVMs. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 65–74, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Discriminative, Syntactic Language Modeling through Latent SVMs (Cherry & Quirk, AMTA 2008)
Copy Citation:
PDF:
https://aclanthology.org/2008.amta-papers.4.pdf