Tutorial 1: Bayesian Nonparametric Structured Models
Percy Liang and Dan Klein
Probabilistic modeling is a dominant approach for both supervised and unsupervised learning tasks in NLP. One constant challenge for models with latent variables is determining the appropriate model complexity, i.e. the question of "how many clusters." While cross-validation can be used to select between a limited number of options, it cannot be feasibly applied in the context of larger hierarchical models where we must balance complexity in many parts of the model at the same time. Nonparametric "infinite" priors such as Dirichlet processes are powerful tools from the Bayesian statistics literature which address exactly this issue. Such priors, which have seen increasing use in recent NLP work, allow the complexity of the model to adapt to the data and admit more tractable and elegant inference methods than traditional model selection approaches.
In explaining how to do inference in these new models, we try to dispel two myths: first, that Bayesian methods are too slow and cumbersome, and, second, that Bayesian techniques require a whole new set of algorithmic ideas. We depart from the traditional sampling methodology which has dominated past expositions and focus on variational inference, an efficient technique which is a natural extension of EM. This approach allows us to tackle structured models such as HMMs and PCFGs with the benefits of Bayesian nonparametrics while maintaining much of the existing EM machinery so familiar to this community. In addition to our foundational presentation, we discuss both concrete implementation issues and demonstrate the empirical advantages of these methods.
- Properties of Dirichlet priors
- Marginalization and sampling
- Variational inference: from Viterbi EM to EM to variational Bayes Dirichlet processes
- Limit of finite mixture models
- Stick-breaking construction
- Chinese restaurant process
- Properties of DP: decaying cluster sizes, etc.
- Inference: variational inference, why EM won't work, sampling
- Latent Dirichlet allocation
- Word alignment models
- Hidden Markov models
- Probabilistic context-free grammars
Slides are available online as pdf document.
Percy Liang is a Ph.D. student in computer science at UC Berkeley. He has a BS in math and a BS/MS in computer science from MIT. His research interests include probabilistic modeling for semi-supervised learning in NLP, especially using Bayesian nonparametrics, and approximate inference algorithms for such models. He holds an NSF Graduate Fellowship and a National Defense Science and Engineering Graduate Fellowship.
Dan Klein is an assistant professor of computer science at the University of California, Berkeley (PhD Stanford, MS Oxford, BA Cornell). Professor Klein's research focuses on statistical natural language processing, including unsupervised methods, syntactic parsing, and machine translation. His academic honors include a British Marshall Fellowship, an inaugural Microsoft New Faculty Fellowship, and best paper awards at the ACL, NAACL, and EMNLP conferences.