< back to Tutorials

SVMS AND STRUCTURED MAX-MARGIN METHODS

Dan Klein and Ben Taskar

This tutorial will be divided into two parts. Part I will be an introduction to SVMs from the ground up, emphasizing how max-margin classifiers relate to maximum entropy classifiers. It will present the arguments for and against max-margin methods in several ways, describe duality in a self-contained way, and discuss what kernels are and aren't. Multiclass SVM formulations and their associated optimization methods will be presented. The goal of this part of the tutorial is to increase awareness of the strengths, weaknesses, and inner workings of SVMs as classifiers for NLP tasks. This part will be most useful (1) to researchers to whom SVMs are currently a magic black box and (2) as a point of departure for part II. Part II will describe structured max-margin methods. The presentation will first be to work out structured solutions in the framework of part I and show how this solution would work (if infeasibly slowly). Then, the max-margin dynamic programming analogs to standard dynamic programs used for calculating expectations will be presented, along with the structured optimization methods. Kernelization and max-margin approaches to combinatorial problems other than sequences and trees will be mentioned, though in less detail. The tutorial will conclude with a discussion of some max-margin learning software we will be making available.

TUTORIAL OUTLINE

  1. SVMs
    • Overview: discriminative feature-based classifiers in NLP
    • Different formulations lead to different classifiers: comparison of SVMs and MaxEnt models
    • Multiclass SVMs: ranking constraints and the associated optimization problem
    • Duality: introduction to duality, dual classifiers, why duality is useful
    • Optimization methods in medium detail, focus on SMO / coordinate ascent
    • Experiments with comparative data analysis
    • Introduction to kernels (but not an extensive coverage of known kernels)
  2. Structured Max-Margin Methods
    • Structured loss functions
    • Exhaustive approach to structured problems
    • Dynamic programming and max-marginal calculations
    • Optimization in structured models
    • Example experiments and results: sequence and tree models
    • Compare / contrast with HMMs, CRFs, PCFGs
    • Kernelization
    • Other max-margin problems: matchings, tours
    • Overview of software resources
DAN KLEIN is an assistant professor in the Computer Science Division at UC Berkeley. He received his bachelor’s degree (summa cum laude) from Cornell University. He then went to Oxford University on a Marshall scholarship, where he earned a master’s degree in linguistics, and finally to Stanford University for his master’s and PhD in computer science. Prof. Klein's research interests include efficient machine learning algorithms for complex NLP tasks, unsupervised learning of linguistic structure, and integrating linguistic ideas into machine learning approaches to NLP. He was the recipent of best paper awards at ACL 2003 and EMNLP 2004.

BEN TASKAR went on to a postdoctoral fellowship at the Computer Science Division, University of California at Berkeley, after receiving his Ph.D. in Computer Science from Stanford University. One of his primary interests is structured models in machine learning, especially in computational linguistics, computer vision and computational biology. Last year, he has co-organized a NIPS workshop on this topic. His work on structured prediction in sequence, tree and other models has received best paper awards at NIPS and EMNLP conferences.


< back to Tutorials