Integer Linear Programming in NLP – Constrained Conditional Models
Ming-Wei Chang, Nicholas Rizzolo, and Dan Roth
University of Illinois at Urbana-Champaign

Overview

Making decisions in natural language processing problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate, what assignments are possible. Structured learning problems such as semantic role labeling provide one such example, but the setting is broader and includes a range of problems such as name entity and relation recognition and co-reference resolution. The setting is also appropriate for cases that may require a solution to make use of multiple (possible pre-designed or pre-learned components) as in summarization, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints.

Constrained Conditional Models (aka Integer Linear Programming formulation of NLP problems) is a learning and inference framework that augments the learning of conditional (probabilistic or discriminative) models with declarative constraints (written, for example, using a first-order representation) as a way to support decisions in an expressive output space while maintaining modularity and tractability of training and inference. In most applications of this framework in NLP, following [Roth & Yih, CoNLL’04], Integer Linear Programming (ILP) was used as the inference framework, although other algorithms can be used for that purpose.

This framework, with and without Integer Linear Programming as its inference engine, has recently attracted much attention within the NLP community, with multiple papers in all the recent major conferences, and a related workshop in NAACL’09. Formulating problems as constrained optimization problems over the output of learned models has several advantages. It allows one to focus on the modeling of problems by providing the opportunity to incorporate problem specific global constraints using a first order language – thus frees the developer from (much of the) low level feature engineering – and it guarantees exact inference. It provides also the freedom of decoupling the stage of model generation (learning) from that of the constrained inference stage, often resulting in simplifying the learning stage as well as the engineering problem of building an NLP system, while improving the quality of the solutions.

These advantages and the availability of off-the-shelf solvers have led to a large variety of natural language processing tasks being formulated within framework, including semantic role labeling, syntactic parsing, coreference resolution, summarization, transliteration and joint information extraction.

The goal of this tutorial is to introduce the framework of Constrained Conditional Models (CCMs) to the broader ACL community, motivate it as a generic framework for learning and inference in global NLP decision problems, present some of the key theoretical and practical issues involved in using CCMs and survey some of the existing applications of it as a way to promote further development of the framework and additional applications. The tutorial will thus be useful for many of the senior and junior researchers that have interest in global decision problems in NLP, providing a concise overview of recent perspectives and research results.

Tutorial Outline

After shortly motivating and introducing the general framework, the main part of the tutorial is a methodological presentation of some of the key computational issues studied within CCMs that we will present by looking at case studies published in the NLP literature. In the last part of the tutorial, we will discuss engineering issues that arise in using CCMs and present some tool that facilitate developing CCM models.

  1. Motivation and Task Definition [30 min]
  2. We will motivate the framework of Constrained Conditional Models and exemplify it using the example of Semantic Role Labeling.

  3. Examples of Existing Applications [30 min]
  4. We will present in details several applications that made use of CCMs – including coreference resolution, sentence compression and information extraction and use these to explain several of the key advantages the framework offers. We will discuss in this context several ways in which constraints can be introduced to an application.

  5. Training Paradigms [30 min]
  6. The objective function used by CCMs can be decomposed and learned in several ways, ranging from a complete joint training of the model along with the constraints to a complete decoupling between the learning and the inference stage. We will present the advantages and disadvantages offered by different training paradigms and provide theoretical and experimental understanding. In this part we will also discuss comparison to other approaches studied in the literature.

  7. Inference methods and Constraints [30 min]
  8. We will present and discuss several possibilities for modeling inference in CCMs, from Integer Linear Programming to search techniques. We will also discuss the use of hard constraints and soft constraints and present ways for modeling constraints.

  9. Introducing background knowledge via CCMs [30 min]
  10. We will look at ways in which Constrained Conditional Models (CCMs)can be used to augment probabilistic models with declarative constraints in order to support decisions in an expressive output space, and how declarative constraints can be used to aid supervised and semi-supervised training.

  11. Developing CCMs Applications [30 min]
  12. We present a modeling language that facilitates developing applications within the CCM framework and present some "templates" for possible applications.

Tutorial Instructors

Ming-Wei Chang
University of Illinois at Urbana-Champaign
mchang21@uiuc.edu

Ming-Wei Chang is a PhD candidate in University of Illinois at Urbana-Champaign.

He has done work on Machine Learning in Natural Language Processing and Information Extraction and has published a number of papers in several international conferences including "Learning and Inference with Constraints" (AAAI’08), "Guiding Semi-Supervision with Constraint-Driven Learning" (ACL’07) and “Unsupervised Constraint Driven Learning For Transliteration Discovery. (NAACL’09). He co-presented a tutorial on CCMs in EACL’09.

Nicholas Rizzolo
University of Illinois at Urbana-Champaign
rizzolo@illinois.edu

Nicholas Rizzolo is a PhD candidate in University of Illinois at Urbana-Champaign.

He has done work on Machine Learning in Natural Language Processing and is the principal developer of Learning Based Java (LBJ) a modeling language for Constrained Conditional Models. He has published a number of papers on these topics, including "Learning and Inference with Constraints" (AAAI’08) and “Modeling Discriminative Global Inference” (ICSC’07)

Dan Roth
University of Illinois at Urbana-Champaign
danr@cs.uiuc.edu

Dan Roth is a Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign and the Beckman Institute of Advanced Science and Technology (UIUC) and a Willett Faculty Scholar of the College of Engineering. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning and received several best paper and research awards. He has developed several machine learning based natural language processing systems including an award winning semantic parser, and has presented invited talks in several international conferences, and several tutorials on machine learning for NLP. Dan Roth has written the first paper on Constrained Conditional Models along with his student Scott Yih, presented in CoNLL’04, and since then has worked on learning and inference issue within this framework as well as on applying it for several NLP problems, including Semantic Role Labeling, Information Extraction and Transliteration. He has presented several invited talks that have addresses aspect of this model.