Empirical Modeling of Semantic Equivalence and Entailment



Workshop at the Annual Meeting of

the Association of Computational Linguistics (ACL 2005)


Ann Arbor, Michigan

June 30, 2005




** Submission Deadline: April 20, 2005 **



Many natural language processing applications require the ability to recognize when two text segments however superficially distinct overlap semantically. Question-Answering (QA), Information Extraction (IE), command-and-control, and multi-document summarization are examples of applications that need precise information about the relationship between different text segments. The concepts of entailment and paraphrase are useful in characterizing these relationships. For instance, does the meaning of one text entail all or part of the other, as in the following example?


I bought a science fiction novel

I bought a book


Or are the two texts so close in meaning that they can be considered paraphrases, linked by many bidirectional entailments?


On its way to an extended mission at Saturn, the Cassini probe on Friday makes its closest rendezvous with Saturn's dark moon Phoebe.

The Cassini spacecraft, which is en route to Saturn, is about to make a close pass of the ringed planet's mysterious moon Phoebe


Quantifying semantic overlap is a fundamental challenge that encompasses issues of lexical choice, syntactic alternation, and reference/discourse structure. The last few years have seen a surge in interest in modeling techniques aimed at measuring semantic equivalence and entailment, with work on paraphrase acquisition/generation, WordNet-based expansion, distributional similarity, supervised learning of semantic variability in information extraction, and the identification of patterns in template-based QA. While different applications face similar underlying semantic problems, these problems are typically addressed in an application-specific manner. In the absence of a generic evaluation framework, it is difficult to compare semantic methods that were developed for different applications. One key goal of this workshop will be to stimulate discussion around the issue of developing common datasets and evaluation strategies


More generally, our aim is to bring together people working on empirical, application-independent approaches to the practical problems of semantic inference. We welcome papers describing original work on computational approaches to modeling the problems of semantic equivalence and entailment, and encourage submissions on the following (non-exclusive) topics:


         Probabilistic modeling of meaning equivalence/entailment between arbitrary text segments

         Methods for generating textual entailments/paraphrases from novel inputs

         String-based vs. linguistically-informed approaches to measuring meaning overlap

         Developing training/test corpora: novel sources of data and strategies for automating data collection

         Human inter-annotator agreement on annotation tasks: Can humans reliably tag the kinds of inferences that are needed for real applications? What information can humans reliably tag?

         Automated evaluation metrics for meaning equivalence/paraphrase

         Empirical investigations into the types of entailment/equivalence inferences needed for particular applications

         Methods for determining which of many entailments between two text segments are relevant for specific applications

         Modeling lexical-level entailment relationships which are geared to be part of entailment modeling for larger texts (as opposed to more general measures of similarity)

         Specific applications that exploit general measures of semantic overlap

         Extension of MT techniques to problems of monolingual semantic equivalence/entailment relationships


The workshop will be anchored by two panel discussions, the first exploring how the problem of semantic overlap has been successfully handled in several applications (question answer, information retrieval, etc.) and the second aimed at developing a shared evaluation task focused on this problem.




Paper submission deadline: April 20, 2005

Notification of acceptance: May 10, 2005

Camera ready copy: May 20, 2005

Workshop date: June 30, 2005




Submitted papers should be prepared in PDF format (all fonts included) or Microsoft Word .doc format and not longer than 6 pages following the ACL style. More detailed information about the format of submissions can be found here:

The language of the workshop is English. Both submission and review processes will be handled electronically. Submissions should be sent as an attachment to the following email address:

billdol AT microsoft DOT com

All submissions should be anonymized for blind review. All accepted papers will be presented in oral sessions of the workshop and collected in the printed proceedings. 





Two datasets are being released in conjunction with this workshop, in order to stimulate submissions and thinking around this topic. While we hope that these will be useful for training, evaluation, and analysis, authors are invited to use whatever resources/approaches are at their disposal. 


         The Pascal Recognising Textual Entailment Challenge Corpus: 1K sentence pairs that have been human-annotated with directional entailments (

         The Microsoft Research Paraphrase Corpus:  5801 likely sentential paraphrase pairs gathered automatically from topically clustered news articles. Multiple human raters examined each pair, classifying more than 3900 as close enough in meaning to be considered equivalent (Microsoft Research Paraphrase Corpus). This is the first time this corpus has been made available.



Bill Dolan (Microsoft Research)

Ido Dagan (Bar Ilan University)





For questions, comments, etc. please send email to





Srinivas Bangalore (AT&T Research)

Regina Barzilay (MIT)

Chris Brockett (Microsoft Research)

Pascale Fung (Hong Kong University of Science and Technology)

Oren Glickman (Bar Ilan University)

Cyril Goutte (Xerox Research Centre Europe)

Ed Hovy (ISI)

Kentaro Inui (Nara Institute of Science and Technology)

Dekang Lin (University of Alberta)

Daniel Marcu (ISI)

Kathy McKeown (Columbia University)

Dan Moldovan (University of Texas at Dallas)

Chris Quirk (Microsoft Research)

Maarten de Rijke (University of Amsterdam)

Hinrich Schuetze (University of Stuttgart)

Satoshi Sekine (New York University)

Peter Turney (National Research Council of Canada)





Additional funding for this workshop has been provided by PASCAL and Microsoft Research.