NAACL-HLT 2012 Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Event Notification Type: 
Call for Papers
Le Centre Sheraton Montreal
Friday, 8 June 2012
Contact Email: 
Submission Deadline: 
Monday, 26 March 2012

NAACL-HLT 2012 Workshop on Evaluation Metrics and System Comparison
for Automatic Summarization

June 8, 2012
Montreal, Quebec, Canada


Interest in summarization research has been steadily growing in the
past decade, with numerous new methods being proposed for generic and
topic-focused summarization of news. Novel genres and domains, most
notably related to spoken input, have become well established,
including summarization of broadcast news, meetings, spoken
conversations and lectures.

At the same time, development of evaluation metrics for summarization
and of resources for some of the new genres and domains has lagged
behind. Manual evaluation protocols (Pyramid scores for content
selection, scores for linguistic quality and overall responsiveness)
show considerable disparity between human performance and the
performance of systems for multi-document summarization of news.
Meanwhile, the widely used suite for automatic evaluation of content,
ROUGE, shows much narrower difference between machine and human
performance and even fails to distinguish the two. For speech
summarization ROUGE also does not properly reflect the difference
between human and automatic summarizers and, unlike for written news,
has low correlations with manual evaluation protocols. The challenge
of automatic evaluation of linguistic quality of summaries has also
only recently started to be addressed.

It has also become harder to identify the most competitive approaches
to summarization. This is partly due to confusing or inconsistent
evidence that comes from different test sets. Evaluating the same
system configuration against several test sets will make possible a
fairer comparison between methods and will further stimulate research
on automatic evaluation metrics.

For this workshop we will invite submission on a wide range of topics
related to evaluation and system comparison in summarization. Topics
of interest include:

+ system comparison on several evaluation datasets. For example for
multi-document summarization we will seek systems evaluated on
multiple years of DUC/TAC data with emphasis on measuring
statistically significant differences.

+ manual evaluation protocols for summarization in new genres where
existing methods may not apply

+ manual evaluation protocols for abstractive summarization, which
assess the degree of text-to-text generation capabilities of the
systems and rewards successful generation capabilities

+ automatic evaluation metrics of linguistic quality

+ automatic evaluation metrics that better reflect the differences in
human and machine performance

+ automatic metrics that significantly outperform ROUGE in content
selection evaluation for news summarization

+ automatic metrics that perform evaluation without the use of human

+ analysis of domain and genre difference that expose weaknesses of
currently adopted evaluation metrics and proposals for addressing
these weaknesses


Submissions will consist of regular full papers of up to 8 pages, plus
additional pages for references. Shorter papers are also welcome.
All papers should be formatted following the NAACL-HLT 2012
guidelines. As the reviewing will be blind, the paper must not
include the authors' names and affiliations. Furthermore,
self-references that reveal the author's identity, e.g., "We
previously showed (Smith, 1991) ..." must be avoided. Instead, use
citations such as "Smith previously showed (Smith, 1991) ..."

We encourage individuals who are submitting papers on automatic
methods for summarization and evaluation to evaluate their approaches
using multiple publicly available datasets, such as those from DUC
( and the TAC Summarization track

Both submission and review processes will be handled
electronically using the Softconf submission software:

The submission deadline is March 26, 2012 by 11:59PM Pacific Standard
Time (GMT-8).


Mar 26: Paper due date
Apr 23: Notification of acceptance
May 04: Camera-ready deadline
Jun 08: Workshop at NAACL-HLT 2012


John Conroy (IDA Center for Computing Sciences)
Hoa Dang (National Institute of Standards and Technology)
Ani Nenkova (University of Pennsylvania)
Karolina Owczarzak (National Institute of Standards and Technology)


Annie Louis (University of Pennsylvania)
George Giannakopoulos (NCSR Demokritos)
Guy Lapalme (University of Montreal)
Horacio Saggon (Universitat Pompeu Fabra)
Josef Steinberger (European Commission Joint Research Centre)
Judith Schlesinger (IDA Center for Computing Sciences)
Kathy McKeown (Columbia University)
Stan Szpakowicz (University of Ottawa)
Steve Renals (University of Edinburgh)
Xiaodan Zhu (National Research Council Canada)
Yang Liu (University of Texas, Dallas)
Giuseppe Carenini (University of British Columbia)
Drago Radev (University of Michigan)
Dan Gillick (University of California at Berkeley)
Gabriel Murray (University of British Columbia)
Dianne O'Leary (University of Maryland)


Please contact us by email: