Workshop on Evaluation Metrics and System Comparison for Automatic Summarization -- DEADLINE EXTENSION

Event Notification Type: 
Call for Papers
Co-located with NAACL-HLT 2012
Friday, 8 June 2012
Contact Email: 
Submission Deadline: 
Sunday, 1 April 2012


NAACL-HLT 2012 Workshop on Evaluation Metrics and System Comparison
for Automatic Summarization

June 8, 2012
Montreal, Quebec, Canada


Interest in summarization research has been steadily growing in the
past decade, with numerous new methods being proposed for generic and
topic-focused summarization of news. Other genres and domains, most
notably related to spoken input, have also become well established,
including summarization of broadcast news, meetings, spoken
conversations and lectures.

At the same time, development of evaluation metrics for summarization
and of resources for some genres and domains has lagged behind. Manual
evaluation protocols (Pyramid scores for content selection, scores for
linguistic quality and overall responsiveness) show considerable
disparity between human performance and the performance of systems for
multi-document summarization of news; however, the widely used suite
for automatic evaluation of content, ROUGE, shows much narrower
difference between machine and human performance and even fails to
distinguish the two. For speech summarization ROUGE also does not
properly reflect the difference between human and automatic
summarizers and, unlike for written news, has low correlations with
manual evaluation protocols. The challenge of automatic evaluation of
linguistic quality of summaries has also only recently started to be

Identifying the most competitive approaches to summarization has also
become more challenging, partly due to confusing or inconsistent
evidence that comes from different test sets. Evaluating the same
system configuration against several test sets will make possible a
fairer comparison between methods and will further stimulate research
on automatic evaluation metrics.

For this workshop we invite submission on a wide range of topics
related to evaluation and system comparison in summarization. Topics
of interest include:

+ system comparison on several evaluation datasets. For example for
multi-document summarization we will seek systems evaluated on
multiple years of DUC/TAC data with emphasis on measuring
statistically significant differences

+ manual evaluation protocols for summarization in new genres where
existing methods may not apply

+ manual evaluation protocols for abstractive summarization, which
assess the degree of text-to-text generation capabilities of the
systems and rewards successful generation capabilities

+ automatic evaluation metrics of linguistic quality

+ automatic evaluation metrics that better reflect the differences in
human and machine performance

+ automatic metrics that significantly outperform ROUGE in content
selection evaluation for news summarization

+ automatic metrics that perform evaluation without the use of human

+ analysis of domain and genre difference that expose weaknesses of
currently adopted evaluation metrics and proposals for addressing
these weaknesses


Submissions will consist of regular full papers of up to 8 pages, plus
additional pages for references. Shorter papers are also welcome.
All papers should be formatted following the NAACL-HLT 2012
guidelines. As the reviewing will be blind, the paper must not
include the authors' names and affiliations. Furthermore,
self-references that reveal the author's identity, e.g., "We
previously showed (Smith, 1991) ..." must be avoided. Instead, use
citations such as "Smith previously showed (Smith, 1991) ..."

We encourage individuals who are submitting papers on automatic
methods for summarization and evaluation to evaluate their approaches
using multiple publicly available datasets, such as those from DUC
( and the TAC Summarization track

Both submission and review processes will be handled
electronically using the Softconf submission software:

The submission deadline is April 1, 2012 by 11:59PM Pacific Standard
Time (GMT-8).


April 1: Paper due date (EXTENDED Deadline)
April 25: Notification of acceptance
May 4: Camera-ready deadline
June 8: Workshop at NAACL-HLT 2012


John Conroy (IDA Center for Computing Sciences)
Hoa Dang (National Institute of Standards and Technology)
Ani Nenkova (University of Pennsylvania)
Karolina Owczarzak (National Institute of Standards and Technology)


Enrique Amigo (UNED, Madrid)
Giuseppe Carenini (University of British Columbia)
Katja Filippova (Google Research)
George Giannakopoulos (NCSR Demokritos)
Dan Gillick (University of California at Berkeley)
Min-Yen Kan (National University of Singapore)
Guy Lapalme (University of Montreal)
Yang Liu (University of Texas, Dallas)
Annie Louis (University of Pennsylvania)
Kathy McKeown (Columbia University)
Gabriel Murray (University of British Columbia)
Dianne O'Leary (University of Maryland)
Drago Radev (University of Michigan)
Steve Renals (University of Edinburgh)
Horacio Saggion (Universitat Pompeu Fabra)
Judith Schlesinger (IDA Center for Computing Sciences)
Josef Steinberger (European Commission Joint Research Centre)
Stan Szpakowicz (University of Ottawa)
Lucy Vanderwende (Microsoft Research)
Stephen Wan (CSIRO ICT Centre)
Xiaodan Zhu (National Research Council Canada)


Please contact us by email: