First Workshop on Generation Evaluation and Metrics at ACL 2021: Final Call for Papers

Event Notification Type: 
Call for Papers
Abbreviated Title: 
GEM '21
Location: 
Berkeley Hotel
Thursday, 5 August 2021
Country: 
Thailand
Contact Email: 
City: 
Bangkok
Contact: 
Sebastian Gehrmann
Antoine Bosselut
Esin Durmus
Varun Prashant Gangal
Yacine Jernite
Laura Perez-Beltrachini
Samira Shaikh
Wei Xu
Submission Deadline: 
Monday, 3 May 2021

Final call for papers and shared task submissions for Workshop on Generation, Evaluation, and Metrics (GEM) at ACL ’21

=========
Call for Participation
=========

Update April 22: Our Paper submission deadline has been extended to May 3! Please submit your papers at the SoftConf link listed below. The shared task submission deadline is May 14.

Update March 29: We have released our challenge sets! You can inspect and load them using HuggingFace Datasets or TFDS. For details, please see our updated writeup (https://arxiv.org/abs/2102.01672).

Natural language generation is one of the most active research fields in NLP. As such, the number of available datasets, metrics, models, and evaluation strategies are increasing rapidly. This is leading to a situation where new models are often evaluated on different anglo-centric tasks with incompatible evaluation setups. With GEM, we are aiming to solve this problem by standardizing and improving the corpora on which to evaluate NLG models, and by supporting the development of better evaluation approaches. In our shared task, models will be applied to a wide set of NLG tasks. It covers challenges that measure specific generation aspects, such as content selection and planning, surface realization, paraphrasing, simplification, and others. To avoid hill-climbing on automated metrics, a second part of the shared task focuses on an in-depth analysis of submitted model outputs across both human and automatic evaluation with the aim to uncover shortcomings and opportunities for progress. GEM is a SIGGEN-endorsed event.

=======
Shared Tasks
=======

The shared task is described in-depth here: https://gem-benchmark.com/shared_task.

It includes two parts:
In the first part, participants are encouraged to apply their model to as many of the included tasks as possible and submit their formatted outputs. We provide GEM-specific test sets that will be used to evaluate specific generation aspects.
In the second part, all submitted and baseline outputs will be released for an evaluation shared task. Participants can submit analyses and evaluations of the model outputs.
During the GEM workshop, shared task participants will come together to discuss their findings which will inform future iterations of GEM.

=======
Call for Papers
=======

All papers are allowed unlimited space for references and appendices. For papers associated with the shared task, we additionally highly encourage publishing the code used to generate the results. We ask for papers in the following categories:

- System Descriptions
Participants of the modeling shared task are invited to submit a system description of 4-8 pages.

- System Evaluation Descriptions
Participants of the evaluation shared task are invited to submit a paper describing their analysis approach and findings of 4-8 pages.

- Research Papers
We welcome papers discussing any of the following topics:
Automatic evaluation of NLG systems
Creating challenge sets for NLG corpora
Critiques of benchmarking efforts (including ours)
Crowdsourcing strategies to improve the inclusiveness of NLG research
Measuring progress in NLG / What should a GEM 2.0 look like
Modeling and data-augmentation strategies for training effective and/or efficient NLG systems that can be applied to a wide range of tasks
Standardizing human evaluation and making it more robust
We additionally invite every group that contributed to the creation and organization of GEM to submit a description of their considerations and contributions.
These submissions can take either of the following forms:
Archival Papers Papers describing original and unpublished work can be submitted in either a short (4-page) or a long (8-page) format.
Non-Archival Abstracts To discuss work already presented or under review at a peer-reviewed venue, we allow the submission of 2-page abstracts
Please note that we are not looking at submissions that focus on specific modeling challenges or introduce new model architectures, etc., which would fit better into conferences like ACL or INLG.

Submissions

All submissions should conform to ACL 2021 style guidelines. Archival long and short paper submissions must be anonymized. Abstracts and shared task submission descriptions should include author information. Please submit your papers at the SoftConf link https://www.softconf.com/acl2021/w10_GEM21/.

=======
Important Dates
=======

Workshop

✅February 2 First Call for Shared Task Submissions and Papers, Release of the Training Data
May 3 Workshop Paper Due Date (excl. shared tasks)
May 28 Notification of Acceptance (excl. shared tasks)
June 7 Camera-ready papers due (excl. shared tasks)

Shared Task Dates

Modeling

✅February 2 Release of the training Data
✅March 29 Release of the test sets
May 14 Modeling submissions due

Evaluation

April 2 Release of the baseline outputs
May 17 Release of the submission outputs
System Descriptions and Analyses
June 11 System Descriptions and Analyses due
June 25 Notification of Acceptance (shared task)
July 9 Camera-ready papers and task descriptions due
August 5-6 Workshop Dates

=======
Organization
=======

The workshop is organized by
Antoine Bosselut (Stanford University)
Esin Durmus (Cornell University)
Varun Prashant Gangal (Carnegie Mellon University)
Sebastian Gehrmann (Google Research)
Yacine Jernite (Hugging Face)
Laura Perez-Beltrachini (University of Edinburgh)
Samira Shaikh (UNC Charlotte)
Wei Xu (Georgia Tech)

The shared task and the GEM environment is organized by a larger team which is listed on this page.