The fifth edition of the Natural Language Generation, Evaluation, and Metrics (GEM) Workshop will be at ACL 2026 in San Diego!
Evaluation of language models has grown to be a central theme in NLP research, while remaining far from solved. As LMs have become more powerful, errors have become tougher to spot and systems harder to distinguish. Evaluation practices are evolving rapidly—from living benchmarks like Chatbot Arena to LMs being used as evaluators themselves (e.g., LM as judge, autoraters). Further research is needed to understand the interplay between metrics, benchmarks, and human-in-the-loop evaluation, and their impact in real-world settings.
Submissions can take any of the following forms:
- Archival Papers: Original and unpublished work, for all the following tracks—Main, ReproNLP, and Opinion/Statement.
- Non-Archival Extended Abstracts: Work already presented or under review at a peer-reviewed venue. This is an excellent opportunity to share recent or ongoing work with the GEM community without precluding future publication.
- Findings Papers: We additionally welcome presentation of relevant papers accepted to Findings, and will share more information at a later date.
Important Dates
- March 19, 2026: Direct paper submission deadline
- April 9, 2026: Pre-reviewed ARR commitment deadline
- April 28, 2026: Notification of acceptance
- May 14, 2026: Camera-ready paper due
- June 4, 2026: Pre-recorded video due (hard deadline)
- July 3–4, 2026: Workshop at ACL in San Diego
Please see our website for more information!