We are delighted to announce *SemEval 2026 Task 8: MTRAGEval: Evaluating Multi-Turn RAG Conversations*
MTRAGEval is a shared task for Evaluating Multi-Turn RAG Conversations for retrieval, generation, and the full RAG pipeline. MTRAGeval is part of SemEval 2026 which will be co-located with an ACL conference.
Data:
The MTRAG Benchmark is released as the trial and training data for MTRAG. You can access the full dataset here: https://github.com/IBM/mt-rag-benchmark
Tasks:
Task A: Retrieval Only
Task B: Generation with Reference Passages (Reference)
Task C: Generation with Retrieved Passages (RAG)
Evaluation Scripts:
Retrieval and Generation Evaluation Scripts are available on the GitHub repo! Please visit the evaluation README for more information. https://github.com/IBM/mt-rag-benchmark/blob/main/scripts/evaluation/REA...
Participation:
Check out our website for more details: https://ibm.github.io/mt-rag-benchmark/MTRAGEval/
For future communications, please join our mailing list: https://groups.google.com/g/mtrageval
Timeline (Tentative):
Sample and Training data ready 15 July 2025
Evaluation start 10 January 2026 Task A and C
Evaluation end 20 January 2026 Task A and C
Evaluation start 21 January 2026 Task B
Evaluation end by 31 January 2026 Task B
Paper submission due February 2026
Notification to authors March 2026
Camera ready due April 2026
SemEval workshop Summer 2026 (co-located with a major NLP conference)
Task Organizers:
Sara Rosenthal
Yannis Katsis
Vraj Shah
Marina Danilevsky