We are excited to announce the Clinskill QA 2026 shared task, co-located with the BioNLP workshop at ACL 2026.
Website (Details): https://whunextgen.github.io/ClinicalskillQA/
Google Group (Registration & Results): https://groups.google.com/g/clinskill-qa2026
Overview
Multimodal large language models (MLLMs) have the potential to support clinical training and assessment by assisting medical experts in interpreting procedural videos and verifying adherence to standardized workflows. Reliable deployment in these settings requires evidence that models can continuously interpret students’ actions during clinical skill assessments, which underpins MLLMs’ understanding of clinical skills. Systematically evaluating and improving MLLMs’ understanding of clinical skills and their continuous perception in clinical skill assessment scenarios is therefore essential for building reliable and high-impact AI systems for medical education.
Dataset
ClinSkill QA is built on 200 sets of shuffled key frames extracted from three types of clinical skill videos. Each set of key frames represents a sequence of continuous actions and is accompanied by expert-annotated ground-truth ordering and order rationales.
Evaluation
For evaluation, we use Task Accuracy (exact ordering) and Pairwise Accuracy (the fraction of adjacent pairs correctly ordered) for the ordering results, and BertScore as well as an LLM-as-judge(G-Eval) for assessing the quality of the ordering explanations.
Important Dates
- First call for participation: Jan 30, 2026
- Releasing of task data : Jan 30, 2026
- Paper submission deadline: Apr 17, 2026
- Notification of acceptance: May 4, 2026
- Camera-ready paper due: May 12, 2026
- BioNLP Workshop Date: July 3 or 4, 2026
Note that all deadlines are 23:59:59 AoE (UTC-12).
Organizers
- Xiyang Huang, School of Artifical Intelligence, Wuhan University
- Yihuai Xu, School of Artifical Intelligence, Wuhan University
- Zhiyuan Chen, School of Artifical Intelligence, Wuhan University
- Keying Wu, School of Artifical Intelligence, Wuhan University
- Jiayi Xiang, School of Artifical Intelligence, Wuhan University
- Buzhou Tang, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
- Renxiong Wei, Zhongnan Hospital of Wuhan University
- Yanqing Ye, Zhongnan Hospital of Wuhan University
- Jinyu Chen, Zhongnan Hospital of Wuhan University
- Cheng Zeng, School of Artifical Intelligence, Wuhan University
- Min Peng, School of Artifical Intelligence, Wuhan University
- Qianqian Xie, School of Artifical Intelligence, Wuhan University
- Sophia Ananiadou, Department of Computer Science, The University of Manchester
We warmly invite the community to participate in this task to advance the evaluation of clinical skill understanding and continuous perception from shuffled keyframes.
Best regards,
Clinskill QA Organizers