BioNLP Workshop
SIGBIOMED | BioNLP 2024 | BioNLP 2023
BIONLP 2025 and Shared Tasks @ ACL 2025
The 24th BioNLP workshop associated with the ACL SIGBIOMED special interest group is co-located with ACL 2025
IMPORTANT DATES (Tentative)
- Paper submission deadline: March 20, 2025
- Notification of acceptance: April 28, 2025
- Camera-ready paper due: May 25, 2025 -- No extensions due to ACL publication deadline.
- Pre-recorded video due (hard deadline): July 7, 2025
- Workshop: August 1st 2025
Keynote
Speaker: Wojciech Kusa
Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation
Current evaluations of automation methods in systematic literature reviews often treat all included studies as equally important, ignoring their varying influence on review outcomes. This can misrepresent the effectiveness of search strategies, as not all relevant studies contribute equally to the conclusions of the review. To address this limitation, we propose a new evaluation framework that incorporates the differential impact of individual studies on review outcomes. Using data from the CLEF 2019 TAR task, we applied this framework to assess 74 automation models, leveraging meta-analysis effect estimates to weigh the influence of each study. Compared to conventional binary relevance metrics, our approach provided a more nuanced assessment, emphasizing the importance of retrieving high-impact studies. Results showed significant differences in model rankings, underscoring the value of outcome-based evaluation. This framework offers researchers a more precise method for evaluating systematic review automation tools, ultimately supporting higher-quality evidence synthesis and better-informed clinical decisions.
Wojciech is a Senior Researcher at the NASK National Research Institute in Poland, where he leads the Linguistic Engineering and Text Analysis Department. He holds a PhD in NLP from TU Wien, with a focus on applying and evaluating neural methods for domain-specific data. His research interests include the safety and evaluation of large language models, clinical and biomedical NLP, and AI-driven scientific discovery. Wojciech was a Marie Skłodowska-Curie Fellow in the EU Horizon 2020 project DoSSIER, specialising in biomedical information retrieval and NLP. He has industry experience from roles at Samsung and Allegro, and has completed research internships at Sony, UNINOVA, and the Polish Academy of Sciences.
Program Committee
* Daniel Andrade, Hiroshima University, Japan * Emilia Apostolova, Anthem, Inc., USA * Eiji Aramaki, University of Tokyo, Japan * Tanmay Basu, Indian Institute of Science Education and Research Bhopal, India * Leandra Budau, Toronto Metropolitan University, Canada * Leonardo Campillos-Llanos, Centro Superior de Investigaciones Científicas - CSIC, Spain * Liuliu Chen, University of Melbourne, Australia * Yingjian Chen, Henan University, China * Brian Connolly, Cincinnati Children's Hospital Hospital Medical Center, Ohio, USA * Mike Conway, University of Melbourne, Australia * An Dao, University of Tokyo, Japan * Berry de Bruijn, National Research Council, Canada * Jean-Benoit Delbrouck, Stanford University, California, USA * Dina Demner-Fushman, US National Library of Medicine * Simona Doneva, University of Zurich, Switzerland * Pietro Ferrazzi, University of Padua, Italy * Kathleen C. Fraser, National Research Council Canada * Natalia Grabar, CNRS, U Lille, France * Cyril Grouin, Université Paris-Saclay, CNRS * Tudor Groza, EMBL-EBI * Yingjun Guan, University of Illinois Urbana-Champaign, USA * Deepak Gupta, US National Library of Medicine * Thierry Hamon, LIMSI-CNRS, France * Ben Holgate, King's College London, UK * Antonio Jimeno Yepes, IBM, Melbourne Area, Australia * Hidetaka Kamigaito, Nara Institute of Science and Technology, Japan * Vani Kanjirangat, Dalle Molle Institute for Artificial Intelligence (IDSIA), Switzerland * Sarvnaz Karimi, CSIRO, Australia * Nazmul Kazi, University of North Florida, USA * Siun Kim, Seoul National University, Korea * Gaurav Kumar, University of California, San Diego, USA * Andre Lamurias, NOVA School of Science and Technology, Lisbon, Portugal * Majid Latifi, Department of Computer Science, University of York, York, UK * Alberto Lavelli, FBK-ICT, Italy * Robert Leaman, US National Library of Medicine * Lung-Hao Lee, National Central University, Taiwan * Ulf Leser, Humboldt-Universität zu Berlin, Germany * Yuan Liang, Queen Mary University of London, UK * Siting Liang, German Research Center for Artificial Intelligence, Germany * Livia Lilli, Fondazione Policlinico Universitario Agostino Gemelli, Italy * Abdine Maiga, University College London, UK * Makoto Miwa, Toyota Technological Institute, Japan * Claire Nedellec, National Research Institute for Agriculture, Food and Environment (INRAE), Paris-Saclay University, France * Guenter Neumann, DFKI, Germany * Aurélie Névéol, LISN - CNRS, France * Mariana Neves, Hasso-Plattner-Institute at the University of Potsdam, Germany * Andrei Niculae, Carol Davila University of Medicine and Pharmacy, Romania * Brian Ondov, Yale University, USA * Noon Pokaratsiri Goldstein, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) * François Remy, Ghent University, Belgium * Francisco J. Ribadas-Pena, University of Vigo, Spain * Fabio Rinaldi, Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland * Roland Roller, DFKI, Germany * Mourad Sarrouti, CLARA Analytics, USA * Efstathia Soufleri, Archimedes - Athena Research Center, Greece * Peng Su, University of Delaware, USA * Madhumita Sushil, University of California, San Francisco, USA * Mario Sänger, Humboldt Universität zu Berlin, Germany * Karin Verspoor, RMIT University, Australia * Davy Weissenbacher, Cedars-Sinai, Los Angeles, California, USA * Nathan M. White, James Cook University, Australia * Dongfang Xu, Cedars-Sinai, USA * Shweta Yadav, University of Illinois Chicago, USA * Ken Yano, National Institute of Advanced Industrial Science and Technology, Japan * Hyunwoo Yoo, Drexel University, USA * Kai Zhang, Worcester Polytechnic Institute, MA, USA * Xinyue Zhang, King's College London, UK * Xiao Yu Cindy Zhang, University of British Columbia, Canada * Jingqing Zhang, Imperial College London, UK * Angelo Ziletti, Bayer, Germany * Ayah Zirikly, Johns Hopkins, USA * Pierre Zweigenbaum, LIMSI - CNRS, France
Secondary Reviewers
* Joseph Akinyemi, University of York, UK * Robert Bossy, National Research Institute for Agriculture, Food and Environment (INRAE), France * Marco Naguib, Interdisciplinary Laboratory on Numerical Sciences (LISN), France
Sponsor
We are pleased to announce that the Chen Institute is co-organizing the BioNLP 2025 Workshop. Founded in 2016 by Tianqiao Chen and Chrissy Luo, the Chen Institute is driven by a bold vision to improve the human experience by understanding how our brains perceive, learn, and interact with the world. Their global platform includes the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech, the Tianqiao Chen Institute for Translational Research in Shanghai, the Chen Frontier Lab for Applied Neurotechnology, and the Chen Frontier Lab for AI and Mental Health. The Chen Scholars program supports early- to mid-career scientists, and the recently launched Chen Institute and Science Prize for AI Accelerated Research highlights their deep commitment to innovation.
At this year’s BioNLP Workshop, the Chen Institute is interested in exploring how artificial intelligence can accelerate the pace of scientific discovery. We believe there are vast, untapped opportunities to make groundbreaking advances by leveraging the power of AI. The hope is that this meeting will serve as the beginning of an ongoing dialogue—focused on new developments, transformative successes, and emerging thinking at the intersection of AI and science. Through this collaboration, the Chen Institute aims to identify and support promising approaches with the potential to meaningfully change the world.
Workshop Program
Friday, August 1, 2025
- 08:40 - 08:50 Opening remarks
- 08:50 - 10:30 Session 1: Foundational tasks
- 08:50 - 09:10 Accelerating Cross-Encoders in Biomedical Entity Linking, Javier Sanz-Cruzado and Jake Lever, University of Glasgow
- 09:10 - 09:30 Beyond Citations: Integrating Finding-Based Relations for Improved Biomedical Article Representations, Yuan Liang, Massimo Poesio, Roonak Rezvani, Queen Mary University of London, University of Utrecht, Recursion
- 09:30 - 09:50 MedSummRAG: Domain-Specific Retrieval for Medical Summarization, Guanting Luo and Yuki Arase, The University of Osaka, Institute of Science Tokyo
- 09:50 - 10:10 Advancing Biomedical Claim Verification by Using Large Language Models with Better Structured Prompting Strategies, Siting Liang and Daniel Sonntag, German Research Center for Artificial Intelligence
- 10:10 - 10:30 Questioning Our Questions: How Well Do Medical QA Benchmarks Evaluate Clinical Capabilities of Language Models? Siun Kim and Hyung-Jin Yoon, Seoul Natoinal University Hospital, Biomedical Engineering, Seoul National University College of Medicine
- 10:30 - 11:00 Coffee Break
- 11:00 - 12:30 Session 2: Clinical NLP
- 11:00 - 11:20 A Retrieval-Based Approach to Medical Procedure Matching in Romanian, Andrei Niculae, Adrian Cosma, Emilian Radoi, National University of Science and Technology Politehnica Bucharest
- 11:20 - 11:40 Error Detection in Medical Note through Multi Agent Debate, Abdine L Maiga, Anoop Shah, Emine Yilmaz, University College London, Amazon
- 11:40 - 12:00 Converting Annotated Clinical Cases into Structured Case Report Forms, Pietro Ferrazzi, Alberto Lavelli, Bernardo Magnini, University of Padova, FBK
- 12:00 - 12:30 Invited Talk -- Wojciech Kusa: Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation
- 12:30 - 14:00 Lunch
- 14:00 - 15:30 Session 3: Shared Tasks
- 14:00 - 14:15 Overview of the BioLaySumm 2025 Shared Task on Lay Summarization of Biomedical Research Articles and Radiology Reports, Chenghao Xiao, Kun Zhao, Xiao Wang, Siwei Wu, Sixing Yan, Tomas Goldsack, Sophia Ananiadou, Noura Al Moubayed, Liang Zhan, William K. Cheung, Chenghua Lin, Durham University, University of Pittsburgh, University of Manchester, Hong Kong Baptist University, University of Sheffield
- 14:15 - 14:20 Poster boaster: AEHRC at BioLaySumm 2025: Leveraging T5 for Lay Summarisation of Radiology Reports. Wenjun Zhang, Shekhar S. Chandra, Bevan Koopman, Jason Dowling and Aaron Nicolson
- 14:20 - 14:25 Poster boaster: Team SXZ at BioLaySumm2025: Combining Section‐Wise Summarization, K‐Shot LLM Prompting, BioBART, and RL Fine‐Tuning for Biomedical Lay Summaries. Pengcheng Xu, Sicheng Shen, Jieli Zhou and Hongyi Xin
- 14:25 - 14:40 SMAFIRA Shared Task at the BioNLP'2025 Workshop: Assessing the Similarity of the Research Goal, Mariana Neves, Iva Sovadinova, Susanne Fieberg, Celine Heinl, diana Rubel, Gilbert Schönfelder, Bettina Bert, German Federal Institute for Risk Assessment, Masaryk University
- 14:40 - 14:55 Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering, Brandon C Colelough, Davis Bartels, Dina Demner-Fushman, National Library of Medicine
- 14:55 - 15:00 Poster boaster: VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA. Pakawat Phasook, Rapepong Pitijaroonpong, Jiramet Kinchagawat, Amrest Chinkamol, Tossaporn Saengja, Kiartnarin Udomlapsakul, Jitkapat Sawatphol and Piyalitt Ittichaiwong
- 15:00 - 15:15 Overview of the ArchEHR-QA 2025 Shared Task on Grounded Question Answering from Electronic Health Records, Sarvesh Soni, SOUMYA GAYEN, Dina Demner-Fushman, National Library of Medicine
- 15:15 - 15:20 Poster boaster: ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality. Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane García Domingo-Aldama, Aitziber Atutxa Salazar, Josu Goikoetxea and Ander Barrena
- 15:20 - 15:25 Poster boaster: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering. Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanath Reddy Gajjala, Zhuoyan Xu, Siddhant Rai and Vaishnav Potlapalli
- 15:30 - 16:00 Coffee Break
- 16:00 - 18:00 Poster Sessions (online, onsite, workshop and shared tasks. Note: Shared Task papers listed in Volume 2)
* Improving Barrett's Oesophagus Surveillance Scheduling with Large Language Models: A Structured Extraction Approach, Xinyue Zhang, Agathe Zecevic, Sebastian Zeki, Angus Roberts, King's College London, Guy's and St Thomas' NHS Foundation Trust * Effective Multi-Task Learning for Biomedical Named Entity Recognition, João Ruano, Gonçalo M Correia, Leonor Maria Machado Barreiros, Afonso Mendes, Priberam * PetEVAL: A veterinary free text electronic health records benchmark, Sean Farrell, Alan Radford, Noura Al Moubayed, Peter-John Mäntylä Noble, Durham University, University of Liverpool * Can Large Language Models Classify and Generate Antimicrobial Resistance Genes? Hyunwoo Yoo, Haebin Shin, Gail Rosen, Drexel University, KAIST AI * Overcoming Data Scarcity in Named Entity Recognition: Synthetic Data Generation with Large Language Models. An Dao, Hiroki Teranishi, Yuji Matsumoto, Florian Boudin, Akiko Aizawa, The University of Tokyo, RIKEN Center for Advanced Intelligence Project, Nantes University, National Institute of Informatics * Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records, Ben Holgate, Joe Davies, Shichao Fang, Joel S. Winston, James T. Teo, Mark P. Richardson, King's College London * Transformer-Based Medical Statement Classification in Doctor-Patient Dialogues, Farnod Bahrololloomi, Johannes Luderschmidt, Biying Fu, RheinMain University of Applied Sciences * PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies, Simona Emilova Doneva, Hanna Hubarava, Pia Andrea Härvelid, Wolfgang Emanuel Zürrer, Julia V Bugajska, Bernard Friedrich Hild, David Brüschweiler, Gerold Schneider, Tilia Ellendorff, Benjamin Victor Ineichen, University of Zurich * QoLAS: A Reddit Corpus of Health-Related Quality of Life Aspects of Mental Disorders, Lynn Greschner, Amelie Wührl, Roman Klinger, University of Bamberg, University of Stuttgart * Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts, Elizabeth Schaefer and Kirk Roberts, Yale University, University of Texas Health Science Center at Houston * LLMs as Medical Safety Judges: Evaluating Alignment with Human Annotation in Patient-Facing QA, Yella Leonie Diekmann, Chase M Fensore, Rodrigo M Carrillo-Larco, Eduard R Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, Joyce C Ho, Emory University * AdaBioBERT: Adaptive Token Sequence Learning for Biomedical Named Entity Recognition, Sumit Kumar and Tanmay Basu, Indian Institute of Science Education and Research Bhopal * Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals, Efstathia Soufleri and Sophia Ananiadou, Athena RC, University of Manchester * MuCoS: Efficient Drug–Target Discovery via Multi-Context-Aware Sampling in Knowledge Graphs, Haji Gul, Abdul Ghani Naim, Ajaz Ahmad Bhat, UBD * Enhancing Antimicrobial Drug Resistance Classification by Integrating Sequence-Based and Text-Based Representations, Hyunwoo Yoo, Bahrad Sokhansanj, James R Brown, Drexel University * Effect of Multilingual and Domain-adapted Continual Pre-training on Few-shot Promptability, Ken Yano and Makoto Miwa, The National Institute of Advanced Industrial Science and Technology, Toyota Technological Institute * Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain, Shintaro Ozaki, Yuta Kato, Siyuan Feng, Masayo Tomita, Kazuki Hayashi, Wataru Hashimoto, Ryoma Obara, Masafumi Oyamada, Katsuhiko Hayashi, Hidetaka Kamigaito, Taro Watanabe, Nara Institute of Science and Technology, The University of Tokyo, NEC * Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study, Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, Stefano Patarnello, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Catholic University of the Sacred Heart, Rome, Italy * CaseReportCollective: A Large-Scale LLM-Extracted Dataset for Structured Medical Case Reports, Xiao Yu Cindy Zhang, Melissa Fong, Wyeth Wasserman, Jian Zhu, University of British Columbia * RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization, Md Sultan Al Nahian and Ramakanth Kavuluru, University of Kentucky * Benchmarking zero-shot biomedical relation triplet extraction across language model architectures, Frederik Steensgaard Gade, Ole Lund, Marie Lisandra Zepeda Mendoza, Technical University of Denmark, Novo Nordisk Research Centre Oxford * Virtual CRISPR: Can LLMs Predict CRISPR Screen Results? Steven Song, Abdalla Abdrabou, Asmita Dabholkar, Kastan Day, Pavan Dharmoju, Jason Perera, Volodymyr Kindratenko, Aly A Khan, University of Chicago, Chan Zuckerberg Biohub Chicago, University of Illinois Urbana-Champaign, Northwestern University
SUBMISSION INSTRUCTIONS
Two types of submissions are invited: full (long) papers (8 pages) and short papers (4 pages).
Submission site for the workshop https://softconf.com/acl2025/BioNLP2025 Submission site for Shared Tasks https://softconf.com/acl2025/BioNLP2025-ST
Please follow these formatting guidelines: https://github.com/acl-org/acl-style-files Please note that the review process is double-blind.
Final versions of accepted papers will be given one additional page of content (up to 9 pages for long papers, up to 5 pages for short papers) to address reviewers’ comments.
Submissions from ACL rolling review
We will consider ACL rolling review submissions with all reviews and scores. If you are interested in submitting your work for consideration, please contact ddemner at gmail.
WORKSHOP OVERVIEW AND SCOPE
The BioNLP workshop, associated with the ACL SIGBIOMED special interest group, is an established primary venue for presenting research in language processing and language understanding for the biological and medical domains. The workshop has been running every year since 2002 and continues getting stronger. Many other emerging biomedical and clinical language processing workshops can afford to be more specialized because BioNLP truly encompasses the breadth of the domain and brings together researchers in biomedical and clinical NLP from all over the world.
BioNLP 2025 will be particularly interested in evaluation frameworks and metrics that reflect the needs of health-related use cases and provide a good estimate of reliability of the proposed solutions. BioNLP 2025 continues to focus on transparency of tgenerative approaches and factuality of the generated text. Language processing that supports DEIA (Diversity, Equity, Inclusion and Accessibility) continues to be of utmost importance. The work on detection and mitigation of bias and misinformation continues to be of interest. Research in languages other than English, particularly, under-represented languages, and health disparities are always of interest to BioNLP. Other active areas of research include, but are not limited to:
- Extraction of complex relations and events;
- Discourse analysis; Anaphora \& coreference resolution;
- Text mining \& Literature based discovery;
- Question Answering; Summarization; Text simplification;
- Resources and strategies for system testing and evaluation;
- Synthetic data generation \& data augmentation;
- Translating NLP research into practice: tangible explainable results of biomedical language processing applications.
SHARED TASKS
SMAFIRA
The SMAFIRA project supports finding alternative methods to animal experiments. The organizers have released SMAFIRA Web tool that allows researchers to perform searches for methods alternative to animal experiments. The input to the tool is a PubMed identifier (PMID) of a publication that represents the animal experiment for which one wants to find an alternative method. The tool retrieves up to 200 similar articles available in PubMed, and presents these as a list of results. The task is to validate and annotate the top 10 similar articles, either automatically, with any system of the participants choice, or manually using the SMAFIRA tool. See details at https://smafira-bf3r.github.io/smafira-st/
ClinIQLink 2025 - LLM Lie Detector Test
The LLM Lie Detector Test aims to evaluate the effectiveness of generative models in producing factually accurate information, with a benchmark dataset specifically curated to align with the knowledge level of a General Practitioner (GP) Medical Doctor. Participants will submit model ouptputs to be assessed using a structured set of atomic question-answer pairs (factoid, true/false and list questions), which focus on retrieving precise, factually correct information. The test will evaluate internal model knowledge retrieval. See details at https://brandonio-c.github.io/ClinIQLink-2025/
ArchEHR-QA 2025: Grounded Electronic Health Record Question Answering
The participants will automatically generate answers to patients’ health-related questions that are grounded in the evidence from patients’ clinical notes. The dataset will consist of hand-curated realistic patient questions (submitted through a patient portal) and their corresponding clinician-rewritten versions (crafted to assist in formulating their responses). The task is to construct coherent answers or responses to input questions that must use and be grounded in the provided clinical note excerpts. See details at https://archehr-qa.github.io/
BioLaySumm 2025
This is the 3nd iteration of BioLaySumm, following the success of the 2nd edition of the task at BioNLP 2024 which attracted 200 plus submissions across 53 different teams and the 1st edition of the task at BioNLP 2023 which attracted 56 submissions across 20 different teams. This edition builds on last year’s task by introducing a new task: radiology report generation with layman’s terms, extending the shared task to a new domain and multi-modality. See detail at https://biolaysumm.org/