Difference between revisions of "BioNLP Workshop"

From ACL Wiki
Jump to navigation Jump to search
 
(17 intermediate revisions by the same user not shown)
Line 7: Line 7:
  
  
===IMPORTANT DATES (Tentative) ===
+
===IMPORTANT DATES ===
  
* Paper submission deadline: March 20, 2025
+
* <font color="#808080"> Paper submission deadline: March 20, 2025
* Notification of acceptance: April 20, 2025
+
* Notification of acceptance: April 28, 2025
* Camera-ready paper due: May 20, 2025 -- <b>No extensions due to ACL publication deadline</b>.  
+
* Camera-ready paper due: May 25, 2025 -- No extensions due to ACL publication deadline.  
* Pre-recorded video due (hard deadline): July 7, 2025
+
* Pre-recorded video due (hard deadline): July 10, 2025. Please contact Underline if you need help </font>
* Workshop: July 31st OR August 1st 2025
+
* <b> Workshop: August 1st 2025 </b>
  
 +
=== Keynote ===
 +
Speaker: Wojciech Kusa
 +
 +
Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation
 +
 +
Current evaluations of automation methods in systematic literature reviews often treat all included studies as equally important, ignoring their varying influence on review outcomes. This can misrepresent the effectiveness of search strategies, as not all relevant studies contribute equally to the conclusions of the review. To address this limitation, we propose a new evaluation framework that incorporates the differential impact of individual studies on review outcomes. Using data from the CLEF 2019 TAR task, we applied this framework to assess 74 automation models, leveraging meta-analysis effect estimates to weigh the influence of each study. Compared to conventional binary relevance metrics, our approach provided a more nuanced assessment, emphasizing the importance of retrieving high-impact studies. Results showed significant differences in model rankings, underscoring the value of outcome-based evaluation. This framework offers researchers a more precise method for evaluating systematic review automation tools, ultimately supporting higher-quality evidence synthesis and better-informed clinical decisions.
 +
 +
Wojciech is a Senior Researcher at the NASK National Research Institute in Poland, where he leads the Linguistic Engineering and Text Analysis Department. He holds a PhD in NLP from TU Wien, with a focus on applying and evaluating neural methods for domain-specific data. His research interests include the safety and evaluation of large language models, clinical and biomedical NLP, and AI-driven scientific discovery. Wojciech was a Marie Skłodowska-Curie Fellow in the EU Horizon 2020 project DoSSIER, specialising in biomedical information retrieval and NLP. He has industry experience from roles at Samsung and Allegro, and has completed research internships at Sony, UNINOVA, and the Polish Academy of Sciences.
 +
 +
 +
===Program Committee===
 +
* Daniel Andrade, Hiroshima University, Japan
 +
* Emilia Apostolova, Anthem, Inc., USA
 +
* Eiji Aramaki, University of Tokyo, Japan
 +
* Tanmay Basu, Indian Institute of Science Education and Research Bhopal, India
 +
* Leandra Budau, Toronto Metropolitan University, Canada
 +
* Leonardo Campillos-Llanos, Centro Superior de Investigaciones Científicas - CSIC, Spain
 +
* Liuliu Chen, University of Melbourne, Australia
 +
* Yingjian Chen, Henan University, China
 +
* Brian Connolly, Cincinnati Children's Hospital Hospital Medical Center, Ohio, USA
 +
* Mike Conway, University of Melbourne, Australia
 +
* An Dao, University of Tokyo, Japan
 +
* Berry de Bruijn, National Research Council, Canada
 +
* Jean-Benoit Delbrouck, Stanford University, California, USA
 +
* Dina Demner-Fushman, US National Library of Medicine
 +
* Simona Doneva, University of Zurich, Switzerland
 +
* Pietro Ferrazzi, University of Padua, Italy
 +
* Kathleen C. Fraser, National Research Council Canada
 +
* Natalia Grabar, CNRS, U Lille, France
 +
* Cyril Grouin, Université Paris-Saclay, CNRS
 +
* Tudor Groza, EMBL-EBI
 +
* Yingjun Guan, University of Illinois Urbana-Champaign, USA
 +
* Deepak Gupta, US National Library of Medicine
 +
* Thierry Hamon, LIMSI-CNRS, France
 +
* Ben Holgate, King's College London, UK
 +
* Antonio Jimeno Yepes, IBM, Melbourne Area, Australia
 +
* Hidetaka Kamigaito, Nara Institute of Science and Technology, Japan
 +
* Vani Kanjirangat, Dalle Molle Institute for Artificial Intelligence (IDSIA), Switzerland
 +
* Sarvnaz Karimi, CSIRO, Australia
 +
* Nazmul Kazi, University of North Florida, USA
 +
* Siun Kim,  Seoul National University, Korea
 +
* Gaurav Kumar, University of California, San Diego, USA
 +
* Andre Lamurias, NOVA School of Science and Technology, Lisbon, Portugal
 +
* Majid Latifi, Department of Computer Science, University of York, York, UK
 +
* Alberto Lavelli, FBK-ICT, Italy
 +
* Robert Leaman, US National Library of Medicine
 +
* Lung-Hao Lee, National Central University, Taiwan
 +
* Ulf Leser, Humboldt-Universität zu Berlin, Germany
 +
* Yuan Liang, Queen Mary University of London, UK
 +
* Siting Liang, German Research Center for Artificial Intelligence, Germany
 +
* Livia Lilli, Fondazione Policlinico Universitario Agostino Gemelli, Italy
 +
* Abdine Maiga, University College London, UK
 +
* Makoto Miwa, Toyota Technological Institute, Japan
 +
* Claire Nedellec, National Research Institute for Agriculture, Food and Environment (INRAE), Paris-Saclay University, France
 +
* Guenter Neumann, DFKI, Germany
 +
* Aurélie Névéol, LISN - CNRS, France
 +
* Mariana Neves, Hasso-Plattner-Institute at the University of Potsdam, Germany
 +
* Andrei Niculae, Carol Davila University of Medicine and Pharmacy, Romania
 +
* Brian Ondov, Yale University, USA
 +
* Noon Pokaratsiri Goldstein, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)
 +
* François Remy, Ghent University, Belgium
 +
* Francisco J. Ribadas-Pena, University of Vigo, Spain
 +
* Fabio Rinaldi, Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland
 +
* Roland Roller, DFKI, Germany
 +
* Mourad Sarrouti,  CLARA Analytics, USA
 +
* Efstathia Soufleri, Archimedes - Athena Research Center, Greece
 +
* Peng Su, University of Delaware, USA
 +
* Madhumita Sushil, University of California, San Francisco, USA
 +
* Mario Sänger, Humboldt Universität zu Berlin, Germany
 +
* Karin Verspoor, RMIT University, Australia
 +
* Davy Weissenbacher, Cedars-Sinai, Los Angeles, California, USA
 +
* Nathan M. White, James Cook University, Australia
 +
* Dongfang Xu, Cedars-Sinai, USA
 +
* Shweta Yadav, University of Illinois Chicago, USA
 +
* Ken Yano, National Institute of Advanced Industrial Science and Technology, Japan
 +
* Hyunwoo Yoo, Drexel University, USA
 +
* Kai Zhang, Worcester Polytechnic Institute, MA, USA
 +
* Xinyue Zhang, King's College London, UK
 +
* Xiao Yu Cindy Zhang, University of British Columbia, Canada
 +
* Jingqing Zhang,  Imperial College London, UK
 +
* Angelo Ziletti, Bayer, Germany
 +
* Ayah Zirikly, Johns Hopkins, USA
 +
* Pierre Zweigenbaum, LIMSI - CNRS, France
 +
 +
===Secondary Reviewers===
 +
* Joseph Akinyemi, University of York, UK
 +
* Robert Bossy, National Research Institute for Agriculture, Food and Environment (INRAE), France
 +
* Marco Naguib, Interdisciplinary Laboratory on Numerical Sciences (LISN), France
 +
 +
===Sponsor===
 +
We are pleased to announce that the Chen Institute is co-organizing the BioNLP 2025 Workshop. Founded in 2016 by Tianqiao Chen and Chrissy Luo, the Chen Institute is driven by a bold vision to improve the human experience by understanding how our brains perceive, learn, and interact with the world. Their global platform includes the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech, the Tianqiao Chen Institute for Translational Research in Shanghai, the Chen Frontier Lab for Applied Neurotechnology, and the Chen Frontier Lab for AI and Mental Health. The Chen Scholars program supports early- to mid-career scientists, and the recently launched Chen Institute and Science Prize for AI Accelerated Research highlights their deep commitment to innovation.
 +
 +
At this year’s BioNLP Workshop, the Chen Institute is interested in exploring how artificial intelligence can accelerate the pace of scientific discovery. We believe there are vast, untapped opportunities to make groundbreaking advances by leveraging the power of AI. The hope is that this meeting will serve as the beginning of an ongoing dialogue—focused on new developments, transformative successes, and emerging thinking at the intersection of AI and science. Through this collaboration, the Chen Institute aims to identify and support promising approaches with the potential to meaningfully change the world.
 +
 +
 +
===Workshop Program===
 +
 +
Friday, August 1, 2025
 +
 +
*08:40 - 08:50  <b>Opening remarks</b>
 +
 +
*08:50 - 10:30  <b>Session 1: Foundational tasks</b>
 +
 +
*08:50 - 09:10  Accelerating Cross-Encoders in Biomedical Entity Linking, Javier Sanz-Cruzado and Jake Lever, University of Glasgow
 +
*09:10 - 09:30  Beyond Citations: Integrating Finding-Based Relations for Improved Biomedical Article Representations, Yuan Liang, Massimo Poesio, Roonak Rezvani, Queen Mary University of London, University of Utrecht, Recursion
 +
*09:30 - 09:50  MedSummRAG: Domain-Specific Retrieval for Medical Summarization, Guanting Luo and Yuki Arase, The University of Osaka, Institute of Science Tokyo
 +
*09:50 - 10:10  Advancing Biomedical Claim Verification by Using Large Language Models with Better Structured Prompting Strategies, Siting Liang and Daniel Sonntag, German Research Center for Artificial Intelligence
 +
*10:10 - 10:30  Questioning Our Questions: How Well Do Medical QA Benchmarks Evaluate Clinical Capabilities of Language Models? Siun Kim and Hyung-Jin Yoon, Seoul Natoinal University Hospital, Biomedical Engineering, Seoul National University College of Medicine
 +
 +
*10:30 - 11:00  <b>Coffee Break</b>
 +
 +
*11:00 - 12:30  <b>Session 2: Clinical NLP</b>
 +
 +
*11:00 - 11:20  A Retrieval-Based Approach to Medical Procedure Matching in Romanian, Andrei Niculae, Adrian Cosma, Emilian Radoi, National University of Science and Technology Politehnica Bucharest
 +
*11:20 - 11:40  Error Detection in Medical Note through Multi Agent Debate, Abdine L Maiga, Anoop Shah, Emine Yilmaz, University College London, Amazon
 +
*11:40 - 12:00  Converting Annotated Clinical Cases into Structured Case Report Forms, Pietro Ferrazzi, Alberto Lavelli, Bernardo Magnini, University of Padova, FBK
 +
 +
*12:00 - 12:30  <b>Invited Talk</b> -- Wojciech Kusa: Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation
 +
 +
*12:30 - 14:00  <b>Lunch</b>
 +
 +
*14:00 - 15:30  <b>Session 3: Shared Tasks</b>
 +
 +
*14:00 - 14:15  Overview of the BioLaySumm 2025 Shared Task on Lay Summarization of Biomedical Research Articles and Radiology Reports, Chenghao Xiao, Kun Zhao, Xiao Wang, Siwei Wu, Sixing Yan, Tomas Goldsack, Sophia Ananiadou, Noura Al Moubayed, Liang Zhan, William K. Cheung, Chenghua Lin, Durham University, University of Pittsburgh, University of Manchester, Hong Kong Baptist University, University of Sheffield
 +
*14:15 - 14:20  Poster boaster: AEHRC at BioLaySumm 2025: Leveraging T5 for Lay Summarisation of Radiology Reports. Wenjun Zhang, Shekhar S. Chandra, Bevan Koopman, Jason Dowling and Aaron Nicolson
 +
*14:20 - 14:25  Poster boaster: Team SXZ at BioLaySumm2025: Combining Section‐Wise Summarization, K‐Shot LLM Prompting, BioBART, and RL Fine‐Tuning for Biomedical Lay Summaries. Pengcheng Xu, Sicheng Shen, Jieli Zhou and Hongyi Xin
 +
* 14:25 - 14:40  SMAFIRA Shared Task at the BioNLP'2025 Workshop: Assessing the Similarity of the Research Goal, Mariana Neves, Iva Sovadinova, Susanne Fieberg, Celine Heinl, diana Rubel, Gilbert Schönfelder, Bettina Bert, German Federal Institute for Risk Assessment, Masaryk University
 +
*14:40 - 14:55  Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering, Brandon C Colelough, Davis Bartels, Dina Demner-Fushman, National Library of Medicine
 +
*14:55 - 15:00  Poster boaster: VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA. Pakawat Phasook, Rapepong Pitijaroonpong, Jiramet Kinchagawat, Amrest Chinkamol, Tossaporn Saengja, Kiartnarin Udomlapsakul, Jitkapat Sawatphol and Piyalitt Ittichaiwong
 +
*15:00 - 15:15  Overview of the ArchEHR-QA 2025 Shared Task on Grounded Question Answering from Electronic Health Records, Sarvesh Soni, SOUMYA GAYEN, Dina Demner-Fushman, National Library of Medicine
 +
*15:15 - 15:20  Poster boaster: ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality. Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane García Domingo-Aldama, Aitziber Atutxa Salazar, Josu Goikoetxea and Ander Barrena
 +
*15:20 - 15:25  Poster boaster: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering. Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanath Reddy Gajjala, Zhuoyan Xu, Siddhant Rai and Vaishnav Potlapalli
 +
 +
*15:30 - 16:00  <b>Coffee Break</b>
 +
 +
*16:00 - 18:00  <b>Poster Sessions</b> (online, onsite, workshop and shared tasks. Note: Shared Task papers listed in Volume 2)
 +
 +
  * Improving Barrett's Oesophagus Surveillance Scheduling with Large Language Models: A Structured Extraction Approach, Xinyue Zhang, Agathe Zecevic, Sebastian Zeki, Angus Roberts, King's College London, Guy's and St Thomas' NHS Foundation Trust
 +
  * Effective Multi-Task Learning for Biomedical Named Entity Recognition, João Ruano, Gonçalo M Correia, Leonor Maria Machado Barreiros, Afonso Mendes, Priberam
 +
  * PetEVAL: A veterinary free text electronic health records benchmark, Sean Farrell, Alan Radford, Noura Al Moubayed, Peter-John Mäntylä Noble, Durham University, University of Liverpool
 +
  * Can Large Language Models Classify and Generate Antimicrobial Resistance Genes? Hyunwoo Yoo, Haebin Shin, Gail Rosen, Drexel University, KAIST AI
 +
  * Overcoming Data Scarcity in Named Entity Recognition: Synthetic Data Generation with Large Language Models. An Dao, Hiroki Teranishi, Yuji Matsumoto, Florian Boudin, Akiko Aizawa, The University of Tokyo, RIKEN Center for Advanced Intelligence Project, Nantes University, National Institute of Informatics
 +
  * Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records, Ben Holgate, Joe Davies, Shichao Fang, Joel S. Winston, James T. Teo, Mark P. Richardson, King's College London
 +
  * Transformer-Based Medical Statement Classification in Doctor-Patient Dialogues, Farnod Bahrololloomi, Johannes Luderschmidt, Biying Fu, RheinMain University of Applied Sciences
 +
  * PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies, Simona Emilova Doneva, Hanna Hubarava, Pia Andrea Härvelid, Wolfgang Emanuel Zürrer, Julia V Bugajska, Bernard Friedrich Hild, David Brüschweiler, Gerold Schneider, Tilia Ellendorff, Benjamin Victor Ineichen, University of Zurich
 +
  * QoLAS: A Reddit Corpus of Health-Related Quality of Life Aspects of Mental Disorders, Lynn Greschner, Amelie Wührl, Roman Klinger, University of Bamberg, University of Stuttgart
 +
  * Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts, Elizabeth Schaefer and Kirk Roberts, Yale University, University of Texas Health Science Center at Houston
 +
  * LLMs as Medical Safety Judges: Evaluating Alignment with Human Annotation in Patient-Facing QA, Yella Leonie Diekmann, Chase M Fensore, Rodrigo M Carrillo-Larco, Eduard R Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, Joyce C Ho, Emory University
 +
  * AdaBioBERT: Adaptive Token Sequence Learning for Biomedical Named Entity Recognition, Sumit Kumar and Tanmay Basu, Indian Institute of Science Education and Research Bhopal
 +
  * Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals, Efstathia Soufleri and Sophia Ananiadou, Athena RC, University of Manchester
 +
  * MuCoS: Efficient Drug–Target Discovery via Multi-Context-Aware Sampling in Knowledge Graphs, Haji Gul, Abdul Ghani Naim, Ajaz Ahmad Bhat, UBD
 +
  * Enhancing Antimicrobial Drug Resistance Classification by Integrating Sequence-Based and Text-Based Representations, Hyunwoo Yoo, Bahrad Sokhansanj, James R Brown, Drexel University
 +
  * Effect of Multilingual and Domain-adapted Continual Pre-training on Few-shot Promptability, Ken Yano and Makoto Miwa, The National Institute of Advanced Industrial Science and Technology, Toyota Technological Institute
 +
  * Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain, Shintaro Ozaki, Yuta Kato, Siyuan Feng, Masayo Tomita, Kazuki Hayashi, Wataru Hashimoto, Ryoma Obara, Masafumi Oyamada, Katsuhiko Hayashi, Hidetaka Kamigaito, Taro Watanabe, Nara Institute of Science and Technology, The University of Tokyo, NEC
 +
  * Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study, Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, Stefano Patarnello, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Catholic University of the Sacred Heart, Rome, Italy
 +
  * CaseReportCollective: A Large-Scale LLM-Extracted Dataset for Structured Medical Case Reports, Xiao Yu Cindy Zhang, Melissa Fong, Wyeth Wasserman, Jian Zhu, University of British Columbia
 +
  * RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization, Md Sultan Al Nahian and Ramakanth Kavuluru, University of Kentucky
 +
  * Benchmarking zero-shot biomedical relation triplet extraction across language model architectures, Frederik Steensgaard Gade, Ole Lund, Marie Lisandra Zepeda Mendoza, Technical University of Denmark, Novo Nordisk Research Centre Oxford
 +
  * Virtual CRISPR: Can LLMs Predict CRISPR Screen Results? Steven Song, Abdalla Abdrabou, Asmita Dabholkar, Kastan Day, Pavan Dharmoju, Jason Perera, Volodymyr Kindratenko, Aly A Khan, University of Chicago, Chan Zuckerberg Biohub Chicago, University of Illinois Urbana-Champaign, Northwestern University
 +
 +
 +
<b><font size="5">  BioNLP-ST 2025 Posters</font></b>
 +
  ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality. Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane García Domingo-Aldama, Aitziber Atutxa Salazar, Josu Goikoetxea, Ander Barrena
 +
 +
  UNIBUC-SD at ArchEHR-QA 2025: Prompting Our Way to Clinical QA with Multi-Model Ensembling. Dragos Dumitru Ghinea and Ștefania Rîncu
 +
 +
  Loyola at ArchEHR-QA 2025: Exploring Unsupervised Attribution of Generated Text: Attention and Clustering-Based Methods. Rohan Sethi, Timothy Miller, Majid Afshar, Dmitriy Dligach
 +
 +
  CUNI-a at ArchEHR-QA 2025: Do we need Giant LLMs for Clinical QA? Vojtech Lanz and Pavel Pecina
 +
 +
  WisPerMed at ArchEHR-QA 2025: A Modular, Relevance-First Approach for Grounded Question Answering on Eletronic Health Records. Jan-Henning Büns, Hendrik Damm, Tabea Margareta Grace Pakull, Felix Nensa, Elisabeth Livingstone
 +
 +
  heiDS at ArchEHR-QA 2025: From Fixed-k to Query-dependent-k for Retrieval Augmented Generation. Ashish Chouhan and Michael Gertz
 +
 +
  UniBuc-SB at ArchEHR-QA 2025: A Resource-Constrained Pipeline for Relevance Classification and Grounded Answer Synthesis. Sebastian Balmus, Dura Alexandru Bogdan, Ana Sabina Uban
 +
 +
  KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering. Adam Kovacs, Paul Schmitt, Gabor Recski
 +
 +
  LAILab at ArchEHR-QA 2025: Test-time scaling for evidence selection in grounded question answering from electronic health records. Tuan Dung Le, Thanh Duong, Shohreh Haddadan, Behzad Jazayeri, Brandon Manley, Thanh Thieu
 +
 +
  UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting. Sara Shields-Menard, Zach Reimers, Joshua Gardner, David Perry, Anthony Rios
 +
 +
  UTSamuel at ArchEHR-QA 2025: A Clinical Question Answering System for Responding to Patient Portal Messages Using Generative AI. Samuel M Reason, Liwei Wang, Hongfang Liu, Ming Huang
 +
 +
  LAMAR at ArchEHR-QA 2025: Clinically Aligned LLM-Generated Few-Shot Learning for EHR-Grounded Patient Question Answering. Seksan Yoadsanit, Nopporn Lekuthai, Watcharitpol Sermsrisuwan, Titipat Achakulvisut
 +
 +
  Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering. Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanath Reddy Gajjala, Zhuoyan Xu, Siddhant Rai, Vaishnav Potlapalli
 +
 +
  UIC at ArchEHR-QA 2025: Tri-Step Pipeline for Reliable Grounded Medical Question Answering. Mohammad Arvan, Anuj Gautam, Mohan Zalake, Karl M. Kochendorfer
 +
 +
  DMIS Lab at ArchEHR-QA 2025: Evidence-Grounded Answer Generation for EHR-based QA via a Multi-Agent Framework. Hyeon Hwang, Hyeongsoon Hwang, JongMyung Jung, Jaehoon Yun, Minju Song, Yein Park, Dain Kim, Taewhoo Lee, Jiwoong Sohn, Chanwoong Yoon, Sihyeon Park, Jiwoo Lee, Heechul Yang, Jaewoo Kang
 +
 +
  CogStack-KCL-UCL at ArchEHR-QA 2025: Investigating Hybrid LLM Approaches for Grounded Clinical Question Answering. Shubham Agarwal, Thomas Searle, Kawsar Noor, Richard Dobson
 +
 +
  SzegedAI at ArchEHR-QA 2025: Combining LLMs with traditional methods for grounded question answering. Soma Bálint Nagy, Bálint Nyerges, Zsombor Mátyás Kispéter, Gábor Tóth, András Tamás Szlúka, Gábor Kőrösi, Zsolt Szántó, Richárd Farkas
 +
 +
  LIMICS at ArchEHR-QA 2025: Prompting LLMs Beats Fine-Tuned Embeddings. Adam REMAKI, Armand Violle, Vikram Natraj, Étienne Guével, Akram Redjdal
 +
 +
  razreshili at ArchEHR-QA 2025: Contrastive Fine-Tuning for Retrieval-Augmented Biomedical QA. Arina Zemchyk
 +
 +
  DKITNLP at ArchEHR-QA 2025: A Retrieval Augmented LLM Pipeline for Evidence-Based Patient Question Answering. Provia Kadusabe, Abhishek Kaushik, Fiona Lawless
 +
 +
  AEHRC at BioLaySumm 2025: Leveraging T5 for Lay Summarisation of Radiology Reports. Wenjun Zhang, Shekhar S Chandra, Bevan Koopman, Jason Dowling, Aaron Nicolson
 +
 +
  MetninOzU at BioLaySumm2025: Text Summarization with Reverse Data Augmentation and Injecting Salient Sentences. Egecan Evgin, Ilknur Karadeniz, Olcay Taner Yıldız
 +
 +
  Shared Task at Biolaysumm2025 : Extract then summarize approach Augmented with UMLS based Definition Retrieval for Lay Summary generation.. Aaradhya Gupta and Parameswari Krishnamurthy
 +
 +
  RainCityNLP at BioLaySumm2025: Extract then Summarize at Home. Jen Wilson, Michael Pollack, Rachel Edwards, Avery Bellamy, Helen Salgi
 +
 +
  TLPIQ at BioLaySumm: Hide and Seq, a FLAN-T5 Model for Biomedical Summarization. Melody Bechler, Carly Crowther, Emily Luedke, Natasha Schimka, Ibrahim Sharaf
 +
 +
  LaySummX at BioLaySumm: Retrieval-Augmented Fine-Tuning for Biomedical Lay Summarization Using Abstracts and Retrieved Full-Text Context. Fan Lin and Dezhi Yu
 +
 +
  5cNLP at BioLaySumm2025: Prompts, Retrieval, and Multimodal Fusion. Juan Antonio Lossio-Ventura, Callum Chan, Arshitha Basavaraj, Hugo Alatrista-Salas, Francisco Pereira, Diana Inkpen
 +
 +
  MIRAGES at BioLaySumm2025: The Impact of Search Terms and Data Curation for Biomedical Lay Summarization. Benjamin Pong, Ju-hui Chen, Jonathan Jiang, Abimael Hernandez Jimenez, Melody Vahadi
 +
 +
  SUWMIT at BioLaySumm2025: Instruction-based Summarization with Contrastive Decoding. Priyam Basu, Jose Cols, Daniel Jarvis, Yongsin Park, Daniel Rodabaugh
 +
 +
  BDA-UC3M @ BioLaySumm: Efficient Lay Summarization with Small-Scale SoTA LLMs. Ilyass Ramzi and Isabel Segura Bedmar
 +
 +
  KHU_LDI at BioLaySumm2025: Fine-tuning and Refinement for Lay Radiology Report Generation. Nur Alya Dania binti Moriazi and Mujeen Sung
 +
 +
  CUTN_Bio at BioLaySumm: Multi-Task Prompt Tuning with External Knowledge and Readability adaptation for Layman Summarization. Bhuvaneswari Sivagnanam, Rivo Krishnu C H, Princi Chauhan, Saranya Rajiakodi
 +
 +
  Team SXZ at BioLaySumm2025: Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries. Pengcheng Xu, Sicheng Shen, Jieli Zhou, Hongyi Xin
 +
 +
  VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA. Pakawat Phasook, Rapepong Pitijaroonpong, Jiramet Kinchagawat, Amrest Chinkamol, Tossaporn Saengja, Kiartnarin Udomlapsakul, Jitkapat Sawatphol, Piyalitt Ittichaiwong
  
 
===SUBMISSION INSTRUCTIONS===
 
===SUBMISSION INSTRUCTIONS===
Line 21: Line 250:
  
 
  <font size="4">  Submission site for the workshop https://softconf.com/acl2025/BioNLP2025 </font>  
 
  <font size="4">  Submission site for the workshop https://softconf.com/acl2025/BioNLP2025 </font>  
 +
<font size="4">  Submission site for Shared Tasks https://softconf.com/acl2025/BioNLP2025-ST </font>
  
 
Please follow these formatting guidelines: https://github.com/acl-org/acl-style-files  
 
Please follow these formatting guidelines: https://github.com/acl-org/acl-style-files  
Line 50: Line 280:
  
 
===SHARED TASKS===
 
===SHARED TASKS===
 +
 +
====SMAFIRA====
 +
The SMAFIRA project supports finding alternative methods to animal experiments. The organizers have released SMAFIRA Web tool that allows researchers to perform searches for methods alternative to animal experiments. The input to the tool is a PubMed identifier (PMID) of a publication that represents the animal experiment for which one wants to find an alternative method. The tool retrieves up to 200 similar articles available in PubMed, and presents these as a list of results.  The task is to validate and annotate the top 10 similar articles, either automatically, with any system of the participants choice, or manually using the SMAFIRA tool. See details at https://smafira-bf3r.github.io/smafira-st/
  
 
====ClinIQLink 2025 - LLM Lie Detector Test====
 
====ClinIQLink 2025 - LLM Lie Detector Test====
Line 57: Line 290:
  
 
The participants will automatically generate answers to patients’ health-related questions that are grounded in the evidence from patients’ clinical notes. The dataset will consist of hand-curated realistic patient questions (submitted through a patient portal) and their corresponding clinician-rewritten versions (crafted to assist in formulating their responses). The task is to construct coherent answers or responses to input questions that must use and be grounded in the provided clinical note excerpts. See details at https://archehr-qa.github.io/
 
The participants will automatically generate answers to patients’ health-related questions that are grounded in the evidence from patients’ clinical notes. The dataset will consist of hand-curated realistic patient questions (submitted through a patient portal) and their corresponding clinician-rewritten versions (crafted to assist in formulating their responses). The task is to construct coherent answers or responses to input questions that must use and be grounded in the provided clinical note excerpts. See details at https://archehr-qa.github.io/
 +
 +
====BioLaySumm 2025====
 +
This is the 3nd iteration of BioLaySumm, following the success of the 2nd edition of the task at BioNLP 2024 which attracted 200 plus submissions across 53 different teams and the 1st edition of the task at BioNLP 2023 which attracted 56 submissions across 20 different teams. This edition builds on last year’s task by introducing a new task: radiology report generation with layman’s terms, extending the shared task to a new domain and multi-modality. See detail at https://biolaysumm.org/

Latest revision as of 12:23, 10 July 2025

SIGBIOMED | BioNLP 2024 | BioNLP 2023


BIONLP 2025 and Shared Tasks @ ACL 2025

The 24th BioNLP workshop associated with the ACL SIGBIOMED special interest group is co-located with ACL 2025


IMPORTANT DATES

  • Paper submission deadline: March 20, 2025
  • Notification of acceptance: April 28, 2025
  • Camera-ready paper due: May 25, 2025 -- No extensions due to ACL publication deadline.
  • Pre-recorded video due (hard deadline): July 10, 2025. Please contact Underline if you need help
  • Workshop: August 1st 2025

Keynote

Speaker: Wojciech Kusa

Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation

Current evaluations of automation methods in systematic literature reviews often treat all included studies as equally important, ignoring their varying influence on review outcomes. This can misrepresent the effectiveness of search strategies, as not all relevant studies contribute equally to the conclusions of the review. To address this limitation, we propose a new evaluation framework that incorporates the differential impact of individual studies on review outcomes. Using data from the CLEF 2019 TAR task, we applied this framework to assess 74 automation models, leveraging meta-analysis effect estimates to weigh the influence of each study. Compared to conventional binary relevance metrics, our approach provided a more nuanced assessment, emphasizing the importance of retrieving high-impact studies. Results showed significant differences in model rankings, underscoring the value of outcome-based evaluation. This framework offers researchers a more precise method for evaluating systematic review automation tools, ultimately supporting higher-quality evidence synthesis and better-informed clinical decisions.

Wojciech is a Senior Researcher at the NASK National Research Institute in Poland, where he leads the Linguistic Engineering and Text Analysis Department. He holds a PhD in NLP from TU Wien, with a focus on applying and evaluating neural methods for domain-specific data. His research interests include the safety and evaluation of large language models, clinical and biomedical NLP, and AI-driven scientific discovery. Wojciech was a Marie Skłodowska-Curie Fellow in the EU Horizon 2020 project DoSSIER, specialising in biomedical information retrieval and NLP. He has industry experience from roles at Samsung and Allegro, and has completed research internships at Sony, UNINOVA, and the Polish Academy of Sciences.


Program Committee

* Daniel Andrade, Hiroshima University, Japan 
* Emilia Apostolova, Anthem, Inc., USA
* Eiji Aramaki, University of Tokyo, Japan 
* Tanmay Basu, Indian Institute of Science Education and Research Bhopal, India
* Leandra Budau, Toronto Metropolitan University, Canada
* Leonardo Campillos-Llanos, Centro Superior de Investigaciones Científicas - CSIC, Spain
* Liuliu Chen, University of Melbourne, Australia
* Yingjian Chen, Henan University, China
* Brian Connolly, Cincinnati Children's Hospital Hospital Medical Center, Ohio, USA
* Mike Conway, University of Melbourne, Australia
* An Dao, University of Tokyo, Japan
* Berry de Bruijn, National Research Council, Canada
* Jean-Benoit Delbrouck, Stanford University, California, USA
* Dina Demner-Fushman, US National Library of Medicine
* Simona Doneva, University of Zurich, Switzerland
* Pietro Ferrazzi, University of Padua, Italy
* Kathleen C. Fraser, National Research Council Canada
* Natalia Grabar, CNRS, U Lille, France
* Cyril Grouin, Université Paris-Saclay, CNRS
* Tudor Groza, EMBL-EBI
* Yingjun Guan, University of Illinois Urbana-Champaign, USA
* Deepak Gupta, US National Library of Medicine 
* Thierry Hamon, LIMSI-CNRS, France
* Ben Holgate, King's College London, UK
* Antonio Jimeno Yepes, IBM, Melbourne Area, Australia
* Hidetaka Kamigaito, Nara Institute of Science and Technology, Japan
* Vani Kanjirangat, Dalle Molle Institute for Artificial Intelligence (IDSIA), Switzerland 
* Sarvnaz Karimi, CSIRO, Australia
* Nazmul Kazi, University of North Florida, USA
* Siun Kim,  Seoul National University, Korea
* Gaurav Kumar, University of California, San Diego, USA
* Andre Lamurias, NOVA School of Science and Technology, Lisbon, Portugal
* Majid Latifi, Department of Computer Science, University of York, York, UK
* Alberto Lavelli, FBK-ICT, Italy
* Robert Leaman, US National Library of Medicine 
* Lung-Hao Lee, National Central University, Taiwan
* Ulf Leser, Humboldt-Universität zu Berlin, Germany 
* Yuan Liang, Queen Mary University of London, UK
* Siting Liang, German Research Center for Artificial Intelligence, Germany
* Livia Lilli, Fondazione Policlinico Universitario Agostino Gemelli, Italy
* Abdine Maiga, University College London, UK
* Makoto Miwa, Toyota Technological Institute, Japan
* Claire Nedellec, National Research Institute for Agriculture, Food and Environment (INRAE), Paris-Saclay University, France
* Guenter Neumann, DFKI, Germany
* Aurélie Névéol, LISN - CNRS, France
* Mariana Neves, Hasso-Plattner-Institute at the University of Potsdam, Germany
* Andrei Niculae, Carol Davila University of Medicine and Pharmacy, Romania
* Brian Ondov, Yale University, USA
* Noon Pokaratsiri Goldstein, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)
* François Remy, Ghent University, Belgium
* Francisco J. Ribadas-Pena, University of Vigo, Spain
* Fabio Rinaldi, Dalle Molle Institute for Artificial Intelligence Research (IDSIA), Switzerland 
* Roland Roller, DFKI, Germany
* Mourad Sarrouti,  CLARA Analytics, USA
* Efstathia Soufleri, Archimedes - Athena Research Center, Greece
* Peng Su, University of Delaware, USA
* Madhumita Sushil, University of California, San Francisco, USA
* Mario Sänger, Humboldt Universität zu Berlin, Germany
* Karin Verspoor, RMIT University, Australia
* Davy Weissenbacher, Cedars-Sinai, Los Angeles, California, USA
* Nathan M. White, James Cook University, Australia
* Dongfang Xu, Cedars-Sinai, USA
* Shweta Yadav, University of Illinois Chicago, USA
* Ken Yano, National Institute of Advanced Industrial Science and Technology, Japan
* Hyunwoo Yoo, Drexel University, USA
* Kai Zhang, Worcester Polytechnic Institute, MA, USA
* Xinyue Zhang, King's College London, UK
* Xiao Yu Cindy Zhang, University of British Columbia, Canada
* Jingqing Zhang,  Imperial College London, UK
* Angelo Ziletti, Bayer, Germany
* Ayah Zirikly, Johns Hopkins, USA
* Pierre Zweigenbaum, LIMSI - CNRS, France

Secondary Reviewers

* Joseph Akinyemi, University of York, UK
* Robert Bossy, National Research Institute for Agriculture, Food and Environment (INRAE), France
* Marco Naguib, Interdisciplinary Laboratory on Numerical Sciences (LISN), France

We are pleased to announce that the Chen Institute is co-organizing the BioNLP 2025 Workshop. Founded in 2016 by Tianqiao Chen and Chrissy Luo, the Chen Institute is driven by a bold vision to improve the human experience by understanding how our brains perceive, learn, and interact with the world. Their global platform includes the Tianqiao and Chrissy Chen Institute for Neuroscience at Caltech, the Tianqiao Chen Institute for Translational Research in Shanghai, the Chen Frontier Lab for Applied Neurotechnology, and the Chen Frontier Lab for AI and Mental Health. The Chen Scholars program supports early- to mid-career scientists, and the recently launched Chen Institute and Science Prize for AI Accelerated Research highlights their deep commitment to innovation.

At this year’s BioNLP Workshop, the Chen Institute is interested in exploring how artificial intelligence can accelerate the pace of scientific discovery. We believe there are vast, untapped opportunities to make groundbreaking advances by leveraging the power of AI. The hope is that this meeting will serve as the beginning of an ongoing dialogue—focused on new developments, transformative successes, and emerging thinking at the intersection of AI and science. Through this collaboration, the Chen Institute aims to identify and support promising approaches with the potential to meaningfully change the world.


Workshop Program

Friday, August 1, 2025

  • 08:40 - 08:50 Opening remarks
  • 08:50 - 10:30 Session 1: Foundational tasks
  • 08:50 - 09:10 Accelerating Cross-Encoders in Biomedical Entity Linking, Javier Sanz-Cruzado and Jake Lever, University of Glasgow
  • 09:10 - 09:30 Beyond Citations: Integrating Finding-Based Relations for Improved Biomedical Article Representations, Yuan Liang, Massimo Poesio, Roonak Rezvani, Queen Mary University of London, University of Utrecht, Recursion
  • 09:30 - 09:50 MedSummRAG: Domain-Specific Retrieval for Medical Summarization, Guanting Luo and Yuki Arase, The University of Osaka, Institute of Science Tokyo
  • 09:50 - 10:10 Advancing Biomedical Claim Verification by Using Large Language Models with Better Structured Prompting Strategies, Siting Liang and Daniel Sonntag, German Research Center for Artificial Intelligence
  • 10:10 - 10:30 Questioning Our Questions: How Well Do Medical QA Benchmarks Evaluate Clinical Capabilities of Language Models? Siun Kim and Hyung-Jin Yoon, Seoul Natoinal University Hospital, Biomedical Engineering, Seoul National University College of Medicine
  • 10:30 - 11:00 Coffee Break
  • 11:00 - 12:30 Session 2: Clinical NLP
  • 11:00 - 11:20 A Retrieval-Based Approach to Medical Procedure Matching in Romanian, Andrei Niculae, Adrian Cosma, Emilian Radoi, National University of Science and Technology Politehnica Bucharest
  • 11:20 - 11:40 Error Detection in Medical Note through Multi Agent Debate, Abdine L Maiga, Anoop Shah, Emine Yilmaz, University College London, Amazon
  • 11:40 - 12:00 Converting Annotated Clinical Cases into Structured Case Report Forms, Pietro Ferrazzi, Alberto Lavelli, Bernardo Magnini, University of Padova, FBK
  • 12:00 - 12:30 Invited Talk -- Wojciech Kusa: Incorporating Changes in Review Outcomes in the Evaluation of Systematic Review Automation
  • 12:30 - 14:00 Lunch
  • 14:00 - 15:30 Session 3: Shared Tasks
  • 14:00 - 14:15 Overview of the BioLaySumm 2025 Shared Task on Lay Summarization of Biomedical Research Articles and Radiology Reports, Chenghao Xiao, Kun Zhao, Xiao Wang, Siwei Wu, Sixing Yan, Tomas Goldsack, Sophia Ananiadou, Noura Al Moubayed, Liang Zhan, William K. Cheung, Chenghua Lin, Durham University, University of Pittsburgh, University of Manchester, Hong Kong Baptist University, University of Sheffield
  • 14:15 - 14:20 Poster boaster: AEHRC at BioLaySumm 2025: Leveraging T5 for Lay Summarisation of Radiology Reports. Wenjun Zhang, Shekhar S. Chandra, Bevan Koopman, Jason Dowling and Aaron Nicolson
  • 14:20 - 14:25 Poster boaster: Team SXZ at BioLaySumm2025: Combining Section‐Wise Summarization, K‐Shot LLM Prompting, BioBART, and RL Fine‐Tuning for Biomedical Lay Summaries. Pengcheng Xu, Sicheng Shen, Jieli Zhou and Hongyi Xin
  • 14:25 - 14:40 SMAFIRA Shared Task at the BioNLP'2025 Workshop: Assessing the Similarity of the Research Goal, Mariana Neves, Iva Sovadinova, Susanne Fieberg, Celine Heinl, diana Rubel, Gilbert Schönfelder, Bettina Bert, German Federal Institute for Risk Assessment, Masaryk University
  • 14:40 - 14:55 Overview of the ClinIQLink 2025 Shared Task on Medical Question-Answering, Brandon C Colelough, Davis Bartels, Dina Demner-Fushman, National Library of Medicine
  • 14:55 - 15:00 Poster boaster: VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA. Pakawat Phasook, Rapepong Pitijaroonpong, Jiramet Kinchagawat, Amrest Chinkamol, Tossaporn Saengja, Kiartnarin Udomlapsakul, Jitkapat Sawatphol and Piyalitt Ittichaiwong
  • 15:00 - 15:15 Overview of the ArchEHR-QA 2025 Shared Task on Grounded Question Answering from Electronic Health Records, Sarvesh Soni, SOUMYA GAYEN, Dina Demner-Fushman, National Library of Medicine
  • 15:15 - 15:20 Poster boaster: ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality. Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane García Domingo-Aldama, Aitziber Atutxa Salazar, Josu Goikoetxea and Ander Barrena
  • 15:20 - 15:25 Poster boaster: Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering. Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanath Reddy Gajjala, Zhuoyan Xu, Siddhant Rai and Vaishnav Potlapalli
  • 15:30 - 16:00 Coffee Break
  • 16:00 - 18:00 Poster Sessions (online, onsite, workshop and shared tasks. Note: Shared Task papers listed in Volume 2)
 	* Improving Barrett's Oesophagus Surveillance Scheduling with Large Language Models: A Structured Extraction Approach, Xinyue Zhang, Agathe Zecevic, Sebastian Zeki, Angus Roberts, King's College London, Guy's and St Thomas' NHS Foundation Trust
 	* Effective Multi-Task Learning for Biomedical Named Entity Recognition, João Ruano, Gonçalo M Correia, Leonor Maria Machado Barreiros, Afonso Mendes, Priberam
 	* PetEVAL: A veterinary free text electronic health records benchmark, Sean Farrell, Alan Radford, Noura Al Moubayed, Peter-John Mäntylä Noble, Durham University, University of Liverpool
 	* Can Large Language Models Classify and Generate Antimicrobial Resistance Genes? Hyunwoo Yoo, Haebin Shin, Gail Rosen, Drexel University, KAIST AI
 	* Overcoming Data Scarcity in Named Entity Recognition: Synthetic Data Generation with Large Language Models. An Dao, Hiroki Teranishi, Yuji Matsumoto, Florian Boudin, Akiko Aizawa, The University of Tokyo, RIKEN Center for Advanced Intelligence Project, Nantes University, National Institute of Informatics
 	* Fine-tuning LLMs to Extract Epilepsy Seizure Frequency Data from Health Records, Ben Holgate, Joe Davies, Shichao Fang, Joel S. Winston, James T. Teo, Mark P. Richardson, King's College London	
 	* Transformer-Based Medical Statement Classification in Doctor-Patient Dialogues, Farnod Bahrololloomi, Johannes Luderschmidt, Biying Fu, RheinMain University of Applied Sciences	
 	* PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies, Simona Emilova Doneva, Hanna Hubarava, Pia Andrea Härvelid, Wolfgang Emanuel Zürrer, Julia V Bugajska, Bernard Friedrich Hild, David Brüschweiler, Gerold Schneider, Tilia Ellendorff, Benjamin Victor Ineichen, University of Zurich		
 	* QoLAS: A Reddit Corpus of Health-Related Quality of Life Aspects of Mental Disorders, Lynn Greschner, Amelie Wührl, Roman Klinger, University of Bamberg, University of Stuttgart	
 	* Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts, Elizabeth Schaefer and Kirk Roberts, Yale University, University of Texas Health Science Center at Houston	
 	* LLMs as Medical Safety Judges: Evaluating Alignment with Human Annotation in Patient-Facing QA, Yella Leonie Diekmann, Chase M Fensore, Rodrigo M Carrillo-Larco, Eduard R Castejon Rosales, Sakshi Shiromani, Rima Pai, Megha Shah, Joyce C Ho, Emory University	
 	* AdaBioBERT: Adaptive Token Sequence Learning for Biomedical Named Entity Recognition, Sumit Kumar and Tanmay Basu, Indian Institute of Science Education and Research Bhopal
 	* Enhancing Stress Detection on Social Media Through Multi-Modal Fusion of Text and Synthesized Visuals, Efstathia Soufleri and Sophia Ananiadou, Athena RC, University of Manchester
 	* MuCoS: Efficient Drug–Target Discovery via Multi-Context-Aware Sampling in Knowledge Graphs, Haji Gul, Abdul Ghani Naim, Ajaz Ahmad Bhat, UBD
 	* Enhancing Antimicrobial Drug Resistance Classification by Integrating Sequence-Based and Text-Based Representations, Hyunwoo Yoo, Bahrad Sokhansanj, James R Brown, Drexel University	
 	* Effect of Multilingual and Domain-adapted Continual Pre-training on Few-shot Promptability, Ken Yano and Makoto Miwa, The National Institute of Advanced Industrial Science and Technology, Toyota Technological Institute
 	* Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain, Shintaro Ozaki, Yuta Kato, Siyuan Feng, Masayo Tomita, Kazuki Hayashi, Wataru Hashimoto, Ryoma Obara, Masafumi Oyamada, Katsuhiko Hayashi, Hidetaka Kamigaito, Taro Watanabe, Nara Institute of Science and Technology, The University of Tokyo, NEC
 	* Prompting Large Language Models for Italian Clinical Reports: A Benchmark Study, Livia Lilli, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri, Stefano Patarnello, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Catholic University of the Sacred Heart, Rome, Italy	
 	* CaseReportCollective: A Large-Scale LLM-Extracted Dataset for Structured Medical Case Reports, Xiao Yu Cindy Zhang, Melissa Fong, Wyeth Wasserman, Jian Zhu, University of British Columbia	
 	* RadQA-DPO: A Radiology Question Answering System with Encoder-Decoder Models Enhanced by Direct Preference Optimization, Md Sultan Al Nahian and Ramakanth Kavuluru, University of Kentucky	
 	* Benchmarking zero-shot biomedical relation triplet extraction across language model architectures, Frederik Steensgaard Gade, Ole Lund, Marie Lisandra Zepeda Mendoza, Technical University of Denmark, Novo Nordisk Research Centre Oxford		
 	* Virtual CRISPR: Can LLMs Predict CRISPR Screen Results? Steven Song, Abdalla Abdrabou, Asmita Dabholkar, Kastan Day, Pavan Dharmoju, Jason Perera, Volodymyr Kindratenko, Aly A Khan, University of Chicago, Chan Zuckerberg Biohub Chicago, University of Illinois Urbana-Champaign, Northwestern University


BioNLP-ST 2025 Posters

 	ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality. Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De la Iglesia, Ane García Domingo-Aldama, Aitziber Atutxa Salazar, Josu Goikoetxea, Ander Barrena		
 	UNIBUC-SD at ArchEHR-QA 2025: Prompting Our Way to Clinical QA with Multi-Model Ensembling. Dragos Dumitru Ghinea and Ștefania Rîncu	
 	Loyola at ArchEHR-QA 2025: Exploring Unsupervised Attribution of Generated Text: Attention and Clustering-Based Methods. Rohan Sethi, Timothy Miller, Majid Afshar, Dmitriy Dligach	
 	CUNI-a at ArchEHR-QA 2025: Do we need Giant LLMs for Clinical QA? Vojtech Lanz and Pavel Pecina	
 	WisPerMed at ArchEHR-QA 2025: A Modular, Relevance-First Approach for Grounded Question Answering on Eletronic Health Records. Jan-Henning Büns, Hendrik Damm, Tabea Margareta Grace Pakull, Felix Nensa, Elisabeth Livingstone
 	heiDS at ArchEHR-QA 2025: From Fixed-k to Query-dependent-k for Retrieval Augmented Generation. Ashish Chouhan and Michael Gertz
 	UniBuc-SB at ArchEHR-QA 2025: A Resource-Constrained Pipeline for Relevance Classification and Grounded Answer Synthesis. Sebastian Balmus, Dura Alexandru Bogdan, Ana Sabina Uban
 	KR Labs at ArchEHR-QA 2025: A Verbatim Approach for Evidence-Based Question Answering. Adam Kovacs, Paul Schmitt, Gabor Recski
 	LAILab at ArchEHR-QA 2025: Test-time scaling for evidence selection in grounded question answering from electronic health records. Tuan Dung Le, Thanh Duong, Shohreh Haddadan, Behzad Jazayeri, Brandon Manley, Thanh Thieu
 	UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting. Sara Shields-Menard, Zach Reimers, Joshua Gardner, David Perry, Anthony Rios
 	UTSamuel at ArchEHR-QA 2025: A Clinical Question Answering System for Responding to Patient Portal Messages Using Generative AI. Samuel M Reason, Liwei Wang, Hongfang Liu, Ming Huang
 	LAMAR at ArchEHR-QA 2025: Clinically Aligned LLM-Generated Few-Shot Learning for EHR-Grounded Patient Question Answering. Seksan Yoadsanit, Nopporn Lekuthai, Watcharitpol Sermsrisuwan, Titipat Achakulvisut
 	Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering. Sai Prasanna Teja Reddy Bogireddy, Abrar Majeedi, Viswanath Reddy Gajjala, Zhuoyan Xu, Siddhant Rai, Vaishnav Potlapalli
 	UIC at ArchEHR-QA 2025: Tri-Step Pipeline for Reliable Grounded Medical Question Answering. Mohammad Arvan, Anuj Gautam, Mohan Zalake, Karl M. Kochendorfer
 	DMIS Lab at ArchEHR-QA 2025: Evidence-Grounded Answer Generation for EHR-based QA via a Multi-Agent Framework. Hyeon Hwang, Hyeongsoon Hwang, JongMyung Jung, Jaehoon Yun, Minju Song, Yein Park, Dain Kim, Taewhoo Lee, Jiwoong Sohn, Chanwoong Yoon, Sihyeon Park, Jiwoo Lee, Heechul Yang, Jaewoo Kang
 	CogStack-KCL-UCL at ArchEHR-QA 2025: Investigating Hybrid LLM Approaches for Grounded Clinical Question Answering. Shubham Agarwal, Thomas Searle, Kawsar Noor, Richard Dobson
 	SzegedAI at ArchEHR-QA 2025: Combining LLMs with traditional methods for grounded question answering. Soma Bálint Nagy, Bálint Nyerges, Zsombor Mátyás Kispéter, Gábor Tóth, András Tamás Szlúka, Gábor Kőrösi, Zsolt Szántó, Richárd Farkas
 	LIMICS at ArchEHR-QA 2025: Prompting LLMs Beats Fine-Tuned Embeddings. Adam REMAKI, Armand Violle, Vikram Natraj, Étienne Guével, Akram Redjdal
 	razreshili at ArchEHR-QA 2025: Contrastive Fine-Tuning for Retrieval-Augmented Biomedical QA. Arina Zemchyk
 	DKITNLP at ArchEHR-QA 2025: A Retrieval Augmented LLM Pipeline for Evidence-Based Patient Question Answering. Provia Kadusabe, Abhishek Kaushik, Fiona Lawless
 	AEHRC at BioLaySumm 2025: Leveraging T5 for Lay Summarisation of Radiology Reports. Wenjun Zhang, Shekhar S Chandra, Bevan Koopman, Jason Dowling, Aaron Nicolson
 	MetninOzU at BioLaySumm2025: Text Summarization with Reverse Data Augmentation and Injecting Salient Sentences. Egecan Evgin, Ilknur Karadeniz, Olcay Taner Yıldız
 	Shared Task at Biolaysumm2025 : Extract then summarize approach Augmented with UMLS based Definition Retrieval for Lay Summary generation.. Aaradhya Gupta and Parameswari Krishnamurthy
 	RainCityNLP at BioLaySumm2025: Extract then Summarize at Home. Jen Wilson, Michael Pollack, Rachel Edwards, Avery Bellamy, Helen Salgi
 	TLPIQ at BioLaySumm: Hide and Seq, a FLAN-T5 Model for Biomedical Summarization. Melody Bechler, Carly Crowther, Emily Luedke, Natasha Schimka, Ibrahim Sharaf
 	LaySummX at BioLaySumm: Retrieval-Augmented Fine-Tuning for Biomedical Lay Summarization Using Abstracts and Retrieved Full-Text Context. Fan Lin and Dezhi Yu
 	5cNLP at BioLaySumm2025: Prompts, Retrieval, and Multimodal Fusion. Juan Antonio Lossio-Ventura, Callum Chan, Arshitha Basavaraj, Hugo Alatrista-Salas, Francisco Pereira, Diana Inkpen
 	MIRAGES at BioLaySumm2025: The Impact of Search Terms and Data Curation for Biomedical Lay Summarization. Benjamin Pong, Ju-hui Chen, Jonathan Jiang, Abimael Hernandez Jimenez, Melody Vahadi
 	SUWMIT at BioLaySumm2025: Instruction-based Summarization with Contrastive Decoding. Priyam Basu, Jose Cols, Daniel Jarvis, Yongsin Park, Daniel Rodabaugh
 	BDA-UC3M @ BioLaySumm: Efficient Lay Summarization with Small-Scale SoTA LLMs. Ilyass Ramzi and Isabel Segura Bedmar
 	KHU_LDI at BioLaySumm2025: Fine-tuning and Refinement for Lay Radiology Report Generation. Nur Alya Dania binti Moriazi and Mujeen Sung
 	CUTN_Bio at BioLaySumm: Multi-Task Prompt Tuning with External Knowledge and Readability adaptation for Layman Summarization. Bhuvaneswari Sivagnanam, Rivo Krishnu C H, Princi Chauhan, Saranya Rajiakodi
 	Team SXZ at BioLaySumm2025: Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries. Pengcheng Xu, Sicheng Shen, Jieli Zhou, Hongyi Xin
 	VeReaFine: Iterative Verification Reasoning Refinement RAG for Hallucination-Resistant on Open-Ended Clinical QA. Pakawat Phasook, Rapepong Pitijaroonpong, Jiramet Kinchagawat, Amrest Chinkamol, Tossaporn Saengja, Kiartnarin Udomlapsakul, Jitkapat Sawatphol, Piyalitt Ittichaiwong

SUBMISSION INSTRUCTIONS

Two types of submissions are invited: full (long) papers (8 pages) and short papers (4 pages).

  Submission site for the workshop https://softconf.com/acl2025/BioNLP2025  
  Submission site for Shared Tasks https://softconf.com/acl2025/BioNLP2025-ST  

Please follow these formatting guidelines: https://github.com/acl-org/acl-style-files Please note that the review process is double-blind.

Final versions of accepted papers will be given one additional page of content (up to 9 pages for long papers, up to 5 pages for short papers) to address reviewers’ comments.

Submissions from ACL rolling review

We will consider ACL rolling review submissions with all reviews and scores. If you are interested in submitting your work for consideration, please contact ddemner at gmail.

WORKSHOP OVERVIEW AND SCOPE

The BioNLP workshop, associated with the ACL SIGBIOMED special interest group, is an established primary venue for presenting research in language processing and language understanding for the biological and medical domains. The workshop has been running every year since 2002 and continues getting stronger. Many other emerging biomedical and clinical language processing workshops can afford to be more specialized because BioNLP truly encompasses the breadth of the domain and brings together researchers in biomedical and clinical NLP from all over the world.

BioNLP 2025 will be particularly interested in evaluation frameworks and metrics that reflect the needs of health-related use cases and provide a good estimate of reliability of the proposed solutions. BioNLP 2025 continues to focus on transparency of tgenerative approaches and factuality of the generated text. Language processing that supports DEIA (Diversity, Equity, Inclusion and Accessibility) continues to be of utmost importance. The work on detection and mitigation of bias and misinformation continues to be of interest. Research in languages other than English, particularly, under-represented languages, and health disparities are always of interest to BioNLP. Other active areas of research include, but are not limited to:

  • Extraction of complex relations and events;
  • Discourse analysis; Anaphora \& coreference resolution;
  • Text mining \& Literature based discovery;
  • Question Answering; Summarization; Text simplification;
  • Resources and strategies for system testing and evaluation;
  • Synthetic data generation \& data augmentation;
  • Translating NLP research into practice: tangible explainable results of biomedical language processing applications.

SHARED TASKS

SMAFIRA

The SMAFIRA project supports finding alternative methods to animal experiments. The organizers have released SMAFIRA Web tool that allows researchers to perform searches for methods alternative to animal experiments. The input to the tool is a PubMed identifier (PMID) of a publication that represents the animal experiment for which one wants to find an alternative method. The tool retrieves up to 200 similar articles available in PubMed, and presents these as a list of results. The task is to validate and annotate the top 10 similar articles, either automatically, with any system of the participants choice, or manually using the SMAFIRA tool. See details at https://smafira-bf3r.github.io/smafira-st/

ClinIQLink 2025 - LLM Lie Detector Test

The LLM Lie Detector Test aims to evaluate the effectiveness of generative models in producing factually accurate information, with a benchmark dataset specifically curated to align with the knowledge level of a General Practitioner (GP) Medical Doctor. Participants will submit model ouptputs to be assessed using a structured set of atomic question-answer pairs (factoid, true/false and list questions), which focus on retrieving precise, factually correct information. The test will evaluate internal model knowledge retrieval. See details at https://brandonio-c.github.io/ClinIQLink-2025/

ArchEHR-QA 2025: Grounded Electronic Health Record Question Answering

The participants will automatically generate answers to patients’ health-related questions that are grounded in the evidence from patients’ clinical notes. The dataset will consist of hand-curated realistic patient questions (submitted through a patient portal) and their corresponding clinician-rewritten versions (crafted to assist in formulating their responses). The task is to construct coherent answers or responses to input questions that must use and be grounded in the provided clinical note excerpts. See details at https://archehr-qa.github.io/

BioLaySumm 2025

This is the 3nd iteration of BioLaySumm, following the success of the 2nd edition of the task at BioNLP 2024 which attracted 200 plus submissions across 53 different teams and the 1st edition of the task at BioNLP 2023 which attracted 56 submissions across 20 different teams. This edition builds on last year’s task by introducing a new task: radiology report generation with layman’s terms, extending the shared task to a new domain and multi-modality. See detail at https://biolaysumm.org/