User:Qiaojin

I am a researcher at the BioNLP group under NLM/NIH. I received my MD degree from Tsinghua University in 2022. Prior to that, I was an exchange student at the University of Pittsburgh and Carnegie Mellon University. I got my BSc degree from Tsinghua in 2019. My long-term goal is to democratize biomedical knowledge by providing accurate, verifiable, and understandable information to everyone in need. Currently, I work on three topics regarding large language models for biomedicine:

Evaluating the medical capabilities of LLMs. In 2019, we released and characterized the first decoder-only biomedical language model, BioELMo. Our PubMedQA is one of the most commonly used benchmarks for evaluating LLMs in biomedicine. We also reported their hallucinations in information seeking, hidden flaws behind expert-level performance, and safety vulnerabilities under adversarial attacks.

Augmentation with retrieval and domain tools. We trained MedCPT, state-of-the-art embedding model for biomedicine, using large-scale PubMed search logs. Our MedRAG toolkit and benchmark offer practical guidelines for retrieval-augmented generation in medicine. We released a team of biomedical AI agents, including GeneGPT, GeneAgent, and AgentMD.

Novel LLM applications in biomedicine. We are one of the pioneers in using LLMs for patient-to-trial matching with TrialGPT, which won the NIH Director’s Chanllenge Awards. TrialGPT was covered by POLITICO, Nature, AUANews, and Azure Government. We wrote reviews for opportunities and challenges in biomedical LLMs and biomedical question answering.

I serve as the Area Chair for the ACL Rolling Review, the Associate Editor of the Journal of Medical Internet Research (JMIR), on the editorial board of the Journal of Biomedical Informatics (JBI), and on the editorial committee of a special issue of Journal of the American Medical Informatics Association (JAMIA).

User:Qiaojin

Navigation menu

Search