循证推理
Search documents
破解罕见病诊断难题,全球首个医学循证推理智能体上线
Xin Lang Cai Jing· 2026-02-20 05:14
转自:中国科学报 上海交通大学人工智能学院教授张娅、副教授谢伟迪与医学院附属新华医院教授孙锟、教授余永国领衔 多方科研力量,针对罕见病"确诊难、漏诊率高"的全球性难题,提出了全球首个智能体式罕见病循证推 理诊断系统DeepRare。2月19日,相关研究成果发表于《自然》。 同期,《自然》邀请西澳大利亚大学副教授蒂莫·拉斯曼(Timo Lassmann)撰写专文评论(News & views)。文中指出,DeepRare 最重要的两点意义包括:打破人工智能(AI)临床诊断的"黑盒",以完 整可信的推理过程赢得了医学界的信任;构建了"实时知识检索+自我反思迭代"的AI通用技术,为需要 复杂逻辑推理的领域提供了共通的"解题思路"。 转化医学领域的领军人物、美国斯克里普斯研究转化研究所所长埃里克·托波尔(Eric Topol)也第一时 间推荐,将DeepRare视作Agentic AI在医学领域的优秀范例。 从"黑盒"到"白盒" 据了解,全球已有超过3.5亿人受到罕见病影响,病种超过7000种,约80%为遗传性疾病。大多数患者 在确诊前需经历5年以上的延迟、7次以上就诊、3次以上误诊,平均误诊率高达40%-50%,给 ...
AI医生终于有了硬标尺!全球首个专病循证评测框架GAPS发布,蚂蚁联合北大王俊院士团队出品
量子位· 2025-12-29 06:37
Core Viewpoint - The article discusses the launch of the GAPS (Grounding, Adequacy, Perturbation, Safety) evaluation framework for assessing the clinical capabilities of AI models in the medical field, specifically focusing on lung cancer [1][2][10]. Group 1: GAPS Framework Overview - GAPS is the world's first evaluation framework for AI clinical capabilities, developed in collaboration with a team of thoracic surgeons and led by Professor Wang Jun from Peking University People's Hospital [1][4]. - The framework addresses the limitations of existing medical AI assessments, which often rely on exam-style questions and lack comprehensive evaluation of clinical depth, integrity, robustness, and safety [2][7][10]. - GAPS includes a fully automated evaluation toolchain that generates questions, scoring criteria, and multi-dimensional scoring, focusing on 92 questions covering 1691 clinical points in lung cancer [2][18]. Group 2: Evaluation Dimensions - GAPS breaks down clinical competence into four orthogonal dimensions: 1. Grounding (G): Depth of understanding beyond mere facts, requiring reasoning and decision-making [11]. 2. Adequacy (A): Completeness of responses, with a three-tier evaluation system for essential, conditional, and additional recommendations [12][31]. 3. Perturbation (P): Robustness against real-world uncertainties, tested through various perturbation scenarios [13][34]. 4. Safety (S): Establishing a risk framework to ensure that medical AI does not produce harmful recommendations, with a strict penalty for catastrophic errors [16][36]. Group 3: Technological Innovations - GAPS features an end-to-end automated evaluation pipeline that generates high-quality assessment sets based on clinical guidelines, allowing for rapid expansion into other medical specialties [17][19]. - The framework utilizes advanced techniques such as evidence-based knowledge graphs and virtual patient generation to ensure that each question is grounded in reliable clinical evidence [20][23]. Group 4: Performance Insights - Initial evaluations of leading AI models using GAPS revealed significant performance gaps, particularly in handling uncertainty and providing comprehensive clinical recommendations [29][31]. - The results indicated that while models excelled in factual recall, they struggled with complex decision-making and reasoning under uncertainty, highlighting the need for further development in AI clinical capabilities [29][30]. Group 5: Future Implications - The introduction of GAPS marks a paradigm shift in medical AI evaluation from mere exam scores to assessing clinical competence, emphasizing the importance of evidence-grounded reasoning and uncertainty management in future AI developments [39][40].