Workflow
循证推理
icon
Search documents
破解罕见病诊断难题,全球首个医学循证推理智能体上线
Xin Lang Cai Jing· 2026-02-20 05:14
Core Insights - The article discusses the development of DeepRare, the world's first intelligent rare disease diagnostic system, which addresses the global challenges of difficult diagnosis and high misdiagnosis rates in rare diseases [1][2]. Group 1: System Features - DeepRare employs a "central-embodiment" architecture that surpasses traditional medical AI by linking vast medical literature and real clinical data, enabling deep understanding and internalization of medical knowledge [2]. - Unlike traditional AI, DeepRare exhibits "slow thinking" capabilities similar to human doctors, allowing it to ask questions, validate hypotheses, and iteratively refine diagnostic clues [3]. - The system provides a complete evidence chain for each diagnostic conclusion, enhancing trust in AI by making the reasoning process transparent [3]. Group 2: Diagnostic Performance - DeepRare achieved a first-position accuracy rate of 57.18% in phenotype diagnosis, improving by 23.79 percentage points over the previous best international model [4]. - In retrospective human-machine comparison experiments, DeepRare outperformed rare disease specialists with ten years of clinical experience in diagnostic recall rates [4]. - When gene sequencing data is included, DeepRare's comprehensive first-position diagnostic accuracy exceeds 70.6%, significantly better than the current international standard tool, Exomiser, which has a 53.2% accuracy rate [4]. Group 3: Real-World Application - DeepRare has been deployed in Xinhua Hospital and is undergoing internal testing, functioning as a digital quality control tool to ensure thorough patient evaluations [7]. - The system has launched an online rare disease diagnostic platform that has attracted over 1,000 professional users and covers more than 600 medical and research institutions globally [8]. - A "ten-thousand clinical validation plan" has been initiated to conduct real-world clinical validations of rare diseases, aiming to create a global intelligent diagnostic network [8].
AI医生终于有了硬标尺!全球首个专病循证评测框架GAPS发布,蚂蚁联合北大王俊院士团队出品
量子位· 2025-12-29 06:37
Core Viewpoint - The article discusses the launch of the GAPS (Grounding, Adequacy, Perturbation, Safety) evaluation framework for assessing the clinical capabilities of AI models in the medical field, specifically focusing on lung cancer [1][2][10]. Group 1: GAPS Framework Overview - GAPS is the world's first evaluation framework for AI clinical capabilities, developed in collaboration with a team of thoracic surgeons and led by Professor Wang Jun from Peking University People's Hospital [1][4]. - The framework addresses the limitations of existing medical AI assessments, which often rely on exam-style questions and lack comprehensive evaluation of clinical depth, integrity, robustness, and safety [2][7][10]. - GAPS includes a fully automated evaluation toolchain that generates questions, scoring criteria, and multi-dimensional scoring, focusing on 92 questions covering 1691 clinical points in lung cancer [2][18]. Group 2: Evaluation Dimensions - GAPS breaks down clinical competence into four orthogonal dimensions: 1. Grounding (G): Depth of understanding beyond mere facts, requiring reasoning and decision-making [11]. 2. Adequacy (A): Completeness of responses, with a three-tier evaluation system for essential, conditional, and additional recommendations [12][31]. 3. Perturbation (P): Robustness against real-world uncertainties, tested through various perturbation scenarios [13][34]. 4. Safety (S): Establishing a risk framework to ensure that medical AI does not produce harmful recommendations, with a strict penalty for catastrophic errors [16][36]. Group 3: Technological Innovations - GAPS features an end-to-end automated evaluation pipeline that generates high-quality assessment sets based on clinical guidelines, allowing for rapid expansion into other medical specialties [17][19]. - The framework utilizes advanced techniques such as evidence-based knowledge graphs and virtual patient generation to ensure that each question is grounded in reliable clinical evidence [20][23]. Group 4: Performance Insights - Initial evaluations of leading AI models using GAPS revealed significant performance gaps, particularly in handling uncertainty and providing comprehensive clinical recommendations [29][31]. - The results indicated that while models excelled in factual recall, they struggled with complex decision-making and reasoning under uncertainty, highlighting the need for further development in AI clinical capabilities [29][30]. Group 5: Future Implications - The introduction of GAPS marks a paradigm shift in medical AI evaluation from mere exam scores to assessing clinical competence, emphasizing the importance of evidence-grounded reasoning and uncertainty management in future AI developments [39][40].