Clinical Medicine - filings, earnings calls, financial reports, news

Clinical Medicine

Search documents

让科研人员不再做牛马！斯坦福大学华人团队打造首个通用生物医学AI智能体，从设计实验、数据分析到药物发现全自动搞定

生物世界· 2025-06-10 08:21AI Processing

编辑丨王多鱼排版丨水成文生物医学研究是增进人类对健康和疾病的理解、推动药物研发以及提升临床护理水平的基础。然而，在生物医学实验室中，科研人员往往被复杂的实验方案、庞大的数据库、五花八门的分析工具以及不停更新的海量文献所淹没。生物医学研究日益受到这些重复且分散的工作流程的制约，让科研人员疲于奔命，严重减缓了科学发现的速度，限制了科学创新。这凸显了科学界对根本性新方法的迫切需求——一种能够有效扩展科学专业知识、简化研究工作流程，并充分释放生物医学研究潜力的全新路径。 2025 年 6 月 2 日，斯坦福大学黄柯鑫、 Serena Zhang 、王瀚宸、屈元昊、陆荧洲等研究人员领衔的团队，联合 Genentech、Arc Institute、加州大学旧金山分校及普林斯顿大学等多个顶尖研究机构，发布了一款通用生物医学 AI 智能体 —— Biomni ，该智能体能够自主完成横跨遗传学、基因组学、微生物学、药理学和临床医学等多个生物医学分支领域的复杂研究任务。 Biomni 的诞生标志着 AI 在生物医学研究中从"工具使用者"向"自主决策者"的跃迁。通过将分散的科研资源整 ...

Artificial Intelligence

Artificial Intelligence

斯坦福临床医疗AI横评，DeepSeek把谷歌OpenAI都秒了

量子位· 2025-06-03 06:21

Core Insights - The article discusses the comprehensive evaluation of large language models (LLMs) for medical tasks, highlighting that DeepSeek R1 achieved a 66% win rate, outperforming other models in a clinical context [1][7][24]. Evaluation Framework - A comprehensive assessment framework named MedHELM was developed, consisting of 35 benchmark tests covering 22 subcategories of medical tasks [12][20]. - The classification system was validated by 29 practicing clinicians from 14 medical specialties, ensuring its relevance to real-world clinical activities [4][17]. Model Performance - DeepSeek R1 led the evaluation with a 66% win rate and a macro average score of 0.75, indicating its superior performance across the benchmark tests [7][24]. - Other notable models included o3-mini with a 64% win rate and Claude 3.7 Sonnet with a 64% win rate, while models like Gemini 1.5 Pro ranked lowest with a 24% win rate [26][27]. Benchmark Testing - The evaluation included 17 existing benchmarks and 13 newly developed tests, with 12 of the new tests based on real electronic health record data [21][20]. - The models showed varying performance across different task categories, with higher scores in clinical case generation and patient communication tasks compared to structured reasoning tasks [32]. Cost-Effectiveness Analysis - A cost analysis was conducted based on the token consumption during the evaluation, revealing that non-reasoning models like GPT-4o mini had lower costs compared to reasoning models like DeepSeek R1 [38][39]. - The analysis indicated that models like Claude 3.5 Sonnet and Claude 3.7 Sonnet provided good value for their performance at lower costs [39].

Artificial Intelligence

Clinical Medicine

Artificial Intelligence

DeepSeek R1

Llama 3.3 Instruct

Gemini 1.5 Pro

Artificial Intelligence

Clinical Medicine

Artificial Intelligence

DeepSeek R1

Llama 3.3 Instruct

Gemini 1.5 Pro