反超OpenAI，百川开源大模型医疗能力登顶世界第一

Core Viewpoint - Baichuan Intelligent has launched the Baichuan-M2 model, which has surpassed OpenAI's models in medical capabilities and cost-effectiveness, establishing itself as a leading open-source medical AI model [2][4][12]. Group 1: Model Performance and Evaluation - Baichuan-M2 achieved a score of 60.1 on the HealthBench evaluation, outperforming OpenAI's latest model, GPT-OSS-120B, which scored 57.6 [4][11]. - The model has demonstrated superior performance in various benchmarks, including AIME24 (83.4), CFBench (77.6), and HealthBench (60.7) compared to its competitors [11]. - Baichuan-M2 is the second model globally to exceed a score of 32 on the HealthBench Hard evaluation, indicating its capability to handle complex medical questions [14][17]. Group 2: Cost and Deployment - Baichuan-M2 has been optimized for lightweight deployment, allowing it to run on a single RTX4090 card, reducing costs to nearly 1/60 compared to other models like DeepSeek-R1 [7][10]. - The model's design caters to the needs of medical institutions, enabling rapid deployment using existing hardware [7]. Group 3: Innovation in Training and Data Utilization - The development of Baichuan-M2 involved innovations such as the AI patient simulator and end-to-end reinforcement learning, which significantly enhanced its medical capabilities [19][22]. - The model utilizes a large verification system to ensure the accuracy and safety of its outputs, simulating real-world medical scenarios [19][20]. Group 4: Adaptation to Local Medical Practices - Baichuan-M2 has been specifically optimized to align with Chinese medical guidelines and practices, providing a tailored solution for local healthcare needs [24][26]. - The model's recommendations are based on local patient demographics and treatment protocols, distinguishing it from Western models [26][28]. Group 5: Real-World Application and Validation - Baichuan-M2 has shown exceptional performance in real clinical cases, accurately diagnosing conditions such as hypothyroidism and bronchial obstruction [32][33]. - The model's ability to integrate patient history and symptoms into its diagnostic process has been recognized by medical experts as comparable to that of high-level specialists [30][32].