Workflow
Kimi k1.5
icon
Search documents
十大推理模型挑战2025年高考数学题:DeepSeek-R1、腾讯混元T1并列第一,马斯克的Grok 3遭遇“滑铁卢”
Mei Ri Jing Ji Xin Wen· 2025-06-10 13:53
Core Insights - The discussion around the difficulty of mathematics in the 2025 college entrance examination continues to be a hot topic, with a focus on the performance of various AI reasoning models in a standardized test based on the new curriculum mathematics I paper [1] Group 1: AI Model Performance - The evaluation tested ten AI reasoning models, including DeepSeek-R1, Tencent's Mix Yuan T1, OpenAI's o3, Google's Gemini 2.5 Pro, and xAI's Grok 3, to assess their mathematical capabilities [1] - DeepSeek-R1 and Tencent's Mix Yuan T1 achieved perfect scores of 117, demonstrating exceptional performance in algebra and function problems [4] - The scores of other models included: iFlytek's Spark X1 with 112, Gemini 2.5 Pro with 109, OpenAI's o3 with 107, Alibaba's Qwen 3 with 106, and Doubao's Deep Thinking with 104 [2][7] Group 2: Evaluation Methodology - The assessment utilized a standardized test with a total score of 150, but excluded questions requiring graphical analysis to ensure a level playing field among the models [3] - Scoring was based on high school examination standards, with a focus on final answers for open-ended questions rather than the process [3] Group 3: Notable Failures - Grok 3, developed by xAI and touted as the "strongest AI," ranked third from the bottom with a score of 91, primarily due to its inability to correctly interpret multiple-choice questions [8] - The second lowest was the Zhiyu Qingyan reasoning model, scoring 78, which often faltered at the final step of reasoning, leading to lost points [8][10] - Kimi k1.5 ranked last, suffering significant score losses on the final two challenging questions [10]
DeepSeek用的GRPO有那么特别吗?万字长文分析四篇精品论文
机器之心· 2025-05-24 03:13
Core Insights - The article discusses recent advancements in reasoning models, particularly focusing on GRPO and its improved algorithms, highlighting the rapid evolution of AI in the context of reinforcement learning and reasoning [1][2][3]. Group 1: Key Papers and Models - Kimi k1.5 is a newly released reasoning model that employs reinforcement learning techniques and emphasizes long context extension and improved strategy optimization [10][17]. - OpenReasonerZero is the first complete reproduction of reinforcement learning training on a foundational model, showcasing significant results [34][36]. - DAPO explores improvements to GRPO to better adapt to reasoning training, presenting a large-scale open-source LLM reinforcement learning system [48][54]. Group 2: GRPO and Its Characteristics - GRPO is closely related to PPO (Proximal Policy Optimization) and shares similarities with RLOO (REINFORCE Leave One Out), indicating that many leading research works do not utilize GRPO [11][12][9]. - The core understanding is that current RL algorithms are highly similar in implementation, with GRPO being popular but not fundamentally revolutionary [15][6]. - GRPO includes clever modifications specifically for reasoning training rather than traditional RLHF scenarios, focusing on generating multiple answers for reasoning tasks [13][12]. Group 3: Training Techniques and Strategies - Kimi k1.5's training involves supervised fine-tuning (SFT) and emphasizes behavior patterns such as planning, evaluation, reflection, and exploration [23][24]. - The training methods include a sequence strategy that starts with simpler tasks and gradually increases complexity, akin to human learning processes [27][28]. - The paper discusses the importance of data distribution and the quality of prompts in ensuring effective reinforcement learning [22][41]. Group 4: DAPO Improvements - DAPO introduces two distinct clipping hyperparameters to enhance the learning dynamics and efficiency of the model [54][60]. - It also emphasizes dynamic sampling by removing samples with flat rewards from the batch to improve learning speed [63]. - The use of token-level loss rather than per-response loss is proposed to better manage learning dynamics and avoid issues with long responses [64][66]. Group 5: Dr. GRPO Modifications - Dr. GRPO aims to improve learning dynamics by modifying GRPO to achieve stronger performance with shorter generated lengths [76][79]. - The modifications include normalizing advantages across all tokens in a response, which helps in managing the learning signal effectively [80][81]. - The paper highlights the importance of high-quality data engineering in absorbing the effects of these changes, emphasizing the need for a balanced distribution of problem difficulty [82][89].
中国AI最大门派
投资界· 2025-02-27 07:06
以下文章来源于数字力场 ,作者佘宗明 数字力场 . 从Kimi到DeepSeek,从清华到浙大。 作者 | 佘宗明 运营 | 李玩 来源 | 数字力场 (ID: shuzilichang) 这两天,TMT圈有两则消息备受关注: 一是马斯克打响了「对DeepSeek反击战」——他掌舵的xAI,推出了Grok 3大模型。 20万卡集群训练、超DeepSeek-R1和GPT4o登顶竞技场(lmarena.ai)榜单、马斯克称其为「地表最聪明AI」、因回答「9.11比 9.9大」而翻车……在马斯克热搜圣体加持下,Grok 3成功制造出了一堆话题,包括#全球华人决战AI之巅#。 从Grok 3发布会图片可以看到,坐镇C位的正是两位华人科学家,其中之一就是xAI联合创始人、95后吴宇怀。 ▲Grok 3发布会上,两位华人坐镇C位,左三为吴宇怀。 由此深扒可知,xAI的12名创始成员中,华人占了4席,除了吴宇怀以外,还有本科毕业于清华大学的前谷歌科学家戴子航、本科毕业 于浙江大学的前DeepMind科学家张国栋和哈佛数学天才杨格。 二是「杭州六小龙」之一群核科技启动IPO,冲击「全球空间智能第一股」。 抵抗熵增,打捞有趣。 ...