Workflow
大模型自我进化
icon
Search documents
全行业都在忙着“吃虾”,MiniMax M2.7已经让虾自己拿起筷子了
量子位· 2026-03-18 11:32
最狠的是它现在能自己搭建Agent Harness,把思考和干活彻底揉在了一起,直接 开启了自我进化之路 。 深度适配了OpenClaw长期记忆框架之后,无论是带入真实感情陪你沉浸式玩角色扮演,还是应付那种极其复杂的Office自动化办公需求,对 它来说全都不在话下。 克雷西 发自 凹非寺 量子位 | 公众号 QbitAI MiniMax发布M2.5仅过去一个月,再次重磅更新。 今天他们正式官宣了全新的M2.7模型,这次它干复杂任务和 Agent团队协作 的本事更强了。 它的推理和工程能力也有了质的飞跃,像生产线上那种让人头疼的故障排查,它自己就能搞定。 以前的模型最多也就是给你打打下手写几行代码,现在的M2.7已经是个成熟的SRE(网站可靠性工程)老手了—— 自动关联监控、精准揪出Bug甚至直接写脚本把漏洞修好,它都能一条龙包揽。 现在,M2.7已经在MiniMax Agent和开放平台全量上线了,大家随时可以去体验一把。 最佳Cowork Agent模型 咱们先来盘一下,M2.7都有哪些硬核的亮点。 最基础的是 指令遵循和多智能体协作 的跃升,面对海量Skills的复杂环境,M2.7调用得极其稳健。 官方 ...
6666!NuerIPS满分论文来了
量子位· 2025-11-11 11:11
Core Insights - The article discusses a groundbreaking paper that challenges the prevailing belief that reinforcement learning (RL) is essential for enhancing reasoning capabilities in large language models (LLMs), suggesting instead that model distillation may be more effective [1][5][12]. Group 1: Research Findings - The paper titled "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" received a perfect score at NeurIPS, indicating its significant impact [5][6]. - The research team from Tsinghua University and Shanghai Jiao Tong University found that RL primarily reinforces existing reasoning paths rather than discovering new ones, which contradicts the common assumption that RL can expand a model's reasoning capabilities [10][12]. - The study utilized the pass@k metric to evaluate model performance, revealing that RL models perform better at lower sampling rates but are outperformed by base models at higher sampling rates, indicating that the base model's reasoning abilities may be underestimated [14][20]. Group 2: Methodology - The research involved testing various models across three key application areas: mathematical reasoning, code generation, and visual reasoning, using authoritative benchmark datasets [17][19]. - The models compared included mainstream LLMs like Qwen2.5 and LLaMA-3.1, with RL models trained using algorithms such as PPO, GRPO, and Reinforce++ [18][19]. - The analysis focused on the differences in pass@k performance between RL and base models, as well as the trends in performance as sampling increased [21][22]. Group 3: Implications for the Industry - The findings suggest that the substantial investments and explorations surrounding RLVR may need to be reevaluated, as the actual benefits of RL in enhancing reasoning capabilities could be overestimated [4][12]. - The research highlights the potential of model distillation as a more promising approach for expanding reasoning capabilities in LLMs, which could shift industry focus and funding [10][12].