深度思考模型

Search documents
具身场景新框架!Embodied-Reasoner:攻克复杂具身交互任务
具身智能之心· 2025-06-21 12:06
Core Viewpoint - The article presents the Embodied Reasoner framework, which extends deep reasoning capabilities to embodied interactive tasks, addressing unique challenges such as multimodal interaction and diverse reasoning patterns [3][7][19]. Group 1: Research Background - Recent advancements in deep reasoning models, like OpenAI's o1, have shown exceptional capabilities in mathematical and programming tasks through large-scale reinforcement learning [7]. - However, the effectiveness of these models in embodied domains requiring continuous interaction with the environment has not been fully explored [7]. - The research aims to expand deep reasoning capabilities to embodied interactive tasks, tackling challenges such as multimodal interaction and diverse reasoning patterns [7]. Group 2: Embodied Interaction Task Design - A high-level planning and reasoning embodied task was designed, focusing on searching for hidden objects in unknown rooms rather than low-level motion control [8]. - The task environment is built on the AI2-THOR simulator, featuring 120 unique indoor scenes and 2100 objects [8]. - Four common tasks were designed: Search, Manipulate, Transport, and Composite Tasks [8]. Group 3: Data Engine and Training Strategy - A data engine was developed to synthesize diverse reasoning processes, presenting embodied reasoning trajectories in an observe-think-act format [3]. - A three-stage iterative training process was introduced, including imitation learning, rejection sampling adjustment, and reflection adjustment, enhancing the model's interaction, exploration, and reflection capabilities [3][19]. - The training corpus synthesized 9390 unique task instructions and their corresponding observe-think-act trajectories, covering 107 different indoor scenes and 2100 interactive objects [12][16]. Group 4: Experimental Results - The model demonstrated significant advantages over existing advanced models, particularly in complex long-duration tasks, showing more consistent reasoning capabilities and efficient search behavior [3][18]. - In real-world experiments, the Embodied Reasoner achieved a success rate of 56.7% across 30 tasks, outperforming OpenAI's o1 and o3-mini [17]. - The model's success rate improved by 9%, 24%, and 13% compared to GPT-o1, GPT-o3-mini, and Claude-3.7-Sonnet-thinking, respectively [18]. Group 5: Conclusion and Future Work - The research successfully extends the deep reasoning paradigm to embodied interactive tasks, demonstrating enhanced interaction and reasoning capabilities, especially in complex long-duration tasks [19]. - Future work may explore the application of the model to a wider variety of embodied tasks and improve its generalization and adaptability in real-world environments [19].
一场文心大模型的「AI马拉松」
机器之心· 2025-05-22 10:25
机器之心原创 作者:张倩 对于百度而言,既要保持长期主义的战略定力,也要在技术路径上灵活应变,这种「变与不变」的平衡或许正是其在这轮科技革命中的 制胜之道。 2025 年,模型能力的重要性依然无需多言。 从预训练的角度来看,虽然连 OpenAI 前首席科学家 Ilya Sutskever 都说,预训练数据即将用尽,但海量的图像、视频等多模态数据资源依然有待挖掘。 从后训练的角度来看,强化学习新范式正在让 Scaling Law 焕发新生, 新一代的推理模型在数学、代码、长程规划等问题上不断取得新进展。 对于 AI 公司来说,保持对基础模型研发的投入依然非常必要。现阶段来看,这仍然是攀登智能高峰的本质所在。 而在这个领域,百度一直是一个不可忽视的力量。从 2019 年发布文心大模型 1.0 至今, 文心大模型从知识和数据融合学习,到知识增强、知识点增强,从检索增 强、对话增强、逻辑推理增强,到慢思考、多模态的技术演进并非偶然,而是早期技术探索形成的「积淀」持续推动的结果。 正是这份「积淀」引领百度打造出 超越 GPT-4o 的多模态大模型文心 4.5 Turbo,以及领先 DeepSeek R1、V3 的深度思 ...
火山总裁谭待:很多Agent的能力还停留在类似自动驾驶的L1阶段
news flash· 2025-04-17 11:17
Core Viewpoint - The development direction of the industry is to create Agents with advanced reflection, planning, and autonomous decision-making capabilities, moving beyond the current basic level of many Agents [1] Group 1: Agent Development - The foundation for building Agents is deep thinking models, which must possess the ability to think, plan, and reflect [1] - Agents should support multimodal capabilities, similar to human visual and auditory functions, to better handle complex tasks [1] Group 2: Product Launch - The Doubao 1.5 deep thinking model was officially launched, showcasing strong performance in general tasks such as mathematics, programming, scientific reasoning, and creative writing [1] - A visual version of the deep thinking model was introduced, which has visual reasoning capabilities, allowing it to associate and think about what it sees like a human [1]
从DeepSeek R1的复现看深度思考模型的未来|ML-Summit 2025
AI科技大本营· 2025-03-31 06:55
备受瞩目的 2025 全球机器学习技术大会(ML Summit 2025)将于 4 月 18-19 日在上海虹桥西郊庄园丽笙大酒店召开。本次盛会由 CSDN & Boolan 联合主办,汇聚了超 50 位来自学术界和工业界顶尖专家,共同探讨智能体、联邦学习、多模态大模型等热门 AI 技术实践。 作为全球机器学习技术大会的老朋友,新浪微博首席科学家及 AI 研发部负责人张俊林将带来《从 DeepSeek R1 的复现看深度思考模型的未来》的精 彩分享。 张俊林作为「大模型技术拆解得最通透的实战派」,在 2024 年的机器学习技术大会上,他对 Gemini 多模态架构、OpenAI o1 技术的硬核拆解,让 开发者直呼"终于有人讲透技术本质"。 系统梳理技术脉络: 回顾 DeepSeek R1 开源后的各类复现研究,涵盖 SFT 阶段的轻量适配(如 S1)与 RL 阶段的创新实践。 深度解析训练范式: 重点剖析其核心的两阶段训练模式——如何通过冷启动微调结合多领域数据优化进行 SFT,以及如何运用 GRPO 强化学习 与全场景对齐实现模型"深度思考"能力的跃迁。 探讨关键技术问题: 尝试解答一系列备受关注的核心问 ...