深度思考模型
Search documents
具身场景新框架!Embodied-Reasoner:攻克复杂具身交互任务
具身智能之心· 2025-06-21 12:06
Core Viewpoint - The article presents the Embodied Reasoner framework, which extends deep reasoning capabilities to embodied interactive tasks, addressing unique challenges such as multimodal interaction and diverse reasoning patterns [3][7][19]. Group 1: Research Background - Recent advancements in deep reasoning models, like OpenAI's o1, have shown exceptional capabilities in mathematical and programming tasks through large-scale reinforcement learning [7]. - However, the effectiveness of these models in embodied domains requiring continuous interaction with the environment has not been fully explored [7]. - The research aims to expand deep reasoning capabilities to embodied interactive tasks, tackling challenges such as multimodal interaction and diverse reasoning patterns [7]. Group 2: Embodied Interaction Task Design - A high-level planning and reasoning embodied task was designed, focusing on searching for hidden objects in unknown rooms rather than low-level motion control [8]. - The task environment is built on the AI2-THOR simulator, featuring 120 unique indoor scenes and 2100 objects [8]. - Four common tasks were designed: Search, Manipulate, Transport, and Composite Tasks [8]. Group 3: Data Engine and Training Strategy - A data engine was developed to synthesize diverse reasoning processes, presenting embodied reasoning trajectories in an observe-think-act format [3]. - A three-stage iterative training process was introduced, including imitation learning, rejection sampling adjustment, and reflection adjustment, enhancing the model's interaction, exploration, and reflection capabilities [3][19]. - The training corpus synthesized 9390 unique task instructions and their corresponding observe-think-act trajectories, covering 107 different indoor scenes and 2100 interactive objects [12][16]. Group 4: Experimental Results - The model demonstrated significant advantages over existing advanced models, particularly in complex long-duration tasks, showing more consistent reasoning capabilities and efficient search behavior [3][18]. - In real-world experiments, the Embodied Reasoner achieved a success rate of 56.7% across 30 tasks, outperforming OpenAI's o1 and o3-mini [17]. - The model's success rate improved by 9%, 24%, and 13% compared to GPT-o1, GPT-o3-mini, and Claude-3.7-Sonnet-thinking, respectively [18]. Group 5: Conclusion and Future Work - The research successfully extends the deep reasoning paradigm to embodied interactive tasks, demonstrating enhanced interaction and reasoning capabilities, especially in complex long-duration tasks [19]. - Future work may explore the application of the model to a wider variety of embodied tasks and improve its generalization and adaptability in real-world environments [19].
一场文心大模型的「AI马拉松」
机器之心· 2025-05-22 10:25
Core Viewpoint - Baidu's strategy of balancing long-term commitment with flexible technological adaptation is seen as a key to success in the current technological revolution [1][41]. Group 1: Model Development and Innovation - The importance of model capabilities will remain significant through 2025 [2]. - Despite concerns about the exhaustion of pre-training data, there are still vast resources of multimodal data, such as images and videos, to be explored [3]. - Reinforcement learning is revitalizing the Scaling Law, leading to advancements in reasoning models for complex tasks like mathematics and coding [4]. - Continuous investment in foundational model research is essential for AI companies, with Baidu being a significant player in this field [5]. - Baidu's Wenxin models have evolved through various enhancements, leading to the development of Wenxin 4.5 Turbo and Wenxin X1 Turbo, showcasing Baidu's commitment to foundational research and adaptability in a rapidly changing AI environment [5][10]. Group 2: Performance and Evaluation - At the recent Baidu AI Day, the performance of Wenxin X1 Turbo was demonstrated, showcasing its ability to integrate multimodal information for problem-solving [7]. - Wenxin X1 Turbo outperformed DeepSeek R1 and V3 in authoritative benchmark tests, validating its capabilities [10]. - The China Academy of Information and Communications Technology (CAICT) rated Wenxin X1 Turbo as the first domestic model to achieve a "4+" level in a comprehensive evaluation, excelling in logical reasoning and tool support [12][14]. Group 3: Cost Efficiency and Market Position - Wenxin X1 Turbo's pricing strategy positions it at 25% of DeepSeek R1's cost, making it highly competitive and appealing to developers [17][20]. - Baidu's models are designed to be cost-effective, which is crucial for fostering a thriving ecosystem of AI applications [40]. Group 4: Technological Advancements - Baidu has been a pioneer in multimodal research since 2018, leading to significant advancements in deep semantic understanding [22]. - The company has developed various technologies to enhance multimodal modeling, resulting in improved training efficiency and understanding capabilities [25][30]. - Baidu's long-term commitment to technological investment is evident in its continuous development of multimodal capabilities [27]. Group 5: Ecosystem and Collaboration - The synergy between Wenxin and the PaddlePaddle deep learning platform is a unique aspect of Baidu's approach, enhancing model performance and efficiency [38]. - Baidu's AI ecosystem includes industry empowerment centers and data ecological centers, facilitating collaboration and data integration across various sectors [39].
火山总裁谭待:很多Agent的能力还停留在类似自动驾驶的L1阶段
news flash· 2025-04-17 11:17
Core Viewpoint - The development direction of the industry is to create Agents with advanced reflection, planning, and autonomous decision-making capabilities, moving beyond the current basic level of many Agents [1] Group 1: Agent Development - The foundation for building Agents is deep thinking models, which must possess the ability to think, plan, and reflect [1] - Agents should support multimodal capabilities, similar to human visual and auditory functions, to better handle complex tasks [1] Group 2: Product Launch - The Doubao 1.5 deep thinking model was officially launched, showcasing strong performance in general tasks such as mathematics, programming, scientific reasoning, and creative writing [1] - A visual version of the deep thinking model was introduced, which has visual reasoning capabilities, allowing it to associate and think about what it sees like a human [1]
从DeepSeek R1的复现看深度思考模型的未来|ML-Summit 2025
AI科技大本营· 2025-03-31 06:55
备受瞩目的 2025 全球机器学习技术大会(ML Summit 2025)将于 4 月 18-19 日在上海虹桥西郊庄园丽笙大酒店召开。本次盛会由 CSDN & Boolan 联合主办,汇聚了超 50 位来自学术界和工业界顶尖专家,共同探讨智能体、联邦学习、多模态大模型等热门 AI 技术实践。 作为全球机器学习技术大会的老朋友,新浪微博首席科学家及 AI 研发部负责人张俊林将带来《从 DeepSeek R1 的复现看深度思考模型的未来》的精 彩分享。 张俊林作为「大模型技术拆解得最通透的实战派」,在 2024 年的机器学习技术大会上,他对 Gemini 多模态架构、OpenAI o1 技术的硬核拆解,让 开发者直呼"终于有人讲透技术本质"。 系统梳理技术脉络: 回顾 DeepSeek R1 开源后的各类复现研究,涵盖 SFT 阶段的轻量适配(如 S1)与 RL 阶段的创新实践。 深度解析训练范式: 重点剖析其核心的两阶段训练模式——如何通过冷启动微调结合多领域数据优化进行 SFT,以及如何运用 GRPO 强化学习 与全场景对齐实现模型"深度思考"能力的跃迁。 探讨关键技术问题: 尝试解答一系列备受关注的核心问 ...