混合思维 - filings, earnings calls, financial reports, news

混合思维

Search documents

自动驾驶之心· 2025-08-01 07:05

点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近15个方向学习路线今天自动驾驶之心为大家分享上海期智研究院、理想、同济和清华团队最新的工作— DriveAgent-R1 ！自动驾驶Agent时代来临，以混合思维和主动感知推动基于VLM的自动驾驶发展。如果您有相关工作需要分享，请在文末联系我们！自动驾驶课程学习与技术交流群加入，也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球论文作者 | Weicheng Zheng等编辑 | 自动驾驶之心写在前面 & 笔者的个人理解 DriveAgent-R1 是一款为解决长时程、高层级行为决策挑战而设计的先进自动驾驶智能体。当前VLM在自动驾驶领域的潜力，因其短视的决策模式和被动的感知方式而受到限制，尤其在复杂环境中可靠性不足。为应对这些挑战，DriveAgent-R1 引入了两大核心创新：因此，我们的核心任务是：赋能智能体进行长时程、高层级的行为决策，同时，当面临不确定性时，能像人类驾驶员一样主动地从环境中寻求关键信息。上图生动展示了DriveAgent-R1在 ...

自动驾驶Agent来了！DriveAgent-R1：智能思维和主动感知Agent（上海期智&理想）

自动驾驶之心· 2025-07-29 23:32

Core Viewpoint - DriveAgent-R1 represents a significant advancement in autonomous driving technology, addressing long-term, high-level decision-making challenges through a hybrid thinking framework and active perception mechanism [2][31]. Group 1: Innovations and Challenges - DriveAgent-R1 introduces two core innovations: a novel three-stage progressive reinforcement learning strategy and the MP-GRPO (Mode Grouped Reinforcement Policy Optimization) to enhance the agent's dual-mode specificity capabilities [3][12]. - The current potential of Visual Language Models (VLM) in autonomous driving is limited by short-sighted decision-making and passive perception, particularly in complex environments [2][4]. Group 2: Hybrid Thinking and Active Perception - The hybrid thinking framework allows the agent to adaptively switch between efficient text-based reasoning and in-depth tool-assisted reasoning based on scene complexity [5][12]. - The active perception mechanism equips the agent with a powerful visual toolbox to actively explore the environment, improving decision-making transparency and reliability [5][12]. Group 3: Training Strategy and Performance - A complete three-stage progressive training strategy is designed, focusing on dual-mode supervised fine-tuning, forced comparative mode reinforcement learning, and adaptive mode selection reinforcement learning [24][29]. - DriveAgent-R1 achieves state-of-the-art (SOTA) performance on challenging datasets, surpassing leading multimodal models like Claude Sonnet 4 and Gemini 2.5 Flash [12][26]. Group 4: Experimental Results - Experimental results show that DriveAgent-R1 significantly outperforms baseline models, with first frame accuracy increasing by 14.2% and sequence average accuracy by 15.9% when using visual tools [26][27]. - The introduction of visual tools enhances the decision-making capabilities of state-of-the-art VLMs, demonstrating the potential of actively acquiring visual information in driving intelligence [27]. Group 5: Active Perception and Visual Dependency - Active perception is crucial for deep visual reliance, as evidenced by the drastic performance drop of DriveAgent-R1 when visual inputs are removed, confirming its decisions are genuinely driven by visual data [30][31]. - The training strategy effectively transforms potential distractions from tools into performance amplifiers, showcasing the importance of structured training in utilizing visual tools [27][29].