Workflow
VLA(视觉语言模型)
icon
Search documents
中兴通讯崔丽:AI应用触及产业深水区,价值闭环走向完备
随着AI大模型快速发展,从基础设施到上层应用的演进正成为新一轮科技竞争的关键。 一种行业观点认为,基座大模型的数量未来将持续收敛至个位数左右,但围绕千行百业将衍生出诸多更 为丰富的垂域模型与应用,那也将是本轮AI浪潮真正引发技术变革的关键所在。 其中,物理AI成为一种重要关注窗口,正加速推进具身智能、自动驾驶等领域演进,有望深刻改变未 来社会的运行方式。但技术路线仍存分歧,法律、合规与伦理等软性基础尚在夯实。而进入"Agent元 年",让AI技术真正触及实体经济的"深水区",仍有挑战需要克服。 中兴通讯首席发展官崔丽接受21世纪经济报道记者专访时,深入分析了物理AI的技术路线走向。据她 观察,一些具体行业已经在真正借力AI,率先完成价值闭环。 物理AI之辩 2025年初,Sora的横空出世因其高度还原物理世界的视频生成能力,引发关于"世界模型"的广泛讨论, 也让物理AI的两条核心路线——世界模型与 VLA(视觉语言模型)的竞争浮出水面。 崔丽对记者分析道,Sora等模型的爆发,标志着AI正从单纯的"预测者"向"模拟者"进化,是从"数据驱 动"到"模型仿真驱动"到"物理对齐"到"通用模拟"的范式转移,也是AI落 ...
读了 40 篇 VLA+RL之后​......
具身智能之心· 2025-11-28 00:04
Core Insights - The article discusses the shift in research trends towards incorporating Reinforcement Learning (RL) in Visual Language Models (VLA), moving beyond Supervised Fine-Tuning (SFT) to enhance model performance and adaptability [1][2]. Group 1: RL Methodologies - Various RL methodologies are categorized, including online RL, offline RL, iterative RL, and inference-time improvement, but the author emphasizes that the effectiveness of these methods is more important than their classification [1]. - The real-world applicability of RL is crucial, with safety and efficiency being key concerns during data collection and model deployment [2]. Group 2: Task Performance and Challenges - Current RL implementations show promising results in single-task performance, with examples like Pi-star-0.6 requiring around 1,000 trajectories for complex tasks such as folding clothes [3]. - A significant challenge remains in enabling RL to handle multiple tasks effectively, ensuring that tasks can positively influence each other rather than detract from overall performance [3]. Group 3: Reward Functions and Research Directions - The necessity of learning reward functions or value functions is debated, with the potential for reduced variance in optimization being a key benefit, although this need may diminish as pre-trained VLA models improve [4][5]. - Research directions are identified, focusing on issues related to sparse rewards, the scale of policy networks, and the multi-task capabilities of RL [5]. Group 4: Literature and Keywords - A list of relevant literature and keywords is provided for further exploration, indicating a rich field of study within RL and VLA [6].
楼天城:VLA帮不了L4
自动驾驶之心· 2025-11-15 16:04
Core Insights - The article discusses the advancements in autonomous driving technology, particularly focusing on the transition from Level 2 (L2) to Level 4 (L4) autonomous vehicles, emphasizing the complexity and safety challenges involved in achieving L4 autonomy [5][19][21]. Group 1: Technological Advancements - PonyWorld, a world model technology, enhances the safety of Robotaxi, making it ten times safer than human drivers [9]. - The cost of the autonomous driving kit has decreased by 70% compared to previous generations, with all components now being vehicle-grade [8][30]. - The integration of perception, prediction, and control into an end-to-end model has been achieved, which is now standard for L4 vehicles and a requirement for L2 vehicles [15][16]. Group 2: Learning Models - The article highlights two learning modes: imitation learning, which is quick but limits the learner's potential, and reinforcement learning, which allows for exploration and surpassing the teacher [12]. - L4 companies are evolving through reinforcement learning, while L2 remains within the bounds of imitation learning [12][21]. Group 3: Market and Product Development - The transition to L4 technology for personal vehicles is expected to take longer than anticipated, with significant operational and regulatory challenges still to be addressed [22]. - The Robotaxi fleet has accumulated over 500,000 hours of operation, indicating a significant step towards practical deployment [29]. - The company aims to achieve cost reduction through vehicle-grade components and eliminating the need for human drivers, marking a significant milestone in the development of autonomous vehicles [33]. Group 4: Industry Perspectives - The article discusses the limitations of Vision-Language Models (VLA) in L4 applications, suggesting that specialized models are necessary for the extreme safety requirements of autonomous driving [17]. - The author compares the current state of embodied intelligence to the state of autonomous driving in 2018, indicating a similar need for patience and long-term development [26].