VLA模型
Search documents
比较研究系列:AI智驾2.0,迈向智能涌现
Ping An Securities· 2025-11-24 12:22
汽车 2025 年 11 月 24 日 比较研究系列 AI 智驾 2.0,迈向智能涌现 强于大市(维持) 行情走势图 证券分析师 王德安 投资咨询资格编号 S1060511010006 BQV509 WANGDEAN002@pingan.com.cn 王跟海 投资咨询资格编号 S1060523080001 BVG944 WANGGENHAI964@pingan.com.cn 平安观点: 智能驾驶演进已从 2024 年的端到端范式确立,迈入智驾到 AI 2.0 的规模化能 力兑现期。基于模型能力提升及多样化的训练数据,智驾系统可能涌现出自主 应对极端边缘场景的能力,从而推动智驾系统进一步打通商业闭环。 行 业 报 告 行 业 深 度 报 告 证 券 研 究 报 告 | 一、 特斯拉智驾的软硬件层面新迭代 4 | 二、 我国高阶智驾发展迈入第三阶段 5 | | | --- | --- | --- | | 三、 技术架构:主流智驾玩家迭代方向 8 | | | | 3.1 VLA 方向 8 | | | | 3.2 华为 ADS4.0,面向 L3 12 | | | | 3.3 地平线机器人&Momenta 强调一段式端 ...
元戎启行即将上线Robotaxi,率先落地深圳和无锡
Xin Lang Cai Jing· 2025-11-22 05:23
Core Insights - The article discusses the growth and strategic direction of Yuanrong Qixing, a Shenzhen-based autonomous driving company, as it prepares to enter the Robotaxi market with a focus on mass production and advanced driving assistance systems [1][11]. Group 1: Market Position and Growth - Yuanrong Qixing has delivered 200,000 mass-produced vehicles equipped with urban NOA (Navigation On Autopilot), achieving a nearly 40% market share in the third-party supplier market for urban NOA as of October 2025 [2][6]. - The company has seen a significant increase in monthly delivery volumes, rising from approximately 3,000 units at the beginning of the year to around 30,000 units by year-end, with September and October both reaching 30,000 units [2][6]. - The CEO highlighted that the growth is driven by a strategic focus on core customers and models, emphasizing the importance of creating blockbuster models rather than a wide range of partnerships [2][3]. Group 2: Future Projections and Market Trends - The autonomous driving market is projected to grow from $6.25 billion in 2024 to $19.28 billion in 2025, with expectations to reach $60.33 billion by 2030 [6]. - The penetration rate of L2 and above autonomous driving passenger vehicles in China is expected to rise from 55.7% in 2024 to 65% in 2025 [6]. - Yuanrong Qixing aims to achieve over one million units in mass production deliveries by 2026, supported by existing customer projects [6][12]. Group 3: Technological Advancements - The company has transitioned to a VLA (Vision-Language-Action) model, which enhances scene understanding and defensive driving capabilities, making it the first third-party supplier to offer this model [7][8]. - The VLA model allows for continuous evolution through reinforcement learning, improving performance in complex scenarios compared to traditional models [8][12]. - The company has completed the licensing exam for Robotaxi operations using mass production vehicles and end-to-end technology, marking a significant step in its strategy [13][14]. Group 4: Robotaxi Business Strategy - Yuanrong Qixing plans to launch its Robotaxi service in select cities, starting with Wuxi and Shenzhen, focusing on specific operational areas [11][14]. - The company aims to differentiate itself by using mass-produced vehicles rather than modified cars with high-precision maps, which allows for broader operational capabilities [11][14]. - The CEO believes that the data-driven approach will enable the company to develop a more mature foundation model for Robotaxi operations by 2026 [13][14].
港中文最新!无需微调即可部署VLA模型
具身智能之心· 2025-11-20 04:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zhuo Li等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 现有问题分析 VLA模型在现实世界机器人操作任务中展现出巨大潜力。然而,预训练的VLA策略在下游部署过程中仍会出现显著的性能下降。尽管微调可以缓解这一问题,但它 依赖于高昂的演示数据收集成本和密集型计算,在现实场景中并不实用。这里提出了VLA-Pilot,一种即插即用的推理时策略引导方法,无需额外微调或数据收集, 即可实现预训练VLA模型的零样本部署。 在两种不同机器人形态的六个现实世界下游操作任务中对VLA-Pilot进行了评估,涵盖分布内和分布外场景。实验结果表明,VLA-Pilot大幅提升了现成预训练VLA 策略的成功率,实现了对多样化任务和机器人形态的稳健零样本泛化。 实验视频和代码:https://rip4kobe.github.io/vla-pilot/。 背景介绍&创新点 近年来,VLA模型的进步显著提升了机 ...
解决特斯拉「监督稀疏」难题,用世界模型放大自动驾驶的Scaling Law
具身智能之心· 2025-11-20 00:03
点击下方 卡片 ,关注" 具身智能之心 "公众号 编辑丨 机器之心 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在自动驾驶领域,VLA 大模型正从学术前沿走向产业落地的 "深水区"。近日,特斯拉(Tesla)在 ICCV 的分享中,就将其面临的核心挑战之一公之于众 —— "监 督 稀 疏" 。 这一问题直指当前 VLA 模型的 "七寸": 其输入是高维、稠密的视觉信息流,但其监督信号却往往是低维、稀疏的驾驶动作(如路径点)。那么即便使用 PB 级的 海量数据,VLA 模型的巨大潜力也无法被有效释放。 正当业界热议这一瓶颈时,一支来自国内顶尖学术机构与华为合作的团队,已经悄然给出了破解这一难题的 "锦囊"。一篇名为 《DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving》 的新工作,为解决这一 "监督稀疏" 提供了极具洞见的解决方案。该研究提出, 世界模型(World Model)是 解锁 VLA 数据规模定律(D ...
Physical Intelligence团队正式发布π*0.6!VLA+强化学习训练
具身智能之心· 2025-11-19 00:34
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Physical Intelligence团队 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 11月17号!Physical Intelligence团队正式发布 ,从经验中学习的VLA。 项目链接:https://www.pi.website/blog/pistar06 论文链接:https://www.pi.website/download/pistar06.pdf VLA模型如何通过强化学习在现实部署中实现自我改进? 提出了一种通用方法RECAP:基于经验与校正的优势条件策略强化学习,该方法通过优势条件机制 实现VLA模型的强化学习训练。 该方法将异构数据整合到自我改进过程中,包括演示数据、在线收集数据以及在自主执行期间专家远程干预数据。RECAP方法首先通过离线强化学习预训练通用型 VLA模型(记为 ),该模型随后可通过机器人现场数据收集实现下游任务的专业化性能提升。 实验表明 ...
Physical Intelligence团队正式发布π*0.6
自动驾驶之心· 2025-11-19 00:03
Core Insights - The article discusses the release of the VLA model by the Physical Intelligence team, which utilizes a novel reinforcement learning method called RECAP to enhance self-improvement in real-world deployments [2][4][10]. Summary by Sections Introduction to VLA and RECAP - The VLA model is designed to learn from experience and improve its performance through a method called RECAP, which integrates heterogeneous data sources including demonstration data, online collected data, and expert intervention during autonomous execution [4][7]. Methodology - RECAP combines offline reinforcement learning for pre-training the VLA model and utilizes data collected during deployment for further training. This method aims to enhance the model's robustness and operational efficiency by integrating feedback from various sources [7][10][11]. Training Process - The training process involves three main steps: data collection, value function training, and advantage conditioned training. These steps are repeated to optimize the VLA model [11][12][13]. - Data collection involves running the VLA model on tasks and labeling results to determine reward values, with the option for human intervention to correct early errors [12]. - The value function is trained using all collected data to detect faults and estimate the time required for task completion [13][19]. - Advantage conditioned training improves the VLA strategy by incorporating optimality metrics derived from the value function [13][19]. Applications and Performance - The RECAP method has been successfully applied to complex tasks such as folding clothes, assembling boxes, and making espresso coffee. The model demonstrated significant performance improvements, achieving over two times the throughput and reducing failure rates by approximately 50% in challenging tasks [10][28][30]. - The model's robustness was validated through real-world deployments, where it successfully operated for extended periods without interruption [10][30]. Experimental Analysis - The article details various tasks evaluated during experiments, including clothing folding, coffee making, and box assembly, with specific success criteria for each task [23][24][25]. - Results showed that the RECAP method significantly enhanced both the throughput and success rates across all tasks, with the most notable improvements in diverse clothing folding and coffee making tasks [28][30][32]. Future Directions - The article identifies areas for improvement in the RECAP system, including the need for automation in reward feedback and intervention processes, as well as the exploration of more sophisticated exploration mechanisms [36]. - It also suggests that transitioning to a fully online reinforcement learning framework could enhance the efficiency of the VLA training process [36].
做自动驾驶VLA的这一年
自动驾驶之心· 2025-11-19 00:03
Core Viewpoint - The article discusses the emergence and significance of Vision-Language-Action (VLA) models in the autonomous driving industry, highlighting their potential to unify perception, reasoning, and action in a single framework, thus addressing the limitations of previous models [3][10][11]. Summary by Sections What is VLA? - VLA models are described as multimodal systems that integrate vision, language, and actions, allowing for a more comprehensive understanding and interaction with the environment [4][7]. - The concept originated from robotics and was popularized in the autonomous driving sector due to its potential to enhance interpretability and decision-making capabilities [3][9]. Why VLA Emerged? - The evolution of autonomous driving can be categorized into several phases: modular systems, end-to-end models, and Vision-Language Models (VLM), each with its own limitations [9][10]. - VLA models emerged as a solution to the shortcomings of previous approaches, providing a unified framework that enhances both understanding and action execution [10][11]. VLA Architecture Breakdown - The VLA model architecture consists of three main components: input (multimodal data), processing (integration of inputs), and output (action generation) [12][16]. - Inputs include visual data from cameras, sensor data from LiDAR and RADAR, and language inputs for navigation and interaction [13][14]. - The processing layer integrates these inputs to generate driving decisions, while the output layer produces control commands and trajectory planning [18][20]. Development History of VLA - The article outlines the historical context of VLA development, emphasizing its role in advancing autonomous driving technology by addressing the need for better interpretability and action alignment [21][22]. Key Innovations in VLA Models - Recent models like LINGO-1 and LINGO-2 focus on integrating natural language understanding with driving actions, allowing for more interactive and responsive driving systems [22][35]. - Innovations include the ability to explain driving decisions in natural language and to follow complex verbal instructions, enhancing user trust and system transparency [23][36]. Future Directions - The article raises questions about the necessity of language in future VLA models, suggesting that as technology advances, the role of language may evolve or diminish [70]. - It emphasizes the importance of continuous learning and innovation in the field to keep pace with technological advancements and user expectations [70].
Physical Intelligence团队正式发布π*0.6!VLA+强化学习训练达到实际可用的鲁棒性水平
具身智能之心· 2025-11-18 03:38
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Physical Intelligence团队 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 11月17号!Physical Intelligence团队正式发布 ,从经验中学习的VLA。 项目链接:https://www.pi.website/blog/pistar06 论文链接:https://www.pi.website/download/pistar06.pdf VLA模型如何通过强化学习在现实部署中实现自我改进? 提出了一种通用方法RECAP:基于经验与校正的优势条件策略强化学习,该方法通过优势条件机制 实现VLA模型的强化学习训练。 该方法将异构数据整合到自我改进过程中,包括演示数据、在线收集数据以及在自主执行期间专家远程干预数据。RECAP方法首先通过离线强化学习预训练通用型 VLA模型(记为 ),该模型随后可通过机器人现场数据收集实现下游任务的专业化性能提升。 实验表明 ...
不用术语看懂世界模型:从日常预测到自动驾驶
自动驾驶之心· 2025-11-14 00:04
Group 1 - The core concept of the article is the definition and function of the "world model," which predicts future scenarios based on past sensory data, similar to how humans anticipate events in daily life [2][3][30] - The world model operates by taking various forms of input, such as images, sounds, and sensor data, and outputs predictions about future states, emphasizing the importance of recognizing patterns and making forecasts [4][30] - The distinction between world models and neural networks is highlighted, where neural networks serve as tools for recognition and imitation, while world models are the core that enables prediction and understanding [5][10][30] Group 2 - The article discusses the limitations of creating a "universal" world model due to the vast differences in rules and requirements across various scenarios, leading to the necessity for specialized models [11][12][30] - Various specialized world models are introduced, including video generation, music generation, game, and industrial production models, each focusing on specific domains to achieve precise predictions [12][14][18][30] - The automatic driving world model is described as the most stringent type, as its predictions directly impact safety, requiring rapid response times and high accuracy [18][22][30] Group 3 - The VLA model is presented as an enhanced version of the automatic driving world model, incorporating language logic to improve the prediction of actions based on user commands and traffic rules [23][26][30] - The article concludes that the future of world models lies in becoming more specialized rather than universal, focusing on improving prediction accuracy and speed in specific scenarios [29][30]
ICCV涌现自动驾驶新范式:统一世界模型VLA,用训练闭环迈向L4
量子位· 2025-11-08 04:10
Core Viewpoint - The article discusses the shift in the autonomous driving industry from a data-driven approach to a training-driven approach, emphasizing the importance of world models and reinforcement learning in achieving Level 4 (L4) autonomy [2][4][6]. Group 1: Transition from Data Loop to Training Loop - The current data loop is insufficient for advancing autonomous driving technology, necessitating a shift to a training loop that allows for continuous model iteration through environmental feedback [4][11]. - Ideal's approach involves building a world model training environment in the cloud, which integrates prior knowledge and driving capabilities into the vehicle's VLA model [11][30]. - The world model encompasses environment construction, agent modeling, feedback mechanisms, and various scenario simulations, which are crucial for the training loop [13][31]. Group 2: Simulation and Evaluation Techniques - Ideal employs a combination of reconstruction and generation techniques for simulation, allowing for both stable and dynamic outputs [14][15][16]. - The Hierarchy UGP model, developed in collaboration with academic institutions, achieves state-of-the-art results in large-scale dynamic scene reconstruction [21][19]. - The focus on synthetic data generation enhances the diversity and complexity of training scenarios, improving model performance [25][24]. Group 3: Reinforcement Learning and Challenges - The reinforcement learning world engine enables models to explore training environments and receive feedback, with five key factors influencing its effectiveness [25][27]. - The simulation of interactions between multiple agents poses significant challenges, with Ideal exploring self-play and reward function adjustments to enhance sample diversity [27][29]. Group 4: Commercialization and Technological Advancements - Ideal has successfully established a profitable business model, which supports its ongoing research and development efforts, with over 10 billion yuan invested in the self-developed Star Ring OS [32][33]. - The Star Ring OS enhances vehicle performance by streamlining communication between different control systems, significantly reducing braking distances [35][36]. - The open-source initiative of the Star Ring OS is expected to benefit the entire industry, reducing development costs for other automakers [39][40]. Group 5: Industry Position and Future Outlook - Ideal is positioning itself as a leading player in the AI-driven automotive sector, with a focus on becoming a "space robotics company" [48][50]. - The company has established a research-production closed loop, allowing for rapid application of research findings to production, exemplified by the DriveVLM project [52]. - The article concludes that while many companies are investing in AI and robotics, few have achieved the comprehensive capabilities demonstrated by Ideal and Tesla [53].