VLA系统
Search documents
全面梳理 VLA 20大挑战的深度综述,方向清晰可见,每周更新,助力时刻掌握最新突破!
AI科技大本营· 2025-12-25 01:18
【编者按】 Vision-Language-Action(VLA)正在把"看得懂、说得明白、做得出来"的机器人从演示推向真实系统。但模型、数据、范式爆发式增长的同 时,也带来一个现实困境:新入门者不知道从哪里学起,从业者也难以判断该从哪些维度系统性提升能力。这篇由树根科技、三一集团耘创新实验室、伦敦 国王学院、港理工、达姆施塔特工业大学,挪威阿哥德大学,帝国理工大学等单位联合完成的最新综述,给出了一张清晰的"问题全景图"和学习路线,并提 供一个持续更新的在线参考框架。 近期,具身智能(Embodied AI)已成为人工智能与机器人领域最活跃、同时也最具探索空间的前沿方向之一。从类 GPT 机器人助手的演示,到多模 态大模型逐步走向真实机器人平台,"让机器看得见、听得懂、会行动"正从概念验证走向系统化探索。 然而,随着模型规模迅速膨胀、数据集与方法不断涌现,领域内部也愈发显现出一种结构性的困惑:刚进入这一方向的研究者往往难以判断应当从何入 手;而已身处其中的从业者也常常面临一个更具体的问题——究竟该从哪些维度、以什么顺序系统性提升 VLA 的能力?在快速扩张与路径分化并存的当 下,单纯罗列模型与方法已难以提供有 ...
多家企业押注VLA背后:智驾路线要趋于融合?
Mei Ri Jing Ji Xin Wen· 2025-12-16 12:21
"我跟王兴兴观点最不一样的地方在于,他认为模型架构更重要,但我认为模型的关键是要与整个具身 智能系统适配。在此基础上,数据是起决定意义的。"郎咸朋认为,"VLA就是自动驾驶最好的模型方 案。" 近几年,辅助驾驶行业经历了多次"技术底座"的范式迁移——从企业普遍把激光雷达+高精地图奉为"黄 金组合",到引入BEV(鸟瞰图)+Transformer摆脱高精度地图,再到端到端将辅助驾驶带入AI(人工 智能)时代,企业普遍按照这个路径来推进辅助驾驶功能。 进入2025年,行业在辅助驾驶的发展方向上出现了VLA与世界模型的"分歧",而理想与小鹏就是选择 VLA方案的代表。 两技术派别"各执一词" 据记者了解,VLA被业内视为端到端方案的"智能增强版"。其名称中的V代表视觉感知(Vision),A代 表动作执行(Action),而中间的L则是大语言模型(Language Model)。V负责实时感知环境,A负责 输出具体控制指令,L则像"中台"一样,把感知信息转译为可供A执行的规划与决策。 清华大学车辆与运载学院助理研究员颜宏伟表示:"VLA是多模态大模型驱动的智能体架构,其核心突 破在于引入思维链,通过语言模型实现对环 ...
理想迎来逆风局
3 6 Ke· 2025-11-27 17:40
Core Viewpoint - The performance of Li Auto in Q3 has significantly declined, with revenue dropping by 36.2% year-on-year to 27.4 billion yuan and a net loss of 624 million yuan compared to a profit of 2.8 billion yuan in the same period last year [3][5][6] Financial Performance - Li Auto's Q3 delivery volume was 93,211 units, a nearly 39% year-on-year decrease and over 16% decline from the previous quarter [3][6] - The gross margin for the i6 model, which is currently the best-selling product, is the lowest in the entire lineup, indicating future pressure on profitability [3][5] Strategic Shift to AI - Li Auto is increasingly focusing on AI as a strategic pivot, moving away from its initial emphasis on range-extended vehicles, which are now facing intense competition and market saturation [5][10] - The company has made significant organizational changes to support its AI strategy, including restructuring teams and management to enhance efficiency and focus on AI development [12][13][14] Competitive Landscape - Li Auto faces stiff competition from other players in the market, with rivals like Xiaopeng and NIO gaining market share, leading to a decline in Li Auto's sales performance [9][10] - The company has initiated price reductions for its L series to combat inventory pressures, with discounts reaching up to 45,000 yuan [9][10] R&D Investment - Li Auto plans to invest 12 billion yuan in R&D this year, with 50% allocated specifically to AI, indicating a strong commitment to this area compared to industry averages [19] - The focus on precise investment in AI, particularly in the VLA model, reflects a strategic shift towards long-term technological development rather than broad-based spending [19]
理想汽车
数说新能源· 2025-11-27 02:03
Company Strategy Choices - The company will return to an entrepreneurial organizational model led by the founding team starting from Q4 2025, abandoning the professional management model attempted over the past three years. This decision is based on the rapidly changing industry technology and competitive environment, as well as the founder's extensive experience in startups [18][19]. - The product direction will focus on embodied AI robots rather than just electric vehicles or smart devices. This choice is made to avoid competition based solely on parameters like range and price, and to address user needs in high-frequency life scenarios [18][19]. Technical Route Selection - The company will build a full-stack AI system oriented towards the physical world instead of a language model route. Key breakthroughs will focus on enhancing perception capabilities with 3D Vision Transformers, which could increase effective perception range by 2-3 times [19][20]. - The model layer will aim to optimize the operating frequency of models, with a target to increase the current 10Hz frequency of a 4 billion parameter MOE model by 2-3 times, requiring customized GPU architecture and operating systems [20]. - The hardware layer will develop the Drive Biowire system to reduce the response time from 550 milliseconds to 350 milliseconds, potentially lowering accident rates by over 50% [21]. Q3 2025 Financial and Operational Data - Total revenue for Q3 was 27.4 billion RMB, a year-on-year decrease of 36.2% and a quarter-on-quarter decrease of 9.5%. Vehicle sales revenue was 25.9 billion RMB, down 37.4% year-on-year and 10.4% quarter-on-quarter [22]. - The overall gross margin was 16.3%, down 5.2 percentage points year-on-year and 3.8 percentage points quarter-on-quarter. Excluding recall costs, the gross margin was 20.4% [23]. - The net loss for the quarter was 624.4 million RMB, compared to a net profit of 2.8 billion RMB in the same quarter last year [26]. Product and Technology Progress - The I series models (I8/I6) are positioned to cover mainstream and high-end family markets, with significant order growth since September. Production capacity is expected to increase to about 20,000 units per month by early 2026 [30]. - The VLA system has been fully deployed, enhancing path selection at complex intersections, with further upgrades planned to improve safety and perception capabilities [44]. Market Strategy and Response - The company anticipates a significant drop in deliveries in Q1 2026 due to consumers rushing to take advantage of policy incentives before they expire. Long-term strategies include ensuring all models meet new energy consumption standards to qualify for subsidies [33][40]. - The company plans to operate approximately 4,800 supercharging stations by 2026, with 35% located in highway service areas, to enhance user experience and support the transition to new energy vehicles [40].
世界模型能够从根本上解决VLA系统对数据的依赖,是伪命题...
自动驾驶之心· 2025-09-23 11:37
Core Viewpoint - The article discusses the ongoing debate between two approaches in the autonomous driving sector: VLA (Vision-Language Action) and WA (World Model), highlighting that both are fundamentally reliant on data, but differ in their methodologies and implications for the future of autonomous driving [1][2]. Summary by Sections VLA vs. WA - The autonomous driving landscape is splitting into two camps by 2025: companies like Xiaopeng, Li Auto, and Yuanrong Qixing are betting on the VLA approach, while Huawei and NIO are advocating for the WA model [1]. - WA is claimed to be the ultimate solution for achieving true autonomous driving, but the article argues that it is merely a rebranding of data dependency [1]. Data Dependency - Both VLA and WA are based on the premise that "data determines the upper limit" of capabilities [2]. - VLA relies on real-world multimodal data to train reasoning abilities, while WA requires a combination of real data and simulated data to enhance its capabilities [2]. - The industry is confused about the distinction between "data form" and "data essence," leading to misconceptions about the reliance on data [2]. Industry Misconceptions - The article emphasizes that the discussion should not focus on whether data is needed, but rather on how to efficiently utilize data [2]. - VLA and WA represent different methods of data collection and usage, with data remaining the core competitive advantage in autonomous driving until true artificial intelligence is realized [2]. Community and Resources - The "Autonomous Driving Knowledge Planet" community has over 4,000 members and aims to grow to nearly 10,000 in two years, providing a platform for technical exchange and sharing of knowledge in the autonomous driving field [4][10]. - The community offers resources such as learning routes, technical discussions, and access to industry experts, facilitating knowledge sharing among newcomers and advanced practitioners [4][11].