VLA模型

Search documents
小鹏智驾一把手换人,蔚来团队大调整,各有各的算盘
3 6 Ke· 2025-10-10 12:30
Core Insights - The leadership changes in the autonomous driving divisions of Xiaopeng Motors and NIO indicate a competitive evolution in the smart driving landscape, with both companies adjusting their strategies to enhance their technological capabilities [2][19]. Group 1: Leadership Changes - Xiaopeng Motors announced that Li Liyun, the former head of the autonomous driving center, will be replaced by Liu Xianming, who previously led the world foundation model team [1][2]. - Liu Xianming, who joined Xiaopeng Motors over a year ago, has a background in machine learning and computer vision, having worked at Facebook and Cruise [6][8]. - NIO's autonomous driving team also experienced significant personnel changes, with multiple key executives leaving, including the head of the world model and the product lead for autonomous driving [2][19]. Group 2: Strategic Focus - Liu Xianming's promotion reflects Xiaopeng's commitment to advancing its world foundation model, which is crucial for achieving higher levels of autonomous driving capabilities [13][14]. - The world model developed by Liu's team has a parameter scale of 72 billion, significantly larger than current mainstream VLA models, and is designed to enhance the vehicle's understanding of complex environments [14][16]. - The shift in leadership at both companies suggests a strategic pivot towards different technological approaches, with Xiaopeng focusing on the world model and NIO restructuring to improve its AI integration and delivery efficiency [17][19]. Group 3: Industry Dynamics - The autonomous driving sector is witnessing a bifurcation in technological approaches, primarily between VLA (Vision-Language-Action) and world model architectures, with different companies aligning with one of these strategies [17][18]. - The recent changes in leadership and organizational structure across various companies indicate a new phase of competition in the smart driving field, as firms seek to establish their technological dominance [20].
具身的这几个方向,组成了所谓的大小脑算法
具身智能之心· 2025-09-19 00:03
Core Viewpoint - The article discusses the evolution and current trends in embodied intelligence technology, emphasizing the integration of various models and techniques to enhance robotic capabilities in real-world environments [3][10]. Group 1: Technology Development Stages - The development of embodied intelligence has progressed through several stages, starting from grasp pose detection to behavior cloning, and now to diffusion policy and VLA models [7][10]. - The first stage focused on static object grasping with limited decision-making capabilities [7]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations but faced challenges in generalization and error accumulation [7]. - The third stage, marked by the introduction of diffusion policy methods, improved stability and generalization by modeling action sequences [8]. - The fourth stage, beginning in 2025, explores the integration of VLA models with reinforcement learning and world models to enhance predictive capabilities and multi-modal perception [9][10]. Group 2: Key Technologies and Techniques - Key technologies in embodied intelligence include VLA, diffusion policy, and reinforcement learning, which collectively enhance robots' task execution and adaptability [5][10]. - VLA models combine visual perception, language understanding, and action generation, enabling robots to interpret human commands and perform complex tasks [8]. - The integration of tactile sensing with VLA models expands the sensory capabilities of robots, allowing for more precise operations in unstructured environments [10]. Group 3: Industry Implications and Opportunities - The advancements in embodied intelligence are leading to increased demand for engineering and system capabilities, transitioning from theoretical research to practical deployment [10][14]. - There is a growing interest in training and deploying various models, including diffusion policy and VLA, on platforms like Mujoco and IsaacGym [14]. - The industry is witnessing a surge in job opportunities and research interest, prompting many professionals to shift focus towards embodied intelligence [10].
具身VLA后训练:TeleAI提出潜空间引导的VLA跨本体泛化方法
具身智能之心· 2025-09-16 00:03
Core Insights - The article discusses the challenges and solutions related to the Vision-Language-Action (VLA) models in the context of cross-embodiment adaptation, highlighting the limitations of existing models and the introduction of a new framework called "Align then Steer" (ATE) to enhance performance in post-training scenarios [1][2][10]. Group 1: Challenges in VLA Models - Current VLA models require extensive target domain data for post-training, often needing dozens to hundreds of hours, leading to significant mismatches in action distributions between pre-training and post-training phases [1][10]. - The marginal returns of simply stacking data during post-training diminish rapidly, making it ineffective for fitting the action distribution of target scenarios [1][11]. Group 2: ATE Framework Introduction - The ATE framework proposed by the TeleAI team aims to align action distributions in latent space, allowing for efficient adaptation of VLA models without altering their core architecture [2][10]. - ATE transitions the focus from adjusting model architecture to adjusting distributions, significantly reducing data requirements for cross-embodiment adaptation [2][15]. Group 3: ATE Framework Mechanism - The ATE framework consists of two main phases: aligning action distributions in latent space and guiding the post-training strategy updates using a classifier [14][19]. - In the alignment phase, two small Variational Autoencoders (VAEs) are constructed to embed action data into a unified latent space, ensuring that the adapted actions closely follow the pre-trained distribution [18][19]. - The guiding phase integrates a classifier guidance function to measure the difference between generated actions and target distributions, effectively steering the model outputs towards the desired action distribution [21][22]. Group 4: Experimental Results - The ATE algorithm demonstrated an average increase of 9.8% in multi-task success rates in simulation evaluations compared to direct post-training, with a maximum success rate gain of 32% in real-world scenarios [23][24]. - The framework showed robust performance under various conditions, including lighting changes and external disturbances, maintaining task-related focus and recovery capabilities [29][30]. Group 5: Conclusion and Future Directions - The ATE framework provides a viable solution to the challenges of data scarcity and cross-embodiment adaptation in VLA models, allowing for efficient and robust training without the need for extensive data collection or full model retraining [30]. - This framework can serve as a plug-and-play module compatible with various mainstream VLA models, enhancing their post-training cross-embodiment generalization capabilities [30].
正式开课!具身大脑和小脑算法与实战教程来啦
具身智能之心· 2025-09-15 00:04
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1][3] - The development of embodied intelligence technology has evolved through various stages, from low-level perception to high-level task understanding and generalization [6][14] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, transitioning from laboratories to commercial and industrial applications [3] - Major domestic companies like Huawei, JD, Tencent, Ant Group, and Xiaomi are actively investing and collaborating to build a comprehensive ecosystem for embodied intelligence, while international players like Tesla and investment firms support advancements in autonomous driving and warehouse robotics [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several phases: - The first phase focused on grasp pose detection, which lacked the ability to model task context and action sequences [6] - The second phase introduced behavior cloning, allowing robots to learn from expert demonstrations but revealing weaknesses in generalization and performance in multi-target scenarios [6] - The third phase, emerging in 2023, utilized Diffusion Policy methods to enhance stability and generalization by modeling action trajectories [6][7] - The fourth phase, starting in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome current limitations [9][11][12] Educational Initiatives - The demand for engineering and system capabilities in embodied intelligence is increasing as the industry shifts from research to deployment, necessitating higher engineering skills [17] - A comprehensive curriculum has been developed to cover various aspects of embodied intelligence, including practical applications and advanced topics, aimed at both beginners and advanced learners [14][20]
蔚小理自研智驾芯片:谁在掉队、谁在摇摆、谁在大步向前?
雷峰网· 2025-09-05 12:49
Core Viewpoint - The delay of Nvidia's Thor chip has made the external chip supply chain uncertain and expensive, highlighting the importance of self-developed chips to reduce costs and enhance the technological narrative of companies [1][35]. Group 1: Industry Overview - The arms race for computing power in smart driving began in 2021 with the launch of Nvidia's Orin-X chip, which boasts 254 TOPS, outperforming Mobileye's Q5H and Tesla's HW3.0 [2]. - The narrative logic among car manufacturers emphasizes the importance of self-developed chips as a crucial aspect of their strategy [3]. Group 2: NIO's Chip Development - NIO was the first to propose self-developed chips, with founder Li Bin initiating the "chip-making" project in 2020 despite skepticism [6][7]. - NIO's chip team, led by architect Zhang Danyu, has grown to around 400 members, with significant investment in R&D, totaling approximately 41.9 billion RMB from 2021 to 2024 [10][11]. - NIO aims to control the R&D process and reduce supply chain risks by fully self-developing core technologies [11][12]. Group 3: XPeng's Chip Strategy - XPeng has been aggressive in its chip development, launching the self-developed Turing AI chip for its P7 model, but faced challenges with internal collaboration between the chip and algorithm teams [19][20]. - The company initially outsourced chip design but shifted to full self-development due to delays from partners [20][24]. - XPeng's second-generation chip is under development, targeting a 5nm process, with ambitions to integrate advanced AI models into its vehicles [27][28]. Group 4: Li Auto's Approach - Li Auto started its chip development later than its competitors but faces fewer internal obstacles due to a unified leadership vision [29][30]. - The company has begun developing its second-generation chip, focusing on integrating its operating system and chip development under a single department [31][34]. - Li Auto aims to leverage its organizational structure to enhance collaboration between its algorithm and chip teams, which could lead to improved efficiency [35].
就在今天|物理智能产业与资本峰会:L3高阶智驾专题暨VLA模型产业白皮书及产业图谱发布
国泰海通证券研究· 2025-09-03 22:29
L3 高阶智驾专题暨 VLA 模型产业 自皮书及产业图谱发布 2025年9月4日(周四) 上海 - 国泰海通外滩金融广场 大模型发展如火如荼,将大模型进一步融合至智能驾驶中已成为产业共识, 而近年来政策正使得 L3 级智能驾驶落地商用渐成可能。在此背景下,视觉 - 语言 - 动作模型 (VLA) 应运而生,VLA 有望构建类似人类驾驶员的整体认知 袁玉记 -Momenta 解决方案总监 Momenta (北京初速度科技有限公司)是全球领先自 动驾驶公司,致力于通过深度学习和人工智能技术实 现可规模化的自动驾驶解决方案。公司基于数据驱动 的"一个飞轮"的技术洞察和"两条腿走路"的产品 战略,实现量产辅助驾驶与自动驾驶 Robo 的规模化 应用,开拓可规模化自动驾驶全新路径。公司为量产车 辆打造的量产辅助驾驶解决方案,能实现多种不同程 度的量产辅助驾驶功能,提供覆盖全场景的端到端智 能辅助驾驶体验。公司 Robotaxi 为自动驾驶出租车打 造的突破性的、可规模化商业化落地的自动驾驶方案 框架,将影响智能笃驶、具身智能产业格局与技术发展路线,并带来巨大的市 场和资本机遇。 13:30-13:40 领导发言致辞 陈 ...
理想汽车-W(02015):反转押注i6表现,有待经营优化、VLA优势赋能
KAIYUAN SECURITIES· 2025-08-31 10:47
Investment Rating - The investment rating for the company is "Buy" (maintained) [1][10] Core Views - The report indicates that the performance of the i6 model is crucial for reversing sales trends, with a focus on operational optimization and leveraging VLA advantages [3][4] - Revenue forecasts for 2025-2027 have been revised downwards due to anticipated challenges in the electric vehicle market, with expected revenues of 120.9 billion, 154.4 billion, and 182.8 billion respectively, reflecting year-on-year growth rates of -16.3%, 27.8%, and 18.3% [3][5] - Non-GAAP net profit estimates for the same period have also been reduced to 5.9 billion, 9.4 billion, and 13.5 billion, with corresponding year-on-year growth rates of -44.7%, 60.0%, and 43.0% [3][5] Financial Summary and Valuation Metrics - The company's revenue for Q2 2025 was 30.25 billion, a year-on-year decline of 5% but a quarter-on-quarter increase of 17%, with vehicle deliveries slightly exceeding revised guidance at 111,000 units [3][4] - The average selling price (ASP) decreased by 6,000 to 260,000 due to financial incentives and sales promotions [3] - Gross margin for Q2 2025 fell to 20.1%, with automotive gross margin at 19.4% and service gross margin at 33.5% [3][5] - The company is guiding for Q3 2025 revenues between 24.8 billion and 26.2 billion, with deliveries expected to be between 90,000 and 95,000 units [4] - The report highlights that the company's market capitalization corresponds to price-to-sales (PS) ratios of 1.5, 1.2, and 1.0 for 2025-2027, and price-to-earnings (PE) ratios of 30.5, 19.3, and 13.6 for the same period [3][5]
元戎启行CEO周光:幼年期的VLA智驾,强于巅峰期的端到端
Jing Ji Guan Cha Wang· 2025-08-31 01:05
Core Insights - Yuanrong Qixing launched its next-generation driver assistance platform, DeepRoute IO 2.0, which integrates a self-developed Vision-Language-Action (VLA) model, combining visual perception, semantic understanding, and action decision-making capabilities [2][3] - The shift towards VLA models is driven by the limitations of traditional end-to-end systems and the need for enhanced semantic understanding in complex driving scenarios [3][4] Group 1: Technological Advancements - The VLA model utilizes reinforcement learning to evolve and understand the reasoning behind actions, contrasting with the imitation learning of traditional end-to-end architectures [2][3] - Yuanrong Qixing's CEO, Zhou Guang, emphasizes the urgency of transitioning to a large model-driven company to avoid being outpaced by competitors [2][3] - The VLA system aims to teach AI to adopt a "defensive driving" approach, enabling it to make cautious decisions in uncertain situations [5][6] Group 2: Market Dynamics - Yuanrong Qixing has secured partnerships for over 10 vehicle models, achieving nearly 100,000 units of city navigation assistance system vehicles delivered, indicating significant market penetration [3][4] - The increasing scale of production presents new challenges, as any issues become magnified with higher delivery volumes [3][4] Group 3: Competitive Landscape - Zhou Guang critiques current mainstream technology routes, particularly the limitations of end-to-end systems based on BEV architecture, which struggle with occluded visual information [4][6] - The industry is witnessing a surge in VLA model development, with competitors like Xiaopeng Motors and Li Auto also exploring similar technologies [7][8] Group 4: Future Prospects - The VLA model is envisioned to extend beyond automotive applications, potentially benefiting robotics and autonomous systems in various environments [7][8] - Zhou Guang rates the current VLA model's performance at 6 out of 10, indicating room for improvement and growth, with expectations for significant advancements as next-generation chips become available [8][9]
「摩根士丹利」最新人形机器人研报:主流价值链公司梳理和趋势分析(附报告)
Robot猎场备忘录· 2025-08-28 00:06
Core Insights - The report by Morgan Stanley discusses the growth potential of the humanoid robot sector in China, predicting widespread adoption in the second half of 2025 and an increasing competitive edge over countries like the USA [2][3] - The market is shifting focus from speculative hype to validating commercial value, which will be crucial for the sector's future growth [6] - The report highlights the importance of technological breakthroughs and practical applications as key drivers for market sentiment in humanoid robotics [12] Market Dynamics - The humanoid robot market in China is expected to see a surge in orders in the latter half of the year, indicating a potential explosion in demand [8] - Continuous product launches and innovations in hardware and software are anticipated to act as catalysts for market growth [9] Technological Developments - Major companies are making advancements in their next-generation models, with Tesla's Optimus Gen 3 being a notable example, which is expected to showcase significant improvements by the end of the year [9] - The report identifies a trend towards self-developed AI models among leading humanoid robot startups, which is seen as essential for maintaining competitive advantage [15] Industry Players and Supply Chain - Morgan Stanley's report has expanded its analysis to include 45 companies within the humanoid robot value chain, covering various components from AI to actuators and batteries [11] - The report emphasizes the need for companies to establish their own technological capabilities rather than relying solely on external AI models [14] Commercialization Challenges - The report notes that while producing humanoid robots is not inherently difficult, achieving effective commercialization and scaling production remains a significant challenge for many startups [14] - The sustainability of revenue models based on impressive demonstrations rather than practical applications is questioned, highlighting the need for genuine commercial viability [13]
加州大学最新!做什么?教VLA模型拒绝不可能的任务
具身智能之心· 2025-08-26 00:03
Core Viewpoint - The article discusses the development and performance of the VLA model, focusing on its ability to handle false premise instructions in robotic tasks through the proposed IVA framework, which enhances the model's robustness in interpreting and responding to user commands [4][10]. Group 1: Problem Statement and Solution - The VLA model excels in various robotic tasks by relying on multimodal inputs, but it struggles with false premise instructions, which involve commands that reference non-existent objects or conditions [6][10]. - The IVA framework is introduced to address this issue, enabling the model to detect unexecutable commands, clarify or correct them through language, and associate reasonable alternatives with perception and action [4][10]. Group 2: Research Gaps and Contributions - Current research primarily focuses on the success rate of executing correct commands, neglecting the handling of ambiguous or unexecutable instructions [6][10]. - The core contributions of this work include the introduction of the IVA framework, the construction of a large-scale dataset for training, and validation of the model's performance across eight robotic tasks, demonstrating significant improvements in detecting false premises and executing valid commands [10][25]. Group 3: Experimental Results - The IVA framework achieved a false premise detection accuracy of 97.56% and a 50.78% increase in successful responses under false premise scenarios compared to baseline models [5][25]. - In various tasks, IVA outperformed the LLARVA model in overall success rates and false premise detection rates, with only minor reductions in success rates for real premise commands [25][28]. Group 4: Limitations and Future Directions - The dataset used for training is limited to a simulated environment, which may not fully represent real-world human-robot interactions, and the distribution of false premises may not align with actual occurrences [26][27]. - The IVA framework currently lacks the ability to handle complex, multi-turn clarifications and may struggle with longer, more ambiguous user commands in real-world scenarios [27][28].