Vision-Language-Action (VLA)
Search documents
科技未来:视觉语言动作- 自动驾驶的下一个 AI 前沿-Future of Tech_ VLA as the next AI frontier in autonomous driving
2026-03-24 01:27
Summary of Key Points from the Conference Call Industry Overview - The focus is on the autonomous driving industry, particularly advancements in AI technologies and their implications for various automakers in Japan and China [1][4][5][12][13]. Core Insights and Arguments Global Autonomous Driving Penetration - Global L2+/L2++ penetration is projected to reach 36% by 2030, up from 15% in 2025, while L3 adoption is expected to be limited due to complexity and regulatory hurdles [1][25][28]. - In China, L2+/L2++ penetration is expected to rise to approximately 70% by 2030, significantly higher than the global average [30][34]. - The US market is anticipated to see L2+/L2++ penetration of around 36% by 2030, supported by consumer acceptance of advanced features [43][47]. Japan's Approach to Autonomous Driving - Japanese automakers are adopting varied strategies for commercialization, with Toyota leading through a 'multi-pathway' approach, combining internal development and partnerships [4][9][12]. - Upcoming models like Toyota's RAV4 and Sony Honda Mobility's Afeela are expected to drive the rollout of software-defined vehicles (SDVs) [8][12]. - The Japanese market is characterized by a cautious approach, prioritizing safety and reliability, with L2+/L2++ penetration projected at 29% by 2030 [48][49]. China's Competitive Landscape - Leading Chinese EV manufacturers such as XPeng and Li Auto are at the forefront of adopting Vision-Language-Action (VLA) models, enhancing user experience and decision-making capabilities [5][13]. - The intense competition among Chinese OEMs is accelerating the development of advanced driver-assistance systems (ADAS), which are becoming essential features in premium EVs [5][13]. - Concerns remain regarding the monetization potential of these technologies and the ability of Chinese OEMs to introduce advanced features in international markets [5][13]. Technological Shifts - The transition from rule-based systems to end-to-end (E2E) architectures is being driven by the need for faster deployment and improved handling of edge cases [2][9]. - VLA models are seen as the next frontier in E2E development, with companies like Waymo leveraging advanced AI to enhance navigation capabilities [3][9]. Additional Important Insights - Traditional auto parts suppliers face challenges as automakers assert more control over software layers, potentially reducing suppliers' revenue from design changes [11]. - Japan's government is promoting SDVs as a national priority, aiming for a 30% penetration target by 2030-2035, which may accelerate strategic initiatives across the sector [12]. - The role of high-definition (HD) maps remains relevant even in E2E systems, as they provide essential localization support and training data for AI models [66][67]. Investment Implications - Ratings for Japanese automakers include Outperform for Suzuki and Toyota, Market-Perform for Honda and Denso, and Underperform for Nissan, Mazda, and Subaru [12][14]. - In China, BYD and Xiaomi are rated as Outperform, while XPeng, NIO, and Li Auto are rated as Market-Perform [14]. This summary encapsulates the key points discussed in the conference call, highlighting the advancements and strategic directions of the autonomous driving industry in Japan and China.
VLA的论文占据具身方向的近一半......
具身智能之心· 2025-09-18 04:00
Core Insights - The article emphasizes the significance of Vision-Language-Action (VLA) models in the field of embodied intelligence, highlighting their ability to enable robots to autonomously make decisions in diverse environments, thus breaking the limitations of traditional single-task training methods [1][4]. Industry Development - The embodied intelligence sector is experiencing rapid growth, with teams like Unitree, Zhiyuan, Xinghaitu, and Yinhai General transitioning from laboratory research to commercialization, alongside major tech companies such as Huawei, JD, and Tencent collaborating with international firms like Tesla and Figure AI [3]. Research Opportunities - VLA is identified as a current research hotspot with many unresolved issues, making it a promising area for academic papers. The article mentions the establishment of a specialized VLA research guidance course aimed at helping individuals quickly enter or transition within this field [3][4]. Course Content and Structure - The course focuses on how agents interact effectively with the physical world through a perception-cognition-action loop, covering the evolution of VLA technology from early grasp pose detection to recent models like Diffusion Policy and multimodal foundational models [7][8]. - It addresses core challenges in embodied intelligence, such as cross-domain generalization and long-term planning, and explores how to integrate large language models with robotic control systems [8]. Learning Outcomes - Upon completion, participants are expected to master the theoretical foundations and technical evolution of VLA models, gain proficiency in simulation environments, and develop independent research capabilities [14]. - The course aims to guide students from idea generation to the completion of a high-quality academic paper, ensuring they can identify research opportunities and design effective experiments [10][14].
中国人形机器人_ 人工智能大会要点_ 轮式机器人演示比双足更常见,应用更广泛-China Humanoid Robot_ WAIC 2025 takeaways_ Broader applications with wheel-based robot demo more common than bipedal
2025-07-29 02:31
Summary of WAIC 2025 Takeaways Industry Overview - The conference showcased significant advancements in the AI and robotics industry, with a 35% increase in venue size to 70,000 sqm and a 31% increase in ticket prices to Rmb168 per day, featuring 800 exhibitors (up 60% year-over-year) and over 1,200 speakers [1][2]. Core Insights 1. **Application Scenarios**: There was a more targeted exploration of application scenarios across various sectors including manufacturing, logistics, retail, and elderly care, indicating a shift towards early commercialization [2][7]. 2. **Product Improvements**: Humanoid robots demonstrated meaningful product improvements, moving from static displays to engaging in interactive task demonstrations [2][8]. 3. **Prototype Trends**: A noticeable shift towards AGV-style wheeled bases was observed, suggesting a pragmatic approach to achieving near-term commercial viability, which may negatively impact stocks related to planetary roller screw components [2][9]. 4. **Cost Trends**: Cost curves for humanoid robots are decreasing but not significantly, with the lowest ASP reported at Rmb40,000 for Unitree's new model [2][14]. 5. **Manipulation Challenges**: Manipulation remains a core challenge, with issues around success rates, robustness, and reliability still prevalent [2][12]. Notable Exhibitors and Innovations - **Noematrix**: Showcased wheel-based prototypes performing various tasks, indicating a focus on practical applications [7][18]. - **Galbot**: Demonstrated retail automation robots capable of complex tasks, achieving efficiency levels comparable to human workers [17][18]. - **AgiBot**: Introduced multiple humanoid robots targeting various applications, including logistics and customer interaction [17]. - **Unitree**: Highlighted advancements in dynamic locomotion with their humanoid robots, showcasing improved autonomous capabilities [20]. Future Outlook - The exhibition reinforced a constructive view on humanoid robots as a long-term technology trend, with expectations for a technology inflection point approaching, although not yet realized [3][12]. - Upcoming updates from Tesla's Gen 3 Optimus are anticipated to be significant for the sector [3]. Investment Recommendations - **Sanhua Intelligent Controls**: Rated as a Buy due to growth potential in auto/EV thermal management and HVAC systems [21]. - **Zhejiang Supcon Technology Co.**: Also rated as a Buy, with strong market share in process automation and potential for vertical expansion [22]. - **Best Precision**: Neutral rating, with expectations of becoming a competitive supplier for humanoid robots [23]. - **Leader Harmonious Drive Systems**: Neutral rating, with potential growth in harmonic reduction gear applications [26]. - **Shanghai Baosight Software**: Neutral rating, with concerns over reliance on related-party transactions [27]. Conclusion The WAIC 2025 highlighted significant advancements in humanoid robotics, with a clear trend towards practical applications and commercialization. The investment landscape appears promising for select companies within the sector, although challenges remain in manipulation and cost efficiency.
对VLA的RL最新进展的梳理~
自动驾驶之心· 2025-07-03 12:41
Core Viewpoint - The article discusses the recent advancements in Vision-Language-Action (VLA) models, particularly focusing on the integration of Reinforcement Learning (RL) techniques to enhance their performance and stability in various tasks [1]. Group 1: Early Exploration of iRe-VLA - The core algorithm of iRe-VLA is PPO, which introduces a two-stage training paradigm to address instability in online reinforcement learning [2]. - The implementation utilizes BLIP-2 3B as the VLM backbone, replacing the final fully connected layer with an action head that includes a token learner and an MLP [2]. - The experimental setup involves simulation environments like Meatworld and Franka Kitchen, with tasks divided into three categories for evaluation [2]. Group 2: Preference Alignment with GRAPE - GRAPE introduces preference alignment into VLA training, specifically designed for VLA characteristics [6]. - The reward for each trajectory is composed of three parts: success reward, self-reward, and external reward based on a custom cost function [8]. - The external reward is calculated by decomposing trajectories into stages and evaluating them using a VLM task decomposer [9]. Group 3: LOOP and RIPT-VLA - LOOP combines RLOO and PPO to address challenges in sparse rewards and long sequences in multi-task scenarios [11]. - The RIPT-VLA employs the LOOP algorithm for online RL and provides open-source code for implementation [13]. - The approach includes various tricks to enhance training efficiency, such as dynamic rejection mechanisms and multi-task sampling [15]. Group 4: System and Algorithm Innovations in RL4VLA - RL4VLA models the action generation process as a multi-modal dialogue, using PPO training with dense pseudo-rewards to guide the training process [18]. - The training involves a Robotic Process Reward Model that predicts the likelihood of action sequences, enhancing the reward signal [20]. - The article emphasizes adaptive curriculum selection strategies to improve sample efficiency and generalization capabilities [21][23]. Group 5: Engineering Challenges and Future Directions - The article highlights the need for new RL algorithms suitable for VLA-RL, particularly addressing sparse reward issues and enhancing sample efficiency [30]. - It points out the engineering challenges in improving sampling efficiency and managing memory costs in VLA scenarios [30]. - The exploration of effective reward design and the implementation of RL in non-autoregressive VLA structures are identified as critical areas for future research [30].