VLA
Search documents
开放几个自动驾驶技术交流群(世界模型/端到端/VLA)
自动驾驶之心· 2025-09-20 16:03
Group 1 - The establishment of a technical exchange group focused on autonomous driving technologies has been announced [1] - The group aims to facilitate discussions on various topics such as world models, end-to-end systems, and VLA [1] - The initiative coincides with the back-to-school season and autumn recruitment period, indicating a strategic timing for engagement [1]
人形机器人考察要点_市场展望、组件与具身人工智能-Humanoid Robot tour takeaways_ market outlook, components and embodied AI
2025-09-18 13:09
Summary of Conference Call Notes on Greater China Industrials (Humanoid Robots and Autonomous Driving) Industry Overview - The humanoid robot and autonomous driving (AD) sectors in China are expected to experience rapid expansion over the next decade, with significant growth anticipated in factory settings within 2-3 years and further opportunities in commercial and household applications in the long term [1][1] - The current bill of materials (BOM) cost for a fully-functional humanoid robot is approximately US$50-60k, with expectations for rapid cost reductions in the next five years due to improved product design and economies of scale [1][1] - Stricter regulations in the AD sector are anticipated to create more opportunities for AD components, particularly for LiDAR technology, which will benefit from new long-distance object detection requirements [1][1] Key Players and Developments Dobot - Dobot is a leading global collaborative robot (COBOT) brand, achieving a 47% year-over-year growth in 6-axis COBOT sales in the first half of 2025, indicating market share gains [8][8] - The company has entered the humanoid robot market, launching its first prototype in early 2025 and planning deployment in manufacturing and business scenarios [9][9] RoboSense - RoboSense is focusing on its new EMX LiDAR products, which offer superior precision and detection distance compared to competitors, with expectations to ship 600-700k units in 2025 and 1.5 million units in 2026 [10][10] - The company is also exploring opportunities in the lawn mower, unmanned delivery, and robotaxi industries, with significant partnerships established [11][11] Zhaowei Machinery & Electronics - Zhaowei has launched new dexterous hand models for humanoid robots and aims for a 10-15% global market share in this segment [12][12][13][13] - The BOM cost of the dexterous hand is estimated to account for 20-30% of the total BOM cost of a humanoid robot [13][13] Googol Technology - Googol Technology specializes in high-end control systems for advanced manufacturing and sees strong growth potential in humanoid robots due to its expertise in multi-degree-of-freedom (DoF) controlling [14][15] Minieye - Minieye is making progress with its smart driving solutions, including iPilot and iRobo, and anticipates significant growth in the penetration of front-view camera modules and driver monitoring systems due to new safety regulations [16][17] Leju Robotics - Leju targets to deliver over 1,000 units of robotics in 2025, focusing on stability and durability for large-scale applications [18][18] Orbbec - Orbbec is a leading player in robot vision systems, holding over 70% market share in 3D vision systems for service robots in China [21][21][22][22] UBTECH - UBTECH aims to ship 500 humanoid robots in 2025 and 2,000-3,000 units in 2026, with expectations for BOM cost reductions in the coming years [23][23][24][24] LK Tech - LK Tech is focusing on magnesium alloy technology for humanoid robots, which offers lightweighting and other advantages, and has signed cooperation agreements for R&D projects [25][26][26] Technology Insights - The competition between VLA (Vision-Language-Action) and world model technologies for embodied AI is highlighted, with data availability being a key bottleneck [3][3] - The vision system of humanoid robots is evolving, with depth cameras becoming the mainstream choice for enhancing sensing and navigation capabilities [22][22] Market Outlook - The humanoid robot market is expected to grow significantly, with projections of 3 million units shipped by 2030, leading to substantial opportunities for component suppliers [13][13] - The average selling price (ASP) of humanoid robots is expected to decline to approximately RMB150k (~US$20k) by 2026-2028 due to scale effects [20][20] Conclusion - The humanoid robot and AD sectors in Greater China are poised for significant growth, driven by technological advancements, regulatory changes, and increasing market demand. Key players are actively innovating and expanding their product offerings to capture market share in this rapidly evolving landscape.
小鹏&理想全力攻坚的VLA路线,到底都有哪些研究方向?
自动驾驶之心· 2025-09-17 23:33
Core Viewpoint - The article discusses the transition in intelligent driving technology from rule-driven to data-driven approaches, highlighting the limitations of end-to-end models in complex scenarios and the potential of VLA (Vision-Language Action) as a more streamlined solution [1][2]. Group 1: Challenges in Learning and Research - The technical stack for autonomous driving VLA has not yet converged, leading to a proliferation of algorithms and making it difficult for newcomers to enter the field [2]. - A lack of high-quality documentation and fragmented knowledge in various domains increases the entry barrier for beginners in autonomous driving VLA research [2]. Group 2: Course Development - A new course titled "Autonomous Driving VLA Practical Course" has been developed to address the challenges faced by learners, focusing on a comprehensive understanding of the VLA technical stack [3][4]. - The course aims to provide a one-stop opportunity to enhance knowledge across multiple fields, including visual perception, language modules, and action modules, while integrating cutting-edge technologies [2][3]. Group 3: Course Features - The course emphasizes quick entry into the subject matter through a Just-in-Time Learning approach, using simple language and case studies to help students grasp core technologies rapidly [3]. - It aims to build a framework for research capabilities, enabling students to categorize papers and extract innovative points to form their own research systems [4]. - Practical application is a key focus, with hands-on sessions designed to complete the theoretical-to-practical loop [5]. Group 4: Course Outline - The course covers the origins of autonomous driving VLA, foundational algorithms, and the differences between modular and integrated VLA [6][10][12]. - It includes practical sessions on dataset creation, model training, and performance enhancement, providing a comprehensive learning experience [12][14][16]. Group 5: Instructor Background - The instructors have extensive experience in multimodal perception, autonomous driving VLA, and large model frameworks, with numerous publications in top-tier conferences [22]. Group 6: Learning Outcomes - Upon completion, students are expected to thoroughly understand the current advancements in autonomous driving VLA and master core algorithms [23][24]. - The course is designed to benefit students in internships, job recruitment, and further academic pursuits in the field [26]. Group 7: Course Schedule - The course is set to begin on October 20, with a structured timeline for unlocking chapters and providing support through online Q&A sessions [27].
论文解读之港科PLUTO:首次超越Rule-Based的规划器!
自动驾驶之心· 2025-09-15 23:33
Core Viewpoint - The article discusses the development and features of the PLUTO model within the end-to-end autonomous driving domain, emphasizing its unique two-stage architecture and its direct encoding of structured perception outputs for downstream control tasks [1][2]. Summary by Sections Overview of PLUTO - PLUTO is characterized by its three main losses: regression loss, classification loss, and imitation learning loss, which collectively contribute to the model's performance [7]. - Additional auxiliary losses are incorporated to aid model convergence [9]. Course Introduction - The article introduces a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts from domestic leading manufacturers, aimed at addressing the challenges faced by learners in this rapidly evolving field [12][15]. Learning Challenges - The course addresses the difficulties learners face due to the fast-paced development of technology and the fragmented nature of knowledge across various domains, making it hard for beginners to grasp the necessary concepts [13]. Course Features - The course is designed to provide quick entry into the field, build a framework for research capabilities, and combine theory with practical applications [15][16][17]. Course Outline - The course consists of several chapters covering topics such as the history and evolution of end-to-end algorithms, background knowledge on various technologies, and detailed discussions on both one-stage and two-stage end-to-end methods [20][21][22][29]. Practical Application - The course includes practical assignments, such as RLHF fine-tuning, allowing students to apply their theoretical knowledge in real-world scenarios [31]. Instructor Background - The instructor, Jason, has a strong academic and practical background in cutting-edge algorithms related to end-to-end and large models, contributing to the course's credibility [32]. Target Audience and Expected Outcomes - The course is aimed at individuals with a foundational understanding of autonomous driving and related technologies, with the goal of elevating their skills to the level of an end-to-end autonomous driving algorithm engineer within a year [36].
扩散模如何重塑自动驾驶轨迹规划?
自动驾驶之心· 2025-09-11 23:33
Core Viewpoint - The article discusses the significance and application of Diffusion Models in various fields, particularly in autonomous driving, emphasizing their ability to denoise and generate data effectively [1][2][11]. Summary by Sections Introduction to Diffusion Models - Diffusion Models are generative models that focus on denoising, learning the distribution of data through a forward diffusion process and a reverse generation process [2][4]. - The concept is illustrated through the analogy of ink dispersing in water, where the model aims to recover the original data from noise [2]. Applications in Autonomous Driving - In the field of autonomous driving, Diffusion Models are utilized for data generation, scene prediction, perception enhancement, and path planning [11]. - They can handle both continuous and discrete noise, making them versatile for various decision-making tasks [11]. Course Offering - The article promotes a new course on end-to-end and VLA (Vision-Language Alignment) algorithms in autonomous driving, developed in collaboration with top industry experts [14][17]. - The course aims to address the challenges faced by learners in keeping up with rapid technological advancements and fragmented knowledge in the field [15][18]. Course Structure - The course is structured into several chapters, covering topics such as the history of end-to-end algorithms, background knowledge on VLA, and detailed discussions on various methodologies including one-stage and two-stage end-to-end approaches [22][23][24]. - Special emphasis is placed on the integration of Diffusion Models in multi-modal trajectory prediction, highlighting their growing importance in the industry [28]. Learning Outcomes - Participants are expected to achieve a level of understanding equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering key frameworks and technologies [38][39]. - The course includes practical components to ensure a comprehensive learning experience, bridging theory and application [19][36].
2025年,盘一盘中国智驾的自动驾驶一号位都有谁?
自动驾驶之心· 2025-09-10 23:33
Core Viewpoint - The automatic driving industry is undergoing a significant technological shift towards "end-to-end" solutions, driven by Tesla's leadership and advancements in large model technologies. This shift is prompting domestic automakers to increase investments and adjust their structures, making "end-to-end" a mainstream production solution by 2024 [1]. Group 1: Key Figures in Automatic Driving - The article highlights key figures in China's automatic driving sector, focusing on those who directly influence technology routes and team growth [1]. - Notable leaders include: - **Lang Xianpeng** from Li Auto, who has led advancements in assisted driving technology, including the launch of full-scene NOA and the no-map NOA feature [5]. - **Ye Hangjun** from Xiaomi, who has been pivotal in the development of Xiaomi's end-to-end driving system and has overseen multiple cutting-edge projects [7][9]. - **Ren Shaoqing** from NIO, who has significantly contributed to the development of urban NOA and emphasizes the importance of data in smart driving [11]. - **Li Liyun** from XPeng, who has taken over leadership in smart driving and focuses on a pure vision solution [14][15]. - **Yang Dongsheng** from BYD, who has led the development of the DM-i hybrid system and is pushing for the integration of advanced driving systems across all BYD models [17][20]. - **Su Jing** from Horizon Robotics, who is leading the development of end-to-end HSD solutions [21][22]. - **Cao Xudong** from Momenta, who has developed a data-driven strategy for autonomous driving and is focusing on end-to-end large models [25][26]. Group 2: Technological Trends and Innovations - The article discusses the technological evolution in the automatic driving sector, emphasizing the transition to end-to-end architectures and the emergence of large models, world models, and VLM solutions [1][53]. - Companies are adopting various strategies: - Li Auto is focusing on E2E and VLA systems [5]. - Xiaomi is heavily investing in end-to-end technology with significant output [9]. - NIO is pursuing a world behavior model approach [11]. - XPeng is committed to a pure vision strategy [15]. - BYD is integrating advanced driving systems across its entire lineup [20]. - Momenta is leveraging a dual strategy of L2 and L4 development to enhance its market position [26]. Group 3: Future Outlook - The article concludes that the leaders in the automatic driving industry are crucial in shaping the future of smart driving in China, with a shared goal of creating systems that are safe, reliable, and tailored to local conditions [51][53]. - The ongoing competition and collaboration among these leaders will drive the industry towards more intelligent and user-friendly solutions [51].
自动驾驶中有“纯血VLA"吗?盘点自动驾驶VLM到底能起到哪些作用~
自动驾驶之心· 2025-09-06 16:05
Core Viewpoint - The article discusses the challenges and methodologies involved in developing datasets for autonomous driving, particularly focusing on the VLA (Visual Language Action) model and its applications in trajectory prediction and scene understanding [1]. Dataset Handling - Different datasets have varying numbers of cameras, and the VLM model can handle this by automatically processing different image token inputs without needing explicit camera counts [2] - The output trajectories are based on the vehicle's current coordinate system, with predictions given as relative (x, y) values rather than image coordinates, requiring additional camera parameters for mapping to images [6] - The VLA model's output format is generally adhered to, but occasional discrepancies occur, which are corrected through Python programming for format normalization [8][9] Trajectory Prediction - VLA trajectory prediction differs from traditional methods by incorporating scene understanding capabilities through QA training, enhancing the model's ability to predict trajectories of dynamic objects like vehicles and pedestrians [11] - The dataset construction faced challenges such as data quality issues and inconsistencies in coordinate formats, which were addressed through rigorous data cleaning and standardization processes [14][15] Data Alignment and Structure - Data alignment is achieved by converting various dataset formats into a unified relative displacement in the vehicle's coordinate system, organized in a QA format that includes trajectory prediction and dynamic object forecasting [18] - The input data format consists of images and trajectory points from the previous 1.5 seconds to predict future trajectory points over 5 seconds, adhering to the SANA standard [20] Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community focuses on cutting-edge technologies in autonomous driving, covering nearly 40 technical directions and fostering collaboration between industry and academia [22][24] - The community offers a comprehensive platform for learning, including video tutorials, Q&A sessions, and job opportunities in the autonomous driving sector [28][29]
谈谈Diffusion扩散模型 -- 从图像生成到端到端轨迹规划~
自动驾驶之心· 2025-09-06 11:59
Core Viewpoint - The article discusses the significance and application of Diffusion Models in various fields, particularly in autonomous driving, emphasizing their ability to denoise and generate data effectively [1][2][11]. Summary by Sections Introduction to Diffusion Models - Diffusion Models are generative models that focus on denoising, where noise follows a specific distribution. The model learns to recover original data from noise through a forward diffusion process and a reverse generation process [1][2]. Applications in Autonomous Driving - In the field of autonomous driving, Diffusion Models are utilized for data generation, scene prediction, perception enhancement, and path planning. They can handle both continuous and discrete noise, making them versatile for various decision-making tasks [11]. Course Overview - The article promotes a new course titled "End-to-End and VLA Autonomous Driving," developed in collaboration with top algorithm experts. The course aims to provide in-depth knowledge of end-to-end algorithms and VLA technology [15][22]. Course Structure - The course is structured into several chapters, covering topics such as: - Comprehensive understanding of end-to-end autonomous driving [18] - In-depth background knowledge including large language models, BEV perception, and Diffusion Model theory [21][28] - Exploration of two-stage and one-stage end-to-end methods, including the latest advancements in the field [29][36] Learning Outcomes - Participants are expected to gain a solid understanding of the end-to-end technology framework, including one-stage, two-stage, world models, and Diffusion Models. The course also aims to enhance knowledge of key technologies like BEV perception and reinforcement learning [41][43].
锦秋基金被投地瓜机器人:从VGGT到数据闭环,具身智能的突破与探索
锦秋集· 2025-09-03 04:30
Core Viewpoint - The article discusses the transition from autonomous driving technology to robotics, highlighting the challenges and opportunities in the robotics industry, particularly in the context of embodied intelligence and the potential impact of new models like VGGT on 3D perception and robotics applications [5][7][60]. Group 1: Industry Trends - The robotics industry is at a pivotal moment, with significant technological advancements and a shift towards embodied intelligence, which is seen as the next frontier for AI [5][7]. - The article emphasizes the differences between the autonomous driving and robotics sectors, noting that while autonomous driving has reached a level of standardization, robotics is still exploring diverse hardware forms and algorithms [10][14]. - The VGGT model is introduced as a potential game-changer for 3D geometry, akin to how Transformers revolutionized natural language processing, indicating a shift towards unified solutions for 3D perception [6][67]. Group 2: Technological Migration - The migration of technology from autonomous driving to robotics is highlighted, with companies like DiGua Robotics leveraging experiences from the autonomous driving sector to enhance their robotics platforms [14][18]. - The challenges of hardware diversity in robotics are discussed, as the lack of standardization complicates data accumulation and algorithm development [10][14]. - The article outlines the evolution of autonomous driving algorithms from modular approaches to end-to-end systems, which are now being adapted for robotics applications [25][27]. Group 3: VGGT and Its Implications - VGGT is presented as a foundational model that could redefine 3D visual technology, offering a new paradigm for solving traditional geometric problems through large-scale data and models [55][67]. - The potential for VGGT to replace expensive depth cameras with cheaper RGB cameras is discussed, which could significantly reduce the cost of robotics systems [64][66]. - The article concludes that VGGT represents a significant advancement in the field of 3D vision, marking the entry of large models into the realm of geometric processing [67][68].
Tier 1一哥博世端到端终于走到量产,还是一段式!
自动驾驶之心· 2025-08-30 16:03
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on WePilot AiDrive, a new end-to-end ADAS solution developed by WeRide, which aims to enhance the driving experience and safety through advanced AI capabilities [5][9][10]. Group 1: WeRide's New Technology - WeRide has launched a new end-to-end ADAS solution named WePilot AiDrive, which is set to be mass-produced within the year [5]. - The system integrates sensor data input and vehicle trajectory output into a single model, enhancing the efficiency and responsiveness of autonomous driving [10][24]. - The new system demonstrates improved performance in complex driving scenarios, such as navigating through urban villages and recognizing pedestrians in challenging lighting conditions [12][14][24]. Group 2: Comparison with Previous Systems - The previous two-stage model used separate perception and control models, which often led to data loss and limited understanding of driving environments [25][30]. - The new one-stage model allows for direct learning of the relationship between input data and output trajectories, significantly improving the system's performance [33]. - The transition from a rule-based approach to a more integrated model aims to overcome the limitations of earlier systems, which struggled with generalization and adaptability [32][35]. Group 3: Market Implications - The collaboration between WeRide and Bosch aims to make advanced driving capabilities accessible across various vehicle price segments, not just high-end models [41][44]. - Currently, less than 20% of vehicles in the Chinese market are equipped with advanced intelligent driving features, indicating significant growth potential for WeRide's technology [42]. - The goal is to push L2+ capabilities beyond the "value inflection point," making advanced driving technology more mainstream [44].