自动驾驶之心
Search documents
英伟达长达41页的自驾VLA框架!因果链推理,实车可部署
自动驾驶之心· 2025-11-15 03:03
Core Insights - The article discusses the introduction of the Alpamayo-R1 (AR1) framework by NVIDIA, which aims to enhance decision-making capabilities in complex driving scenarios through causal reasoning and trajectory planning [1][2]. Group 1: Background and Development - The evolution of autonomous driving systems has shifted from traditional modular architectures to end-to-end frameworks, which are now widely recognized in the industry [3]. - Current end-to-end methods struggle with long-tail scenarios due to sparse supervisory signals and the need for higher-order reasoning capabilities, highlighting a significant gap between existing models and the requirements for robust Level 4 (L4) autonomous driving [3][4]. Group 2: Innovations in AR1 - AR1 integrates causal chain reasoning with trajectory planning, resulting in a 12% improvement in planning accuracy in high-difficulty scenarios compared to trajectory-based benchmark models [2][8]. - The model demonstrates a 35% reduction in lane deviation rates and a 25% decrease in near-collision rates during closed-loop simulations [2]. - After reinforcement learning post-training, the model's reasoning quality improved by 45%, and reasoning-action consistency increased by 37% [2]. Group 3: Causal Chain Dataset and Structured Reasoning - The article emphasizes the necessity of structured causal reasoning in autonomous driving, proposing the creation of a causal chain (CoC) dataset that aligns reasoning trajectories with driving decisions [5][29]. - The CoC dataset is designed to ensure that reasoning trajectories are concise and directly linked to specific driving decisions, enhancing the model's interpretability and training efficiency [5][31]. Group 4: Training Strategies and Model Architecture - AR1 employs a multi-stage training strategy that combines supervised fine-tuning and reinforcement learning to optimize reasoning quality and trajectory prediction [8][12]. - The model architecture is modular, allowing compatibility with existing visual-language model (VLM) backbones while integrating components tailored for autonomous driving [12][16]. Group 5: Visual Encoding and Action Decoding - The article discusses the challenges of visual encoding in multi-camera setups and proposes efficient tokenization methods to reduce the number of tokens generated during real-time inference [19][22]. - Action decoding is based on a bicycle model to ensure smooth trajectory outputs, enhancing the model's performance in real-world applications [27][28]. Group 6: Quality Assurance and Annotation Process - A hybrid annotation process combining human and automated labeling is implemented to ensure high-quality training data for the CoC dataset, balancing efficiency and accuracy [48][49]. - The quality assurance process includes multiple checks to ensure causal correctness and decision minimality in the annotated data [52][53].
端到端自动驾驶算法工程师的一天
自动驾驶之心· 2025-11-15 03:03
Core Viewpoint - The article emphasizes the importance of end-to-end algorithms in autonomous driving, highlighting the shift from rule-based algorithms to learning-based approaches, particularly in the context of congestion and dynamic obstacle scenarios [4][7]. Summary by Sections Overview of End-to-End Tasks - The transition to end-to-end systems merges perception tasks and emphasizes the learning-based approach for control algorithms, which is now a mainstream requirement for companies [7]. Two-Stage End-to-End Algorithm Framework - The two-stage framework is discussed, including its modeling methods and the information transfer between perception and planning, navigation, and control (PNC) [8]. One-Stage End-to-End Algorithm - The one-stage framework allows for lossless information transfer, providing superior performance compared to the two-stage approach. Various one-stage frameworks, including those based on VLA and diffusion methods, are introduced [9]. Navigation Information in Production - Navigation information is crucial for guiding and selecting routes in autonomous driving. The chapter covers mainstream navigation map formats and how to effectively encode and embed navigation maps in end-to-end models [10]. Introduction to Reinforcement Learning Algorithms - The necessity of integrating reinforcement learning with imitation learning is highlighted, as it helps machines learn causal relationships and generalize better in diverse driving scenarios [11]. End-to-End Trajectory Output Optimization - This section focuses on practical projects involving trajectory planning, emphasizing the combination of imitation learning and reinforcement learning techniques [12]. Safety Net Solutions - Spatiotemporal Joint Planning - The importance of post-processing logic to ensure model accuracy is discussed, including trajectory smoothing algorithms to enhance stability and reliability [13]. Experience Sharing in End-to-End Production - The final chapter shares insights on production experiences from various perspectives, including data, models, scenarios, and rules, to improve system capabilities [14]. Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [15][16].
端到端和VLA的岗位,三年经验月薪到70k了
自动驾驶之心· 2025-11-14 00:04
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to 70k per month for positions requiring 3-5 years of experience [1] - The technology stack for end-to-end and VLA is complex, involving various advanced algorithms such as BEV perception, Vision-Language Models (VLM), diffusion models, reinforcement learning, and world models [1] - The industry is offering specialized courses to help individuals quickly and effectively learn about end-to-end and VLA technologies, featuring collaboration between academia and industry experts [1] Course Offerings - The "End-to-End and VLA Autonomous Driving Course" focuses on key algorithms and theoretical foundations in end-to-end autonomous driving, covering both one-stage and two-stage approaches, including BEV perception and large language models [11] - The "Autonomous Driving VLA and Large Model Practical Course" is designed for beginners in the VLA field, providing a comprehensive overview of VLA, including modules on Vision, Language, and Action, as well as reinforcement learning and diffusion models [2] - Both courses include practical assignments, allowing participants to build their own VLA models and datasets from scratch [2] Instructor Profiles - The course instructors include experts with strong academic backgrounds and practical experience in autonomous driving and large model development, such as those from Tsinghua University and top-tier universities [7][10][13] - Instructors have published numerous papers in prestigious conferences and have experience in leading projects related to multimodal perception and autonomous driving [7][10][13] Target Audience - The courses are aimed at individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and who have a grasp of concepts related to transformer models, reinforcement learning, and BEV perception [15] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [15]
小鹏刘先明:VLA 2.0的「涌现」过程极其突然......
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses the emergence of advanced technologies in autonomous driving and robotics, particularly focusing on Xiaopeng Motors' developments in VLA (Vision-Language Architecture) and humanoid robots [5][10][28]. Group 1: Technological Advancements - Xiaopeng Motors has invested significantly in computational power, utilizing 30,000 cards and spending over 2 billion in training costs, leading to a breakthrough in their technology [7]. - The emergence of capabilities in their second-generation VLA and humanoid robot IRON was unexpected, with previous months of failures suddenly giving way to significant progress [5][8]. - The core logic of the second-generation VLA is to eliminate the translation from vision to language, enhancing efficiency and enabling self-supervised learning [10][19]. Group 2: Challenges and Solutions - The transition from structured text data to continuous video signals presents challenges, including information loss and the need for real-time feedback from the physical world [14][15][17]. - Xiaopeng's approach simplifies the training process by removing complex steps, allowing for direct input from multimodal data and output as physical actions [20][22]. - The company is focused on optimizing local deployment solutions to achieve low latency and high frame rates, ensuring real-time performance on their hardware [24]. Group 3: Robotics Development - Xiaopeng's robotics team is closely collaborating with the automotive division, emphasizing in-house development to reduce costs and accelerate iteration [28][29]. - The humanoid robot IRON has shown significant improvements in movement, achieving a human-like gait through innovative design and control systems [36][39]. - The development of a universal generative controller allows the robot to perform complex movements, such as Tai Chi, by directly inputting recorded trajectories [46]. Group 4: Future Prospects - The company envisions a future where robots can establish deeper emotional connections with humans, potentially personalizing their designs to meet individual preferences [48]. - The advancements in robotics and autonomous driving are expected to lead to sudden breakthroughs, similar to those seen in the automotive sector [32].
一句话,就能创造出随便乱逛的3D世界!
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses the launch of Marble, a world model developed by WorldLabs, which allows users to create immersive 3D environments using a single image or text prompt [2][3][7]. Group 1: Product Features - Marble enables the generation of persistent, downloadable 3D environments, distinguishing it from other real-time models [28]. - Users can upload 2D images or 3D models (with a fee) to generate worlds, achieving high realism akin to AAA video games [14][16]. - The platform includes AI-native editing tools and a mixed 3D editor, allowing users to construct spatial frameworks and fill in visual details [31]. Group 2: User Experience - The initial testing phase showed impressive results, with the ability to create interactive 3D scenes from a single image [32]. - Users can input multiple images or short videos to create more accurate 3D worlds, enhancing the creative process [48]. - The editing process is iterative, allowing users to modify generated worlds extensively, from minor adjustments to major structural changes [49][50]. Group 3: Pricing and Accessibility - Marble offers three pricing tiers, with the highest tier costing $95 per month for generating up to 75 worlds, while the free version allows for 4 worlds [83][84]. - The Pro version is available for the first month at just $1, with standard pricing at $20 per month [85]. Group 4: Future Implications - The article emphasizes that Marble represents a significant step towards achieving spatial intelligence in AI, which is expected to unlock new applications in simulation and robotics [70][71]. - The integration of interactive capabilities in future world models is highlighted as a key opportunity for enhancing user engagement and application [69].
不用术语看懂世界模型:从日常预测到自动驾驶
自动驾驶之心· 2025-11-14 00:04
Group 1 - The core concept of the article is the definition and function of the "world model," which predicts future scenarios based on past sensory data, similar to how humans anticipate events in daily life [2][3][30] - The world model operates by taking various forms of input, such as images, sounds, and sensor data, and outputs predictions about future states, emphasizing the importance of recognizing patterns and making forecasts [4][30] - The distinction between world models and neural networks is highlighted, where neural networks serve as tools for recognition and imitation, while world models are the core that enables prediction and understanding [5][10][30] Group 2 - The article discusses the limitations of creating a "universal" world model due to the vast differences in rules and requirements across various scenarios, leading to the necessity for specialized models [11][12][30] - Various specialized world models are introduced, including video generation, music generation, game, and industrial production models, each focusing on specific domains to achieve precise predictions [12][14][18][30] - The automatic driving world model is described as the most stringent type, as its predictions directly impact safety, requiring rapid response times and high accuracy [18][22][30] Group 3 - The VLA model is presented as an enhanced version of the automatic driving world model, incorporating language logic to improve the prediction of actions based on user commands and traffic rules [23][26][30] - The article concludes that the future of world models lies in becoming more specialized rather than universal, focusing on improving prediction accuracy and speed in specific scenarios [29][30]
特斯拉FSD藏了VLA吗?下周一场VLA和世界模型的深度讨论~
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the development of the Visual-Language-Action (VLA) framework and world models, highlighting the contributions of various experts in the field [1][2][3][4][5]. Group 1: Key Contributors - Jian Kun, a senior director at Li Auto, has built the autonomous driving technology stack from scratch since joining in 2021, achieving milestones such as Highway NoA in 2022 and City NoA in 2023 [1]. - Xu Lingyun, a PhD from the Chinese Academy of Sciences, leads the parking team at Changan Automobile, focusing on autonomous driving perception and end-to-end system research [2]. - Jiang Anqing, a senior algorithm scientist at Bosch, leads research on VLA and closed-loop algorithms [3]. Group 2: Technological Developments - The discussion includes the potential integration of world models and VLA, questioning whether a unified approach is feasible [8]. - The high demand for data and computing power is making it increasingly difficult for academia to participate in intelligent driving advancements, raising questions about future opportunities in the academic sector [8]. Group 3: Event Highlights - A live discussion on the future of autonomous driving technologies, including insights on Tesla's FSD v14 and its implications for domestic technology [4][5]. - The event featured a deep dive into the reliability of VLM in autonomous driving, with expert opinions on data closed-loop engineering [12].
工程师变身AI“指挥者”,吉利与阿里云的软件开发变革实验
自动驾驶之心· 2025-11-13 00:04
Core Insights - The automotive industry is facing unprecedented challenges in software engineering, with the proportion of software developers at Geely increasing from less than 10% to 40% in recent years, highlighting the exponential growth in complexity as the codebase for smart vehicles surpasses 100 million lines [3][5] - Geely is leveraging AI technology, specifically through collaboration with Alibaba Cloud's Tongyi Lingma, to enhance development efficiency, achieving a 20% increase in coding efficiency and over 30% of code generation being AI-driven [5][6] - The shift from hardware-dominated to software-centric automotive products necessitates a transformation in development models, moving towards agile and DevOps methodologies to support rapid iterations [8][19] Development Challenges - The automotive industry is transitioning from distributed ECU architectures to centralized computing and service-oriented architectures (SOA), which significantly increases system integration complexity [8] - Compliance with stringent international safety standards such as ISO 26262 and ASPICE poses additional challenges, creating tension between rapid agile development and necessary safety protocols [8] AI Integration - Geely's R&D system encompasses application software development, embedded development, and algorithm research, with AI tools like Tongyi Lingma being integrated across all areas [10][11] - AI is being utilized to automate repetitive tasks, allowing engineers to focus on system architecture and core business logic, leading to a 30% efficiency improvement in coding phases [16][18] Knowledge Management - AI's ability to quickly read and interpret legacy code helps mitigate the challenges of "technical debt," allowing new engineers to understand complex systems more rapidly [17][18] - The collaboration between Geely and Alibaba Cloud aims to create a proprietary knowledge base that enhances AI's contextual understanding of Geely's specific technical stack and business logic [14][15] Role Transformation - The role of engineers is evolving from executors to "AI commanders," where they define problems and oversee AI execution, shifting the focus from implementation to strategic oversight [20][21] - The ultimate goal is to achieve a highly automated R&D environment, where AI and human engineers collaborate throughout the entire development process [22][23] Industry Implications - The demand for cross-disciplinary talent that understands both mechanical hardware and software systems is increasing, highlighting a significant skills gap in the automotive industry [23] - The integration of AI in software development may lower technical barriers, enabling engineers with mechanical backgrounds to participate more actively in software engineering [23]
熬过「真空期」的小马智行,已经迎来了势不可挡的正循环
自动驾驶之心· 2025-11-13 00:04
Core Viewpoint - The article highlights the successful IPO of Pony.ai, marking its emergence as a leading player in the autonomous driving industry after overcoming a funding and technological "vacuum period" [2][4][30]. Group 1: IPO and Funding - Pony.ai listed on the Hong Kong Stock Exchange on November 6, 2025, with an offering of approximately 48.25 million shares at a price of HKD 139 per share, potentially raising up to HKD 7.7 billion if the over-allotment option is fully exercised, making it the largest IPO in the global autonomous driving sector for 2025 [2][4]. - The IPO attracted significant cornerstone investments, including USD 120 million from various top international investment institutions, with Uber contributing USD 100 million [4]. - Despite a downturn in the autonomous driving sector, Pony.ai secured two rounds of financing during this period, including a USD 290 million Series D round in 2022, demonstrating strong investor confidence in its technological capabilities [8][24]. Group 2: Technological Advancements - Pony.ai has transitioned from merely imitating human driving to surpassing it, with its Robotaxi technology achieving a safety performance ten times better than human drivers [14][20]. - The company developed a proprietary software stack for its "Virtual Driver," which utilizes an end-to-end model to accurately perceive and understand the environment, predicting the behavior of surrounding vehicles and pedestrians [16]. - The seventh-generation L4 autonomous driving system was launched, featuring a platform design that supports multiple vehicle models and significantly reduces the bill of materials (BOM) cost by 70% compared to previous generations [16][17]. Group 3: Commercialization and Operations - Pony.ai has commenced full-scale commercial operations of its Robotaxi service in major Chinese cities, including Beijing, Shanghai, Guangzhou, and Shenzhen, following successful technological breakthroughs and regulatory approvals [23]. - The company reported a significant revenue increase, with total revenue for Q2 2025 reaching RMB 154 million, a year-on-year growth of 75.9% [24]. - The Robotaxi segment alone generated revenue of USD 3.256 million (approximately RMB 23.32 million) in the first half of 2025, reflecting a year-on-year increase of 178.8% [25]. - The operational efficiency of the Robotaxi fleet is enhanced through a highly automated fleet management system, allowing for real-time monitoring and management of vehicles, which supports a 1:20 personnel-to-vehicle operational ratio [27].
我们正在寻找自动驾驶领域的合伙人...
自动驾驶之心· 2025-11-13 00:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 大家好,我是柱哥。最近一直在做内容升级并筹划更为细致的内容输出,从以往单篇文章的解读慢慢过 渡到更深入的技术梳理、方案分析、观点讨论。自动驾驶已经进入技术深水区,行业的难点和痛点需要 更多有志之士参与进来一起突破。后面我们将陆续为大家增加圆桌访谈、实战&工业级课程、咨询等各 类输出。 包括但不限于:自动驾驶产品经理、4D标注/数据闭环、世界模型、VLA、自动驾驶大模型、强化学 习、端到端等多个方向。 岗位说明 主要面向自动驾驶培训合作(B端主要面向企业和高校、研究院所培训,C端面向较多学生、求职类人 群)、课程开发和原创文章创作。 联系我们 待遇与合作方式,欢迎添加微信wenyirumo做进一步沟通。 这段时间特斯拉、小鹏、理想都有新的技术分享,引起了大家非常广泛和深入的讨论,非常开心能为大 家分享更多优质的内容。 作为国内自动驾驶领域创作的技术平台,我们期望能够在这波激流中贡献自己的力量,成为一个真的能 给行业带来价值的平台。 众人拾柴火焰高,我们需要更多优秀的伙伴加入我们。 主要方向 ...