Workflow
自动驾驶之心
icon
Search documents
对比之后,VLA的成熟度远高于世界模型...
自动驾驶之心· 2025-09-26 16:03
Core Insights - The article discusses the competition between VLA (Vision-Language Action) models and world models in the field of end-to-end autonomous driving, highlighting that over 90% of current models are segmented end-to-end rather than purely VLA or world models [2][6]. Group 1: Model Comparison - VLA models, represented by companies like Gaode Map and Horizon Robotics, show superior performance compared to world models, with the latest VLA papers published in September 2023 [6][43]. - The performance metrics of various models indicate that VLA models outperform world models significantly, with the best VLA model achieving an average L2 distance of 0.19 meters and a collision rate of 0.08% [5][6]. Group 2: Data Utilization - The Shanghai AI Lab's GenAD model utilizes unlabelled data sourced from the internet, primarily YouTube, to enhance generalization capabilities, contrasting with traditional supervised learning methods that rely on labeled data [7][19]. - The GenAD framework employs a two-tier training approach similar to Tesla's, integrating diffusion models and Transformers, but requires high-precision maps and traffic rules for effective operation [26][32]. Group 3: Testing Methods - Two primary testing methods for end-to-end autonomous driving are identified: open-loop testing using synthetic data in simulators like CARLA, and closed-loop testing based on real-world collected data [4][6]. - The article emphasizes the limitations of open-loop testing, which cannot provide feedback on the execution of predicted actions, making closed-loop testing more reliable for evaluating model performance [4][6]. Group 4: Future Directions - The article suggests that while world models have potential, their current implementations often require additional labeled data, which diminishes their advantages in generalization and cost-effectiveness compared to VLA models [43]. - The ongoing research and development in the field indicate a trend towards improving the integration of various data sources and enhancing model robustness through advanced training techniques [19][32].
自动驾驶之心国庆&中秋节活动进行中(课程/星球/硬件优惠等)
自动驾驶之心· 2025-09-26 16:03
Group 1 - The article promotes various discounts and offers for courses, including a 70% discount and a reduction of 80 or 99 yuan for specific courses [1][3][4] - A yearly subscription to the "Big Model Planet" is available for 99 yuan, which includes technology, industry insights, and job hunting resources [1] - The platform offers a 1v1 tutoring service with a maximum discount of 1000 yuan off a 5000 yuan fee, and a 1v6 paper tutoring service with a 1000 yuan reduction [1] Group 2 - The "Autonomous Driving Heart" section highlights cutting-edge self-driving technology with nearly 40 learning routes available [6] - The community facilitates face-to-face interactions with industry leaders and academic experts, providing insights into the latest developments in autonomous driving [6] - The content includes discussions on various topics such as the competition between VLA and WA routes, future directions of self-driving technology, and the concept of world models [6]
AnchDrive:一种新端到端自动驾驶扩散策略(上大&博世)
自动驾驶之心· 2025-09-26 07:50
Core Insights - The article introduces AnchDrive, an end-to-end framework for autonomous driving that effectively addresses the challenges of multimodal behavior and generalization in long-tail scenarios [1][10][38] - AnchDrive utilizes a hybrid trajectory anchor approach, combining dynamic and static anchors to enhance trajectory quality and robustness in planning [10][38] Group 1: Introduction and Background - End-to-end autonomous driving algorithms have gained significant attention due to their superior scalability and adaptability compared to traditional rule-based motion planning methods [4][12] - These methods learn control signals directly from raw sensor data, reducing the complexity of modular design and minimizing cumulative perception errors [4][12] Group 2: Methodology - AnchDrive employs a multi-head trajectory decoder that dynamically generates a set of trajectory anchors, capturing behavioral diversity under local environmental conditions [8][15] - The framework integrates a large-scale static anchor set derived from human driving data, providing cross-scenario behavioral prior knowledge [8][15] Group 3: Experimental Results - In the NAVSIM v2 simulation platform, AnchDrive achieved an Extended Predictive Driver Model Score (EPDMS) of 85.5, indicating its ability to generate robust and contextually appropriate behaviors in complex driving scenarios [9][30][34] - The performance of AnchDrive was significantly higher than existing methods, with an 8.9 point increase in EPDMS compared to VADv2, while reducing the number of trajectory anchors from 8192 to just 20 [34] Group 4: Contributions - The main contributions of the article include the introduction of the AnchDrive framework, which utilizes a truncated diffusion process initialized from a hybrid trajectory anchor set, significantly improving initial trajectory quality and planning robustness [10][38] - The design of a mixed perception model with dense and sparse branches enhances the planner's understanding of obstacles and road geometry [11][18]
29.88万的ES8,蔚来终于开窍了...
自动驾驶之心· 2025-09-26 03:45
Core Viewpoint - NIO Day showcased two distinct models, the new ES8 and the ET9 Horizon Special Edition, representing NIO's dual strategy of affordability and luxury in the electric vehicle market [1][21]. Group 1: New ES8 - The new ES8 is positioned as an accessible flagship SUV, starting at a price of 299,800 yuan, which is 10,000 yuan lower than its previous pre-sale price [8][17]. - The vehicle features a spacious design with dimensions of 5280×2010×1800mm, offering 6 or 7-seat configurations, enhancing its practicality [11]. - It boasts an upgraded battery capacity of 102 kWh, providing a CLTC range of up to 635 kilometers, addressing consumer concerns about range anxiety [13]. - The ES8 includes high-performance features such as a three-second acceleration and advanced driving assistance capabilities, balancing performance and safety [14]. - The model aims to redefine luxury by making it a more inclusive experience rather than a privilege for a few [20]. Group 2: ET9 Horizon Special Edition - The ET9 is priced at 818,000 yuan, targeting a niche market rather than the mainstream, serving as a benchmark for luxury electric vehicles [22]. - The design of the ET9 emphasizes elegance and sophistication, with a focus on aesthetics that resonate with Eastern sensibilities [24][25]. - NIO's commitment to detail and quality in the ET9 reflects its dedication to meeting the demands of consumers seeking an exceptional experience [28]. Group 3: Strategic Insights - The dual focus on affordability (普惠) and luxury (极致) is essential for NIO's growth strategy, ensuring market presence while maintaining brand prestige [34]. - The company aims to achieve profitability by Q4 2025, with the new ES8 expected to contribute significantly to sales, supported by a production capacity of 40,000 units [39][40]. - NIO's long-term vision includes deepening its market roots while simultaneously elevating its brand stature, as highlighted during the NIO Day event [33][38].
有一定深度学习基础,该如何入门自动驾驶?
自动驾驶之心· 2025-09-25 23:33
Group 1 - The core viewpoint emphasizes the rapid evolution of the autonomous driving technology stack, highlighting the need for continuous learning to avoid obsolescence in the field [1] - The company has established three platforms focusing on autonomous driving, embodied intelligence, and large models, encouraging exploration and adaptation in a changing environment [2] - The company is actively promoting industry advancement and has launched significant promotional activities during the National Day and Mid-Autumn Festival, offering discounts on courses [2][4] Group 2 - The knowledge community for autonomous driving includes nearly 40 learning paths, covering cutting-edge technologies such as VLA, world models, and closed-loop simulation [8] - The community facilitates face-to-face interactions with industry leaders and offers seven premium courses aimed at beginners, fostering skill development [8]
如何向一段式端到端注入类人思考的能力?港科OmniScene提出了一种新的范式...
自动驾驶之心· 2025-09-25 23:33
Core Insights - The article discusses the limitations of current autonomous driving systems in achieving true scene understanding and proposes a new framework called OmniScene, which integrates human-like cognitive abilities into the driving process [11][13][14]. Group 1: OmniScene Framework - OmniScene introduces a visual-language model (OmniVLM) that combines panoramic perception with temporal fusion capabilities for comprehensive 4D scene understanding [2][14]. - The framework employs a teacher-student architecture for knowledge distillation, embedding textual representations into 3D instance features to enhance semantic supervision [2][15]. - A hierarchical fusion strategy (HFS) is proposed to address the imbalance in contributions from different modalities during multi-modal fusion, allowing for adaptive calibration of geometric and semantic features [2][16]. Group 2: Performance Evaluation - OmniScene was evaluated on the nuScenes dataset, outperforming over ten mainstream models across various tasks, establishing new benchmarks for perception, prediction, planning, and visual question answering (VQA) [3][16]. - Notably, OmniScene achieved a significant 21.40% improvement in visual question answering performance, demonstrating its robust multi-modal reasoning capabilities [3][16]. Group 3: Human-like Scene Understanding - The framework aims to replicate human visual processing by continuously converting sensory input into scene understanding, adjusting attention based on dynamic driving environments [11][14]. - OmniVLM is designed to process multi-view and multi-frame visual inputs, enabling comprehensive scene perception and attention reasoning [14][15]. Group 4: Multi-modal Learning - The proposed HFS combines 3D instance representations with multi-view visual inputs and semantic attention derived from textual cues, enhancing the model's ability to understand complex driving scenarios [16][19]. - The integration of visual and textual modalities aims to improve the model's contextual awareness and decision-making processes in dynamic environments [19][20]. Group 5: Challenges and Solutions - The article highlights challenges in integrating visual-language models (VLMs) into autonomous driving, such as the need for domain-specific knowledge and real-time safety requirements [20][21]. - Solutions include designing driving attention prompts and developing new end-to-end visual-language reasoning methods to address safety-critical driving scenarios [22].
从现有主流 RL 库来聊聊RL Infra架构演进
自动驾驶之心· 2025-09-25 23:33
Core Viewpoint - Reinforcement Learning (RL) is transitioning from a supportive technology to a core driver of model capabilities, focusing on multi-step, interactive agent training to achieve General Artificial Intelligence (AGI) [2][6]. Group 1: Modern RL Infrastructure Architecture - The core components of modern RL infrastructure include a Generator, which interacts with the environment to generate trajectories and calculate rewards, and a Trainer, which updates model parameters based on trajectory data [6][4]. - The generator-trainer architecture, combined with distributed coordination layers like Ray, forms the "gold standard" for RL systems [6][4]. Group 2: Primary Development - Primary Development frameworks serve as foundational frameworks for building RL training pipelines, providing core algorithm implementations and integration with underlying training/inference engines [8][7]. - TRL (Transformer Reinforcement Learning) is a user-friendly RL framework launched by Hugging Face, offering various algorithm supports [9][10]. - OpenRLHF, developed by a collaborative team including ByteDance and NetEase, aims to provide an efficient and scalable RLHF and Agentic RL framework [11][14]. - veRL, developed by Byte's Seed team, is one of the most comprehensive frameworks with extensive algorithm support [16][19]. - AReaL (Asynchronous Reinforcement Learning) is designed for large-scale, high-throughput RL training with a fully asynchronous architecture [20][21]. - NeMo-RL, launched by NVIDIA, integrates into its extensive NeMo ecosystem, focusing on production-level RL frameworks [24][28]. - ROLL, an Alibaba open-source framework, emphasizes asynchronous and Agentic capabilities for large-scale LLM RL [30][33]. - slime, developed by Tsinghua and Zhipu, is a lightweight framework focusing on seamless integration of SGLang with Megatron [34][36]. Group 3: Secondary Development - Secondary Development frameworks are built on primary frameworks, targeting specific downstream application scenarios like multi-modal, multi-agent, and GUI automation [44][3]. - Agentic RL frameworks, such as verl-agent, optimize for asynchronous rollout and training, addressing the core challenges of multi-round interactions with external environments [46][47]. - Multimodal RL frameworks, like VLM-R1 and EasyR1, focus on training visual-language reasoning models, addressing data processing and loss function design challenges [53][54]. - Multi-Agent RL frameworks, such as MARTI, integrate multi-agent reasoning and reinforcement learning for complex collaborative tasks [59][60]. Group 4: Summary and Trends - The RL infrastructure is evolving from a "workshop" model to a "standardized pipeline," with increasing modularity in framework design [65]. - Asynchronous architectures are becoming essential to address the computational asymmetry between rollout and training [66]. - The emergence of high-performance inference engines like vLLM and SGLang significantly accelerates the rollout process [66]. - The evolution from RLHF to Agentic RL reflects the growing complexity of tasks supported by new frameworks [66]. - Distributed training framework choices, such as Megatron-LM and DeepSpeed, are critical for large-scale model training [66]. - Scene-driven secondary development frameworks are addressing unique challenges in vertical domains [66]. - The importance of orchestrators for managing distributed components in RL systems is becoming widely recognized [66].
超高性价比3D扫描仪!点云/视觉全场景厘米级重建
自动驾驶之心· 2025-09-25 23:33
Core Viewpoint - The GeoScan S1 is presented as a highly cost-effective 3D laser scanner designed for industrial and educational applications, featuring lightweight design, easy operation, and high precision in 3D scene reconstruction [1][9]. Group 1: Product Features - The GeoScan S1 can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large area scanning of over 200,000 square meters [1][29]. - It integrates multiple sensors, including RTK, IMU, and high-resolution cameras, enabling real-time mapping and high-precision data collection [22][34]. - The device operates on a hand-held Ubuntu system and includes various connectivity options such as dual USB 3.0 ports and a high-bandwidth network interface, facilitating flexible integration for research and development [3][44]. Group 2: User Experience - The GeoScan S1 is designed for ease of use, allowing users to start scanning with a single button press and export results without complex setups [5][27]. - It supports real-time modeling and high-fidelity scene reconstruction, producing colorful point cloud data through advanced sensor fusion algorithms [27][30]. - The device is lightweight, weighing 1.3 kg without the battery and 1.9 kg with it, and has a battery life of approximately 3 to 4 hours [22][26]. Group 3: Market Positioning - The GeoScan S1 is marketed as the most cost-effective handheld 3D laser scanner in the domestic market, with a starting price of 19,800 yuan [9][57]. - Various versions are available, including a basic version, a depth camera version, and online/offline 3DGS versions, catering to different user needs [57][58]. - The product has been validated through numerous projects in collaboration with academic institutions, enhancing its credibility in the market [9][38].
打算招聘几位大佬共创平台(4D标注/世界模型/VLA/模型部署等方向)
自动驾驶之心· 2025-09-25 07:36
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3][4] - The article highlights the benefits of joining, including resource sharing for job seeking, PhD recommendations, and substantial cash incentives [5][6]
车圈一个月48位高管变动,新一轮的变革要开始了......
自动驾驶之心· 2025-09-25 03:45
Group 1 - The automotive industry is undergoing a new round of transformation, with significant executive changes in various companies, including Li Auto, BYD, and Changan Automobile [1] - The autonomous driving sector is rapidly evolving, with a shift in focus from traditional methods to new algorithms and models, emphasizing the need for continuous learning and adaptation [2][3] - The community is actively engaging in discussions about the future of autonomous driving, exploring new article styles and hosting online events with industry leaders [3][6] Group 2 - The community has developed platforms for autonomous driving, embodied intelligence, and large models, aiming to explore new opportunities amidst constant change [3][4] - A comprehensive resource has been created within the community, offering over 40 technical routes and addressing practical questions related to autonomous driving [5][8] - The community is focused on providing a collaborative environment for both beginners and advanced practitioners, facilitating knowledge sharing and networking [10][14] Group 3 - The community offers a variety of learning resources, including video tutorials and structured learning paths for newcomers to the field of autonomous driving [11][13] - Regular discussions and Q&A sessions are held to address industry-related queries, such as entry points into end-to-end autonomous driving and the applicability of multi-sensor fusion [17][19] - The community aims to grow its membership significantly over the next two years, enhancing its role as a hub for technical exchange and career opportunities in the autonomous driving sector [3][19]