Workflow
自动驾驶之心
icon
Search documents
FSD v14很有可能是VLA!ICCV'25 Ashok技术分享解析......
自动驾驶之心· 2025-10-24 00:04
Core Insights - Tesla's FSD V14 series has shown rapid evolution with four updates in two weeks, indicating a new phase of accelerated development in autonomous driving technology [4][5] - The transition to an end-to-end architecture from version 12 has sparked industry interest in similar technologies, emphasizing the importance of a unified neural network model for driving control [7][9] Technical Advancements - The end-to-end system reduces intermediate processing steps, allowing for seamless gradient backpropagation from output to perception, enhancing overall model optimization [7] - Ashok highlighted the complexity of encoding human value judgments in autonomous driving scenarios, showcasing the system's ability to learn from human driving data to make nuanced decisions [9] - Traditional modular systems face challenges in defining interfaces for perception and decision-making, while end-to-end models minimize information loss and improve decision-making in rare scenarios [11][13] Data Utilization - Tesla's data engine collects vast amounts of driving data, generating the equivalent of 500 years of driving data daily, which is crucial for training the FSD model [18][19] - The company employs complex mechanisms to gather data from rare scenarios, ensuring the model can generalize effectively [19] Model Structure and Challenges - The ideal end-to-end model structure involves high-dimensional input data (e.g., 7 channels of 5 million pixel camera video) mapped to low-dimensional output signals, presenting significant training challenges [16] - The end-to-end system's architecture is designed to ensure interpretability and safety, avoiding the pitfalls of being a "black box" [20][22] Evaluation Framework - A robust evaluation framework is essential for end-to-end systems, focusing on closed-loop performance and the ability to assess diverse driving behaviors [32][34] - Tesla's closed-loop simulation system plays a critical role in validating the correctness of the end-to-end policy and generating adversarial samples for model testing [36][38] Future Implications - The integration of Tesla's simulation capabilities into robotics suggests potential advancements in embodied AI, enhancing the versatility of AI applications across different domains [40][42]
做了几期线上交流,我发现大家还是太迷茫
自动驾驶之心· 2025-10-24 00:04
Core Viewpoint - The article emphasizes the establishment of a comprehensive community called "Autonomous Driving Heart Knowledge Planet," aimed at providing a platform for knowledge sharing and networking in the autonomous driving industry, addressing the challenges faced by newcomers in the field [1][3][14]. Group 1: Community Development - The community has grown to over 4,000 members and aims to reach nearly 10,000 within two years, providing a space for technical sharing and communication among beginners and advanced learners [3][14]. - The community integrates various resources including videos, articles, learning paths, Q&A, and job exchange, making it a comprehensive hub for autonomous driving enthusiasts [3][5]. Group 2: Learning Resources - The community has organized over 40 technical learning paths, covering topics such as end-to-end autonomous driving, multi-modal large models, and data annotation practices, significantly reducing the time needed for research [5][14]. - Members can access a variety of video tutorials and courses tailored for beginners, covering essential topics in autonomous driving technology [9][15]. Group 3: Industry Insights - The community regularly invites industry experts to discuss trends, technological advancements, and production challenges in autonomous driving, fostering a serious content-driven environment [6][14]. - Members are encouraged to engage with industry leaders for insights on job opportunities and career development within the autonomous driving sector [10][18]. Group 4: Networking Opportunities - The community facilitates connections between members and various autonomous driving companies, offering resume forwarding services to help members secure job placements [10][12]. - Members can freely ask questions regarding career choices and research directions, receiving guidance from experienced professionals in the field [87][89].
京东入局新能源汽车赛道,名称官宣......
自动驾驶之心· 2025-10-23 08:14
Core Viewpoint - GAC Group, in collaboration with JD.com and CATL, has officially named its new vehicle "Aion UT Super," which is positioned as a "national good car" [1]. Group 1 - The Aion UT Super is the first model to feature "GAC Huawei Cloud Car Machine" technology [2]. - It is equipped with a large battery that offers a range of 500 km, a first in its class, and supports battery swapping in just 99 seconds, utilizing CATL's chocolate battery swapping technology [2]. Group 2 - The article mentions the establishment of nearly a hundred technical communication groups by "Automated Driving Heart," covering various advanced topics in automated driving technology [6]. - The community has around 4,000 members and includes over 300 automated driving companies and research institutions, focusing on more than 30 learning paths in automated driving technology [6].
手持激光雷达即可在线实时重建点云!超高性价比3D扫描仪来了~
自动驾驶之心· 2025-10-23 00:04
Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective 3D laser scanner designed for industrial and research applications, emphasizing its lightweight design, ease of use, and advanced features for real-time 3D scene reconstruction. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in 3D scene reconstruction using a multi-modal sensor fusion algorithm, capable of generating point clouds at a rate of 200,000 points per second and covering distances up to 70 meters [1][29]. - It supports scanning areas exceeding 200,000 square meters and can be equipped with a 3D Gaussian data collection module for high-fidelity scene restoration [1][30]. - The device is designed for easy operation with a one-button start feature, allowing users to export scan results without complex setups [5][27]. Group 2: Technical Specifications - The GeoScan S1 integrates various sensors, including RTK, IMU, and dual wide-angle cameras, and features a compact design with dimensions of 14.2cm x 9.5cm x 45cm and a weight of 1.3kg (without battery) [22][12]. - It operates on a power input of 13.8V - 24V with a power consumption of 25W, and has a battery capacity of 88.8Wh, providing approximately 3 to 4 hours of operational time [22][26]. - The system runs on Ubuntu 20.04 and supports various data export formats, including PCD and LAS [22][42]. Group 3: Market Positioning - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan for the basic version [9][57]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, having been tested in over a hundred projects [9][38]. - The scanner is suitable for a wide range of applications, including urban planning, infrastructure monitoring, and complex scene mapping in various environments such as industrial parks and tunnels [46][52].
上交OccScene:3D OCC生成新框架(TPAMI)
自动驾驶之心· 2025-10-23 00:04
Core Insights - The article discusses the integration of generative models with autonomous driving systems, emphasizing the need for high-quality, large-scale annotated data for training perception models, which is often costly and time-consuming [2] - OccScene is introduced as a solution that combines 3D scene generation with semantic occupancy perception through a novel joint diffusion framework, achieving a synergistic effect where the two tasks enhance each other [3] Innovation and Contributions - A unified perception-generation framework is proposed, where the perception model provides detailed geometric and semantic priors to the generator, creating a beneficial feedback loop [5] - The Mamba-based dual alignment module (MDA) is designed to efficiently align camera trajectories, semantic occupancy, and diffusion features, ensuring cross-view consistency and geometric accuracy in generated content [5] - OccScene demonstrates state-of-the-art (SOTA) performance, generating high-quality images/videos and corresponding 3D semantic occupancy information with just text prompts, significantly enhancing existing SOTA perception models [5] - The mutual learning mechanism promotes the model to find broader and more stable loss minima, avoiding local minima stagnation issues seen in independent learning [5] Comparison with Traditional Methods - OccScene employs a joint learning framework that promotes bidirectional enhancement, unlike traditional methods that treat generation and perception separately [7] - It requires only text prompts for flexible scene generation, contrasting with traditional methods that rely on real annotated data [7] - OccScene provides fine-grained semantic occupancy guidance for more precise geometry, moving away from the coarse geometric control of traditional approaches [7] - The generation process is driven by perception tasks, ensuring the practical utility of generated data [7] Technical Framework - The core of OccScene is the joint perception-generation diffusion framework, integrating semantic occupancy prediction with text-driven generation into a single diffusion process [8] - The training strategy consists of two phases: first, tuning the generator to understand occupancy constraints, and second, mutual learning to achieve bidirectional enhancement [9][10] - A dynamic weighted loss function is designed to balance the two tasks during joint optimization, ensuring stability in training [11][13] Experimental Results - OccScene achieves SOTA performance in 3D scene generation across various tasks, with significantly lower FID scores compared to traditional methods, indicating better quality [20][21] - The generated scenes exhibit more reasonable geometry and clearer details, maintaining high logical consistency in cross-view videos [20][23] - Using OccScene as a data augmentation strategy significantly improves the performance of existing SOTA perception models, demonstrating the high quality and information richness of the synthetic data [24][25] Applications and Value - OccScene is positioned as a critical tool for autonomous driving simulation, generating high-fidelity, diverse driving scenarios, particularly for corner cases, enhancing system robustness at a low cost [32] - It provides controllable and editable virtual environments for navigation and interaction in robotics and AR/VR applications [32] - As a plug-and-play data generator, OccScene addresses data scarcity issues for various downstream 3D vision tasks [32]
关于端侧大模型芯片化的若干趋势思考......
自动驾驶之心· 2025-10-23 00:04
Core Insights - The article discusses the evolution of algorithms in the chip design industry, particularly focusing on the advancements in attention mechanisms and their implications for future chip designs [2][4]. Group 1: Attention Mechanism Evolution - The Transformer architecture has dominated the large model field, but its self-attention mechanism poses significant computational challenges, especially in terms of power requirements during the prefill and decode phases [4]. - Various improvements to the Transformer structure have been proposed, such as Performer, Reformer, and lnformer, but none have achieved widespread application due to a lack of strong demand [4]. - The emergence of linear attention mechanisms aims to reduce computational complexity to linear levels, with models like RWKV and Mamba following this approach [5]. Group 2: Dynamic Sparsity and MoE Technology - Dynamic sparsity, particularly through Mixture of Experts (MoE) technology, has gained traction, allowing only a subset of experts to be activated during inference, which can lead to better performance and reduced computational costs [8]. - The trend towards increased sparsity in MoE models, such as Ant Group's recent models, indicates a significant shift in the industry, necessitating larger memory and bandwidth requirements [9]. Group 3: Low-Bit Quantization - The introduction of low-bit quantization techniques, such as FP8 training, has opened new avenues for model efficiency, with a focus on weight-only quantization to alleviate bandwidth bottlenecks [11]. - The article highlights the importance of fine-grained quantization and the potential for mixed quantization strategies to optimize model performance, especially in MoE models [12]. Group 4: Token Compression - Token compression has emerged as a critical area for reducing the computational burden of large models, particularly in visual token processing, which has shown high redundancy [14]. - The article notes a surge in research focused on token compression techniques, which could significantly impact chip design by lowering application barriers for large models [14]. Group 5: Future Implications for Chip Design - The advancements in attention mechanisms, dynamic sparsity, low-bit quantization, and token compression are expected to have substantial implications for the design of future edge chips, which have lagged behind the development of large models [14].
端到端和VLA,正在吸引更多智驾公司的关注......
自动驾驶之心· 2025-10-23 00:04
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language-Action) technical talent in the automotive industry, particularly among major manufacturers and suppliers [1][3] - The industry is evolving from modular production algorithms to end-to-end solutions and now to VLA, with core algorithms involving BEV perception, VLM, diffusion models, reinforcement learning, and world models [3] Group 1: Industry Demand and Trends - The demand for end-to-end and VLA technology talent is high, with inquiries from multiple companies, including three major manufacturers and several suppliers [1] - The industry primarily operates under two paradigms: single-stage and two-stage approaches, with UniAD being a representative of the single-stage model [1] - The end-to-end approach has diversified into various subfields, especially those based on VLA, with a surge in related academic publications and industrial applications in recent years [1] Group 2: Educational Initiatives - The company has launched courses focused on end-to-end and VLA autonomous driving, aimed at helping individuals quickly and efficiently enter these fields [3][12] - The "VLA and Large Model Practical Course" covers VLA from VLM as an autonomous driving interpreter to modular and integrated VLA, including detailed theoretical foundations and practical assignments [3][12] - The "End-to-End and VLA Autonomous Driving Course" focuses on key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [12][14] Group 3: Instructor Expertise - The courses are led by experts from both academia and industry, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [8][11][14] - Instructors have published numerous papers in top-tier conferences and possess extensive experience in research and practical applications in autonomous driving and large models [8][11][14] Group 4: Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts such as transformer models, reinforcement learning, and BEV perception [15][16] - Participants are expected to have a background in probability theory, linear algebra, and programming skills in Python and PyTorch [15][16]
某大型Tier 1中阶项目量产不顺......
自动驾驶之心· 2025-10-23 00:04
Core Viewpoint - The article discusses the challenges and dynamics in the autonomous driving industry, particularly focusing on the relationships between automotive companies and Tier 1 suppliers, highlighting the shift in power dynamics and the need for collaboration in development [5][11][14]. Group 1: Challenges in Production - A major Tier 1 supplier faced difficulties in mass production for a leading automotive company, leading to the need for alternative suppliers to step in [5]. - Many Tier 1 suppliers with strong business capabilities but weak engineering skills are struggling with mass production, resulting in project handovers to more capable suppliers [7][8]. - The trend of some automotive companies shifting projects from underperforming suppliers to those with solid production capabilities is evident, with companies like 易航 benefiting from this transition [7][10]. Group 2: Development Models - The article questions whether autonomous driving should be standardized or customized, emphasizing the importance of tailored solutions for different vehicle types and user preferences [6][9]. - A collaborative development model proposed by 易航 allows automotive companies to build their algorithm capabilities while ensuring that the developed solutions meet specific user needs [8][10]. - The need for joint development between automotive companies and Tier 1 suppliers is highlighted as essential for creating effective autonomous driving solutions that resonate with end-users [9][10]. Group 3: Supplier Dynamics - The power dynamics in the industry are shifting, with Tier 1 suppliers gaining more influence over automotive companies, leading to a situation where companies are no longer seen as "valued clients" [12][13]. - Automotive companies are increasingly seeking reliable Tier 1 suppliers that can provide customized solutions and respond to specific needs, as opposed to the more dominant Tier 1 suppliers who may not be as flexible [13][14]. - The article identifies a limited number of Tier 1 suppliers capable of being "cornerstone suppliers" for automotive companies, with 易航 being highlighted for its ability to collaborate effectively and support self-research initiatives [14].
从地平线自动驾驶2025年的工作,我们看到了HSD的野心......
自动驾驶之心· 2025-10-22 00:03
Core Insights - Horizon is advancing in the autonomous driving sector by focusing on large-scale production of the new HSD system and reshaping the foundational logic of autonomous driving through cutting-edge research papers [2][3] - The company is transitioning from a technology supplier to a standard-defining entity in the industry, supported by capital influx following its Hong Kong listing [2] Group 1: End-to-End Autonomous Driving - ResAD introduces a normalized residual trajectory modeling framework that simplifies the learning task and enhances model performance, achieving a PDMS score of 88.6 in NAVSIM benchmark tests [8] - CorDriver enhances safety in end-to-end autonomous driving by explicitly defining safe passage areas, resulting in a 66.7% reduction in collision rates with traffic participants [11] - TTOG unifies motion prediction and path planning tasks, demonstrating a 36.06% reduction in average L2 error on the nuScenes dataset [15] - MomAD addresses trajectory prediction consistency and stability issues by introducing momentum mechanisms, showing significant improvements in collision rates and trajectory smoothness [19] - GoalFlow generates high-quality multimodal trajectories by using precise target point guidance, achieving a PDMS score of 90.3 in NavSim benchmark tests [22] - RAD employs a large-scale 3DGS-based reinforcement learning framework to enhance safety, reducing collision rates by three times compared to pure imitation learning methods [26] - DiffusionDrive utilizes a truncated diffusion model for real-time end-to-end autonomous driving, achieving an 88.1 PDMS score and significantly improving planning quality [30] Group 2: Autonomous Driving Scene Generation & World Models - Epona is a self-regressive diffusion world model that achieves high-resolution, long-term future scene generation and trajectory planning, outperforming existing methods in the NuScenes dataset [33] - UMGen generates diverse, multimodal driving scenes, supporting user-controlled scenario generation and demonstrating superior authenticity and controllability compared to existing methods [38] - DrivingWorld constructs a world model for autonomous driving via a video GPT framework, generating high-fidelity videos with strong temporal consistency and structural integrity [41] Group 3: Autonomous Driving VLM & VLA - AlphaDrive integrates reinforcement learning and reasoning into visual language models for high-level planning in autonomous driving, improving planning accuracy by 25.52% compared to standard fine-tuning models [45] - The company has established a community of nearly 4,000 members and over 300 autonomous driving companies and research institutions, focusing on various autonomous driving technology stacks [49]
大佬开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
自动驾驶之心· 2025-10-22 00:03
Core Insights - The article discusses the current state and future of AI, particularly focusing on the limitations of reinforcement learning and the timeline for achieving Artificial General Intelligence (AGI) [5][6][10]. Group 1: AGI and AI Development - AGI is expected to take about ten years to develop, contrary to the belief that this year would be the year of agents [12][13]. - Current AI agents, such as Claude and Codex, are impressive but still lack essential capabilities, including multi-modal abilities and continuous learning [13][14]. - The industry has been overly optimistic about the pace of AI development, leading to inflated expectations [12][15]. Group 2: Limitations of Reinforcement Learning - Reinforcement learning is criticized as being inadequate for replicating human learning processes, as it often relies on trial and error without a deep understanding of the problem [50][51]. - The approach of reinforcement learning can lead to noise in the learning process, as it weights every action based on the final outcome rather than the quality of the steps taken [51][52]. - Human learning involves a more complex reflection on successes and failures, which current AI models do not replicate [52][53]. Group 3: Future of AI and Learning Mechanisms - The future of AI may involve more sophisticated attention mechanisms and learning algorithms that better mimic human cognitive processes [33][32]. - There is a need for AI models to develop mechanisms for long-term memory and knowledge retention, which are currently lacking [31][32]. - The integration of AI into programming and development processes is seen as a continuous evolution rather than a sudden leap to superintelligence [45][47].