Workflow
自动驾驶之心
icon
Search documents
上交OccScene:3D OCC生成新框架(TPAMI)
自动驾驶之心· 2025-10-23 00:04
Core Insights - The article discusses the integration of generative models with autonomous driving systems, emphasizing the need for high-quality, large-scale annotated data for training perception models, which is often costly and time-consuming [2] - OccScene is introduced as a solution that combines 3D scene generation with semantic occupancy perception through a novel joint diffusion framework, achieving a synergistic effect where the two tasks enhance each other [3] Innovation and Contributions - A unified perception-generation framework is proposed, where the perception model provides detailed geometric and semantic priors to the generator, creating a beneficial feedback loop [5] - The Mamba-based dual alignment module (MDA) is designed to efficiently align camera trajectories, semantic occupancy, and diffusion features, ensuring cross-view consistency and geometric accuracy in generated content [5] - OccScene demonstrates state-of-the-art (SOTA) performance, generating high-quality images/videos and corresponding 3D semantic occupancy information with just text prompts, significantly enhancing existing SOTA perception models [5] - The mutual learning mechanism promotes the model to find broader and more stable loss minima, avoiding local minima stagnation issues seen in independent learning [5] Comparison with Traditional Methods - OccScene employs a joint learning framework that promotes bidirectional enhancement, unlike traditional methods that treat generation and perception separately [7] - It requires only text prompts for flexible scene generation, contrasting with traditional methods that rely on real annotated data [7] - OccScene provides fine-grained semantic occupancy guidance for more precise geometry, moving away from the coarse geometric control of traditional approaches [7] - The generation process is driven by perception tasks, ensuring the practical utility of generated data [7] Technical Framework - The core of OccScene is the joint perception-generation diffusion framework, integrating semantic occupancy prediction with text-driven generation into a single diffusion process [8] - The training strategy consists of two phases: first, tuning the generator to understand occupancy constraints, and second, mutual learning to achieve bidirectional enhancement [9][10] - A dynamic weighted loss function is designed to balance the two tasks during joint optimization, ensuring stability in training [11][13] Experimental Results - OccScene achieves SOTA performance in 3D scene generation across various tasks, with significantly lower FID scores compared to traditional methods, indicating better quality [20][21] - The generated scenes exhibit more reasonable geometry and clearer details, maintaining high logical consistency in cross-view videos [20][23] - Using OccScene as a data augmentation strategy significantly improves the performance of existing SOTA perception models, demonstrating the high quality and information richness of the synthetic data [24][25] Applications and Value - OccScene is positioned as a critical tool for autonomous driving simulation, generating high-fidelity, diverse driving scenarios, particularly for corner cases, enhancing system robustness at a low cost [32] - It provides controllable and editable virtual environments for navigation and interaction in robotics and AR/VR applications [32] - As a plug-and-play data generator, OccScene addresses data scarcity issues for various downstream 3D vision tasks [32]
关于端侧大模型芯片化的若干趋势思考......
自动驾驶之心· 2025-10-23 00:04
Core Insights - The article discusses the evolution of algorithms in the chip design industry, particularly focusing on the advancements in attention mechanisms and their implications for future chip designs [2][4]. Group 1: Attention Mechanism Evolution - The Transformer architecture has dominated the large model field, but its self-attention mechanism poses significant computational challenges, especially in terms of power requirements during the prefill and decode phases [4]. - Various improvements to the Transformer structure have been proposed, such as Performer, Reformer, and lnformer, but none have achieved widespread application due to a lack of strong demand [4]. - The emergence of linear attention mechanisms aims to reduce computational complexity to linear levels, with models like RWKV and Mamba following this approach [5]. Group 2: Dynamic Sparsity and MoE Technology - Dynamic sparsity, particularly through Mixture of Experts (MoE) technology, has gained traction, allowing only a subset of experts to be activated during inference, which can lead to better performance and reduced computational costs [8]. - The trend towards increased sparsity in MoE models, such as Ant Group's recent models, indicates a significant shift in the industry, necessitating larger memory and bandwidth requirements [9]. Group 3: Low-Bit Quantization - The introduction of low-bit quantization techniques, such as FP8 training, has opened new avenues for model efficiency, with a focus on weight-only quantization to alleviate bandwidth bottlenecks [11]. - The article highlights the importance of fine-grained quantization and the potential for mixed quantization strategies to optimize model performance, especially in MoE models [12]. Group 4: Token Compression - Token compression has emerged as a critical area for reducing the computational burden of large models, particularly in visual token processing, which has shown high redundancy [14]. - The article notes a surge in research focused on token compression techniques, which could significantly impact chip design by lowering application barriers for large models [14]. Group 5: Future Implications for Chip Design - The advancements in attention mechanisms, dynamic sparsity, low-bit quantization, and token compression are expected to have substantial implications for the design of future edge chips, which have lagged behind the development of large models [14].
端到端和VLA,正在吸引更多智驾公司的关注......
自动驾驶之心· 2025-10-23 00:04
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language-Action) technical talent in the automotive industry, particularly among major manufacturers and suppliers [1][3] - The industry is evolving from modular production algorithms to end-to-end solutions and now to VLA, with core algorithms involving BEV perception, VLM, diffusion models, reinforcement learning, and world models [3] Group 1: Industry Demand and Trends - The demand for end-to-end and VLA technology talent is high, with inquiries from multiple companies, including three major manufacturers and several suppliers [1] - The industry primarily operates under two paradigms: single-stage and two-stage approaches, with UniAD being a representative of the single-stage model [1] - The end-to-end approach has diversified into various subfields, especially those based on VLA, with a surge in related academic publications and industrial applications in recent years [1] Group 2: Educational Initiatives - The company has launched courses focused on end-to-end and VLA autonomous driving, aimed at helping individuals quickly and efficiently enter these fields [3][12] - The "VLA and Large Model Practical Course" covers VLA from VLM as an autonomous driving interpreter to modular and integrated VLA, including detailed theoretical foundations and practical assignments [3][12] - The "End-to-End and VLA Autonomous Driving Course" focuses on key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [12][14] Group 3: Instructor Expertise - The courses are led by experts from both academia and industry, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [8][11][14] - Instructors have published numerous papers in top-tier conferences and possess extensive experience in research and practical applications in autonomous driving and large models [8][11][14] Group 4: Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts such as transformer models, reinforcement learning, and BEV perception [15][16] - Participants are expected to have a background in probability theory, linear algebra, and programming skills in Python and PyTorch [15][16]
某大型Tier 1中阶项目量产不顺......
自动驾驶之心· 2025-10-23 00:04
Core Viewpoint - The article discusses the challenges and dynamics in the autonomous driving industry, particularly focusing on the relationships between automotive companies and Tier 1 suppliers, highlighting the shift in power dynamics and the need for collaboration in development [5][11][14]. Group 1: Challenges in Production - A major Tier 1 supplier faced difficulties in mass production for a leading automotive company, leading to the need for alternative suppliers to step in [5]. - Many Tier 1 suppliers with strong business capabilities but weak engineering skills are struggling with mass production, resulting in project handovers to more capable suppliers [7][8]. - The trend of some automotive companies shifting projects from underperforming suppliers to those with solid production capabilities is evident, with companies like 易航 benefiting from this transition [7][10]. Group 2: Development Models - The article questions whether autonomous driving should be standardized or customized, emphasizing the importance of tailored solutions for different vehicle types and user preferences [6][9]. - A collaborative development model proposed by 易航 allows automotive companies to build their algorithm capabilities while ensuring that the developed solutions meet specific user needs [8][10]. - The need for joint development between automotive companies and Tier 1 suppliers is highlighted as essential for creating effective autonomous driving solutions that resonate with end-users [9][10]. Group 3: Supplier Dynamics - The power dynamics in the industry are shifting, with Tier 1 suppliers gaining more influence over automotive companies, leading to a situation where companies are no longer seen as "valued clients" [12][13]. - Automotive companies are increasingly seeking reliable Tier 1 suppliers that can provide customized solutions and respond to specific needs, as opposed to the more dominant Tier 1 suppliers who may not be as flexible [13][14]. - The article identifies a limited number of Tier 1 suppliers capable of being "cornerstone suppliers" for automotive companies, with 易航 being highlighted for its ability to collaborate effectively and support self-research initiatives [14].
从地平线自动驾驶2025年的工作,我们看到了HSD的野心......
自动驾驶之心· 2025-10-22 00:03
Core Insights - Horizon is advancing in the autonomous driving sector by focusing on large-scale production of the new HSD system and reshaping the foundational logic of autonomous driving through cutting-edge research papers [2][3] - The company is transitioning from a technology supplier to a standard-defining entity in the industry, supported by capital influx following its Hong Kong listing [2] Group 1: End-to-End Autonomous Driving - ResAD introduces a normalized residual trajectory modeling framework that simplifies the learning task and enhances model performance, achieving a PDMS score of 88.6 in NAVSIM benchmark tests [8] - CorDriver enhances safety in end-to-end autonomous driving by explicitly defining safe passage areas, resulting in a 66.7% reduction in collision rates with traffic participants [11] - TTOG unifies motion prediction and path planning tasks, demonstrating a 36.06% reduction in average L2 error on the nuScenes dataset [15] - MomAD addresses trajectory prediction consistency and stability issues by introducing momentum mechanisms, showing significant improvements in collision rates and trajectory smoothness [19] - GoalFlow generates high-quality multimodal trajectories by using precise target point guidance, achieving a PDMS score of 90.3 in NavSim benchmark tests [22] - RAD employs a large-scale 3DGS-based reinforcement learning framework to enhance safety, reducing collision rates by three times compared to pure imitation learning methods [26] - DiffusionDrive utilizes a truncated diffusion model for real-time end-to-end autonomous driving, achieving an 88.1 PDMS score and significantly improving planning quality [30] Group 2: Autonomous Driving Scene Generation & World Models - Epona is a self-regressive diffusion world model that achieves high-resolution, long-term future scene generation and trajectory planning, outperforming existing methods in the NuScenes dataset [33] - UMGen generates diverse, multimodal driving scenes, supporting user-controlled scenario generation and demonstrating superior authenticity and controllability compared to existing methods [38] - DrivingWorld constructs a world model for autonomous driving via a video GPT framework, generating high-fidelity videos with strong temporal consistency and structural integrity [41] Group 3: Autonomous Driving VLM & VLA - AlphaDrive integrates reinforcement learning and reasoning into visual language models for high-level planning in autonomous driving, improving planning accuracy by 25.52% compared to standard fine-tuning models [45] - The company has established a community of nearly 4,000 members and over 300 autonomous driving companies and research institutions, focusing on various autonomous driving technology stacks [49]
大佬开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
自动驾驶之心· 2025-10-22 00:03
Core Insights - The article discusses the current state and future of AI, particularly focusing on the limitations of reinforcement learning and the timeline for achieving Artificial General Intelligence (AGI) [5][6][10]. Group 1: AGI and AI Development - AGI is expected to take about ten years to develop, contrary to the belief that this year would be the year of agents [12][13]. - Current AI agents, such as Claude and Codex, are impressive but still lack essential capabilities, including multi-modal abilities and continuous learning [13][14]. - The industry has been overly optimistic about the pace of AI development, leading to inflated expectations [12][15]. Group 2: Limitations of Reinforcement Learning - Reinforcement learning is criticized as being inadequate for replicating human learning processes, as it often relies on trial and error without a deep understanding of the problem [50][51]. - The approach of reinforcement learning can lead to noise in the learning process, as it weights every action based on the final outcome rather than the quality of the steps taken [51][52]. - Human learning involves a more complex reflection on successes and failures, which current AI models do not replicate [52][53]. Group 3: Future of AI and Learning Mechanisms - The future of AI may involve more sophisticated attention mechanisms and learning algorithms that better mimic human cognitive processes [33][32]. - There is a need for AI models to develop mechanisms for long-term memory and knowledge retention, which are currently lacking [31][32]. - The integration of AI into programming and development processes is seen as a continuous evolution rather than a sudden leap to superintelligence [45][47].
SFT的本质,其实是在优化RL目标的下界...
自动驾驶之心· 2025-10-22 00:03
作者 | 欲壑难填@知乎 转自 | SFT 其实在优化 RL 目标的下界 原文链接: https://zhuanlan.zhihu.com/p/1950847739404456574 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 本文只做学术分享,如有侵权,联系删文 ,欢迎添加小助理微信AIDriver004做进一步咨询 TL;DR:本文推导出在稀疏奖励的情况下, 标准 SFT 的训练目标其实是 RL 目标的一个(松的)下界,为了收紧这个下界同时 保持训练稳定,作者引入了一个桥梁分布 来进行调节。最终在形式上得到了一个重要性加权版本的 SFT 目标。 论文链接:https://arxiv.org/abs/2507.12856 SFT 的优化目标是 RL 的下界 在 SFT 的设定下,我们只有 "好的" 回复数据。从 RL 的视角来看,这可以理解为我们有一个打分函数 能够区分出好的回 复和差的回复,并据此构建一个奖励函数 ,只对打分值为正的样本给出奖励值 1,其他样本奖励值均为 首先,我们通过目标函数的推导,将 SFT 和 RL 联系起来。 RL 策略梯度算法中,训练策略 ...
我们的具身社区,最近又增加了很多模块~
自动驾驶之心· 2025-10-22 00:03
Core Viewpoint - The article emphasizes the development and enhancement of a community focused on embodied intelligence, highlighting the addition of various modules and resources to support members in their projects and learning [1][14]. Group 1: Community Development - The community has expanded its sections to include VLA, real2sim2real, mobile operations, world models, and domain adaptation, along with high-quality live broadcasts [1]. - The community aims to create a closed-loop exchange across various fields, including industry, academia, job seeking, and Q&A [1]. Group 2: Live Sharing and Technical Resources - Continuous live sharing sessions, including roundtable forums, are organized to discuss the current state and challenges in the embodied intelligence industry [3]. - A comprehensive technical roadmap has been developed for beginners, providing a structured learning path [5]. Group 3: Industry and Project Solutions - Valuable industry frameworks and project solutions are provided for members already engaged in related research [9]. - The community has established a job referral mechanism with several embodied intelligence companies, facilitating job placements for members [11]. Group 4: Educational Resources and Networking - The community offers a compilation of over 40 open-source projects and nearly 60 datasets related to embodied intelligence, along with mainstream simulation platforms and various learning routes [14]. - Members can access exclusive learning videos and documents, fostering a conducive learning environment and networking opportunities [19]. Group 5: Comprehensive Resource Compilation - The community has gathered resources on various aspects of embodied intelligence, including research reports, books, component manufacturers, and simulation platforms [22][25][27][37]. - Specific learning paths for embodied perception and interaction, as well as reinforcement learning, are outlined to assist members in their studies [43][45][59].
我们正在寻找自动驾驶领域的合伙人...
自动驾驶之心· 2025-10-22 00:03
Group 1 - The article announces the recruitment of 10 outstanding partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include large models, multimodal models, diffusion models, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation, and model deployment and quantization [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes resource sharing for job seeking, doctoral recommendations, and study abroad opportunities, along with substantial cash incentives and collaboration on entrepreneurial projects [5] - Interested parties are encouraged to add WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]
提供最专业的平台和运营团队!我们正在招募运营的同学~
自动驾驶之心· 2025-10-21 00:06
Core Viewpoint - The automatic driving industry is rapidly evolving, with increasing demand and a growing number of business lines, indicating a healthy development in the sector [1]. Group 1: Team Overview - The team has developed four key IPs over two years: embodied intelligence, automatic driving, 3D vision, and large model technology, with a total audience of nearly 360,000 across various platforms [1]. Group 2: Recruitment - The company is hiring full-time and part-time positions for operations and sales roles to support its expanding business [2]. Group 3: Job Responsibilities and Requirements - Responsibilities for the operations role include managing course progress, enhancing platform engagement, and developing content related to the automatic driving and AI industries [4]. - The sales role involves creating promotional content for online and hardware products and liaising with hardware manufacturers and academic/enterprise clients [5][6]. - Candidates are expected to have strong execution skills, a relevant educational background, and familiarity with social media platforms [12]. Group 4: Growth Opportunities - The company offers exposure to top-tier operational teams, providing opportunities to learn operational techniques and sales strategies, which can lead to rapid personal growth [7]. - There are also opportunities for further academic pursuits, such as research and doctoral studies, which can enhance personal development [9].