Workflow
扩散模型
icon
Search documents
从零开始!自动驾驶端到端与VLA学习路线图~
自动驾驶之心· 2025-08-24 23:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 端到端和VLA涉及的技术栈实在是太多了,今天就从小白入门学习的角度和大家聊聊端到端和VLA的发展路线。 首先看一下大语言模型的近五年的关键时间线: 聊大模型,离不开Transformer,为了方便后续理解,我们进行一个通俗的概括。 进一步展开Token化、BPE、位置编码等等~ Transformer: Attention is all you need 3. 合并频次最高的两个非结束字符组成一个新 字符,并重新统计所有字符频次(新字符会分 走部分原高频字符的频次 ) 4. 重复2-3直至字符数量达标or迭代轮次达标 $$P E_{(p o s,2i)}=s i n(p o s/10000^{2i/d_{\mathrm{model}}})$$ PE(pos,2i+1) = COS(pos/1000022/dmodel 7 x D 向量 "这是一段文字" Tokenizer + Positional 231 34 462 4758 762 38 7 x D 向量 Encoding [EQgmbedding 7 ...
DiT突遭怒喷,谢赛宁淡定回应
量子位· 2025-08-20 07:48
Core Viewpoint - The article discusses the recent criticisms of the DiT (Diffusion Transformers) model, which is considered a cornerstone in the diffusion model field, highlighting the importance of scientific scrutiny and empirical validation in research [3][10]. Group 1: Criticism of DiT - A user has raised multiple concerns about DiT, claiming it is flawed both mathematically and in its structure, even questioning the presence of Transformer elements in DiT [4][12]. - The criticisms are based on a paper titled "TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training," which introduces a strategy that allows early-layer tokens to be passed to deeper layers without modifying the architecture or adding parameters [12][14]. - The critic argues that the rapid decrease in FID (Fréchet Inception Distance) during training indicates that DiT's architecture has inherent properties that allow it to easily learn the dataset [15]. - The Tread model reportedly trains 14 times faster than DiT after 400,000 iterations and 37 times faster at its best performance after 7 million iterations, suggesting that significant performance improvements may undermine previous methods [16][17]. - The critic also suggests that if parts of the network are disabled during training, it could render the network ineffective [19]. - It is noted that the more network units in DiT that are replaced with identity mappings during training, the better the model evaluation results [20]. - The architecture of DiT is said to require logarithmic scaling to represent the signal-to-noise ratio differences during the diffusion process, indicating potential issues with output dynamics [23]. - Concerns are raised regarding the Adaptive Layer Normalization method, suggesting that DiT processes conditional inputs through a standard MLP (Multi-Layer Perceptron) without clear Transformer characteristics [25][26]. Group 2: Response from Xie Saining - Xie Saining, the author of DiT, responded to the criticisms, asserting that the Tread model's findings do not invalidate DiT [27]. - He acknowledges the Tread model's contributions but emphasizes that its effectiveness is due to regularization enhancing feature robustness, not because DiT is incorrect [28]. - Saining highlights that Lightning DiT, an upgraded version of DiT, remains a powerful option and should be prioritized when conditions allow [29]. - He also states that there is no evidence to suggest that the post-layer normalization in DiT causes issues [30]. - Saining summarizes improvements made over the past year, focusing on internal representation learning and various methods for enhancing model training [32]. - He mentions that the sd-vae (stochastic depth variational autoencoder) is a significant concern for DiT, particularly regarding its high computational cost for processing images at 256×256 resolution [34].
DiT在数学和形式上是错的?谢赛宁回应:不要在脑子里做科学
机器之心· 2025-08-20 04:26
机器之心报道 编辑:冷猫,+0 「兄弟们,DiT 是错的!」 最近一篇帖子在 X 上引发了很大的讨论,有博主表示 DiT 存在架构上的缺陷,并附上一张论文截图。 博主提到的论文发表于今年 1 月(3 月更新 v2),介绍了一种名为 TREAD 的新方法,该工作通过一种创新的「令牌路由」(token routing)机制,在不改变模型 架构的情况下,极大地提升了训练效率和生成图像的质量,从而在速度和性能上都 显著超越了 DiT 模型 。 具体而言,TREAD 在训练过程中使用「部分令牌集」(partial token set) vs 「完整令牌集」(full token set),通过预定义路由保存信息并重新引入到更深层,跳 过部分计算以减少成本,同时仅用于训练阶段,推理时仍采用标准设置。这与 MaskDiT 等方法类似,但更高效。 图 1. 我们引入了 TREAD ,这是一种能够显著提升基于 token 的扩散模型骨干网络训练效率的训练策略。当应用于标准的 DiT 骨干网络时,我们在无引导 FID 指标上实现了 14/37 倍的训练速度提升,同时 也收敛到了更好的生成质量。 图中横轴代表训练时间(以 A100 ...
公司通知团队缩减,懂端到端的留下来了。。。
自动驾驶之心· 2025-08-19 23:32
Core Viewpoint - The article discusses the rapid evolution and challenges in the field of end-to-end autonomous driving technology, emphasizing the need for a comprehensive understanding of various algorithms and models to succeed in this competitive industry [2][4][6]. Group 1: Industry Trends - The shift from modular approaches to end-to-end systems in autonomous driving aims to eliminate cumulative errors between modules, marking a significant technological leap [2]. - The emergence of various algorithms and models, such as UniAD and BEV perception, indicates a growing focus on integrating multiple tasks into a unified framework [4][9]. - The demand for knowledge in multi-modal large models, reinforcement learning, and diffusion models is increasing, reflecting the industry's need for versatile skill sets [5][20]. Group 2: Learning Challenges - New entrants face difficulties due to the fragmented nature of knowledge and the overwhelming volume of research papers in the field, often leading to early abandonment of learning [5][6]. - The lack of high-quality documentation and practical guidance further complicates the transition from theory to practice in end-to-end autonomous driving research [5][6]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the learning challenges, focusing on practical applications and theoretical foundations [6][24]. - The course is structured to provide a comprehensive understanding of end-to-end algorithms, including their historical development and current trends [11][12]. - Practical components, such as real-world projects and assignments, are included to ensure that participants can apply their knowledge effectively [8][21]. Group 4: Course Content Overview - The course covers various topics, including the introduction to end-to-end algorithms, background knowledge on relevant technologies, and detailed explorations of both one-stage and two-stage end-to-end methods [11][12][13]. - Specific chapters focus on advanced topics like world models and diffusion models, which are crucial for understanding the latest advancements in autonomous driving [15][17][20]. - The final project involves practical applications of reinforcement learning from human feedback (RLHF), allowing participants to gain hands-on experience [21].
端到端VLA的起点:聊聊大语言模型和CLIP~
自动驾驶之心· 2025-08-19 07:20
Core Viewpoint - The article discusses the development and significance of end-to-end (E2E) algorithms in autonomous driving, emphasizing the integration of various advanced technologies such as large language models (LLMs), diffusion models, and reinforcement learning (RL) in enhancing the capabilities of autonomous systems [21][31]. Summary by Sections Section 1: Overview of End-to-End Autonomous Driving - The first chapter provides a comprehensive overview of the evolution of end-to-end algorithms, explaining the transition from modular approaches to end-to-end solutions, and discussing the advantages and challenges of different paradigms [40]. Section 2: Background Knowledge - The second chapter focuses on the technical stack associated with end-to-end systems, detailing the importance of LLMs, diffusion models, and reinforcement learning, which are crucial for understanding the future job market in this field [41][42]. Section 3: Two-Stage End-to-End Systems - The third chapter delves into two-stage end-to-end systems, exploring their emergence, advantages, and disadvantages, while also reviewing notable works in the field such as PLUTO and CarPlanner [42][43]. Section 4: One-Stage End-to-End and VLA - The fourth chapter highlights one-stage end-to-end systems, discussing various subfields including perception-based methods and the latest advancements in VLA (Vision-Language Alignment), which are pivotal for achieving the ultimate goals of autonomous driving [44][50]. Section 5: Practical Application and RLHF Fine-Tuning - The fifth chapter includes a major project focused on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing practical insights into building pre-training and reinforcement learning modules, which are applicable to VLA-related algorithms [52]. Course Structure and Learning Outcomes - The course aims to equip participants with a solid understanding of end-to-end autonomous driving technologies, covering essential frameworks and methodologies, and preparing them for roles in the industry [56][57].
都在做端到端了,轨迹预测还有出路么?
自动驾驶之心· 2025-08-19 03:35
⼀、 端到端离不开的轨迹预测 端到端量产以来,很多规划控制和轨迹预测放同学都很焦虑,都想着转行做感知模型,怕自己过两年失业。但 这一年多以来,据自动驾驶之心了解,一段式端到端上车的并不多,很多公司依然沿用二段式端到端或者模块 化的方法,轨迹预测或者说联合预测仍然是量产使用最多的算法, 依然是许多公司和机构研究的热点。但更 进一步,其实轨迹预测的模型和感知模型融合在一起训练,其实就是所谓的端到端,因此 相关的会议和期刊 依然有较大量的工作产出。 自动驾驶之心针对目前比较火的基于扩散模型的多智能体轨迹预测方法研究展开了首个1v6小班课!本课题聚 焦于"基于扩散模型的多智能体轨迹预测方法"。多智能体轨迹预测旨在根据多个交互主体的历史轨迹,预测其 未来运动轨迹,这在自动驾驶、智能监控和机器人导航等场景中至关重要。然而,由于人的行为具有不确定性 和多模态性,预测任务十分困难。传统方法通常依赖循环神经网络、卷积网络或图神经网络建模社会交互,而 生成模型(如GAN和CVAE)虽然可以模拟多模态分布,但效率不高。 扩散模型是一类通过逐步去噪实现复杂分布生成的新型模型,近年来在图像生成等领域取得了重大突破。研究 者发现将扩散模 ...
从顶会和量产方案来看,轨迹预测还有很多内容值得做......
自动驾驶之心· 2025-08-18 12:00
Core Viewpoint - The article emphasizes the ongoing relevance and importance of trajectory prediction in autonomous driving, despite the rise of VLA (Vehicle Localization and Awareness) technologies. It highlights that trajectory prediction remains a critical module for ensuring safety and efficiency in driving systems [1][2]. Group 1: Trajectory Prediction Importance - Trajectory prediction is essential for autonomous driving systems as it helps in identifying potential hazards and planning optimal driving routes, thereby enhancing safety and efficiency [1]. - The quality of trajectory prediction directly impacts the planning and control of autonomous vehicles, making it a fundamental component of intelligent driving systems [1]. Group 2: Research and Development in Trajectory Prediction - Academic research in trajectory prediction is thriving, with significant focus on joint prediction, multi-agent prediction, and diffusion-based approaches, which are gaining traction in major conferences [1]. - The introduction of diffusion models has shown promise in improving multi-modal modeling capabilities for trajectory prediction, addressing the challenges posed by human behavior's uncertainty and multi-modality [2][3]. Group 3: Course Offering and Objectives - A new course on trajectory prediction using diffusion models is being offered, aimed at teaching research methods and paper publication strategies, particularly for multi-agent trajectory prediction [2][9]. - The course will cover various aspects, including classic and cutting-edge papers, baseline models, datasets, and writing methodologies, to help students develop a comprehensive understanding of the field [7][9]. Group 4: Course Structure and Content - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, with a focus on empirical validation using public datasets like ETH, UCY, and SDD [12][24]. - Key topics include the introduction of diffusion models, traditional trajectory prediction methods, and advanced techniques for integrating social interaction modeling and conditional control mechanisms [28][29].
都在聊轨迹预测,到底如何与自动驾驶结合?
自动驾驶之心· 2025-08-16 00:03
Core Viewpoint - The article emphasizes the significant role of diffusion models in enhancing the capabilities of autonomous driving systems, particularly in data diversity, perception robustness, and decision-making under uncertainty [2][3]. Group 1: Applications of Diffusion Models - Diffusion models improve 3D occupancy prediction, outperforming traditional methods, especially in occluded or low-visibility areas, thus aiding downstream planning tasks [5]. - Conditional diffusion models are utilized for precise image translation in driving scenarios, enhancing system understanding of various road environments [5]. - Stable diffusion models efficiently predict vehicle trajectories, significantly boosting the predictive capabilities of autonomous driving systems [5]. - The DiffusionDrive framework innovatively applies diffusion models to multimodal action distribution, addressing uncertainties in driving decisions [5]. Group 2: Data Generation and Quality - Diffusion models effectively tackle the challenges of insufficient diversity and authenticity in natural driving datasets, providing high-quality synthetic data for autonomous driving validation [5]. - Future explorations will include video generation to further enhance data quality, particularly in 3D data annotation [5]. Group 3: Recent Research Developments - The dual-conditioned temporal diffusion model (DcTDM) generates realistic long-duration driving videos, outperforming existing models by over 25% in consistency and frame quality [7]. - LD-Scene integrates large language models with latent diffusion models for user-controllable adversarial scenario generation, achieving state-of-the-art performance in generating high adversariality and diversity [11]. - DualDiff enhances multi-view driving scene generation through a dual-branch conditional diffusion model, achieving state-of-the-art performance in various downstream tasks [14][34]. Group 4: Traffic Simulation and Scenario Generation - DriveGen introduces a novel traffic simulation framework that generates diverse traffic scenarios, supporting customized designs and improving downstream algorithm performance [26]. - Scenario Dreamer utilizes a vectorized latent diffusion model for generating driving simulation environments, demonstrating superior performance in realism and efficiency [28][31]. - AdvDiffuser generates adversarial safety-critical driving scenarios, enhancing transferability across different systems while maintaining high realism and diversity [68]. Group 5: Safety and Robustness - AVD2 enhances understanding of accident scenarios through the generation of accident videos aligned with natural language descriptions, significantly advancing accident analysis and prevention [39]. - Causal Composition Diffusion Model (CCDiff) improves the generation of closed-loop traffic scenarios by incorporating causal structures, demonstrating enhanced realism and user preference alignment [44].
端到端离不开的轨迹预测,这个方向还有研究价值吗?
自动驾驶之心· 2025-08-16 00:03
⼀、 端到端盛行的当下,轨迹预测这个方向还有研究价值吗? 最近有同学后台问我们,现在都是搞端到端了,前面的轨迹预测和规划控制还有啥研究的价值吗?端到端真的 上车的并不多,很多依然沿用分层方案,其中轨迹预测作为后半段的核心算法,依然是许多公司和机构研究的 热点。包括联合轨迹预测和目标轨迹预测。相关的会议和期刊依然有较大量的工作产出。 自动驾驶之心针对目前比较火的基于扩散模型的多智能体轨迹预测方法研究展开了首个1v6小班课!本课题聚 焦于"基于扩散模型的多智能体轨迹预测方法"。多智能体轨迹预测旨在根据多个交互主体的历史轨迹,预测其 未来运动轨迹,这在自动驾驶、智能监控和机器人导航等场景中至关重要。然而,由于人的行为具有不确定性 和多模态性,预测任务十分困难。传统方法通常依赖循环神经网络、卷积网络或图神经网络建模社会交互,而 生成模型(如GAN和CVAE)虽然可以模拟多模态分布,但效率不高。 扩散模型是一类通过逐步去噪实现复杂分布生成的新型模型,近年来在图像生成等领域取得了重大突破。研究 者发现将扩散模型应用于轨迹预测可以显著提升多模态建模能力。例如,LeapfrogDiffusionModel(LED)采 用可训 ...
死磕技术的自动驾驶黄埔军校,4000人了!
自动驾驶之心· 2025-08-15 14:23
Core Viewpoint - The article emphasizes the establishment of a comprehensive community focused on autonomous driving, aiming to bridge the gap between academia and industry while providing valuable resources for learning and career opportunities in the field [2][16]. Group 1: Community and Resources - The community has created a closed-loop system covering various fields such as industry, academia, job seeking, and Q&A exchanges, enhancing the learning experience for participants [2][3]. - The platform offers cutting-edge academic content, industry roundtables, open-source code solutions, and timely job information, significantly reducing the time needed for research [3][16]. - Members can access nearly 40 technical routes, including industry applications, VLA benchmarks, and entry-level learning paths, catering to both beginners and advanced researchers [3][16]. Group 2: Learning and Development - The community provides a well-structured learning path for beginners, including foundational knowledge in mathematics, computer vision, deep learning, and programming [10][12]. - For those already engaged in research, valuable industry frameworks and project proposals are available to further their understanding and application of autonomous driving technologies [12][14]. - Continuous job sharing and career opportunities are promoted within the community, fostering a complete ecosystem for autonomous driving [14][16]. Group 3: Technical Focus Areas - The community has compiled extensive resources on various technical aspects of autonomous driving, including perception, simulation, planning, and control [16][17]. - Specific learning routes are available for topics such as end-to-end learning, 3DGS principles, and multi-modal large models, ensuring comprehensive coverage of the field [16][17]. - The platform also features a collection of open-source projects and datasets relevant to autonomous driving, facilitating hands-on experience and practical application [32][34].