Diffusion Model

Search documents
Diffusion²:一个双扩散模型,破解自动驾驶“鬼探头”难题!
自动驾驶之心· 2025-10-09 23:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Yuhao Luo等 编辑 | 自动驾驶之 心 同济大学和威斯康星大学麦迪逊分校最近的一篇工作,很有意思分享给大家。这篇工作指出一个问题:当行人突然从盲区出现时,往往无法获得足够的观测数据(即 瞬时轨迹),因此交通事故的风险很高。 换句话说就是鬼探头的场景,如何做好行人的轨迹预测。 针对这个问题,他们提出了Diffusion² - 专为瞬时轨迹预测而设计。Diffusion²由两个串联的扩散模型组成:一个用于 反向预测 ,生成未观测到的历史轨迹;另一个用 于 正向预测 ,预测未来轨迹。考虑到生成的未观测历史轨迹可能会引入额外的噪声,提出了一种 双头参数化机制 来估计其偶然不确定性(aleatoric uncertainty),并 设计了一个 时间自适应噪声模块 ,该模块在前向扩散过程中动态调节噪声尺度。实验证明,Diffusion2在ETH/UCY和斯坦福无人机(Stanford Drone)数据集上的瞬时 轨迹预测任务中树立了新的最先进水平。 ...
合伙人招募!4D标注/世界模型/VLA/模型部署等方向
自动驾驶之心· 2025-09-27 23:33
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The recruitment targets individuals with expertise in various advanced models and technologies related to autonomous driving, such as large models, multimodal models, and 3D target detection [3] - Candidates are preferred to have a master's degree or higher from universities ranked within the QS200, with priority given to those with significant conference contributions [4] Group 2 - The benefits for partners include resource sharing related to job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives [5] - Opportunities for collaboration on entrepreneurial projects are also highlighted [5] - Interested parties are encouraged to contact via WeChat for further inquiries regarding collaboration in the autonomous driving field [6]
地平线&清华Epona:自回归式世界端到端模型~
自动驾驶之心· 2025-08-12 23:33
Core Viewpoint - The article discusses a unified framework for autonomous driving world models that can generate long-term high-resolution video while providing real-time trajectory planning, addressing limitations of existing methods [5][12]. Group 1: Existing Methods and Limitations - Current diffusion models, such as Vista, can only generate fixed-length videos (≤15 seconds) and struggle with flexible long-term predictions (>2 minutes) and multi-modal trajectory control [7]. - GPT-style autoregressive models, like GAIA-1, can extend indefinitely but require discretizing images into tokens, which degrades visual quality and lacks continuous action trajectory generation capabilities [7][13]. Group 2: Proposed Methodology - The proposed world model in the autonomous driving domain uses a series of forward camera observations and corresponding driving trajectories to predict future driving dynamics [10]. - The framework decouples spatiotemporal modeling using causal attention in a GPT-style transformer and a dual-diffusion transformer for spatial rendering and trajectory generation [12]. - An asynchronous multimodal generation mechanism allows for parallel generation of 3-second trajectories and the next frame image, achieving 20Hz real-time planning with a 90% reduction in inference computational power [12]. Group 3: Model Structure and Training - The Multimodal Spatiotemporal Transformer (MST) encodes past driving scenes and action sequences, enhancing temporal position encoding for implicit representation [16]. - The Trajectory Planning Diffusion Transformer (TrajDiT) and Next-frame Prediction Diffusion Transformer (VisDiT) are designed to handle trajectory and image predictions, respectively, with a focus on action control [21]. - A chain-of-forward training strategy is employed to mitigate the "drift problem" in autoregressive inference by simulating prediction noise during training [24]. Group 4: Performance Evaluation - The model demonstrates superior performance in video generation metrics, achieving a FID score of 7.5 and a FVD score of 82.8, outperforming several existing models [28]. - In trajectory control metrics, the proposed method achieves a high accuracy rate of 97.9% in comparison to other methods [34]. Group 5: Conclusion and Future Directions - The framework integrates image generation and vehicle trajectory prediction with high quality, showing strong potential for applications in closed-loop simulation and reinforcement learning [36]. - However, the current model is limited to single-camera input, indicating a need for addressing multi-camera consistency and point cloud generation challenges in the autonomous driving field [36].
自动驾驶论文速递 | GS-Occ3D、BEV-LLM、协同感知、强化学习等~
自动驾驶之心· 2025-07-30 03:01
Group 1 - The article discusses recent advancements in autonomous driving technologies, highlighting several innovative frameworks and models [3][9][21][33][45] - GS-Occ3D achieves state-of-the-art (SOTA) geometric accuracy with a 0.56 corner distance (CD) on the Waymo dataset, demonstrating superior performance over LiDAR-based methods [3][5] - BEV-LLM introduces a lightweight multimodal scene description model that outperforms existing models by 5% in BLEU-4 score, showcasing the integration of LiDAR and multi-view images [9][10] - CoopTrack presents an end-to-end cooperative perception framework that sets new SOTA performance on the V2X-Seq dataset with 39.0% mAP and 32.8% AMOTA [21][22] - The Diffusion-FS model achieves a 0.7767 IoU in free-space prediction, marking a significant improvement in multimodal driving channel prediction [45][48] Group 2 - GS-Occ3D's contributions include a scalable visual occupancy label generation pipeline that eliminates reliance on LiDAR annotations, enhancing the training efficiency for downstream models [5][6] - BEV-LLM utilizes BEVFusion to combine 360-degree panoramic images with LiDAR point clouds, improving the accuracy of scene descriptions [10][12] - CoopTrack's innovative instance-level end-to-end framework integrates cooperative tracking and perception, enhancing the learning capabilities across agents [22][26] - The ContourDiff model introduces a novel self-supervised method for generating free-space samples, reducing dependency on dense annotated data [48][49]
Diffusion/VAE/RL 数学原理
自动驾驶之心· 2025-07-29 00:52
Core Viewpoint - The article discusses the principles and applications of Diffusion Models and Variational Autoencoders (VAE) in the context of machine learning, particularly focusing on their mathematical foundations and training methodologies. Group 1: Diffusion Models - The training objective of the network is to fit the mean and variance of two Gaussian distributions during the denoising process [7] - The KL divergence term is crucial for fitting the theoretical values and the network's predicted values in the denoising process [9] - The process of transforming the uncertain variable \(x_0\) into the uncertain noise \(\epsilon\) is iteratively predicted [15] Group 2: Variational Autoencoders (VAE) - VAE assumes that the latent distribution follows a Gaussian distribution, which is essential for its generative capabilities [19] - The training of VAE is transformed into a combination of reconstruction loss and KL divergence constraint loss to prevent the latent space from degenerating into a sharp distribution [26] - Minimizing the KL loss corresponds to maximizing the Evidence Lower Bound (ELBO) [27] Group 3: Reinforcement Learning (RL) - The Markov Decision Process (MDP) framework is utilized, which includes states and actions in a sequential manner [35] - The semantic representation aims to approach a pulse distribution, while the generated representation is expected to follow a Gaussian distribution [36] - Policy gradient methods are employed to enable the network to learn the optimal action given a state [42]
一边是毕业等于失业,一边是企业招不到人,太难了。。。
自动驾驶之心· 2025-07-23 09:56
Core Insights - The automatic driving industry is experiencing a paradox where job openings are abundant, yet companies struggle to find suitable talent. This is attributed to a shift in market expectations and a focus on sustainable business models rather than rapid expansion [2][3]. Industry Overview - Companies in the automatic driving sector are now more cautious with their spending, prioritizing survival and the establishment of viable business models over aggressive hiring and expansion strategies. This shift is expected to lead to significant industry adjustments within the next 1-3 years [2][3]. Talent Demand - There is an unprecedented demand for "top talent" and "highly compatible talent" in the automatic driving field. Companies are not necessarily unwilling to hire, but they are looking for candidates with exceptional skills and relevant experience [4][3]. Community and Resources - The "Automatic Driving Heart Knowledge Planet" is the largest community focused on automatic driving technology in China, established to provide resources and networking opportunities for professionals in the field. It has nearly 4000 members and over 100 industry experts contributing to discussions and knowledge sharing [9][10]. Learning and Development - The community offers comprehensive learning pathways covering various subfields of automatic driving technology, including perception, mapping, and AI model deployment. This initiative aims to support both newcomers and experienced professionals in enhancing their skills [9][12][13]. Job Placement Support - The community has established a direct referral mechanism with numerous automatic driving companies, facilitating job placements for members. This service aims to streamline the hiring process and connect qualified candidates with potential employers [10][9].
双非研究生,今年找工作有些迷茫。。。
自动驾驶之心· 2025-07-14 14:04
Core Viewpoint - The article emphasizes the importance of staying updated with cutting-edge technologies in the fields of autonomous driving and embodied intelligence, highlighting the need for strong technical skills and knowledge in advanced areas such as large models, reinforcement learning, and 3D graphics [4][5]. Group 1: Industry Trends - There is a growing demand for talent in the fields of robotics and embodied intelligence, with many startups receiving significant funding and showing rapid growth potential [4][5]. - Major companies are shifting their focus towards more advanced technologies, moving from traditional methods to end-to-end solutions and large models, indicating a technological evolution in the industry [4][5]. - The community aims to build a comprehensive ecosystem that connects academia, products, and recruitment, fostering a collaborative environment for knowledge sharing and job opportunities [6]. Group 2: Technical Directions - The article outlines four key technical directions in the industry: visual large language models, world models, diffusion models, and end-to-end autonomous driving [9]. - It provides resources and summaries of various research papers and datasets related to these technologies, indicating a strong emphasis on research and development [10][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][35][36][38]. Group 3: Community and Learning Resources - The community offers a variety of learning materials, including video courses, hardware, and coding resources, aimed at equipping individuals with the necessary skills for the evolving job market [6]. - There is a focus on creating a supportive environment for discussions on the latest industry trends, technical challenges, and job opportunities, which is crucial for professionals looking to advance their careers [6].
4000人的自动驾驶黄埔军校,死磕技术分享与求职交流~
自动驾驶之心· 2025-07-12 14:43
Core Viewpoint - The smart driving industry is experiencing significant growth, with companies willing to invest heavily in research and talent acquisition, indicating a robust job market and opportunities for new entrants [2][3]. Group 1: Industry Trends - The smart driving sector continues to attract substantial funding for research and development, with companies offering competitive salaries to attract talent [2]. - There is a noticeable trend of shorter technology iteration cycles in the autonomous driving field, with a focus on advanced technologies such as visual large language models (VLA) and end-to-end systems [7][11]. Group 2: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" aims to create a comprehensive community for knowledge sharing, focusing on academic and engineering challenges in the autonomous driving industry [3][11]. - The community has established a structured learning path covering various aspects of autonomous driving technology, including perception, planning, and control [13][15]. Group 3: Educational Offerings - The community offers a range of educational resources, including video courses, hardware tutorials, and live sessions with industry experts, aimed at both newcomers and experienced professionals [3][15]. - There are dedicated modules for job preparation, including resume sharing and interview experiences, to help members navigate the job market effectively [5][12]. Group 4: Technical Focus Areas - Key technical areas of focus include visual language models, world models, and end-to-end autonomous driving systems, with ongoing discussions about their integration and application in real-world scenarios [11][36]. - The community emphasizes the importance of understanding the latest advancements in algorithms and models, such as diffusion models and generative techniques, for future developments in autonomous driving [16][36].
4000人的自动驾驶黄埔军校,死磕技术分享与求职交流~
自动驾驶之心· 2025-07-12 05:41
Core Insights - The autonomous driving industry is experiencing significant changes, with many professionals transitioning to related fields like embodied intelligence, while others remain committed to the sector due to strong funding and high salaries for new graduates [2][6] - The article emphasizes the importance of networking and community engagement for knowledge acquisition and job preparation in the autonomous driving field [3][4] Group 1: Industry Trends - The autonomous driving sector continues to attract substantial investment, with companies willing to offer competitive salaries to attract talent [2] - The technology iteration cycle in autonomous driving is becoming shorter, indicating rapid advancements and a focus on cutting-edge technologies such as visual large language models (VLM) and end-to-end systems [8][12] Group 2: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" is highlighted as a leading community for professionals and students in the autonomous driving field, offering resources such as video courses, technical discussions, and job opportunities [4][14] - The community provides a structured learning path covering various aspects of autonomous driving technology, including perception, planning, and machine learning [19][21] Group 3: Technical Focus Areas - Key technical areas identified for 2025 include VLM, end-to-end systems, and world models, which are crucial for the future evolution of autonomous driving technology [8][43] - The community emphasizes the integration of advanced algorithms and models, such as diffusion models and 3D generative simulations, to enhance autonomous driving capabilities [15][22]
告别Transformer!北大、北邮、华为开源纯卷积DiC:3x3卷积实现SOTA性能,比DiT快5倍!
机器之心· 2025-07-11 08:27
Core Viewpoint - The article discusses a new convolution-based diffusion model called DiC (Diffusion CNN) developed by researchers from Peking University, Beijing University of Posts and Telecommunications, and Huawei, which outperforms the popular Diffusion Transformer (DiT) in both performance and inference speed [1][5][24]. Group 1: Introduction and Background - The AI-generated content (AIGC) field has predominantly adopted transformer-based diffusion models, which, while powerful, come with significant computational costs and slow inference speeds [4]. - The researchers challenge the notion that transformer architectures are the only viable path for generative models by reverting to the classic 3x3 convolution [5][9]. Group 2: Technical Innovations - The choice of 3x3 convolution is justified by its excellent hardware support and optimization, making it a key operator for achieving high throughput [8]. - DiC employs a U-Net Hourglass architecture, which is found to be more effective than the traditional transformer stacking architecture, allowing for broader coverage of the original image area [13]. - A series of optimizations, including stage-specific embeddings, optimal injection points for conditional information, and conditional gating mechanisms, enhance the model's ability to utilize conditional information effectively [14][15]. Group 3: Experimental Results - DiC demonstrates superior performance metrics compared to DiT, achieving a FID score of 13.11 and an IS score of 100.15, significantly better than DiT-XL/2's FID score of 20.05 and IS score of 66.74 [17][18]. - The throughput of DiC-XL reaches 313.7, nearly five times that of DiT-XL/2, showcasing its efficiency in inference speed [18]. - DiC's convergence speed is ten times faster than DiT under the same conditions, indicating its potential for rapid training [18][19]. Group 4: Conclusion and Future Outlook - The emergence of DiC challenges the prevailing belief that generative models must rely on self-attention mechanisms, demonstrating that simple and efficient convolutional networks can still build powerful generative models [24].