Workflow
Diffusion
icon
Search documents
从目前的信息来看,端到端的落地上限应该很高......
自动驾驶之心· 2025-11-12 00:04
Core Insights - The article highlights significant developments in the autonomous driving industry, particularly the performance of Horizon HSD and the advancements in Xiaopeng's VLA2.0, indicating a shift towards end-to-end production models [1][3]. Group 1: Industry Developments - Horizon HSD's performance has exceeded expectations, marking a return to the industry's focus on one-stage end-to-end production, which has a high potential ceiling [1]. - Xiaopeng's VLA2.0, which integrates visual and language inputs, reinforces the notion that value-added (VA) capabilities are central to autonomous driving technology [1]. Group 2: Educational Initiatives - The article discusses a new course titled "Practical Class for End-to-End Production," aimed at sharing production experiences in autonomous driving, focusing on various methodologies including one-stage and two-stage frameworks, reinforcement learning, and trajectory optimization [3][8]. - The course is limited to 40 participants, emphasizing a targeted approach to skill development in the industry [3][5]. Group 3: Course Structure - The course consists of eight chapters covering topics such as end-to-end task overview, two-stage and one-stage algorithm frameworks, navigation information applications, reinforcement learning algorithms, trajectory output optimization, fallback solutions, and production experience sharing [8][9][10][11][12][13][14][15]. - Each chapter is designed to build upon the previous one, providing a comprehensive understanding of the end-to-end production process in autonomous driving [16]. Group 4: Target Audience and Requirements - The course is aimed at advanced learners with a background in autonomous driving algorithms, reinforcement learning, and programming skills, although it is also accessible to those with less experience [16][17]. - Participants are required to have a GPU with recommended specifications and a foundational understanding of relevant mathematical concepts [17].
舍弃 VAE,预训练语义编码器能让 Diffusion 走得更远吗?
机器之心· 2025-11-02 01:30
Group 1 - The article discusses the limitations of Variational Autoencoders (VAE) in the diffusion model paradigm and explores the potential of using pretrained semantic encoders to enhance diffusion processes [1][7][8] - The shift from VAE to pretrained semantic encoders like DINO and MAE aims to address issues such as semantic entanglement, computational efficiency, and the disconnection between generative and perceptual tasks [9][10][11] - RAE and SVG are two approaches that prioritize semantic representation over compression, leveraging the strong prior knowledge from pretrained visual models to improve efficiency and generative quality [10][11] Group 2 - The article highlights the trend of moving from static image generation to more complex multimodal content, indicating that the traditional VAE + diffusion framework is becoming a bottleneck for next-generation generative models [8][9] - The computational burden of VAE is significant, with examples showing that the VAE encoder in Stable Diffusion 2.1 requires 135.59 GFLOPs, surpassing the 86.37 GFLOPs needed for the core diffusion U-Net network [8][9] - The discussion includes the implications of the "lazy and rich" business principle in the AI era, suggesting a shift in value from knowledge storage to "anti-consensus" thinking among human experts [3]
Diffusion 一定比自回归更有机会实现大一统吗?
机器之心· 2025-08-31 01:30
Group 1 - The article discusses the potential of Diffusion models to achieve a unified architecture in AI, suggesting that they may surpass autoregressive (AR) models in this regard [7][8][9] - It highlights the importance of multimodal capabilities in AI development, emphasizing that a unified model is crucial for understanding and generating heterogeneous data types [8][9] - The article notes that while AR architectures have dominated the field, recent breakthroughs in Diffusion Language Models (DLM) in natural language processing (NLP) are prompting a reevaluation of Diffusion's potential [8][9][10] Group 2 - The article explains that Diffusion models support parallel generation and fine-grained control, which are capabilities that AR models struggle to achieve [9][10] - It outlines the fundamental differences between AR and Diffusion architectures, indicating that Diffusion serves as a powerful compression framework with inherent support for multiple compression modes [11]