Workflow
扩散模型
icon
Search documents
随到随学!端到端与VLA自动驾驶小班课(视频+答疑)
自动驾驶之心· 2026-01-08 05:58
Jason, C9本科+QS50 PhD,已发表CCF-A论文2篇,CCF-B论文若干。现任国内TOP主机厂算法专家,目前从事端到端、大模型、世界模型等前沿算法的 预研和量产,并已主持和完成多项自动驾驶感知和端到端算法的产品量产交付,拥有丰富的端到端算法研发和实战经验。 这门课程讲如何展开 第一章:端到端算法介绍 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 端到端与VLA涉及的核心内容包括BEV感知、视觉语言模型VLM、扩散模型、强化学习等等。通过学习端到端与VLA自动驾驶,可以掌握学术界和工业 界最前沿的技术栈。 为此我们联合 工业界大佬 开展了这门《端到端与VLA自动驾驶小班课》正式结课啦,随到随学(视频+答疑)!课程包含二段式端到端与一段式端到端 前沿算法的细致讲解,基本上都是工业界和学术界的Baseline。 扫码报名!抢占课程名额 讲师介绍 第一章主要是针对端到端自动驾驶概括性的内容讲解,这一章老师会带大家盘一下端到端的发展历史,端到端这个概念是怎么来了,为什么从模块化的 方法发展到端到端。一段式、二段式再到现在的VLA范式,每一种范式都有哪 ...
ViT一作盛赞:这个中国开源“PS模型”强过Nano Banana
量子位· 2025-12-29 04:32
也就是说现在图片元素也支持精细化修改了: 梦瑶 发自 凹非寺 量子位 | 公众号 QbitAI 太香了太香了,妥妥完爆ChatGPT和Nano Banana! 刚刚,ViT核心作者、Meta超级智能团队成员 Lucas Beyer 连发三条帖子,怒赞通义千问不久前发布的开源模型 Qwen—Image— Layered 。 在他看来,这才是图像生成的正确打开方式~ 他还顺便自补了一句:这个模型方向自己其实也想做来着,只是太忙,一直没来得及动手……(笑) 实话实说,Qwen—Image—Layered模型确实不一般,因为它可以让我们真正实现ps级别的 拆图自由 。 Qwen—Image—Layered 模型的核心能力,就是专治「一图定生死」这事儿的。 它能将一张普通图片分解成多个包含透明度信息的 RGBA分离图层 ,实现真正意义上的图片素材的可编辑性。 光说概念有点抽象,咱直接看例子~ 连网友们看了 模型效果后都不禁感叹:咋有种开源PhotoShop的感觉,amazing啊~ 所以,这套让Lucas Beyer反复点赞的模型到底强在哪儿,咱一起来看! 图片也能像PS一样拆拆拆了 如果说Nano Banana技能点 ...
市场正在惩罚只懂理论的端到端算法工程师......
自动驾驶之心· 2025-12-29 01:07
该课程涉及的核心算法包括:一段式端到端、两段式端到端、导航信息的量产应用、开闭环强化学习、扩散模型+强化学习、自回归+强化学习、时空联合规划等 等,最后分享一些实际的量产经验。这门课程是自动驾驶之心联合工业界算法专家开设的《面向量产的端到端实战小班课》!课程只有一个重点:聚焦量产。从一 段式、两段式、强化学习、导航应用、轨迹优化、兜底方案再到具体量产经验分享。面向就业直击落地,所以这门课程目前不打算大规模招生, 仅剩「15名」招生 名额...... 仅剩「15个」名额,扫码咨询助理! 讲师介绍 王路, C9本科+QS50 PhD,已发表CCF-A和CCF-B论文若干。现任国内TOP tier1算法专家,目前从事大模型、世界模型等前沿算法的预研和量产,所研发算法已成功 落地并量产,拥有丰富的端到端算法研发和实战经验。 课程大纲 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 近期和业内一位做招聘的朋友聊了聊,他们反馈中游车企和Tier1 开始铺 人力和资源跟进端到端。但面试的候选人往往只懂一部分,甚至有些还停留在论文层面, 根本没有量产经验和优化能力,端到端 ...
刚做了一份世界模型的学习路线图,面向初学者......
自动驾驶之心· 2025-12-25 03:24
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not a specific technology but rather a category of models with certain capabilities. It emphasizes the trend in the industry towards using world models for closed-loop simulation to address the high costs associated with corner cases in autonomous driving [2]. Course Overview - The course on world models in autonomous driving is structured into six chapters, covering the introduction, background knowledge, discussions on general world models, video generation-based models, OCC-based models, and job-related insights in the industry [5][6][7][8][9]. Chapter Summaries - **Chapter 1: Introduction to World Models** This chapter outlines the relationship between world models and end-to-end autonomous driving, discussing the development history and current applications of world models, as well as various streams such as pure simulation, simulation plus planning, and generating sensor inputs [5]. - **Chapter 2: Background Knowledge** This chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding subsequent chapters [6]. - **Chapter 3: General World Models** Focuses on popular general world models like Marble from Li Fei-Fei's team and Genie 3 from DeepMind, discussing their core technologies and design philosophies [7]. - **Chapter 4: Video Generation-Based World Models** This chapter delves into video generation algorithms, starting with GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM, highlighting both classic and cutting-edge advancements in this area [8]. - **Chapter 5: OCC-Based World Models** Concentrates on OCC generation algorithms, discussing three major papers and a practical project, emphasizing the potential for these methods to extend into vehicle trajectory planning [9]. - **Chapter 6: World Model Job Topics** This chapter shares practical insights from the instructor's experience, addressing industry applications, pain points, and interview preparation for positions related to world models [9]. Learning Outcomes - The course aims to provide a comprehensive understanding of world models in autonomous driving, equipping participants with the knowledge to achieve a level comparable to one year of experience as a world model algorithm engineer [10].
下周开课!我们设计了一份自动驾驶世界模型学习路线图....
自动驾驶之心· 2025-12-24 09:22
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 最近和业内专家jason老师讨论了很多,分享一个最近被问到很多的问题: 世界模型是不是端到端?以及如何看待世界模型最近爆发式的工作发表。 第一个问题的答案是明确的:不是。 世界模型和端到端都不指某个具体的技术,而是一类具备某些特定能力的模型。可以理解为 世界模型只是一种实现端到端自 动驾驶的途径。 早鸟优惠!开课即止~ 目前学术界和工业界把自动驾驶世界模型收敛到生成和重建两个领域,并且主流都在利用世界模型在做闭环仿真,所以我们看到了很多相关工作的推出。这也是业 内风格转换的一个趋势,Corner Case的成本过高,我们需要更有效的的其他手段...... 先前平台和Jason老师共同打造的《端到端与VLA自动驾驶小班课》备受大家好评,因此我们进一步推出这门世界模型小班课, 课程聚焦于通用世界模型、视频生 成、OCC生成等世界模型算法,涵盖特斯拉世界模型、李飞飞团队Marble等。欢迎大家加入学习~ 讲师介绍 Jason:C9本科+QS50 PhD,已发表CCF-A论文2篇,CCF-B论文若干。现任国内TOP主 ...
现场围观腾讯广告算法大赛,我都想入职了
量子位· 2025-12-24 05:14
Core Insights - The article discusses Tencent's algorithm competition, highlighting its significance in attracting talent and providing practical experience in cutting-edge AI technologies [1][28][43] Group 1: Competition Overview - The competition offered substantial rewards, including a total prize pool of 3.8 million yuan, with the champion receiving 2 million yuan and all participants gaining access to valuable resources like computing power [32][34] - The competition attracted over 8,400 students and 2,800 teams from nearly 30 countries, showcasing its global reach and influence [34] Group 2: Technical Focus - The competition's theme, "full-modal generative recommendation," addresses advanced challenges in advertising and recommendation systems, emphasizing the integration of various data types such as text, images, and videos [5][11] - Participants faced real-world challenges, including data noise, alignment issues, and the need for efficient modeling of user behavior over long sequences [13][41] Group 3: Talent Acquisition Strategy - Tencent's approach to the competition serves as a recruitment strategy, allowing the company to identify and engage with top talent in a practical setting rather than traditional recruitment methods [39][42] - The competition's structure inherently filters candidates, ensuring that only those capable of handling complex data and modeling challenges progress to the final stages [40][41] Group 4: Industry Context - The competition reflects Tencent's established AI technology framework, which has been validated through real business applications, indicating the company's commitment to innovation and talent development [29][30] - The article notes the competitive landscape for talent in the AI sector, with companies like Tencent offering attractive employment packages and support programs to attract young professionals [44][46]
布局控制+身份一致:浙大提出ContextGen,实现布局锚定多实例生成新SOTA
机器之心· 2025-12-20 04:45
随着扩散模型(Diffusion Models)的迭代演进,图像生成已经日臻成熟。然而,在 多实例图像生成(Multi-Instance Image Generation, MIG) 这一有着大量用户 场景的关键领域,现有的方法仍面临核心瓶颈:如何同时实现对多个对象的 空间布局控 制(Layo ut Control)以及身份特征的良好保持(Identity Preservation) 主流方法往往无法做到两全其美:依赖文本和布局引导(Layout-to-Image)的模型往往难以实现高度的实例定制化,且实例遗漏、属性泄露的问题时有发生;而主 流的主体驱动(Subject-driven)方法在主体数量增加时,面临着严重的身份混淆和细节丢失的问题。 。 ContextGen 与主流 SOTA 的对比示例,以及 ContextGen 的使用例 为解决这一制约高度定制化图像生成的难题, 浙江大学 ReLER 团队发布 ContextGen ,一个新型的基于 Diffusion Transformer (DiT) 的框架,旨在通过上下文学 习,可靠地完成图像引导的多实例生成任务! ContextGen 提出了全新的上下 ...
端到端落地中可以参考的七个Project
自动驾驶之心· 2025-12-19 00:05
Core Viewpoint - The article emphasizes the importance of end-to-end production in autonomous driving technology, highlighting the need for practical experience in various algorithms and applications to address real-world challenges in the industry [2][7]. Course Overview - The course is designed to provide in-depth knowledge on end-to-end production techniques, focusing on key algorithms such as one-stage and two-stage frameworks, reinforcement learning, and trajectory optimization [2][4]. - It includes practical projects that cover the entire process from theory to application, ensuring participants gain hands-on experience [2][12]. Instructor Background - The instructor, Wang Lu, is a top-tier algorithm expert with a strong academic background and extensive experience in developing and implementing advanced algorithms for autonomous driving [3]. Course Structure - The course consists of eight chapters, each focusing on different aspects of end-to-end algorithms, including: 1. Overview of end-to-end tasks and integration of perception and control systems [7]. 2. Two-stage end-to-end algorithm frameworks and their advantages [8]. 3. One-stage end-to-end algorithms with a focus on performance [9]. 4. Application of navigation information in autonomous driving [10]. 5. Introduction to reinforcement learning algorithms and training strategies [11]. 6. Optimization of trajectory outputs using various algorithms [12]. 7. Post-processing strategies for ensuring reliable outputs [13]. 8. Sharing of production experiences and strategies for real-world applications [14]. Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, including familiarity with reinforcement learning and diffusion models [15][17].
全网破防,AI“手指难题”翻车逼疯人类,6根手指,暴露Transformer致命缺陷
3 6 Ke· 2025-12-15 12:39
最近,网友们已经被AI「手指难题」逼疯了。给AI一支六指手,它始终无法正确数出到底有几根手指!说吧AI,你是不是在嘲笑人类?其实这背后,暗 藏着Transformer架构的「阿喀琉斯之踵」…… 最近几天,整个互联网陷入阴影—— AI,在用数手指嘲笑人类。 人类给AI的这道题,指令很简单:在图中的每根手指上,依次标出数字。 当然题目中有个小陷阱,就是这只手其实有六个手指。 结果,Nano Banana Pro理直气壮地在这只手上标出1、2、3、4、5,直接略过了其中一只手指。 这荒诞的场面,再一次震惊了网友们。 AI模型是真的这么傻吗? 很多人不这么认为——或许,AI只是在装傻,调戏人类而已。 很有可能,它是在嘲笑这些试图测试自己的劣质人类。 为了通过图灵测试,AI必须让自己变得愚蠢一点,才能看起来像人类。如果太聪明,人类就破防了。 GPT-5.2,同样翻车了 有人也拿这个问题去问GPT-5.2,而且prompt里明明白白写了图里有六根手指。 但GPT-5.2面对「图里有几根手指」的问题,还是斩钉截铁地说:五根! 理由就是:人类有五根手指,所以图里没有五根手指就是错的。 还有人把手指画得奇形怪状,人类都要难倒的 ...
时隔一年DiffusionDrive升级到v2,创下了新纪录!
自动驾驶之心· 2025-12-11 03:35
Core Insights - The article discusses the upgrade of DiffusionDrive to version 2, highlighting its advancements in end-to-end autonomous driving trajectory planning through the integration of reinforcement learning to address the challenges of diversity and sustained high quality in trajectory generation [1][3][10]. Background Review - The shift towards end-to-end autonomous driving (E2E-AD) has emerged as traditional tasks like 3D object detection and motion prediction have matured. Early methods faced limitations in modeling, often generating single trajectories without alternatives in complex driving scenarios [5][10]. - Previous diffusion models applied to trajectory generation struggled with mode collapse, leading to a lack of diversity in generated behaviors. DiffusionDrive introduced a Gaussian Mixture Model (GMM) to define prior distributions for initial noise, promoting diverse behavior generation [5][13]. Methodology - DiffusionDriveV2 introduces a novel framework that utilizes reinforcement learning to overcome the limitations of imitation learning, which previously led to a trade-off between diversity and sustained high quality in trajectory generation [10][12]. - The framework incorporates intra-anchor GRPO and inter-anchor truncated GRPO to manage advantage estimation within specific driving intentions, preventing mode collapse by avoiding inappropriate comparisons between different intentions [9][12][28]. - The method employs scale-adaptive multiplicative noise to enhance exploration while maintaining trajectory smoothness, addressing the inherent scale inconsistency between proximal and distal segments of trajectories [24][39]. Experimental Results - Evaluations on the NAVSIM v1 and NAVSIM v2 datasets demonstrated that DiffusionDriveV2 achieved state-of-the-art performance, with a PDMS score of 91.2 on NAVSIM v1 and 85.5 on NAVSIM v2, significantly outperforming previous models [10][33]. - The results indicate that DiffusionDriveV2 effectively balances trajectory diversity and sustained quality, achieving optimal performance in closed-loop evaluations [38][39]. Conclusion - The article concludes that DiffusionDriveV2 successfully addresses the inherent challenges of imitation learning in trajectory generation, achieving an optimal trade-off between planning quality and diversity through innovative reinforcement learning techniques [47].