Workflow
扩散模型
icon
Search documents
北航团队提出新的离线分层扩散框架:基于结构信息原理,实现稳定离线策略学习|NeurIPS 2025
AI前线· 2025-10-09 04:48
作者 | 北航彭浩团队 基于扩散模型(Diffusion Model)的生成方法已显示出用于从离线强化学习 (offline Reinforcement Learning) 数据集建模轨迹的巨大潜力,并且已引入 分层扩散(Hierarchical Diffusion)来减轻长期规划任务中的方差累积和计算挑战。然而,现有方法通常假设具有单个预定义时间尺度的固定两层扩散层 次结构,这限制了对各种下游任务的适应性并降低了决策的灵活性。 论文标题: Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning arxiv 地址: https://arxiv.org/abs/2509.21942 代码地址: https://github.com/SELGroup/SIHD 研究背景与动机 离线强化学习旨在解决一个核心挑战:如何在不与环境进行新交互的情况下,仅利用固定的历史数据集训练出有效的策略。扩散模型通过将策略学习重 构为条件轨迹生成任务,有效缓解了分布外(OOD)状态和动作导致的"外推误差"(extrap ...
自动驾驶之心招募合伙人啦!4D标注/世界模型/模型部署等方向
自动驾驶之心· 2025-10-04 04:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 业务合伙人 自动驾驶之心业务合伙人招募来啦!我们团队今年计划向国内外招募10名优秀的合伙人,负责自动驾驶相 关课程研发、论文辅导业务开发、硬件研发; 主要方向 如果您是大模型/多模态大模型、扩散模型、VLA、端到端、具身交互、联合预测、SLAM、3D目标检测、 世界模型、闭环仿真3DGS、大模型部署与量化感知推理等方向,欢迎加入我们; 待遇说明 自动驾驶资源共享(求职、读博、出国留学推荐等); 丰厚的现金激励; 创业项目合作与推荐; 联系我们 更多欢迎添加微信咨询,备注" 机构/公司 + 自动驾驶合作咨询 "。 岗位要求 QS200以内高校,硕士及以上学历,手握顶会的大佬优先。 ...
业务合伙人招募!4D标注/世界模型/VLA/模型部署等方向
自动驾驶之心· 2025-10-02 03:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 业务合伙人 自动驾驶之心业务合伙人招募来啦!我们团队今年计划向国内外招募10名优秀的合伙人,负责自动驾驶相 关课程研发、论文辅导业务开发、硬件研发; 主要方向 如果您是大模型/多模态大模型、扩散模型、VLA、端到端、具身交互、联合预测、SLAM、3D目标检测、 世界模型、闭环仿真3DGS、大模型部署与量化感知推理等方向,欢迎加入我们; 创业项目合作与推荐; 联系我们 更多欢迎添加微信咨询,备注" 机构/公司 + 自动驾驶合作咨询 "。 岗位要求 QS200以内高校,硕士及以上学历,手握顶会的大佬优先。 待遇说明 自动驾驶资源共享(求职、读博、出国留学推荐等); 丰厚的现金激励; ...
首次实现第一视角视频与人体动作同步生成!新框架攻克视角-动作对齐两大技术壁垒
量子位· 2025-10-01 01:12
生成的视频可以通过从人体动作推导出的相机位姿,借助 3D 高斯点渲染(3D Gaussian Splatting)提升到三维场景中。 闻乐 发自 凹非寺 量子位 | 公众号 QbitAI AI生成第三视角视频已经驾轻就熟,但第一视角生成却仍然"不熟"。 为此,新加坡国立大学、南洋理工大学、香港科技大学与上海人工智能实验室联合发布 EgoTwin , 首次实现了第一视角视频与人体动作的 联合生成 。 一举攻克了 视角-动作对齐 与 因果耦合 两大瓶颈,为可穿戴计算、AR及具身智能打开落地新入口。 EgoTwin 是一个基于扩散模型的框架,能够以视角一致且因果连贯的方式联合生成第一人称视角视频和人体动作。 下面具体来看。 第一视角视频与人体动作同步生成 核心挑战:第一视角生成的"两难困境" 第一视角视频的本质是 人体动作驱动的视觉记录 ——头部运动决定相机的位置与朝向,全身动作则影响身体姿态与周围场景变化。 二者之间存在内在的耦合关系,无法被单独分离。传统视频生成方法难以适配这一特性,主要面临两大难题: 1. 视角对齐难题 生成视频中的相机轨迹,必须与人体动作推导的头部轨迹精准匹配。但现有方法多依赖预设相机参数生 ...
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-09-29 08:45
Core Viewpoint - 2023 is identified as the year of end-to-end production, with 2024 expected to be a significant year for this development in the automotive industry, particularly in autonomous driving technology [1][3]. Group 1: End-to-End Production - Leading new forces and manufacturers have already achieved end-to-end production [1]. - There are two main paradigms in the industry: one-stage and two-stage approaches, with UniAD being a representative of the one-stage method [1]. Group 2: Development Trends - Since last year, the one-stage end-to-end approach has rapidly evolved, leading to various derivatives such as perception-based, world model-based, diffusion model-based, and VLA-based one-stage methods [3]. - Major autonomous driving companies are focusing on self-research and mass production of end-to-end autonomous driving solutions [3]. Group 3: Course Offerings - A course titled "End-to-End and VLA Autonomous Driving" has been launched, covering cutting-edge algorithms in both one-stage and two-stage end-to-end approaches [5]. - The course aims to provide insights into the latest technologies in the field, including BEV perception, visual language models, diffusion models, and reinforcement learning [5]. Group 4: Course Structure - The course consists of several chapters, starting with an introduction to end-to-end algorithms, followed by background knowledge essential for understanding the technology stack [9][10]. - The second chapter focuses on the most frequently asked technical keywords in job interviews over the next two years [10]. - Subsequent chapters delve into two-stage end-to-end methods, one-stage end-to-end methods, and practical assignments involving RLHF fine-tuning [12][13]. Group 5: Learning Outcomes - Upon completion, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer [19]. - The course aims to deepen understanding of key technologies such as BEV perception, multimodal large models, and reinforcement learning, enabling participants to apply learned concepts to real projects [19].
Meta刚从OpenAI挖走了清华校友宋飏
36氪· 2025-09-26 13:35
Core Viewpoint - The recent hiring of Yang Song, a key figure in diffusion models and an early contributor to DALL·E 2, by Meta Superintelligence Labs (MSL) signals a strategic move in the AI competition, enhancing MSL's talent pool and research capabilities [2][3][11]. Group 1: Talent Acquisition and Team Structure - Yang Song's addition to MSL strengthens the "dual-core" structure of the team, with one leader managing overall strategy and the other focusing on critical paths in research [16]. - The team composition is becoming clearer, with a more structured division of research responsibilities [17]. - Since summer, over 11 researchers from OpenAI, Google, and Anthropic have joined MSL, indicating a high-frequency recruitment strategy [20]. Group 2: Industry Trends and Dynamics - The rapid turnover of talent among top AI labs is becoming more common, reflecting a shift towards project compatibility and team dynamics as key factors in employment decisions [25]. - The relationship between researchers and labs is evolving into a "mutual pursuit," where both parties seek alignment in goals and capabilities [47]. - The competition for AI talent is intensifying, with increasing demands on researchers to understand cross-modal capabilities and complete data workflows [48]. Group 3: Research Focus and Strategic Alignment - Yang Song's research on diffusion models aligns closely with MSL's strategic direction, aiming to develop universal models that can understand various data forms [28][30]. - The integration of Yang Song's expertise is expected to enhance MSL's ability to create a comprehensive AI product system, accelerating the formation of a complete technical loop from modeling to execution [32][41]. - Meta is not only attracting top talent but is also working to transform these capabilities into organizational and product-level resources [44].
AnchDrive:一种新端到端自动驾驶扩散策略(上大&博世)
自动驾驶之心· 2025-09-26 07:50
端到端多模态规划已成为自动驾驶领域的变革性范式,能有效应对行为多模态问题及长尾场景下的 泛化挑战。 本文提出端到端框架AnchDrive,该框架可有效引导扩散策略(diffusion policy),以降低传统生成 模型的高计算成本。 与从纯噪声开始去噪不同,AnchDrive利用丰富的混合轨迹锚点(hybrid trajectory anchors)为规划器 初始化。这些锚点来源于两个互补的数据源:一是包含通用驾驶先验知识的静态词汇表,二是一组 动态的、具备情境感知能力的轨迹。其中,动态轨迹由Transformer实时解码生成,该Transformer可 处理密集型与稀疏型感知特征。随后,扩散模型通过学习预测轨迹偏移分布来优化这些锚点,从而 实现精细化调整。这种基于锚点的引导式设计,能够高效生成多样化、高质量的轨迹。在NAVSIM 基准测试中的实验表明,AnchDrive达到了新的性能上限(state-of-the-art),并展现出强大的泛化能 力。 更多关于端到端自动驾驶、VLA、世界模型的前沿技术,欢迎加入『自动驾驶之心知识星球』! 一、引言 近年来,端到端自动驾驶算法受到广泛关注,其相较于传统基于规 ...
突发,Meta刚从OpenAI挖走了清华校友宋飏
3 6 Ke· 2025-09-25 11:56
Core Insights - Meta has successfully recruited Song Yang, a key figure in diffusion models and an early contributor to DALL·E 2 technology, to lead research at Meta Superintelligence Labs (MSL) [1][12][29] - This recruitment signals a strategic shift for Meta, indicating a move towards a more collaborative team structure rather than relying solely on individual talent [12][13] Group 1: Team Dynamics - The combination of Song Yang and Shengjia Zhao represents a transition for MSL from a focus on individual excellence to a more coordinated team approach [12][13] - Both individuals share a strong academic background, having studied at Tsinghua University and Stanford, and have significant experience at OpenAI [13][14] - The team structure is becoming clearer, with defined roles that enhance research efficiency and collaboration [13][29] Group 2: Talent Acquisition Trends - Meta's recruitment pace has accelerated, with over 11 researchers from OpenAI, Google, and Anthropic joining MSL since summer [14][18] - There is a notable trend of talent movement among top AI labs, indicating that project alignment and team culture are becoming critical factors in employment decisions [14][18] - The departure of some researchers, such as Aurko Roy, highlights the competitive nature of talent retention in the AI sector [14][18] Group 3: Strategic Focus - Song Yang's research aligns closely with MSL's strategic direction, particularly in multi-modal reasoning and the development of general models that can process various data types [18][29] - His expertise in diffusion models is expected to enhance MSL's capabilities in generative AI, contributing to a more integrated research approach [18][28] - The ongoing evolution of AI projects necessitates a deeper understanding of cross-modal interactions and the integration of research into practical applications [29]
从300多篇工作中,看VLA在不同场景下的应用和实现......
具身智能之心· 2025-09-25 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 兰州大学、中科院、新加坡国立等单位联合出品的一篇最新survey! Pure Vision Language Action (VLA) Models: A Comprehensive Survey 论文链接:https://arxiv.org/pdf/2509.19012 视觉-语言-动作(Vision Language Action, VLA)模型的出现,标志着机器人技术从传统基于策略的控制向通用机器人技术的范式转变,同时也将视觉- 语言模型(Vision Language Models, VLMs)从被动的序列生成器重新定位为在复杂、动态环境中执行操作与决策的主动智能体。 机器人技术长期以来一直是科学研究的重要领域。在历史发展进程中,机器人主要依赖预编程指令和设计好的控制策略来完成任务分解与执行。这些 方法通常应用于简单、重复性的任务,例如工厂 ...
深度综述 | 300+论文带你看懂:纯视觉如何将VLA推向自动驾驶和具身智能巅峰!
自动驾驶之心· 2025-09-24 23:33
视觉-语言-动作(Vision Language Action, VLA)模型的出现,标志着机器人技术从传统基于策略的控制向通用机器人技术的范式转变,同时也将视觉-语言模型(Vision Language Models, VLMs)从被动的序列生成器重新定位为在复杂、动态环境中执行操作与决策的主动智能体。 为此,兰州大学、中科院和新加坡国立大学的团队深入探讨了先进的VLA方法,旨在提供清晰的分类体系,并对现有研究进行系统、全面的综述。文中全面分析了VLA 在不同场景下的应用,并将VLA方法划分为多个范式: 自回归、扩散模型、强化学习、混合方法及专用方法 ;同时详细探讨了这些方法的设计动机、核心策略与实现方 式。 此外,本文还介绍了VLA研究所需的基础数据集、基准测试集与仿真平台。基于当前VLA研究现状,综述进一步提出了该领域面临的关键挑战与未来发展方向,以推动 VLA模型与通用机器人技术的研究进展。通过综合300多项最新研究的见解,本综述勾勒出这一快速发展领域的研究轮廓,并强调了将塑造可扩展、通用型VLA方法发 展的机遇与挑战。 论文标题:Pure Vision Language Action (VLA) M ...