自动驾驶之心

Search documents
SimpleVLA-RL:突破 VLA 模型训练瓶颈,RL实现端到端在线训练
自动驾驶之心· 2025-09-15 03:56
Core Insights - The article discusses the development of the SimpleVLA-RL framework, which enhances the training of Visual-Language-Action (VLA) models in robotics through reinforcement learning (RL) techniques, addressing key challenges in data scarcity and generalization capabilities [3][4][6]. Group 1: Research Background and Core Issues - VLA models are crucial for robotic manipulation, integrating visual perception, language understanding, and action generation, but current training methods face two main bottlenecks: data scarcity and weak generalization [4][6]. - The traditional training process relies heavily on large-scale human operation data, which is costly and difficult to scale, limiting model scalability [4][6]. - The article raises the question of whether RL can enhance the long-term action planning capabilities of VLA models, despite the unique challenges posed by VLA applications [4][6]. Group 2: SimpleVLA-RL Framework Contributions - SimpleVLA-RL is designed to improve VLA training efficiency, particularly in data-scarce environments, and has achieved state-of-the-art (SOTA) performance in benchmark tests like LIBERO and RoboTwin [7][8]. - The framework incorporates interactive trajectory sampling, parallel training across multiple environments, and a unified design for training, inference, and rendering, addressing the slow interaction and high cost issues of VLA models [7][8]. - It has demonstrated significant improvements in success rates across various tasks, such as increasing LIBERO's average success rate from 91.0% to 99.1% and RoboTwin 2.0 from 38.3% to 68.8% [7][8][14]. Group 3: Data Efficiency and Generalization - SimpleVLA-RL significantly reduces the dependency on large-scale demonstration data, achieving an average success rate of 96.9% with only one trajectory of demonstration data, surpassing the performance of full-trajectory supervised fine-tuning [19][20]. - The framework enhances the model's robustness across different scenes, objects, and tasks, demonstrating improved performance in unseen tasks compared to traditional methods [21][24]. Group 4: Real-World Deployment and Innovations - The framework has shown effective Sim-to-Real transfer, with real-world task success rates improving from 17.5% to 38.5% using only simulated data for training [24][27]. - A notable discovery is the "Pushcut" phenomenon, where the RL-trained model autonomously discovers more efficient strategies beyond human demonstrations, indicating a potential for innovative behavior in VLA models [25][30]. Group 5: Summary and Conclusions - SimpleVLA-RL addresses three core issues in VLA model training: reducing reliance on large-scale demonstration data, enhancing generalization capabilities, and achieving efficient Sim-to-Real transfer [31][32]. - The findings suggest that RL can enable VLA models to explore superior strategies, paving the way for future developments in autonomous and adaptive robotic systems [31][32].
过来人经验!研一进组后一定要着手准备小论文!
自动驾驶之心· 2025-09-14 23:33
科研开头找节奏是最难的,且至关重要,真所谓 一步领先步步领先,一步落后步步落后。 真正拉开差距的往往就是开学这段时间, 谁能适应科研节奏?谁跑得快能做出成果?谁就能抢占课题 组优质资源,就能得到导师更多资源投入, 有了资源加持自然跑得更快,又能争取更多资源,成果出 的更多,雪球越滚越大。 等到研二研三才考虑发小论文出成果,导师的注意力和课题组资源早被瓜分光了! 此外,尽早发小论文还有这些好处: 无论你是刚进组想要尽早产出成果但是没有头绪,还是临近求职毕业要求还没达到,迟迟无法产出满意 的论文 , 不妨考虑寻求专业助力,自动驾驶之心服务大家的论文辅导正式推出了。 有位研二学员找到了我们指导,一年发了3SCI论文,申博、国奖统统拿下! 带方向扫码免费咨询 更容易拿奖学金; 为毕业论文做准备; 培养科研能力; 求职升学更有优势。 第1周:确定研究方向,筛选出 3 个备选课题。 第2-3周:完成文献综述,搭建论文框架。 第4-6周:进行实验设计与数据收集。 第7-8周:完成初稿。 第9-10周:初稿完成,修改润色。 第11-12周:开始选刊投稿,等待accept! 为什么选我们? 我们300+专职于自动驾驶/具身智能 ...
自动驾驶世界模型技术交流群成立了
自动驾驶之心· 2025-09-14 23:33
自动驾驶之心世界模型技术交流群成立了,欢迎大家加入一起世界模型相关的内容。感兴趣的同学欢迎添 加小助理微信进群:AIDriver005, 备注:昵称+世界模型加群。 ...
具身大脑风云榜!盘一盘国内外具身大脑的灵魂人物们...
自动驾驶之心· 2025-09-14 23:33
点击下方 卡片 ,关注" 具身智能 之心 "公众号 我们今天为大家全面梳理具身大脑领域的国内外知名公司,深入分析其技术特点、产品布局和应用 场景,为公司提供行业全景图,助力战略决策和业务拓展。 重点关注 :聚焦于开发机器人 "大脑" 系统的企业,包括具身大模型、多模态感知决策系统等。 (一)国内公司 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 当前,具身智能已成为全球的新焦点,如何打造一个通用的本体和大脑是各个创业公司一直努力突 破的,更是受到资本和产业界的高度关注。 自变量机器人(CEO 王潜) 星海图(CEO 高继扬) 公司简介 :成立于 2023 年,聚焦 "通用具身大模型" 研发,以 真实世界数据 为主要数据来源构 建具备精细操作能力的通用机器人。在技术路线上更偏向于 "大脑",从一开始就坚持走 端到端 的具身通用大模型路线 。成立不到两年,已完成 8 轮融资。 代表成果 : WALL - A 模型:2024 年 10 月推出全球目 ...
端到端再进化!用扩散模型和MoE打造会思考的自动驾驶Policy(同济大学)
自动驾驶之心· 2025-09-14 23:33
最近,大模型在自动驾驶领域也逐渐崭露头角,像视觉-语言模型(VLM)和视觉-语言-动作模型(VLA)已经在理解场景、语义关联和泛化能力上有了不错的表现。不 过,这类模型在实际连续控制场景中还受一些限制,比如推理速度慢、动作不够连贯,以及安全性保障难度大。 与此同时,扩散模型(Diffusion Models)正在改变视觉、音频和控制领域的生成式建模方式。和传统的回归或分类方法不同,扩散策略(Diffusion Policy, DP)把动作生 成看作一个"逐步去噪"的过程,不仅能更好地表达多种可能的驾驶选择,还能保持轨迹的时序一致性和训练的稳定性。不过,这类方法在自动驾驶中还没被系统化研究 过。扩散策略通过直接建模输出动作空间,为生成平滑可靠的驾驶轨迹提供了一种更强大、更灵活的思路,非常适合解决驾驶决策中的多样性和长期稳定性问题。 另一方面,专家混合(MoE, Mixture of Experts)技术也逐渐成为大模型的重要架构。它通过按需激活少量专家,让模型在保持计算效率的同时具备更强的扩展性和模块化 能力。MoE 在自动驾驶中也被尝试应用,比如做多任务策略和模块化预测,但大多数设计还是面向具体任务,限制了专 ...
招聘几位大佬,打算共创平台(世界模型/模型部署)
自动驾驶之心· 2025-09-14 03:44
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2][5] - The main areas of expertise sought include large models, multimodal models, diffusion models, SLAM, 3D object detection, and closed-loop simulation [3] - Candidates from QS200 universities with a master's degree or higher, especially those with significant conference experience, are preferred [4] Group 2 - The company offers benefits such as resource sharing for job seeking, PhD recommendations, and study abroad opportunities [5] - Attractive cash incentives and opportunities for entrepreneurial project collaboration are highlighted [5] - Interested parties are encouraged to contact via WeChat for collaboration inquiries [6]
Diffusion Model扩散模型一文尽览!
自动驾驶之心· 2025-09-13 16:04
作者 | 论文推土机 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1948137034842611877 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 Diffusion 扩散模型整理 本文整理diffusion的数学原理,只有少量微分方程,随机微分方程和概率相关的公式,所以只需要基础数学背景也能完全看懂。如果实在看不懂也没关系,关注高亮块的 结论即可。快速阅读办法就是只看高亮块然后接受结论即可。更快速的阅读办法是只看第一章朗之万采样建立对diffusion的直观印象即可。总结来说:相对论就是和美女 在一起时间短,加班的时候时间长;diffusion就是用网络学习怎么解常微分/随机微分方程。 本文分成五部分内容: 首先我们整理与diffusion model相关的各个基础概念,这部分的整理都是数学定义,主要来自以下链接: [An Introduction to Flow Matching and Diffusion ...
超高性价比3D扫描仪!点云/视觉全场景重建,高精厘米级重建
自动驾驶之心· 2025-09-13 16:04
最强性价比3D激光扫描仪 面向工业场景和教研场景的 超高性价比3D扫描仪来了!GeoScan S1是国内目前最强性价比实景三维激光扫描 仪,轻量化设计,一键启动,便可拥有高效实用的三维解决方案。以多模态传感器融合算法为核心,实现厘米级 精度的三维场景实时重构。可广泛用于多种作业领域。 每秒20万级点云成图,70米测量距离,360°全域覆盖,支持20万平米以上的大场景,扫描可选配3D高斯数据采 集模块,实现高保真实景还原。支持跨平台集成,配备高带宽网口及双USB 3.0接口,为科研实验提供灵活扩展 空间。降低开发门槛,助力开发者快速掌握研发能力,开启更多可能。 GeoScan S1设备自带手持Ubuntu系统和多种传感器设备,手柄集成了电源,可通过D-TAP转XT30母头输出至 GeoScan S1设备本体,给雷达、摄像头以及主控板提供电源。 基础版重建效果一览! 使用门槛低 :操作简单直观,一键启动即可 执行扫描作业 扫描结果导出即用 :无需复杂部署和繁琐处理,扫 描结果导出即用 高效率高精度建图 :模型精度高,行走之间轻松扫 描大场景 业内最优惠价格 :性价比高,高度 集成多传感器, 往下翻~ 重磅!3DG ...
某新势力智驾组织架构即将迎来重大调整...
自动驾驶之心· 2025-09-13 16:04
Group 1 - The core viewpoint of the article highlights a significant organizational restructuring within a leading domestic intelligent driving company, driven by recent leadership departures and the need for a more efficient structure to meet rising demands for promotions among staff [2] - The company has seen a surge in its status within the industry due to the success of its intelligent driving solutions last year, becoming a benchmark that many competitors are now following [2] - The restructuring will expand the number of secondary departments from four to ten, creating a flatter organizational structure that provides more opportunities for advancement among employees [2] Group 2 - There is a notable division in the industry regarding the technical routes for the next generation of mass production solutions, particularly between the VLA and WA factions [4] - The article emphasizes that while VLA (Video Language Action) appears to be a shortcut, it is not the ultimate solution for autonomous driving, with the World-Action model being presented as the true end goal [4] - The organizational adjustments are aimed at better positioning the company to enhance VLA production optimization, increase new vehicle sales, and adapt to external environmental changes [4]
不管VLA还是WM世界模型,都需要世界引擎
自动驾驶之心· 2025-09-13 16:04
Core Viewpoint - The article discusses the current state and future prospects of end-to-end autonomous driving, emphasizing the concept of a "World Engine" to address challenges in the field [2][21]. Definition of End-to-End Autonomous Driving - End-to-end autonomous driving is defined as "learning a single model that directly maps raw sensor inputs to driving scenarios and outputs control commands," replacing traditional modular pipelines with a unified function [3][6]. Development Roadmap of End-to-End Autonomous Driving - The evolution of end-to-end autonomous driving has progressed from simple black-and-white image inputs over 20 years to more complex methods, including conditional imitation learning and modular approaches [8][10]. Current State of End-to-End Autonomous Driving - The industry is currently in the "1.5 generation" phase, focusing on foundational models and addressing long-tail problems, with two main branches: the World Model (WM) and Visual Language Action (VLA) [10][11]. Challenges in Real-World Deployment - Collecting data for all scenarios, especially extreme cases, remains a significant challenge for achieving Level 4 (L4) or Level 5 (L5) autonomous driving [17][18]. Concept of the "World Engine" - The "World Engine" concept aims to learn from human expert driving and generate extreme scenarios for training, which can significantly reduce costs associated with large fleets [21][24]. Data and Algorithm Engines - The "World Engine" consists of a Data Engine for generating extreme scenarios and an Algorithm Engine, which is still under development, to improve and train end-to-end algorithms [24][25].