VLA
Search documents
冷静看待VLA:不是救世主,也不是“垃圾”
自动驾驶之心· 2025-12-26 09:18
作者 | 郑纯然Range 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1921620079314961855 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 它并不完全是黑盒,最近NVIDIA有个工作CoT-VLA,就主要展示了VLA思维链并拆分为三层: 和人的思考模式真挺像的。 真正的挑战在于让模型学会泛化。 在遮挡/复杂背景/3D空间中的表现,关键是要把subgoal embedding设计好来保证泛化性。要保证subgoal embedding具有: 例如用cross-attention: 任务文本token attend到图像patch token,上面4条都能保证,可能效果就不错。 说不定,learning方法在复杂环境下反而更有优势。 昨晚睡前刷到一篇批判VLA的帖子,说"有些搞VLA公司又懒又蠢... (此处省略2000个字)",全篇非常犀利,我整理了下弗雷哥 (答主) 说的几个槽点: 肯定不能全盘否定 ...
小米陈光:我们不想制造技术焦虑了
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-25 08:24
2025年,智能驾驶行业出现"名词过载"现象,从VLA、VA、到WA,分化出多个派别,争鸣不断。 理想汽车智驾团队从端到端+世界模型全面切向VLA(Vision Language Action),在算法架构中引入大 语言模型(LLM)。和理想一样坚定选择VLA的还有智驾供应商元戎启行。 行业里也有坚定的VLA反对派。华为表示,不会走向VLA,而是会坚定选择WA(World Action,世界 模型)。和华为一样尝试去掉Language环节的还有小鹏。 而在这场争鸣中,端到端仍展现出巨大的潜力,小米汽车就是在这一方向持续深耕的企业。 "现在竞争太激烈,大家会产生一些焦虑,倾向于通过各种方式或技术让用户觉得更先进。"小米汽车端 到端负责人陈光告诉《21汽车·一见Auto》,"但无论VA、WA还是VLA,在我看来其实都一样,都是看 如何让模型的智能密度最大。" 现有头部新势力中,小米汽车启动端到端研发较晚。2024年,小米在内部正式整合成立"端到端算法与 功能部",负责量产方案开发。而理想、蔚来都比小米早了至少3个月。 但小米追赶很快。今年2月,小米正式向用户全量推送了300万Clips的端到端(HAD),7月再次 ...
专访地平线副总裁吕鹏:做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-23 00:45
今年前三个季度,国内20万元以上乘用车市场份额占比30%,13万元以下市场份额则高达50%,但后者 多数车型尚未配备城区辅助驾驶功能。这一广阔的蓝海市场,正吸引着地平线、Momenta等智驾厂商加 速布局,全力抢占市场先机。 今年4月,地平线正式推出基于征程6系列芯片的城区辅助驾驶解决方案——HSD(Horizon SuperDrive)。尽管并非该赛道的先行者,但地平线已快速迈入大规模量产阶段。11月,随着星途ET5 正式上市,地平线的HSD解决方案同步实现量产;另一款搭载该方案的车型深蓝L06也于同期发售。两 款车型上市短短两周后,地平线HSD的激活量便突破12000辆,量产落地成效显著。 除了推出全新的解决方案,地平线还通过生态拓展加速市场渗透。12月初的地平线技术生态大会上,公 司公布了两大生态推进举措:一是拓展生态合作模式,新增算法服务模式"HSD Together",并已与日本 电装、大众的合资公司CARIZON(酷睿程)、HCT(智驾大陆)达成合作;二是引入更多生态合作伙 伴,元戎启行、卓驭等企业已加入其生态体系。 缺乏芯片研发能力的算法公司、软硬研发实力薄弱的车企,正纷纷向地平线聚拢。地平线接 ...
地平线吕鹏:端到端是基石,做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-22 13:23
(原标题:地平线吕鹏:端到端是基石,做不好端到端就做不好VLA) 21世纪经济报道记者 易思琳 除了推出全新的解决方案,地平线还通过生态拓展加速市场渗透。12月初的地平线技术生态大会上,公 司公布了两大生态推进举措:一是拓展生态合作模式,新增算法服务模式"HSD Together",并已与日本 电装、大众的合资公司CARIZON(酷睿程)、HCT(智驾大陆)达成合作;二是引入更多生态合作伙 伴,元戎启行、卓驭等企业已加入其生态体系。 缺乏芯片研发能力的算法公司、软硬研发实力薄弱的车企,正纷纷向地平线聚拢。地平线接下来的目 标,是让城区辅助驾驶功能下沉至10万元国民车型,实现技术普惠,并计划在未来3-5年内达成千万级 量产规模。 地平线敢于定下这一目标,底气源于其在智驾端到端方案上的长期坚守与深耕。据地平线工程师透露, 公司自2024年底便集中力量主攻端到端技术,90%的研发人力均投入到该方案的研发与量产落地工作 中。 WA/VLA皆需端到端支撑 今年前三个季度,国内20万元以上乘用车市场份额占比30%,13万元以下市场份额则高达50%,但后者 多数车型尚未配备城区辅助驾驶功能。这一广阔的蓝海市场,正吸引着地平 ...
研究生实验到什么程度可以写小论文?
自动驾驶之心· 2025-12-22 03:23
如果你 可以看看我们推出的论文辅导,旨在 有限时间内高效产出科研成果 ,避免自主写作的各种坑。 论文辅导上线了! 端到端、VLA、世界模型、强化学习、3D目标检测、多传感器融合、3DGS、BEV感知、 Occupancy Network、多任务学习、语义分割、轨迹预测、运动规划、扩散模型、Flow matching、 点云感知、毫米波雷达、单目感知、车道线/在线高精地图等方向。 支持带课题/研究方向过来咨询, 我们只说实话,做实事,不会夸大也不会打鸡血, 认真听完你的 情况然后告诉你可以怎么走。 微信:paperguidance 很多研究生发paper的共性问题就是一上来就想整高大上的东西,问题是数据也没看过,baseline也 没跑出来。 要知道离春节只剩下1个多月了,现在不把小论文投出去,明年上半年见刊真要来不及 了。 小论文重在完整性不在novelty ,项目能讲一个完整的故事就够了,在现有方法上做点改进、解决 个具体问题就行。故事讲得清楚、实验做得扎实,照样能发出去了。 无论是idea还是debug,这种事情有时候真是旁观者清,就怕你卡住了自己闷头搞两周还没进展。 以结果为导向,配套代码提升指导,提供 ...
「一脑多形」圆桌:世界模型、空间智能在具身智能出现了哪些具体进展?丨GAIR 2025
雷峰网· 2025-12-20 04:07
Core Viewpoint - The article discusses the current state and future potential of embodied intelligence, focusing on the challenges and opportunities presented by world models and spatial intelligence in the field of robotics and AI [2][4][10]. Group 1: Development of Embodied Intelligence - The technology route for embodied intelligence is still in an exploratory phase, with no convergence yet, which is seen as a positive sign for innovation [4][3]. - There is a consensus among experts that the core issues of embodied intelligence, such as interaction and human-machine collaboration, should be addressed by academic institutions, while industries focus on practical applications [4][5]. - The integration of AI with physical entities is expected to lead to significant advancements in intelligence, but the field must avoid reverting to industrial automation without achieving generalized intelligence [4][5][30]. Group 2: World Models in Autonomous Driving - World models are currently being utilized by leading companies like Tesla to enhance data generation and improve decision-making processes through closed-loop testing [11][12]. - The concept of world models has gained traction in autonomous driving due to the simplicity of generating scenarios compared to robotics, with advancements in generative AI enabling the creation of realistic training samples [12][13]. - There is ongoing debate regarding the definition and application of world models in both autonomous driving and robotics, with differing opinions on the necessity of pixel-level reconstruction versus latent state representation [12][13][14]. Group 3: Spatial Intelligence in Robotics - Spatial intelligence is a critical aspect of robotics, with a focus on perception and understanding spatial relationships, which has evolved from traditional SLAM techniques to more learning-based approaches [20][21]. - The current challenges in spatial intelligence include the need for better data representation and understanding of complex spatial relationships, which are still underdeveloped in robotic systems [22][23]. - The integration of visual and semantic information is essential for enhancing robots' spatial capabilities, but the field is still in its early stages [22][23][24]. Group 4: Commercialization and Future Applications - The future of drone applications is expected to expand significantly, with potential uses in various sectors, but the timeline for widespread adoption remains uncertain [26][27]. - The gap between technological capabilities and market needs poses challenges for entrepreneurs, as there is often a mismatch between innovative ideas and practical industrial requirements [30][31]. - The shift towards learning-based control paradigms is anticipated to increase the applicability of drones and robots in real-world scenarios, moving beyond traditional automation [28][29].
最近收到了很多同学关于自驾方向选择的咨询......
自动驾驶之心· 2025-12-19 09:25
Core Insights - The article discusses various advanced directions in autonomous driving research, emphasizing the importance of deep learning and traditional methods for different academic backgrounds [2][3]. Group 1: Research Directions - Key areas of focus include VLA, end-to-end learning, reinforcement learning, 3DGS, and world models, which are recommended for students in computer science and automation [2]. - For mechanical and vehicle engineering students, traditional methods like PnC and 3DGS are suggested due to their lower computational requirements and ease of entry [2]. Group 2: Paper Guidance Services - The article announces the launch of a paper guidance service that covers various topics such as end-to-end learning, multi-sensor fusion, and trajectory prediction [3][6]. - The service includes support for topic selection, full process guidance, and experimental assistance [6]. Group 3: Publication Success - The guidance service has a high acceptance rate for papers submitted to top conferences and journals, including CVPR, AAAI, and ICLR [7]. - The article highlights the range of publication venues, including CCF-A, CCF-B, and various SCI categories [10].
特斯拉再一次预判潮水的方向
自动驾驶之心· 2025-12-18 09:35
Core Viewpoint - Tesla's AI leader Ashok Elluswamy revealed the technical methodology behind Tesla's Full Self-Driving (FSD) in a recent article, emphasizing the choice of an end-to-end neural network model and addressing the challenges faced in practice [4][6]. Group 1: End-to-End Neural Network Model - Tesla's decision to adopt an end-to-end neural network model is driven by the need to address complex driving scenarios that cannot be pre-defined by rules, such as the "trolley problem" and second-order effects [6][10]. - The end-to-end model is described as a complete overhaul of previous architectures, fundamentally changing design, coding, and validation processes, leading to a more human-like driving experience [11][19]. - The model outputs driving instructions alongside interpretable "intermediate results," utilizing technologies like generative Gaussian splatting to create dynamic 3D models of the environment in real-time [8][17]. Group 2: VLA and World Model Concepts - VLA (Vision-Language-Action) is an extension of the end-to-end model that incorporates language information, allowing for a more visual representation of driving behavior [12][14]. - The world model aims to establish a high-bandwidth cognitive system based on video/image data, addressing the limitations of language models in understanding complex, dynamic environments [15][19]. - The relationship between end-to-end, VLA, and world models is clarified, with end-to-end serving as the foundation, VLA as an upgrade, and the world model as the ultimate form of understanding spatial dynamics [12][19]. Group 3: Industry Perspectives and Trends - The industry is divided into three main technical routes: end-to-end, VLA, and world model, with companies like Horizon Robotics and Bosch primarily adopting end-to-end due to lower costs and higher stability [13][19]. - VLA has faced criticism from industry leaders who argue that its reliance on language models may not be essential for effective autonomous driving, emphasizing the need for spatial understanding instead [16][19]. - Tesla's recent publication has reignited discussions in the industry, positioning the company at the forefront of current technological directions and providing a systematic analysis of practical applications [20].
L3自动驾驶量产元年,离L4的梦想又近了一步?
Xin Lang Cai Jing· 2025-12-17 06:30
文|极智GeeTech 近日,工信部首次批准L3级自动驾驶商业化运营,通过L3级自动驾驶准入申请的两款车型为长安深蓝SL03与极狐阿尔法S6,标志着我国首次允许车辆在特 定条件下由系统承担驾驶任务。可以预见的是,2026年将真正成为L3级自动驾驶的"量产元年"。 值得注意的是,此次明确了L3级自动驾驶的权责划分:当车辆在限定路段以不超过80公里时速自主行驶时,一旦发生事故,若系统处于激活状态,车企或 将承担主要责任。同时,准入要求L3级自动驾驶车辆的传感设备必须为"前装量产",后改装车辆无法获得试点资格,从源头保障技术稳定性。 行业普遍认为,L3级是从"辅助驾驶"到"完全自动驾驶"的重要过渡,后续的L4级自动驾驶将实现更大突破——在固定区域内,车辆可完全脱离人类干预,真 正实现无人驾驶。 这一小步,背后是全球十年的技术博弈。德国早在2021年就通过《自动驾驶法》,明确L3系统激活期间事故责任由车企承担,并要求车辆配备"黑匣子"记录 运行数据。奔驰Drive Pilot系统随后在德国高速公路上线,成为全球首个商业化的L3产品。相比之下,中国此次准入虽起步稍晚,却一步切入责任核心,未 走"测试"老路,而是直接启动 ...
最近收到了很多同学关于具身方向选择的咨询......
具身智能之心· 2025-12-17 00:05
【具身智能之心论文辅导重磅上线!多模态大模型/VLA/强化学习/VLN/遥操作/数采/机器人仿 真/real2sim2real/端到端/diffusion等顶会方向1V1定制化辅导】 辅导区间 CCF-A到CCF-C 先看看具身的一些方向,vln、vla、强化、还有一些real2sim2real。很多小白不知道如何下手,选择强化学 习还是vla?传统slam还是vln?哪些方向需要较大算力,哪些不需要?除此之外,什么样的本体适合自己研 究,预算不够怎么办?仿真可以吗? 对正在从事slam的同学,vln和vla都是一个比较好的切入方向。如果有机械臂,展开vla是一个不错的选择。 除此之外,没有硬件的同学可以尽量在仿真里面或者使用低成本的so-100等硬件完成实验。也有很多低成 本的科研平台,比如移动操作平台。四足和人形更适合强化,vla难度过高。 剩下就是一些方法论的问题了,有好的idea至关重要。对很多新人研究者,一个好的idea需要踩很多次坑。 如果你还是新人,不知道怎么入门,可以看看我们推出的论文辅导。 论文辅导上线了 最近收到很多小伙伴的咨询,其中不乏大模型、传统机器人、机械方向的同学。 ✅ 顶会/顶刊 ...