VLA
Search documents
世界模型,是自动驾驶的终极答案吗?
3 6 Ke· 2026-02-05 04:30
把视角拉到更广义的语境里,"世界模型"本质是在虚拟世界里再造真实世界,人工智能能像人一样理解现实世界,认知物理规律、事物的因果关 系和环境动态的技术。 世界模型被大部分科学家和科技公司视为"物理世界 AI"技术远征的关键拼图。斯坦福大学教授李飞飞曾指出,空间智能是AI的下一个十年,而世 界模型是构建空间智能的关键技术。 走在行业前沿的科学家和科技公司还在探索当中,但中国汽车行业已经用各种新颖的概念名词把位置占住。 实际上,智驾行业里今天谈的"世界模型"也只是名词差异,在技术路径上并没有太大差别。只是对行业原来的仿真工具进行技术范式升级,在还 原度更高、颗粒度更高、场景更丰富、自由度更高的虚拟世界中,解决端到端模型测试、验证问题,这一切都是为了训练出效果更高、更加拟人 的端到端智驾模型。 图片来源:视觉中国 文|肖漫 编辑|李勤 过去两三年,车企谈智驾必提及各类新颖的技术名词。 世界模型是继端到端、 VLA 后,智驾领域最时髦的词。不同公司还给它套上新的外壳——小鹏推出了"世界基座模型"、蔚来的叫"端到端世界模 型"、华为的叫"世界行为模型"(WA)。除了他们,地平线、理想、元戎启行、Momenta也在做世界模 ...
见谈|地平线吕鹏:端到端是基石,做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2026-02-03 08:04
对于当前智能驾驶技术路线的不同分野,吕鹏认为市场不必有名词焦虑,无论是端到端还是VLA、 WA、VA,本质上没有冲突,本身的技术架构都是基于端到端去做的。"没有端到端的基座,很难把一 些新的模态做引入,也就没办法更好地提升产品性能。端到端是基石,做不好端到端,就做不好 VLA。" (文章来源:21世纪经济报道) 21世纪经济报道记者易思琳 21世纪经济报道记者和地平线副总裁吕鹏做了一次专访对话,主要围绕当前的技术路线分野、端到端的 未来等话题。 ...
五一视界(6651.HK)物理AI的“左右互搏”:世界模型与VLA的闭环进化论
Zhong Jin Zai Xian· 2026-01-28 02:39
五一视界(6651.HK)物理AI算法工程师侯涛博士 NVIDIA 创始人兼CEO 黄仁勋在刚刚召开的瑞士达沃斯世界经济论坛2026年会上提到,AI技术正在取 得三大突破:从聊天演变到干活的智能体、开源模型降低大家进入AI的门槛、物理智能理解客观自然 世界。其中,物理智能展现出AI开始理解蛋白质结构、化学分子、流体力学等自然科学规律,物理AI 确实让行业很振奋,也让所有人意识到,这是AI的下一波浪潮。 作者 但物理AI也是真的难,比起这个词的光环效应,其研发过程真的是有大量的苦活累活都要一一实践。 最近很多人在探讨物理AI的世界模型和VLA,那我就从实际的研发体会出发,来聊聊这个话题。 为了加速AI理解、重建和生成物理世界,离不开世界模型这个AI工具,其属于利用 AI 训练 AI的新范 畴。而在具身智能走向通用化的道路上,行业也正在形成一个共识:单纯依赖真实机器人的采集数据还 是不够的,我们正在见证一种双模型协同的新范式崛起——VLA 模型(视觉—语言—动作模型)或VA 模 型(视觉—动作模型,有人主张中间不需要语言层来进行逻辑推理)与世界模型的左右互搏与螺旋上升。 简单来说:VLA 或 VA 担当负责感知、 ...
从 DeepMind 到投身具身智能,王佳楠:算法最终还是要服务真实世界|万有引力
AI科技大本营· 2026-01-23 10:09
以下文章来源于CSDN ,作者万有引力 CSDN . 成就一亿技术人 对话 | 唐小引 嘉宾 | 王佳 楠 责编 | 梦依丹 出品 | CSDN(ID:CSDNnews) 通往 AGI 的终点,是代码,还是身体? 在王佳楠看来,答案明确指向了——具身智能。 左:王佳楠,右:唐小引 在 2025 全球机器学习技术大会现场 , CSDN &《新程序员》执行总编唐小引 与星尘智能副总 裁、前 DeepMind 研究员王佳楠展开了一次深入对 话。从 AGI 的终极想象,到具身智能的现实瓶颈,从快慢系统的工程逻辑,到通用机器人的时间表与开发者应有的信念,她给 出了一个既冷静、也充 满长期主义色彩的答案。王佳楠在采访中提到的核心观点有: 欢迎 收听音频播客,如有兴趣观看完整视频,可在文末获取 她曾在牛津大学完成学业,加入 DeepMind,从事强化学习与持续学习研究,亲历了 AlphaStar 等标志性项目的诞生,也在国内生成式 AI 尚处早期 阶段时,参与过统一生成框架的探索,走在 AIGC 爆发之前的科研前沿。无论是在"纯算法"的巅峰,还是在生成式模型的起点,她都站在浪潮内部。 2024 年,她加入星尘智能,选择直面 ...
2025年几家自动驾驶公司的采访总结
自动驾驶之心· 2026-01-22 09:07
Core Algorithm - The industry has shifted towards end-to-end solutions, moving away from modular approaches, at least in public discourse [1] - The introduction of world models is prevalent, with some companies using them to generate training data, while others incorporate them into end-to-end models to enhance performance [1][8] - There is a divergence in opinions regarding the necessity of language models (VLA) in autonomous driving, with some companies arguing that language is not essential for driving tasks [1][11] Simulation and Infrastructure - The closed-loop systems have evolved from data-driven to simulation testing and training loops [2] - 3DGS is highlighted as a crucial technology for building simulation environments, as emphasized by Tesla at CVPR 2025 [5] - Infrastructure is critical, with companies like Xiaomi and Li Auto noting its benefits for development efficiency [3][14] Organizational Capability - Organizational ability is vital, as large autonomous driving teams face significant management challenges [4] - Team culture and collaboration are emphasized as essential for overcoming complex technical and management issues [5] Technical Choices Comparison - A comparison of various companies' technical choices reveals differing approaches to core technologies and the role of world models and simulation tools [9] - Companies like Li Auto advocate for a training loop that evolves from imitation to self-learning, while NVIDIA emphasizes interpretability and reasoning in AI [9] Key Non-Core Factors - R&D infrastructure and engineering efficiency are crucial for the success of autonomous driving technologies [14] - Simulation and synthetic data are becoming essential for addressing corner cases that real-world data cannot cover [14] - The scale of computing power and chip adaptation is critical, as autonomous driving is not just a software issue but also a hardware challenge [15] User Experience and Safety - User experience and safety are paramount, with companies like Xiaomi stressing the importance of balancing advanced technology with user concerns [17] - The need for a dual-stack safety mechanism is highlighted, ensuring that even aggressive end-to-end models have a fallback to traditional rule-based systems for safety [19]
VLA任务的成本马上被干到了白菜价......
具身智能之心· 2026-01-20 09:30
Core Viewpoint - The cost of robotic arms has significantly decreased, with prices now below 5000 yuan, making them more accessible for various VLA tasks [1][2]. Group 1: Cost Trends - Two years ago, the price for a single robotic arm for VLA tasks was over 30,000 yuan, which dropped to around 15,000 yuan last year, and now it is below 5,000 yuan [2]. - The reduction in costs allows for easier implementation of various VLA tasks such as pi0 and pi0.5 [2]. Group 2: Challenges for Beginners - Many beginners face difficulties in replicating VLA tasks due to high costs and lack of effective data collection methods [3][4]. - A significant amount of time is wasted by beginners on troubleshooting and overcoming obstacles in data collection and model training [4]. Group 3: Educational Initiatives - The company has developed a comprehensive course aimed at addressing the challenges faced by beginners in the VLA field, covering hardware, data collection, algorithms, and practical experiments [9][14]. - The course includes a free SO-100 robotic arm for participants, enhancing hands-on learning [19]. Group 4: Target Audience and Requirements - The course is designed for individuals seeking practical experience in VLA, including students and professionals transitioning from other fields [26]. - Participants are expected to have a foundational knowledge of Python and Pytorch, as well as experience in debugging and data collection with real machines [26].
2026,中国智驾驶入决赛圈
3 6 Ke· 2026-01-15 03:46
Core Insights - Tesla's Full Self-Driving (FSD) technology has demonstrated its capability by completing a 4,397 km journey across the U.S. without human intervention, showcasing its stability in complex driving conditions [1] - The competition in the autonomous driving sector is intensifying, particularly in China, where several companies are facing significant challenges, leading to a consolidation of players [1] - The industry consensus is that by 2026, only two to three companies will emerge as leaders in the autonomous driving space [1] Group 1: Tesla's Technological Advancements - Tesla's FSD V12 and V14 represent critical turning points, with V12 proving the feasibility of a model-driven end-to-end approach, prompting the industry to shift towards this model [2] - FSD V14 addresses the limitations of previous versions by integrating a reasoning capability, leading to a potential unification of L2 and L4 development paradigms [2] Group 2: Competitive Landscape in China - Companies like Horizon Robotics, Zhaojun Technology, and WeRide are emerging as strong competitors, with Horizon completing a significant technology architecture switch and launching its HSD model [3][4] - WeRide has shifted focus from L4 Robotaxi to L2+ solutions, achieving rapid development and production timelines [3] - Zhaojun Technology has adopted an aggressive strategy by completely overhauling its previous technology framework to focus on end-to-end solutions [4] Group 3: Industry Trends and Challenges - The industry is witnessing a shift from rule-based to model-driven approaches, with VLA (Vision-Language-Action) models gaining traction among manufacturers like Xpeng and Li Auto [5][6] - Huawei is taking a different approach by rejecting VLA in favor of WA (World Action) models, emphasizing the need for a more streamlined process [6] - The competition is expected to intensify as companies strive to secure sufficient data and funding to support their autonomous driving technologies [10][11] Group 4: Future Outlook - The autonomous driving sector is entering a phase of stricter regulations and increased competition, with a focus on L2+ and urban navigation assistance (NOA) as immediate priorities for many companies [12] - By 2026, the market is anticipated to narrow down to a few key players, with Huawei currently leading the pack, followed by Horizon, Momenta, and WeRide [12][13]
VLA学习“成本太高”的问题,正在被解决......
具身智能之心· 2026-01-14 09:00
Core Viewpoint - The article discusses the challenges faced by beginners in the field of VLA (Vision-Language Alignment) tasks due to high costs and the complexity of data collection and model training, while introducing a comprehensive course aimed at addressing these issues and providing practical skills for aspiring professionals in the field [3][5][9]. Group 1: Challenges in VLA Tasks - Many beginners express frustration over the high costs associated with mechanical arms and sensors, which can exceed 15,000 yuan, making it difficult for self-learners or those without equipment to engage in VLA tasks [3]. - Open-source low-cost robotic arms are available, but many beginners struggle to achieve effective results due to difficulties in data collection and model training [4]. - A significant amount of time is wasted by beginners on troubleshooting and overcoming obstacles in data collection, model training, and deployment, particularly with complex models like π0 and π0.5 [5]. Group 2: Course Offerings - The "Embodied Intelligence Heart" platform has developed a course that replicates methods such as ACT, GR00T, π0, and π0.5, aimed at helping individuals who lack access to expensive equipment and do not know how to get started [8]. - The course includes practical tutorials and is designed to assist students in effectively learning VLA techniques, even if they have access to real machines but are unsure how to utilize them [9]. - The curriculum covers a wide range of topics, including hardware for robotic arms, data collection, VLA algorithms, evaluation, simulation, deployment of mainstream VLA models, and various real machine experiments [14]. Group 3: Course Details and Target Audience - The course is the most comprehensive offering from "Embodied Intelligence Heart," combining both software and hardware aspects to facilitate effective learning [15]. - It is targeted at individuals seeking practical experience and projects in the VLA field, including those transitioning from traditional computer vision, robotics, or autonomous driving [25]. - Participants will receive a SO-100 robotic arm as part of the course, which includes both teaching and execution arms, enhancing hands-on learning [18].
英伟达还是放不下自动驾驶
虎嗅APP· 2026-01-13 13:35
Core Viewpoint - The article discusses NVIDIA's recent announcements at CES, particularly the launch of the open-source VLA model, Alpamayo, aimed at revolutionizing autonomous driving technology and its implications for the automotive industry [5][8]. Group 1: NVIDIA's Innovations - NVIDIA introduced the Alpamayo model, which integrates Vision-Language-Action (VLA) technology for autonomous driving, allowing vehicles to interpret sensor data into language and symbols for decision-making [6][10]. - Alpamayo is the first open-source VLA model, providing a foundational framework for automakers to develop their own autonomous driving solutions, thus lowering development costs and complexity [12][14]. - The model is complemented by the AlpaSim simulation framework and a dataset containing over 1,727 hours of driving data, offering a comprehensive toolkit for automotive companies [12][14]. Group 2: Competitive Landscape - The VLA model has attracted interest from various automakers, including Xiaopeng, Li Auto, and others, who are also pursuing similar technologies [10][11]. - Tesla's Full Self-Driving (FSD) system appears to utilize a similar VLA architecture, indicating a competitive race in the autonomous driving sector [10][11]. - Despite Tesla's advancements, NVIDIA's Alpamayo aims to provide a more explainable and controllable decision-making process compared to traditional end-to-end models [11][12]. Group 3: NVIDIA's Business Strategy - NVIDIA's automotive business, while dominant in high-level autonomous driving, has not met revenue expectations compared to its data center operations, prompting a strategic shift [17][22]. - The company aims to provide standardized tools and frameworks to automakers, allowing them to leverage NVIDIA's technology without needing extensive in-house development capabilities [22][26]. - By offering Alpamayo and associated tools, NVIDIA seeks to maintain its market position while addressing the needs of traditional automakers who may lack advanced algorithm development capabilities [23][26].
英伟达还是放不下自动驾驶
远川研究所· 2026-01-12 13:12
Core Viewpoint - Nvidia is launching a comprehensive offensive in the autonomous driving sector with its open-source VLA model, Alpamayo, which aims to provide car manufacturers with a robust foundation for developing their own autonomous driving technologies [6][10][21]. Group 1: Nvidia's Innovations - At CES 2026, Nvidia announced the Alpamayo model, which utilizes a Vision-Language-Action (VLA) approach to enhance decision-making in autonomous driving by making the reasoning process interpretable and traceable [7][10]. - Alpamayo is the first open-source VLA model, allowing car manufacturers to customize it based on their data and needs, thus reducing development complexity while ensuring algorithmic differentiation [10][11]. - Alongside Alpamayo, Nvidia also introduced AlpaSim for closed-loop testing and the Physical AI dataset, which contains over 1,727 hours of driving data, providing a comprehensive toolkit for developers [11][13]. Group 2: Competitive Landscape - Other companies, such as Xiaopeng and Li Auto, are also developing VLA models, indicating a competitive shift towards this technology in the autonomous driving space [8][10]. - Tesla's FSD appears to be adopting a similar VLA-like architecture, although it remains less transparent compared to Nvidia's approach [10][14]. Group 3: Nvidia's Business Strategy - Nvidia's automotive business, while dominant in high-level driving assistance, has not met revenue expectations compared to its data center operations, prompting a strategic shift to provide more comprehensive support to car manufacturers [15][20]. - The company aims to create a closed-loop toolchain for intelligent driving, integrating cloud training and vehicle-side inference, thus facilitating easier adoption of its hardware and software solutions by automakers [21][22]. - Nvidia's strategy reflects a balance between standardization and customization, as it seeks to provide a rich software toolbox while avoiding direct involvement in specific autonomous driving projects [22][24].