Workflow
世界模型
icon
Search documents
中游智驾厂商正在快速抢占端到端人才......
自动驾驶之心· 2025-12-15 00:04
Core Viewpoint - The article discusses the technological anxiety in intelligent driving, particularly among mid-tier manufacturers, and highlights the anticipated growth in demand for end-to-end (E2E) and VLA (Vision-Language-Action) technologies in the coming year [2]. Group 1: Industry Trends - The mass production of cutting-edge technologies like end-to-end systems is expected to begin next year, with L2 technologies becoming more standardized and moving towards lower-tier markets [2]. - The total sales of passenger vehicles priced above 200,000 are around 7 million, but leading new forces account for less than one-third of this, indicating a slow adoption of end-to-end mass production models [2]. - The maturity of end-to-end technology is seen as a precursor to larger-scale production, with the advancement of L3 regulations necessitating urgent technological upgrades among mid-tier manufacturers [2]. Group 2: Recruitment and Training - There is a growing demand for positions related to end-to-end and VLA technologies, as many professionals are seeking to quickly learn these advanced skills [3]. - The article mentions the launch of specialized courses aimed at practical applications of end-to-end and VLA technologies, designed for individuals already working in the field [3][6]. - The courses will cover various modules, including navigation information application, reinforcement learning optimization, and production experiences related to diffusion and autoregressive models [3][6]. Group 3: Course Details - The end-to-end production course will focus on practical implementation, detailing key modules and offering seven practical exercises suitable for those looking to advance their careers [3][6]. - The VLA course will cover foundational algorithms and theories, including BEV perception and large language models, with practical applications based on diffusion models and VLA algorithms [6][11]. - The instructors for these courses are experienced professionals from top-tier companies and academic institutions, ensuring a high level of expertise in the training provided [5][8][13].
东方理工金鑫:如何找到自动驾驶与机器人统一的「空间语言」丨GAIR 2025
雷峰网· 2025-12-14 06:27
" 当AI拥有「思维链」,赋予机器想象力的世界模型训练新范式。 " 作者丨吴彤 编辑丨 林觉民 在人工智能研究正以前所未有的速度迭代的今天,一位研究者如果同时聚焦于世界模型与具身智能这类高度前沿的课题,并且强调产业应用和市场接受度才是 技术真正的试金石,这可能本身就成为了一种值得关注的信号。 宁波东方理工大学助理教授金鑫便是这样一位研究者。 我们近期的一次交流,恰逢他的团队在美国圣地亚哥NeurIPS会议的活动告一段落——他与上海交通大学、布里斯托大学、清华大学等高校的合作者们在那组 织了一场关于"具身世界模型"( Embodied World Models for Decision Making)的研讨会,并有多位学界和产业界大咖受邀参加并作报告。 从早期的图像视频信号处理、压缩等底层视觉任务,到近年聚焦于表征解耦、世界模型、空间智能等方向,金鑫的研究不断从低维信息向高维信息跃迁,不断 尝试新的挑战,试图让机器变得更加智能,更好地理解物理世界并服务实际产业,其研究路径也反映出AI领域逐渐从简单的感知走向更加复杂的认知与决策。 然而,当对话触及这些光环之下的研究内核时,他表现出一种审慎。 "这只是我们团队现阶 ...
GAIR 2025 「数据&一脑多形」分论坛,激辩 AI 演进路径
雷峰网· 2025-12-14 06:27
Core Insights - The article emphasizes the transition of AI from "specialized" to "generalized" language understanding over the past decade, with the next key battle being the expansion of this generality from the realm of language to the physical world [1] Group 1: Data Paradigm Shift - Data is evolving from a traditional "resource" role to a more fundamental "cognitive foundation" and "value carrier" [3] - High-quality, structured, and logically coherent data is becoming essential for defining the cognitive boundaries and aligning the value of models [3][4] - The forum discussed building a more interpretable, credible, and evolutionary knowledge system amidst the data deluge, highlighting data as a core link driving intelligent evolution and harmonious coexistence with society [4] Group 2: One Brain, Many Forms - The "One Brain, Many Forms" paradigm is redefining how intelligence is constructed, moving beyond single models for specific tasks to a unified cognitive core that can dynamically generate various forms for different scenarios [5] - This approach aims to achieve a leap from "specialized intelligence" to "unified intelligence," allowing the same "brain" to understand language, interpret visuals, and manipulate entities while sharing knowledge across different forms [5] Group 3: Embodied Intelligence and Data Collection - The founder of Noitom Robotics, Dr. Dai Ruoli, highlighted the high demand for quality data in the field of humanoid robots and embodied intelligence, emphasizing the relationship between data scale, quality, and model capability [10] - Dr. Dai identified three structural challenges in remote operation as a data acquisition method, pushing the industry to explore more universal and scalable data acquisition paradigms [11][12] - The concept of a "data pyramid" was introduced, stressing the importance of understanding the core value of data at different levels to create sustainable engineering and business paths [12] Group 4: Future of Embodied Data - The CEO of Jishudai Iteration, Tong Xianqiao, predicted an explosive growth in embodied data volume in the coming years, positioning "embodied data services" as a significant opportunity in the robotics sector [15] - Current data collection methods were categorized into two paths: real machine end and simulation end, focusing on various techniques for data acquisition [16] - A platform design approach was proposed to enhance data collection efficiency and optimize deployment, introducing the concept of AI agents for automatic annotation and resource management [17] Group 5: One Brain, Many Forms Discussions - The forum on "One Brain, Many Forms" featured discussions on the development of embodied intelligence and the integration of world models, with participants emphasizing the ongoing exploration phase in the industry [45][46] - The challenges of achieving a universal controller were discussed, with insights on the differences in performance based on hardware capabilities and algorithmic approaches [47] - The panel concluded with reflections on the future of embodied intelligence, highlighting the gap between innovative ideas and practical applications in the industry [48]
“世界模型”竞赛升级:Runway推出GWM-1,实时交互可持续数分钟之久
硬AI· 2025-12-13 12:45
Core Insights - Runway aims to evolve from being a "special effects supplier" in the film industry to becoming an "AI architect" in the physical world [2][20] - The company has launched its first General World Model (GWM-1), entering the "world simulation" arena dominated by giants like Google and Nvidia [2][20] - GWM-1 is designed to understand physical laws, geometric structures, and environmental dynamics, focusing on "coherence" and "interactivity" [2][5] GWM-1 Breakdown - The world model allows AI to simulate the mechanisms of the real world without traversing all possible scenarios, enabling reasoning, planning, and action [5] - GWM-1 consists of three autoregressive models tailored for different domains: GWM-Worlds, GWM-Robotics, and GWM-Avatars, all built on the latest Gen-4.5 base model [5][6] GWM-Worlds - GWM-Worlds provides an interactive digital environment exploration interface, allowing users to intervene in real-time and predict subsequent events [8] - The model generates environments at 24fps and 720p resolution, maintaining coherence in long sequences of motion [8] GWM-Robotics - GWM-Robotics addresses the challenge of acquiring real data for extreme weather and unexpected obstacles by generating high-quality synthetic data [10][11] - This approach significantly reduces training costs and helps predict compliance risks before deploying robots in the real world [11] GWM-Avatars - GWM-Avatars integrates video generation with voice, enabling digital avatars to engage in long, continuous conversations without quality loss [14][15] - If successful, this technology could disrupt customer service and online education sectors [15] Base Model Evolution and Computational Power - Runway has upgraded its Gen-4.5 model to enhance native audio and multi-camera editing capabilities, supporting video generation of up to one minute [18] - The company has partnered with CoreWeave to utilize Nvidia's cloud infrastructure for model training and inference, addressing the computational demands of world simulation [18] Strategic Expansion - Runway's strategy is rapidly expanding from creative tools in film to robotics simulation, but it faces stiff competition from established players like Google and Nvidia [19][20] - The ability to leverage GWM-1 to prove its capabilities beyond a special effects supplier will be crucial for the company's valuation growth [20]
专家指具身智能大规模落地仍处于早期阶段
Zhong Guo Xin Wen Wang· 2025-12-13 12:33
中新社北京12月13日电 (记者刘育英)13日在北京举行的2026中国信通院深度观察报告会上,中国信息通 信研究院副总工程师许志远认为,当前具身智能已经取得认知智能与物理智能的双线突破,但大规模落 地仍处于早期阶段。 展望未来,许志远认为,在VLA(视觉-语言-动作模型)的基础上引入世界模型(World Model),借助其对 物理世界的理解、预测与推演能力,有望成为进一步提升机器人大模型能力的重要发展路径。(完) 他表示,当前具身智能模型路线、数据范式以及最佳机器人形态仍未定型,大规模落地仍处于早期阶 段,其未来方向仍在持续竞争与快速演化中。 "当前行业仍面临三个核心焦点问题。"许志远表示,一是模型路线之争,即大模型范式是否适用于机器 人。虽然大模型在语言、图像、视频领域取得巨大成功,但"同样的范式能否直接迁移到机器人控制"仍 未被证明,业界正在探索多种途径。 二是数据训练范式之争。数据仍然是限制机器人能力跃升的核心瓶颈,混合数据、多模态数据、世界模 型生成数据等方向均在探索中。 三是形态路线之争,即人形机器人是否是"真需求"。当前,特斯拉、Figure AI等企业坚持全人形路线; 而中国国内今年涌现出多款 ...
“世界模型”竞赛升级:Runway推出GWM-1,实时交互可持续数分钟之久
Hua Er Jie Jian Wen· 2025-12-13 10:36
Core Insights - Runway has launched its first General World Model (GWM-1), entering the competitive "world simulation" arena dominated by giants like Google and Nvidia [1] - GWM-1 is designed to understand physical laws, geometric structures, and environmental dynamics, focusing on "coherence" and "interactivity" [1] - The model consists of three specialized autoregressive models: GWM-Worlds, GWM-Robotics, and GWM-Avatars, all built on Runway's latest Gen-4.5 base model [3] GWM-Worlds - GWM-Worlds allows users to interactively explore digital environments, predicting the next frame based on user inputs [4] - It generates environments at 24fps and 720p resolution, enabling real-time changes in camera angles and environmental conditions [4] - The model aims to provide a training ground for AI agents to navigate and act in the physical world [4] GWM-Robotics - GWM-Robotics addresses the challenge of data scarcity in robotics by generating high-quality synthetic data for various environmental scenarios [6] - This approach significantly reduces training costs and helps predict compliance risks before deploying robots in real-world settings [6] - Runway is actively engaging with robotics companies and offering GWM-Robotics through SDK to expand its B2B industrial client base [6] GWM-Avatars - GWM-Avatars integrates video generation with voice interaction, enabling digital avatars to engage in long-duration conversations without quality loss [8] - If successful, this technology could disrupt customer service and online education sectors [8] Base Model Evolution and Computational Power - Runway has upgraded its Gen-4.5 model to enhance its video generation capabilities, supporting one-minute video generation with consistent character portrayal and native dialogue [10] - The company has partnered with CoreWeave to utilize Nvidia's cloud infrastructure for model training and inference, addressing the computational demands of world simulation [10] Strategic Expansion - Runway's strategy is rapidly evolving from a creative tool for film to a simulator for robotics, but it faces stiff competition from established players like Google and Nvidia [11] - The ability to prove its worth as an "AI architect" in the physical world will be crucial for Runway's valuation and future growth [11]
GAIR 2025 世界模型分论坛:从通用感知到视频、物理世界模型的百家争鸣
雷峰网· 2025-12-13 09:13
" 具身智能爆发第三年,世界模型凝聚了哪些共识? " 作者丨 张进 吴彤 梁丙鉴 刘欣 齐铖湧 编辑丨 林觉民 马晓宁 13 日,第八届 GAIR 全球人工智能与机器人大会世界模型分论坛圆满成功。 这场的演讲嘉宾是在世界模型领域,研究不同方向的五位青年学者,他们带来了五场围绕世界模型的精彩 演讲,话题聚焦通用感知、三维技术、物理模型、世界模型、数字人重建。通过他们的演讲、我们得以窥 见当下围绕着世界模型的研究是多么广泛与丰富。 目前,世界模型的研究尚处于起步阶段,共识尚未形成,有关该领域的研究形成了无数支流,而这股潮流 中,今天到场的几位嘉宾,用他们的智慧和力量给世界模型领域研究带来了不同的启发。 浙江大学研究员彭思达:面向具身智能的通用空间感知技术 在"世界模型"分论坛上,首位演讲者是浙江大学研究员彭思达。他是浙江大学软件学院"百人计划"研究 员、博士生导师,研究方向为三维计算机视觉和计算机图形学。此次他带来的主题演讲是《面向具身智能 的通用空间感知技术》,介绍了其团队近期在赋予机器人通用感知能力方面的多项工作。 团队主要聚焦于赋予机器人三项基础能力:一是相机定位(Camera Pose Estimatio ...
港中深韩晓光:3DGen,人类安全感之战丨GAIR 2025
雷峰网· 2025-12-13 09:13
" 构建世界模型,为什么不能只靠「炼丹」? " 作者丨吴彤 编辑丨 林觉民 在香港中文大学(深圳),助理教授韩晓光的实验室名为GAP,意为"像素、点与多边形的生成与分析"。 现在看来,这个名字,也隐喻着他希望弥合真实世界和虚拟世界之间的"鸿沟"的意思。 2018年,韩晓光加入这所大学时,是当时唯一专注于计算机图形学研究的教师。2024年,他尝试从三维 重建拓展至具身智能与世界模型,又一次如入无人之境。 在小红书上,他的账号@韩晓光,简介仅有两行:港中深理工学院助理教授、图形学与三维视觉。他将小 红书视为传播平台,也视为个人思考的整理场所,会公开讨论"显式3D是否还有必要"、"世界模型为何需 要可解释性"等专业问题,也会记录与学生讨论时获得的启发。 这种直接、平实的分享,吸引了一批对技术本质感兴趣的读者,也代表了韩晓光这类青年教师群体打破学 术边界的自觉实践。从某一种角度看,构建世界模型需要理解真实世界的运行逻辑,而他的线上互动,本 身就是一场持续进行的、小规模的"世界模拟"。 在韩晓光的叙述中,他研究演进是自然发生的。从三维重建到动态生成,再到服务于机器人的虚拟环境构 建,核心始终是"三维内容的生成与理解"。 ...
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果!理想高管回应宇树王兴兴质疑,多家车企押注的VLA,靠谱吗?
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]