Workflow
世界模型
icon
Search documents
蔚来任少卿:世界模型解决的是时空认知,VLA做不到。
自动驾驶之心· 2025-10-09 23:32
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system that goes beyond language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level rather than converting information into language first [3][5]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model is not merely an addition of language but a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporation of physical laws such as gravity and inertia into the model [6]. 2. Ability to understand and predict object movements in three-dimensional space over time [6]. 3. The vast amount of data absorbed by language models from the internet aids in training autonomous driving systems [7]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to other fields. The ongoing debate between VLA and WA represents a larger industry transformation [9]. - The article suggests that those who remain in the industry must be versatile talents with rich technical backgrounds, as the market is expected to undergo significant changes [9]. Community and Learning Resources - A community platform has been established to provide resources for learning and sharing knowledge about autonomous driving, including video tutorials, technical discussions, and job opportunities [11][12][24]. - The community aims to gather individuals from various academic and industrial backgrounds to foster collaboration and knowledge sharing [25].
任少卿的智驾非共识:世界模型、长时序智能体与 “变态” 工程主义
晚点Auto· 2025-10-09 12:17
以下文章来源于晚点LatePost ,作者晚点团队 晚点LatePost . 晚一点,好一点 留在智能驾驶,不是因为容易,而是因为更难。 文 丨 魏冰 宋玮 编辑 丨 宋玮 任少卿的头发很有辨识度,浓密、微卷,刘海盖住额头。走进会议室,第一次见他的人把他当成了实习生,知道身 份后调侃说,只有在 AI 创业公司才能看到这么年轻的技术 leader。 "我们就是 AI 公司"——任少卿一本正经的回答。 但他身处的是蔚来,一家还在血海中搏杀的汽车制造商,而他的战场,是智能驾驶。这个反常回答,和他的人生轨 迹相似:总在别人以为答案已定的时候,他偏要走向另一个方向。 2007 年他考入中科大,2016 年博士毕业。期间他提出了 Faster R-CNN(一种基于深度学习的目标检测框架),又 和当时微软亚研院视觉计算组的孙剑、何恺明,博士生张祥雨一起研究 ResNet(残差网络)。后者解决了神经网络 越深越 "失忆" 的难题,让模型可以无限叠加层数,被视为深度学习史上的里程碑。当时任少卿 27 岁。 2016 年,他与曹旭东共同创立自动驾驶公司 Momenta,亲历了自动驾驶最热的创业年代。4 年后,他离开一手创立 的公 ...
任少卿的智驾非共识:世界模型、长时序智能体与 “变态” 工程主义
晚点LatePost· 2025-10-09 10:14
留在智能驾驶,不是因为容易,而是因为更难。 文 丨 魏冰 宋玮 编辑 丨 宋玮 任少卿的头发很有辨识度,浓密、微卷,刘海盖住额头。走进会议室,第一次见他的人把他当成了实习生,知道身 份后调侃说,只有在 AI 创业公司才能看到这么年轻的技术 leader。 "我们就是 AI 公司"——任少卿一本正经的回答。 但他身处的是蔚来,一家还在血海中搏杀的汽车制造商,而他的战场,是智能驾驶。这个反常回答,和他的人生轨 迹相似:总在别人以为答案已定的时候,他偏要走向另一个方向。 2007 年他考入中科大,2016 年博士毕业。期间他提出了 Faster R-CNN(一种基于深度学习的目标检测框架),又 和当时微软亚研院视觉计算组的孙剑、何恺明,博士生张祥雨一起研究 ResNet(残差网络)。后者解决了神经网络 越深越 "失忆" 的难题,让模型可以无限叠加层数,被视为深度学习史上的里程碑。当时任少卿 27 岁。 2016 年,他与曹旭东共同创立自动驾驶公司 Momenta,亲历了自动驾驶最热的创业年代。4 年后,他离开一手创立 的公司,转身去了还在低谷挣扎的蔚来。 原因很简单,当年 AI 发展撞上瓶颈,他认为下一次突破只能靠 ...
金融时报:超级智能的下一个入口,谷歌、Meta、英伟达......科技巨头都在加码“世界模型”
美股IPO· 2025-09-29 08:51
Core Viewpoint - Major AI companies like Google DeepMind, Meta, and Nvidia are shifting their R&D focus towards "world models" to gain an edge in the race towards machine "superintelligence" [1][3][7] Group 1: Market Potential - The potential market size for "world models" is estimated to be as high as $100 trillion, encompassing sectors such as autonomous driving, robotics, and manufacturing [1][3][4] Group 2: Technological Developments - Recent advancements in "world models" have been highlighted by various AI companies, with Google DeepMind releasing Genie 3, which generates video frame by frame, allowing for scalable AI training without real-world consequences [5] - Meta is training its V-JEPA model using raw video content to mimic children's passive learning through observation, with ongoing tests on robots [5] - Nvidia's CEO has stated that the next major growth phase for the company will come from "physical AI," leveraging its Omniverse platform for simulations to support expansion into robotics [5] Group 3: Applications and Innovations - "World models" are being applied in the entertainment industry, with startups like World Labs developing models that generate 3D environments from single images, and Runway creating game scenes that better understand physical laws [6] Group 4: Industry Challenges - The shift towards "world models" is driven by the perception that large language models (LLMs) are reaching their performance ceiling, with significant investments from major companies [7][8] - Despite the promising outlook, building these models requires vast amounts of physical world data and computational power, which remains a significant technical challenge [9] - Experts believe that achieving human-level intelligence in machines driven by next-generation AI systems may still take up to a decade [9]
对比之后,VLA的成熟度远高于世界模型...
自动驾驶之心· 2025-09-26 16:03
作者 | 周彦武 来源 | 佐思汽车研究 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 首先需要指出VLA和世界模型都是端到端的一种,尽管很多人都认为一段式端到端比分段式优秀,但无论是产业界还是学术界,90%以上都是分段式端到端,纯 粹的VLA和世界模型非常罕见。 代表VLA阵营出战的是高德地图的 模型,地平线的SENNA模型,还有加州大学洛杉矶分校的AutoVLA。代表世界模型出战的有和特斯拉中国 FSD很接近的上海AI实验室的GenAD模型,做重卡自动驾驶的中科慧拓的GenAD模型,华为和浙江大学合作的Drive-OccWorld,还有理想汽车的World4Drive,理 想汽车尽管推崇VLA,但对世界模型的研究水平也是极高的。 | 模型名称 | L2平均距离(米) | 3秒平均碰撞率 | 备注 | | --- | --- | --- | --- | | AutoDrive-R2 | 0.19 | | 70亿参数版 | | AutoDrive-R2 | 0.49 | | 30亿参数版 | | SENNA | 0.22 | 0.08% | 加入自车状态 ...
高通组局,宇树王兴兴说了一堆大实话
量子位· 2025-09-26 09:12
Core Viewpoint - The article discusses the challenges and opportunities in the field of embodied intelligence and robotics, emphasizing the importance of collaboration among industry players to address technical difficulties and accelerate progress [3][25][48]. Group 1: Industry Challenges - The current state of robotics is characterized by diverse technical routes, leading to a lack of significant progress despite the apparent excitement in the field [4][25]. - Many robotics and chip manufacturers overlook the critical role of chips in robotics, which is essential for enhancing performance and reliability [16][18]. - The industry faces difficulties in deploying large-scale computing power in robots due to space constraints, battery capacity, and heat dissipation issues [20][21]. Group 2: Technological Developments - The goal of companies like Yushu Technology is to develop universal AI for robots that can perform various tasks in unfamiliar environments, akin to a "ChatGPT moment" for robotics [11][12]. - The development stages for achieving advanced robotic capabilities include fixed action demonstrations, real-time action generation, task execution in unfamiliar settings, and achieving high success rates in delicate operations [12]. - The future of embodied intelligence in robotics may involve using mobile phone chips, which could provide significant potential for innovation [24]. Group 3: Collaboration and Open Source - The article highlights the importance of open-sourcing models to foster collaboration and accelerate advancements in the field, similar to OpenAI's approach with earlier GPT models [28][29]. - Companies are encouraged to maintain an open attitude towards various models and collaborate with third parties to enhance development [30][31]. Group 4: AI and Agent Systems - The article discusses the role of agent systems in AI, emphasizing the need for end-cloud collaboration to improve user experience and privacy [35][36]. - The demand for end-side models is increasing, as they are crucial for understanding user needs and facilitating communication with cloud models [39][40]. - The industry lacks a unified standard for AI applications across different devices, leading to high development costs and fragmentation [48][50]. Group 5: Future Directions - The future of AI in robotics and other sectors will likely involve creating a cross-terminal operating system that integrates various services and enhances user experience [50][51]. - Collaboration among industry players is essential for building the necessary infrastructure and supporting innovation in smart devices [51].
车圈一个月48位高管变动,新一轮的变革要开始了......
自动驾驶之心· 2025-09-25 03:45
Group 1 - The automotive industry is undergoing a new round of transformation, with significant executive changes in various companies, including Li Auto, BYD, and Changan Automobile [1] - The autonomous driving sector is rapidly evolving, with a shift in focus from traditional methods to new algorithms and models, emphasizing the need for continuous learning and adaptation [2][3] - The community is actively engaging in discussions about the future of autonomous driving, exploring new article styles and hosting online events with industry leaders [3][6] Group 2 - The community has developed platforms for autonomous driving, embodied intelligence, and large models, aiming to explore new opportunities amidst constant change [3][4] - A comprehensive resource has been created within the community, offering over 40 technical routes and addressing practical questions related to autonomous driving [5][8] - The community is focused on providing a collaborative environment for both beginners and advanced practitioners, facilitating knowledge sharing and networking [10][14] Group 3 - The community offers a variety of learning resources, including video tutorials and structured learning paths for newcomers to the field of autonomous driving [11][13] - Regular discussions and Q&A sessions are held to address industry-related queries, such as entry points into end-to-end autonomous driving and the applicability of multi-sensor fusion [17][19] - The community aims to grow its membership significantly over the next two years, enhancing its role as a hub for technical exchange and career opportunities in the autonomous driving sector [3][19]
华为坚定要走的世界模型路线,到底是什么?
自动驾驶之心· 2025-09-24 23:33
一、引言 世界建模已成为人工智能(AI)与机器人领域的一项基础性任务,其核心目标是使智能体具备理解、表示并预测其所处动态环境的能力。近年来,生成 式建模技术(包括变分自编码器(VAEs)、生成对抗网络(GANs)、扩散模型(diffusion models)和自回归模型(autoregressive models))取得了显 著进展,通过实现复杂的生成与预测能力,极大地丰富了该领域的研究内容。 然而,这些进展在很大程度上集中于2D数据,主要是图像或视频。与之形成对比的是,现实世界场景本质上处于3D空间中且具有动态特性,通常需要 利用原生3D与4D表示的模型。这类表示包括RGB-D图像、占用网格、激光雷达点云,以及能够捕捉时间动态的时序形式。这些模态可提供明确的几何 信息和物理基础,对于自主驾驶、机器人等嵌入式系统(embodied systems)和安全关键系统(safety-critical systems)而言至关重要。 除上述原生格式外,世界建模的研究也已拓展至相邻领域。部分研究关注视频、全景或基于网格(mesh)的数据,此类系统具备大规模、通用的视频- 网格生成能力;与此同时,另一类研究聚焦于3D物体 ...
PhysicalAgent:迈向通用认知机器人的基础世界模型框架
具身智能之心· 2025-09-20 16:03
Core Viewpoint - The article discusses the development of a new robotic control framework called PhysicalAgent, which aims to overcome existing limitations in the field of robot manipulation by integrating iterative reasoning, diffusion video generation, and closed-loop execution [2][4]. Group 1: Key Challenges in Robotics - Current mainstream visual-language-action (VLM) models require task-specific fine-tuning, leading to a significant drop in robustness when switching robots or environments [2]. - World model-based methods depend on specially trained predictive models and carefully curated training data, limiting their generalizability [2]. Group 2: Framework Design and Principles - The PhysicalAgent framework separates perception and reasoning from specific robot forms, requiring only lightweight skeletal detection models for different robots, which minimizes computational costs and data requirements [4]. - The framework leverages pre-trained video generation models that understand physical processes and object interactions, allowing for quick integration without local training [4]. - It aligns human-like reasoning by generating visual representations of actions based on textual instructions, facilitating intuitive robot control [4]. Group 3: VLM's Grounding Reasoning Role - The VLM serves as the cognitive core of the framework, enabling grounding through multiple calls to achieve "instruction-environment-execution" rather than a single planning step [6]. - The framework innovatively reconstructs action generation as conditional video synthesis, moving away from traditional direct control strategy learning [6]. Group 4: Execution Process and Adaptation - The robot adaptation layer translates generated action videos into motor commands, which is the only part requiring robot-specific adaptation [6]. - The process includes task decomposition, contextual scene description, execution monitoring, and model independence, allowing for flexibility in model selection [6]. Group 5: Experimental Validation - Experiments validate the framework's cross-form and perception modality generalization, as well as the robustness of iterative execution [8]. - The first experiment demonstrated that the framework significantly outperformed task-specific baselines in success rates across different robotic platforms [12]. - The second experiment confirmed the robustness of the iterative "Perceive→Plan→Reason→Act" pipeline, achieving an 80% success rate across physical robots [13].
那些号称端到端包治百病的人,压根从来没做过PnC......
自动驾驶之心· 2025-09-16 23:33
Core Viewpoint - The article discusses the current state and future potential of end-to-end (E2E) autonomous driving systems, emphasizing the need for a shift from modular to E2E approaches in the industry, while acknowledging the challenges and limitations that still exist in achieving maturity in this technology [3][5]. Group 1: End-to-End Autonomous Driving - The concept of end-to-end systems involves directly processing raw sensor data to output control signals for vehicles, representing a significant shift from traditional modular approaches [3][4]. - E2E systems are seen as a way to provide a comprehensive representation of the information affecting vehicle behavior, which is crucial for handling the open-set scenarios of autonomous driving [4]. - The industry is currently divided, with some companies focusing on Vehicle Language Architecture (VLA) and others on traditional methods, but there is a consensus that E2E systems are the future [2][5]. Group 2: Industry Trends and Challenges - There is a growing recognition that autonomous driving is transitioning from rule-based to knowledge-driven systems, which necessitates a deeper understanding of E2E methodologies [5]. - Despite the high potential of E2E systems, there are still significant challenges to overcome before they can fully replace traditional planning and control methods [5]. - The article suggests that companies should allow more time for E2E systems to mature rather than rushing to implement them without adequate understanding [5]. Group 3: Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community aims to provide a platform for sharing knowledge and resources related to autonomous driving, including technical routes and job opportunities [8][18]. - The community has gathered over 4,000 members and aims to expand to nearly 10,000 within two years, offering a space for both beginners and advanced learners to engage with industry experts [8][18]. - Various learning resources, including video tutorials and technical discussions, are available to help members navigate the complexities of autonomous driving technologies [12][18].