Workflow
时空认知
icon
Search documents
蔚来任少卿:世界模型解决的是时空认知,VLA做不到。
自动驾驶之心· 2025-10-09 23:32
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system that goes beyond language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level rather than converting information into language first [3][5]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model is not merely an addition of language but a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporation of physical laws such as gravity and inertia into the model [6]. 2. Ability to understand and predict object movements in three-dimensional space over time [6]. 3. The vast amount of data absorbed by language models from the internet aids in training autonomous driving systems [7]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to other fields. The ongoing debate between VLA and WA represents a larger industry transformation [9]. - The article suggests that those who remain in the industry must be versatile talents with rich technical backgrounds, as the market is expected to undergo significant changes [9]. Community and Learning Resources - A community platform has been established to provide resources for learning and sharing knowledge about autonomous driving, including video tutorials, technical discussions, and job opportunities [11][12][24]. - The community aims to gather individuals from various academic and industrial backgrounds to foster collaboration and knowledge sharing [25].
任少卿的智驾非共识:世界模型、长时序智能体与 “变态” 工程主义
晚点LatePost· 2025-10-09 10:14
留在智能驾驶,不是因为容易,而是因为更难。 文 丨 魏冰 宋玮 编辑 丨 宋玮 任少卿的头发很有辨识度,浓密、微卷,刘海盖住额头。走进会议室,第一次见他的人把他当成了实习生,知道身 份后调侃说,只有在 AI 创业公司才能看到这么年轻的技术 leader。 "我们就是 AI 公司"——任少卿一本正经的回答。 但他身处的是蔚来,一家还在血海中搏杀的汽车制造商,而他的战场,是智能驾驶。这个反常回答,和他的人生轨 迹相似:总在别人以为答案已定的时候,他偏要走向另一个方向。 2007 年他考入中科大,2016 年博士毕业。期间他提出了 Faster R-CNN(一种基于深度学习的目标检测框架),又 和当时微软亚研院视觉计算组的孙剑、何恺明,博士生张祥雨一起研究 ResNet(残差网络)。后者解决了神经网络 越深越 "失忆" 的难题,让模型可以无限叠加层数,被视为深度学习史上的里程碑。当时任少卿 27 岁。 2016 年,他与曹旭东共同创立自动驾驶公司 Momenta,亲历了自动驾驶最热的创业年代。4 年后,他离开一手创立 的公司,转身去了还在低谷挣扎的蔚来。 原因很简单,当年 AI 发展撞上瓶颈,他认为下一次突破只能靠 ...