世界模型
Search documents
小鹏刚刚发布了VLA 2.0,但去掉了语言转译......
自动驾驶之心· 2025-11-06 00:04
Core Viewpoint - Xiaopeng Motors has recently released VLA 2.0, which represents a significant advancement in autonomous driving technology, particularly in the context of competing with Tesla's innovations [2][10]. Summary by Sections VLA Development - Xiaopeng's VLA is being developed in two parallel paths: V/L→A and V→L→A, with the former aligning more closely with Tesla's recent ICCV sharing, where L is not a middleware but a parallel input to V [3][6]. - The V/L→A model eliminates language translation while maintaining a focus on visual inputs [6]. Technical Specifications - The first mass-produced physical world model boasts a maximum effective computing power of 2250 TOPS [6]. - Future plans include entering the robotaxi market, utilizing four Turing AI chips with a total computing power of 3000 TOPS [8]. Industry Context - The competition in L3 technology is intensifying, with various companies analyzing and following Xiaopeng's VLA developments [10]. - The ongoing debate between world models and VLA pathways remains unresolved, indicating a need for continued exploration in both academic and industrial sectors [10]. Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community has been established to provide a comprehensive platform for learning and sharing knowledge in the autonomous driving field, with over 4000 members and plans to expand to nearly 10,000 [14][31]. - The community offers a variety of resources, including video tutorials, technical discussions, and job placement mechanisms, aimed at both beginners and advanced practitioners in the field [17][29][95].
流形空间CEO武伟:当AI开始“理解世界”,世界模型崛起并重塑智能边界|「锦秋会」分享
锦秋集· 2025-11-05 14:01
Core Insights - The article discusses the evolution of AI towards "world models," which enable AI to simulate and understand the world rather than just generate content. This shift is seen as a critical leap towards "general intelligence" [4][5][9]. Group 1: Definition and Importance of World Models - World models are defined as generative models that can simulate all scenarios, allowing AI to predict and make better decisions through internal simulations rather than relying solely on experience-based learning [15][18]. - The need for world models arises from their ability to construct agent models for better decision-making and to serve as environment models for offline reinforcement learning, enhancing generalization capabilities [18][22]. Group 2: Development and Applications - The development of world models has been rapid, with significant advancements since the 2018 paper "World Models," leading to the emergence of structured models capable of video generation [24][52]. - Key applications of world models include their use in autonomous driving, robotics, and drone technology, where they provide a foundational layer for general intelligence [9][75]. Group 3: Technical Approaches - Various technical approaches to world models are discussed, including explicit physical modeling and the use of generative models that focus on creating environments for reinforcement learning [29][40]. - The article highlights the importance of data collection, representation learning, and architecture improvements to enhance the capabilities of world models [69][71]. Group 4: Future Directions - Future improvements in world models are expected to focus on richer multimodal data collection, stronger representation learning, and the ability to adapt to various tasks and environments [69][70][73]. - The company claims to be the only team globally to have developed a "universal world model" that can be applied across different domains, including ground and aerial intelligent agents [75][81].
对话郎咸朋:VLA 技术论战、团队换血与不被看好时的自我证明
理想TOP2· 2025-11-05 10:29
Core Viewpoint - The article discusses the evolution and strategic decisions of Li Auto's autonomous driving team, particularly focusing on the development of the VLA (Vision-Language-Action) model, which aims to enhance the driving experience by enabling the system to think like a human rather than merely mimicking driving behavior [3][4][20]. Organizational Changes - On September 19, Li Auto restructured its autonomous driving R&D department into 11 secondary departments to promote a more efficient AI-oriented organization [6]. - The restructuring aims to enhance communication and decision-making efficiency, with all department leaders reporting directly to the head of the autonomous driving team [7]. Technical Development - Li Auto's autonomous driving team initially faced challenges due to late entry into the market, but has since made significant progress by adopting an "end-to-end" approach and now focusing on the VLA model [3][4]. - The VLA model utilizes multi-modal AI to improve the driving experience, emphasizing the system's ability to think and reason [3][4][20]. Industry Reactions - Industry experts, including Huawei and Bosch representatives, have expressed skepticism about the feasibility of the VLA model, citing challenges in multi-modal feature alignment and data training [4][22]. - The criticism from competitors is viewed by Li Auto as validation of the VLA's potential, suggesting that the model's complexity is a necessary step for advancement [20][25]. Future Outlook - Li Auto anticipates that by early next year, significant improvements in the VLA model will be evident, enhancing its competitive position in the autonomous driving market [4][25]. - The company aims to achieve L4 level autonomous driving by 2027, with a focus on building a robust data feedback loop to continuously improve the system's capabilities [43][44].
清华团队提出AirScape:动作意图可控的低空世界模型,全面开源!
具身智能之心· 2025-11-05 09:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Baining Zhao等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 人类空间感的重要组成部分之一,是对自身移动会产生的视觉观测变化的预期。这对于空间移动下的任务/动作决策至关重要。 因此,推演和想象是具身智能领域的基础问题之一,表现为预测:如果本体执行移动意图,那么具身观测将会如何变化。 现有世界模型的研究主要聚焦于人形机器人和自动驾驶应用,它们大多在二维平面上操作,动作空间有限。 具体而言,关键挑战包括: 为此,清华大学团队提出 AirScape ,专为六自由度(6DoF)空中具身智能体设计的生成式世界模型。 利用提出的 11k 视频-意图对数据集 ,对视频生成基础模型进行监督微调。这一阶段使模型获得对低空动作意图的基本理解和生成能力。 AirScape 能基于当前的低空视觉观测和动作意图,推演未来的序列观测。 项目的数据集和代码已全面开源。 低空世界模型数据集 为支撑低空世界 ...
极佳视界获新一轮亿元级 A1 轮融资,CEO:“物理世界 ChatGPT 时刻”将在 2 至 3 年内到来
AI前线· 2025-11-05 05:09
Core Viewpoint - The article discusses the recent financing round of GigaVision, highlighting its focus on physical AI and the development of world models that drive general intelligence in the physical world. The company has completed three rounds of financing within two months, indicating strong investor interest and confidence in its technology and market potential [2][4]. Financing and Company Background - GigaVision has successfully completed a new round of financing amounting to hundreds of millions, led by Huawei Hubble and Huakong Fund. This follows two previous rounds of financing in August, also totaling hundreds of millions [2]. - Founded in 2023, GigaVision focuses on physical AI and offers a range of products including the GigaWorld platform, GigaBrain model, and Maker ontology [2][4]. Team and Expertise - The core team of GigaVision is closely associated with Tsinghua University's Automation Department and includes top researchers from prestigious institutions and executives from leading companies like Baidu and Microsoft. The team has published over 200 top AI papers and won numerous global AI competition awards [4]. World Model Technology - GigaVision emphasizes the immediate value of world model technology, which addresses issues such as high-dimensional data scarcity and the Sim2Real gap in traditional simulators. This technology allows AI to model physical environments digitally, improving decision-making and reducing trial-and-error in unfamiliar settings [6][9]. - Major tech companies like NVIDIA, Google DeepMind, and Tesla are also investing in world model applications, indicating its significance in the industry [6][7]. Future Predictions and Goals - GigaVision's CEO predicts that a "Physical World ChatGPT moment" will occur within 2 to 3 years, driven by advancements in world models, VLA, and reinforcement learning, aiming for a 95% success rate in 90% of common tasks [8][14]. - The company aims to create a high-availability world model system that can learn from limited real data, generate high-fidelity synthetic data, and enhance the realism of generated data through multi-modal feedback [9][10]. Collaborations and Market Strategy - GigaVision has established deep collaborations with various humanoid robot innovation centers, research institutions, and cloud computing companies to build a leading data factory and physical AI platform [13]. - The company plans to continue advancing physical AI model development and commercial applications, focusing on a three-pronged approach of "intelligence - ontology - scenarios" to accelerate the realization of its vision [14].
谷歌Dreamer大神离职,自曝错过Transformer
3 6 Ke· 2025-11-05 02:20
刚刚,「Dreamer」大神Danijar Hafner,宣布离开他曾工作近十年的谷歌。 离职前Danijar担任Google DeepMind旧金山分部的资深研究科学家(Staff Research Scientist)。 他的研究目标是「构建能够理解世界并与世界互动的通用智能体」。 作为谷歌世界模型大牛,Danijar曾主导/联合主导了Dreamer系列(Dreamer、DreamerV3、Dreamer4 等)的开发。 Danijar Hafner 他在推文中写道:「今天是我在DeepMind的最后一天」。 回顾了在Google和DeepMind将近10年的工作经历,Danijar认为「一个重要的篇章走到了终点」。 Danijar在谷歌的早期经历,多是以研究员的身份参与谷歌研究院、DeepMind、Brain Team等团队的工作。 从他的教育经历中,也能清晰看出他的职业发展轨迹。 | Researcher 研究员 | Google (google.com) | | 2023 - Present | | --- | --- | --- | --- | | 谷歌 (google.com) | | | 20 ...
理想郎咸朋:VLA 加强化学习将成为车企真正的护城河
晚点LatePost· 2025-11-04 08:03
Core Viewpoint - The article discusses the evolution of Li Auto's autonomous driving technology, particularly focusing on the development and implementation of the VLA (Vision-Language-Action) model, which aims to enhance the driving experience by integrating multi-modal AI capabilities. The article highlights the challenges faced by the team, the strategic decisions made, and the competitive landscape in the autonomous driving sector [5][6][18]. Team Development and Structure - The Li Auto autonomous driving team has undergone significant changes since its inception in 2018, with three generations of core personnel. The recent restructuring aimed to create a flatter organization with 11 new departments, enhancing communication and decision-making efficiency [8][9][51]. - The team has shifted from a centralized, closed development model to a more open and collaborative approach, reflecting the need for agility in AI development [10][11]. Strategic Decisions - The decision to pursue the VLA model was driven by the recognition that simply following existing paths, such as those taken by competitors like Huawei and Tesla, would not suffice. The team aimed to create a new competitive edge through innovative technology [6][14][18]. - The VLA model is positioned as a significant advancement over previous methods, with the goal of achieving L4 level autonomous driving capabilities. The model emphasizes the importance of human-like reasoning and decision-making in driving [21][29]. Challenges and Criticism - The VLA model has faced skepticism from industry experts, with concerns about its feasibility and the technical challenges associated with multi-modal AI integration. Critics argue that the approach may be overly simplistic or "tricksy" compared to other methods [22][24]. - Despite the criticism, the team believes that the challenges presented by the VLA model are indicative of its potential correctness and innovation [24][25]. Future Outlook - The company aims to establish a robust reinforcement learning loop to enhance the VLA model's capabilities, with expectations of significant improvements in user experience by the end of 2023 and into 2024 [28][39]. - The long-term vision includes achieving L4 autonomous driving by 2027, with a focus on building a comprehensive data-driven ecosystem that supports continuous learning and adaptation [41][44].
对话郎咸朋:VLA 技术论战、团队换血与不被看好时的自我证明
晚点Auto· 2025-11-04 03:58
Core Viewpoint - The article discusses the evolution of Li Auto's autonomous driving technology, particularly focusing on the development and implementation of the VLA (Vision-Language-Action) model, which aims to enhance the driving experience by enabling the system to think like a human rather than merely mimicking driving behavior [2][3][4]. Development of Li Auto's Autonomous Driving Team - The autonomous driving team at Li Auto was established in 2018 and has undergone three generations of key personnel changes, reflecting the challenges and growth within the organization [4][7][46]. - The team initially lacked resources and had to adapt by retrofitting existing vehicles with laser radar for technology research [3][4]. Shift to VLA Model - Li Auto transitioned to the VLA model to differentiate itself from competitors like Huawei and Tesla, emphasizing the need for next-generation technology rather than merely following existing paths [3][4][17]. - The VLA model utilizes multi-modal AI to improve the driving experience, aiming for a more human-like decision-making process [3][4][21]. Internal and External Challenges - The development of VLA has faced internal team restructuring and external skepticism, with industry leaders questioning its feasibility and effectiveness [3][4][21][22]. - Despite criticism, the company believes that the challenges posed by competitors validate the direction of the VLA model [4][21]. Organizational Changes - In September 2023, Li Auto restructured its autonomous driving department into 11 sub-departments to promote a more efficient and AI-focused organization [6][7]. - The new structure aims to enhance communication and decision-making efficiency, moving away from a centralized development model [8][9]. Future Goals and Expectations - Li Auto aims to achieve L4 level autonomous driving by 2027, with significant milestones set for 2021 and 2023 [37][39]. - The company anticipates that the VLA model will enable self-iteration and improvement, potentially surpassing competitors in the Chinese market [39][40]. Technical Considerations - The VLA model is designed to operate on existing autonomous driving chips, although these chips were not originally optimized for large models [33][34]. - Li Auto is investing in cloud computing capabilities, with a current training capacity of 10 EFLOPS and plans for further expansion [32][33]. Market Positioning - The company is focused on establishing a strong market presence in China before expanding internationally, recognizing the unique challenges of commercializing autonomous driving technology [41][42].
从DriveVLA-W0出发:探讨世界模型如何放大VLA的扩展定律(中科院)
自动驾驶之心· 2025-11-04 00:03
戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 在自动驾驶领域,通过大规模数据来扩展视觉-语言-动作模型,为构建更通用的驾驶智能提供了一条充满前景的道路。然而,VLA模型一直面临" 监督缺失 "的问 题:其庞大的模型能力仅由稀疏、低维的动作信号进行监督,导致其大部分表征潜力未能得到充分利用。 为解决此问题,中科院和华为引望的团队提出了 DriveVLA-W0, 一种利用世界模型来预测未来图像的训练范式。 为验证DriveVLA-W0的通用性,本文在两种主流 VLA架构上展开验证:针对采用离散视觉token的VLA模型,设计自回归世界模型;针对基于连续视觉特征的VLA模型,设计扩散世界模型。基于世界建模学习到的 丰富表征,本文进一步引入轻量级动作专家(action expert),以解决实时部署中的推理耗时问题。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 DriveVLA-W0: 利用世界模型放大VLA的 拓展定律 时间:11.4 / 19:30-20:30 直播简介 VLA模型是通向通用自动驾驶的希望路 径,却受限于"监督赤字": ...
极佳视界完成新一轮亿元级A1轮融资 华为哈勃和华控基金联合领投
Zheng Quan Shi Bao Wang· 2025-11-03 11:36
Group 1 - The core viewpoint of the news is that the company, Jijiashijie, has successfully completed a new round of financing amounting to hundreds of millions, led by Huawei Hubble and Huakong Fund [1] - Jijiashijie focuses on physical AI and aims to develop "world model-driven general intelligence for the physical world," with products including GigaWorld, GigaBrain, and Maker [1] - The founder and CEO, Huang Guan, emphasizes that world models are a key and popular direction for embodied intelligence, supported by Huawei's recognition of world models as a top technology trend for 2035 [1][2] Group 2 - Huang Guan predicts that the "physical world ChatGPT moment" will arrive within 2 to 3 years, with world models addressing generalization, VLA handling task complexity, and reinforcement learning improving accuracy and reliability [2] - Jijiashijie plans to continue advancing the development of physical AI intelligent models and accelerate the commercialization of benchmark scenarios through a "smart-body-scenario" integration [2]