Workflow
世界模型
icon
Search documents
世界模型能够从根本上解决VLA系统对数据的依赖,是伪命题...
自动驾驶之心· 2025-11-22 02:01
Core Viewpoint - The article discusses the ongoing debate between two approaches in the autonomous driving sector: the VLA (Vision-Language Action) route favored by companies like Xiaopeng, Li Auto, and Yuanrong Qixing, and the World Model (WA) approach promoted by Huawei and NIO. It argues that the WA approach is fundamentally flawed as it relies heavily on data, which is a critical asset in the industry [2][3]. Summary by Sections VLA vs. WA - The VLA approach leverages vast amounts of real-world data to enhance reasoning capabilities, while the WA model seeks to reduce reliance on real data by using simulated data to expand its capabilities. However, the article posits that both approaches are fundamentally about how data is utilized rather than whether data is necessary [2][3]. Data Dependency - Both VLA and WA are built on the premise that "data determines the ceiling" of capabilities. VLA relies on multi-modal data from real scenarios, while WA requires a combination of real and simulated data to enhance its generalization ability. The industry often confuses the "form of data" with its "essence," leading to misconceptions about the role of data in autonomous driving [3]. Industry Insights - The article emphasizes that the real challenge is not whether to depend on data, but how to efficiently utilize it. It highlights that before true artificial intelligence is realized, data will remain the core competitive advantage in the autonomous driving industry [3]. Community and Learning Resources - The article promotes a community platform for knowledge sharing among industry professionals and academics, offering resources such as learning routes, technical discussions, and job opportunities in the autonomous driving field [8][9][18]. Technical Learning and Development - The community provides a comprehensive set of learning materials covering over 40 technical directions in autonomous driving, including VLA, multi-modal models, and various simulation tools, aimed at both beginners and advanced practitioners [19][39]. Networking Opportunities - The platform facilitates networking opportunities with industry leaders and experts, allowing members to engage in discussions about trends, technologies, and career development in the autonomous driving sector [22][92].
世界模型崛起,AI路线之争喧嚣再起
3 6 Ke· 2025-11-20 01:58
Core Insights - The future of AI may hinge on understanding the evolutionary codes of the human brain, as highlighted by Yann LeCun's departure from Meta to focus on "World Models" [1] - Fei-Fei Li emphasizes that the advancement of AI should pivot from merely expanding model parameters to embedding "Spatial Intelligence," a fundamental cognitive ability that humans possess from infancy [1][3] - The launch of Marble by World Labs, which utilizes multimodal world models to create persistent 3D digital twin spaces, marks a significant step towards achieving spatial intelligence in AI [1] Group 1: AI Development Perspectives - Yann LeCun's vision diverges from Meta's focus on large language models (LLMs), arguing that LLMs cannot replicate human reasoning capabilities [3] - LLMs are constrained by data quality and scale, leading to cognitive limitations that hinder their ability to model the physical world and perform dynamic causal reasoning [3][4] - The reliance on text data restricts AI's ability to break free from "symbolic cages," necessitating a shift towards a structured understanding of the world for true AI evolution [4] Group 2: World Models vs. Large Language Models - World models are seen as a solution to the fundamental limitations of LLMs, focusing on high-dimensional perceptual data to model the physical world directly [4][5] - The key characteristics of world models include internal representation and prediction, physical cognition, and counterfactual reasoning capabilities [11] - A complete world model consists of state representation, dynamic models, and decision-making models, enabling AI to simulate and plan actions in a virtual environment [12][13] Group 3: Industry Trends and Innovations - Recent advancements in world models have been made by major tech companies, with Google DeepMind's Genie series and Meta's Code World Model leading the charge [16] - The concept of "physical AI" is gaining traction, with Nvidia's CEO asserting that the next growth phase will stem from these new models, which will revolutionize robotics [16] - The application of world models is already influencing various sectors, including autonomous driving and robotics, as companies like Tesla integrate these models for real-world learning and validation [17] Group 4: Challenges and Future Directions - The development of world models faces technical challenges, including the need for extensive multimodal data and the lack of standardized training datasets [20] - Cognitive challenges arise from the complexity of decision-making processes within world models, raising concerns about transparency and alignment with human values [20][21] - Despite the challenges, the global competition in the world model space is intensifying, with the potential to redefine industries and enhance human-AI collaboration [21][22]
图灵奖得主竟「忘了提及」中国学者成果?马库斯重锤Yann LeCun
3 6 Ke· 2025-11-19 11:19
【导读】硅谷最新「神仙打架」来了:一个是预言40年全准的AI教父,一个是嘴炮满级、逮谁怼谁的深度学习头号黑粉。也许LeCun离开Meta仅仅是未 来硅谷AI风起云涌的一次预演。 如果要评选近期AI圈的「超级地震」,Yann LeCun被曝将离开Meta绝对算得上大地震。 作为图灵奖得主,LLM的公开反对者,世界模型的苦行僧,开源模型的守护者,X的全职博主。。。 Yann LeCun身上有太多的标签。 最近外媒WSJ给LeCun写了一篇有点「歌颂功德」味道的文章,声称「他一直正确了40年」。 作为一名和Hinton齐名,乃至共事过的AI老兵,LeCun的地位也当得起这些夸赞。 但这篇文章,和LeCun本人却遭到了马库斯的极度不认可。 马库斯的观点是我们都被他骗了十年,不要将他神化!甚至表示: Yann LeCun赖以成名的CNN卷积神经网络成果晚于1988年张伟等人发表的文章。 虽然LeCun在卷积神经网络的发展过程中发挥了作用。但他既非其发明者,也非首个将反向传播算法应用于训练网络权重的研究者(尽管许多 人误以为是如此)。 如果说Yann LeCun是深度学习阵营内部的「反对派」,他反对LLM,但坚持深度学习 ...
FSD v14里面藏了VLA吗?谁在定义自动驾驶下一代方案:VLA vs WA的一场深入探讨......
自动驾驶之心· 2025-11-17 00:05
江岸青 :早稻田大学博士,博世中央研究院高级算法科学家,vla/闭环算法 研究team leader 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 超五百人预约!赶紧加入 这一个月,大家对智能驾驶的讨论前所未有的高涨。 而这些讨论的背后都有着一个主题: 自动驾驶的下一代方案应该是什么? 今天自动驾驶之心将为大家带来一场重量级的智驾圆桌,汇聚学术界和工业界的多元观点。这一场圆桌将围绕VLA、世界模型展开极其深入 而全面的讨论,包括世界模型和VLA的各种形态,在产业界落地的进展和结合二者的可能性。会谈及近期特斯拉,理想在ICCV发表的技术报 告,DriveVLA-W0和世界模型的技术讨论等等。敬请期待这场深度与前瞻性兼具的思想盛宴。 主讲嘉宾 许凌云 :中国科学院博士,卡内基梅隆机器人研究所博士后。共发表12篇机器人领域顶级期刊或会议文章,获取过DARPA SUBT无人车挑战 赛2019年世界冠军。研究成果主要集中在目标检测、跟踪,从2019年到2024年专注于智能驾驶算法的开发,主导过多个行车和泊车量产项目 ...
李飞飞世界模型爆火后,我们实测后发现离「真可用」还很远
深思SenseAI· 2025-11-14 12:40
Core Insights - The article discusses the launch of World Labs' "world model," which can create 3D worlds based on a single image and prompt words, highlighting its potential and limitations in generating immersive environments [1][19]. Group 1: Functionality and User Experience - The world model can generate environments directly from prompt words or by uploading an image, with the latter yielding better results [1]. - Initial experiences with the model show impressive results in small-scale environments, but quality deteriorates significantly when expanding the generated area [2][3]. - Users experience a noticeable drop in quality and consistency as they move away from the original image, leading to issues like blurriness and distortion [4][5]. Group 2: Limitations and Challenges - The model struggles to maintain detail and consistency in larger environments, resulting in sparse details and a lack of immersive gameplay [5]. - The "world extension" feature, which allows users to generate multiple worlds, still suffers from severe geometric distortions and abstract representations, failing to meet practical needs for playable environments [6][8]. - The multi-image generation feature often gets stuck in loading, indicating performance issues that hinder its usability for creating complex scenes [8][11]. Group 3: Market Position and Future Potential - The article suggests that while the current version of the world model is not fully mature, it represents an early stage in AI-generated gaming and virtual space [19]. - The efforts by the team around "spatial intelligence" are seen as significant, opening new possibilities for future applications in virtual world construction and digital twins [19]. - Despite its limitations, the model serves as a notable starting point for the evolution of spatial computing and content production tools, warranting continued attention in the coming years [19].
特斯拉FSD藏了VLA吗?下周一场VLA和世界模型的深度讨论~
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the development of the Visual-Language-Action (VLA) framework and world models, highlighting the contributions of various experts in the field [1][2][3][4][5]. Group 1: Key Contributors - Jian Kun, a senior director at Li Auto, has built the autonomous driving technology stack from scratch since joining in 2021, achieving milestones such as Highway NoA in 2022 and City NoA in 2023 [1]. - Xu Lingyun, a PhD from the Chinese Academy of Sciences, leads the parking team at Changan Automobile, focusing on autonomous driving perception and end-to-end system research [2]. - Jiang Anqing, a senior algorithm scientist at Bosch, leads research on VLA and closed-loop algorithms [3]. Group 2: Technological Developments - The discussion includes the potential integration of world models and VLA, questioning whether a unified approach is feasible [8]. - The high demand for data and computing power is making it increasingly difficult for academia to participate in intelligent driving advancements, raising questions about future opportunities in the academic sector [8]. Group 3: Event Highlights - A live discussion on the future of autonomous driving technologies, including insights on Tesla's FSD v14 and its implications for domestic technology [4][5]. - The event featured a deep dive into the reliability of VLM in autonomous driving, with expert opinions on data closed-loop engineering [12].
腾讯研究院AI速递 20251113
腾讯研究院· 2025-11-12 16:08
Group 1: Generative AI Developments - Meta's Chief AI Scientist LeCun is leaving the company due to strategic disagreements, focusing on "world models" in a new startup [1] - Google's AI model successfully transcribed an 18th-century ledger with a character error rate of only 1.7%, showcasing advanced abstract reasoning capabilities [2] - ElevenLabs launched the Scribe v2 Realtime model, achieving a 93.5% accuracy rate across 90 languages with a latency of just 150 milliseconds [3] Group 2: AI in Communication and Music - OpenAI is set to introduce a group chat feature for ChatGPT, allowing users to share conversation links while maintaining privacy [4] - An AI-generated song topped the Billboard country digital singles chart, raising concerns about the competition between AI and human artists [5] Group 3: Investment and Financing in AI - The AI company Jiga Vision completed a financing round of over 100 million yuan, with investments from Huawei and other funds [6] - Gamma, an AI presentation tool, raised $68 million in Series B funding, achieving a valuation of $2.1 billion and generating an annual recurring revenue of $100 million [9] Group 4: Programming Language Trends - TypeScript has surpassed Python as the most widely used programming language on GitHub, with a 66% year-over-year increase in contributors [8]
锦秋基金被投企业流形空间3个月融资亿元,证明世界模型也需要预训练 |Jinqiu Spotlight
锦秋集· 2025-11-12 12:44
Core Insights - The article discusses the emergence and potential of world models in AI, particularly focusing on the company Manifold AI and its CEO Wu Wei's vision for developing a robust world model that can understand and predict the physical world [7][10][22]. Investment and Company Overview - Jinqiu Fund has invested in Manifold AI, which has quickly raised over 100 million in seed and angel rounds within three months of its establishment [4][6]. - Jinqiu Fund emphasizes a long-term investment philosophy, seeking breakthrough technologies and innovative business models in general artificial intelligence startups [5]. Technology and Market Trends - The concept of world models is gaining traction, with significant discussions in Silicon Valley about their capabilities, including generative, multimodal, and interactive features [8][9]. - Wu Wei argues that world models can provide superior predictive capabilities compared to Vision-Language-Action (VLA) models, which are limited by their reliance on past experiences [18][22]. Technical Development and Challenges - The development of world models is still in its early stages, with various approaches being explored, including explicit physical modeling and latent space interaction [25][30]. - Manifold AI aims to create a "bodily world model" that can transfer and unify across different scales, contrasting with the top-down strategies of many international teams [33]. Strategic Focus and Market Positioning - Manifold AI prioritizes the robotics and drone sectors over autonomous driving due to the fragmented nature of these markets, which allows for more opportunities for innovation [43][44]. - The company is focused on enabling hardware to possess autonomous reasoning capabilities, moving away from human-controlled operations [46]. Future Goals and Product Development - The company plans to release its first generation of base models based on the World Model Architecture (WMA) by late 2025 to early 2026, aiming to drive advancements in Physical AI Agents [51]. - Wu Wei emphasizes the importance of pre-training models to understand physical world dynamics, which can reduce deployment costs significantly [37][40].
世界模型有望带来机器人与具身智能的下一个“奇点时刻”?
机器人大讲堂· 2025-11-09 15:30
Core Viewpoint - 2023 is recognized as the "Year of Large Models," while 2025 is anticipated to be the eve of the explosion of "World Models," which are reshaping the core logic of embodied intelligence and driving the evolution of the robotics industry towards higher-level intelligence with environmental cognition and proactive decision-making [1]. Summary by Sections World Model Definition and Characteristics - The World Model represents a significant advancement over traditional robotic frameworks, which follow a linear "perception-decision-control" chain. It enables robots to understand, predict, and plan by creating a high-dimensional cognitive model of the real world, allowing for proactive reasoning rather than merely executing commands [2][4]. - The World Model's capabilities are characterized by three internalization features: spatial internalization (transforming 2D data into 3D semantic space), rule internalization (learning basic physical rules), and temporal internalization (integrating historical and real-time data for continuous understanding) [3]. Development and Application of World Models - The concept of World Models has evolved over three decades, beginning with Richard S. Sutton's Dyna algorithm in 1990, which integrated learning, planning, and reaction mechanisms. This laid the theoretical groundwork for its application in robotics [7]. - The transition to practical applications began in 2018 with the publication of the "World Models" paper, which demonstrated the potential of World Models in complex dynamic environments through deep learning techniques [9]. - Since 2019, advancements in computational power and multimodal technologies have accelerated the development of World Models, leading to their integration into real-world applications, such as Tesla's Full Self-Driving (FSD) system and Xiaopeng Motors' training environments [10]. Impact on the Robotics Industry - The industrialization of World Models addresses key challenges in traditional robotics, such as data scarcity and high training costs. For instance, World Models can generate vast amounts of virtual scenarios from minimal real data, significantly reducing training expenses [12]. - World Models enable large-scale training scenarios, allowing for comprehensive testing across diverse conditions, which enhances safety and reliability in robotics applications [13][15]. - The cognitive leap provided by World Models allows robots to make human-like decisions, improving their adaptability in complex environments and expanding their application value [15]. Challenges in Industrialization - Despite the potential of World Models, challenges remain, including the need for improved memory and generalization capabilities to handle long-duration tasks in complex environments [16]. - There are still fundamental differences between simulation and reality, particularly in aspects like texture, dynamic consistency, and non-deterministic events, which can affect performance during real-world deployment [18]. - Ethical considerations, such as decision-making transparency and data privacy, are critical as the complexity of World Models increases [18]. Future Trends - The integration of World Models with multimodal technologies is expected to enhance robots' environmental understanding and predictive capabilities, leading to more reliable and generalized performance [19]. - The evolution towards end-to-end solutions centered around World Models will reduce reliance on manual rules and high-precision maps, streamlining development processes [21]. - The shift towards a cloud-edge collaborative computing architecture will facilitate large-scale scenario simulations and model training, optimizing performance and reducing deployment costs [21]. Conclusion - The development of World Models marks a transformative shift in the robotics industry, addressing traditional challenges and redefining the technological landscape. By 2030, the market for robots equipped with World Models is projected to exceed 3 trillion yuan, with significant contributions from various sectors [22].
智驾将往何处去?第一次自动驾驶圆桌纪实
自动驾驶之心· 2025-11-06 00:04
Core Insights - The article discusses the evolution and current state of the autonomous driving industry, highlighting the experiences and lessons learned from industry experts [4][7][11] - It emphasizes the importance of strategic execution and the need for companies to avoid weaknesses in their operations to succeed in the competitive landscape of autonomous driving [7][11] Group 1: Industry Evolution - The autonomous driving industry has undergone significant changes over the past decade, with early optimism giving way to more realistic approaches focused on Level 2 (L2) automation and safety [5][6] - Experts reflect on the initial hype surrounding RoboTaxi and the subsequent shift towards practical applications and L2 production, marking a more commercially viable direction for the industry [6][7] Group 2: Key Challenges and Lessons - The industry has faced three major challenges: the abandonment of RoboTaxi, ensuring the safety of L2 systems, and transitioning to mass production [7] - Successful companies in the autonomous driving sector must possess strong strategic execution and avoid operational weaknesses, as the delivery chain for autonomous products is complex and lengthy [7][11] Group 3: Technological Perspectives - The discussion includes insights on VLA (Vision-Language-Action) and world models, highlighting their complementary nature in addressing challenges in autonomous driving [8][10] - Experts agree that advancements in AI and the integration of new technologies will continue to shape the future of autonomous driving, with a focus on balancing innovation and safety [10][11] Group 4: Future Opportunities - There is a consensus among experts that the autonomous driving industry still has significant growth potential, particularly in areas like urban navigation and the integration of academic research into practical applications [11] - The ongoing development of AI coding is seen as a tool that can enhance focus on core algorithmic challenges rather than detracting from the industry's competitive edge [11]