Workflow
世界模型
icon
Search documents
“世界模型”竞赛升级:Runway推出GWM-1,实时交互可持续数分钟之久
Hua Er Jie Jian Wen· 2025-12-13 10:36
Core Insights - Runway has launched its first General World Model (GWM-1), entering the competitive "world simulation" arena dominated by giants like Google and Nvidia [1] - GWM-1 is designed to understand physical laws, geometric structures, and environmental dynamics, focusing on "coherence" and "interactivity" [1] - The model consists of three specialized autoregressive models: GWM-Worlds, GWM-Robotics, and GWM-Avatars, all built on Runway's latest Gen-4.5 base model [3] GWM-Worlds - GWM-Worlds allows users to interactively explore digital environments, predicting the next frame based on user inputs [4] - It generates environments at 24fps and 720p resolution, enabling real-time changes in camera angles and environmental conditions [4] - The model aims to provide a training ground for AI agents to navigate and act in the physical world [4] GWM-Robotics - GWM-Robotics addresses the challenge of data scarcity in robotics by generating high-quality synthetic data for various environmental scenarios [6] - This approach significantly reduces training costs and helps predict compliance risks before deploying robots in real-world settings [6] - Runway is actively engaging with robotics companies and offering GWM-Robotics through SDK to expand its B2B industrial client base [6] GWM-Avatars - GWM-Avatars integrates video generation with voice interaction, enabling digital avatars to engage in long-duration conversations without quality loss [8] - If successful, this technology could disrupt customer service and online education sectors [8] Base Model Evolution and Computational Power - Runway has upgraded its Gen-4.5 model to enhance its video generation capabilities, supporting one-minute video generation with consistent character portrayal and native dialogue [10] - The company has partnered with CoreWeave to utilize Nvidia's cloud infrastructure for model training and inference, addressing the computational demands of world simulation [10] Strategic Expansion - Runway's strategy is rapidly evolving from a creative tool for film to a simulator for robotics, but it faces stiff competition from established players like Google and Nvidia [11] - The ability to prove its worth as an "AI architect" in the physical world will be crucial for Runway's valuation and future growth [11]
GAIR 2025 世界模型分论坛:从通用感知到视频、物理世界模型的百家争鸣
雷峰网· 2025-12-13 09:13
" 具身智能爆发第三年,世界模型凝聚了哪些共识? " 作者丨 张进 吴彤 梁丙鉴 刘欣 齐铖湧 编辑丨 林觉民 马晓宁 13 日,第八届 GAIR 全球人工智能与机器人大会世界模型分论坛圆满成功。 这场的演讲嘉宾是在世界模型领域,研究不同方向的五位青年学者,他们带来了五场围绕世界模型的精彩 演讲,话题聚焦通用感知、三维技术、物理模型、世界模型、数字人重建。通过他们的演讲、我们得以窥 见当下围绕着世界模型的研究是多么广泛与丰富。 目前,世界模型的研究尚处于起步阶段,共识尚未形成,有关该领域的研究形成了无数支流,而这股潮流 中,今天到场的几位嘉宾,用他们的智慧和力量给世界模型领域研究带来了不同的启发。 浙江大学研究员彭思达:面向具身智能的通用空间感知技术 在"世界模型"分论坛上,首位演讲者是浙江大学研究员彭思达。他是浙江大学软件学院"百人计划"研究 员、博士生导师,研究方向为三维计算机视觉和计算机图形学。此次他带来的主题演讲是《面向具身智能 的通用空间感知技术》,介绍了其团队近期在赋予机器人通用感知能力方面的多项工作。 团队主要聚焦于赋予机器人三项基础能力:一是相机定位(Camera Pose Estimatio ...
港中深韩晓光:3DGen,人类安全感之战丨GAIR 2025
雷峰网· 2025-12-13 09:13
Core Viewpoint - The article discusses the importance of understanding the underlying principles of world models, emphasizing that relying solely on data-driven approaches ("炼丹") is insufficient for creating effective AI systems. It advocates for the integration of human-understandable structures and logic into AI models to enhance their interpretability and reliability [2][63]. Group 1: Development of 3D Generation - The evolution of 3D generation has transitioned from early attempts at creating 3D models from single images to the current era of large models capable of generating high-quality 3D content from textual descriptions [7][16]. - The emergence of "open world" 3D generation began around 2023 with the Dreamfusion project, which allowed for the generation of 3D models without category restrictions, marking a significant shift in the field [11][12]. - Current trends in 3D generation focus on achieving finer details, structured outputs for easier editing, and better alignment between generated models and input images [19][20]. Group 2: Challenges and Opportunities in 3D Generation - The article highlights a dilemma faced by the 3D generation field, particularly in light of advancements in video generation technologies that can produce content without the complex 3D modeling processes [24][28]. - Despite the rise of video generation, 3D content creation retains its value due to its ability to provide physical realism, spatial consistency, and detailed control over content [29][34]. - The potential crisis for 3D generation lies in the increasing capabilities of video generation models, which are beginning to exhibit controllable features, raising questions about the necessity of 3D in future content creation [34][38]. Group 3: The Role of 3D in World Models - The article categorizes world models into three types: macro models for societal understanding, personal experience models for exploration, and embodied models for machine intelligence, with 3D being essential for interactive virtual environments [43][44][45]. - For embodied intelligence, understanding human interaction with the physical world necessitates 3D modeling to accurately capture and simulate these interactions [48][50]. - The transition from digital to physical manufacturing processes, such as 3D printing, underscores the foundational role of 3D data in creating tangible products [52]. Group 4: Technical Approaches in AI - The article contrasts explicit and implicit approaches in AI development, with explicit methods relying on clear geometric and physical modeling, while implicit methods depend on data-driven neural networks [56][57]. - The need for explainability in AI systems is emphasized, suggesting that a balance between performance and interpretability is crucial for user trust and safety [58][63]. - The discussion concludes that 3D and 4D modeling are vital for providing a comprehensible framework for understanding complex AI systems, thereby enhancing user confidence [59][63].
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果!理想高管回应宇树王兴兴质疑,多家车企押注的VLA,靠谱吗?
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]
2026 将近,世界模型到底更「世界」了吗?
机器之心· 2025-12-13 02:30
Core Viewpoint - The recent launch of GWM Worlds and GWM Robotics by Runway pushes video generation towards an interactive "world simulation" paradigm, reigniting discussions on the definition and scope of "world models" as interfaces for creation and interaction, simulators for training and evaluation, or cognitive frameworks for reasoning and decision-making [1]. Group 1: Evolution of World Models - Over the past two years, world models have evolved to be considered on par with LLMs in the AGI landscape, transitioning from a narrow definition focused on reinforcement learning to a broader understanding that includes generative modeling [4]. - Initially, world models were seen as internal environment models for agents, predicting future states based on current conditions and actions, allowing for internal simulation and decision-making [5]. - The engineering perspective defined world models as a combination of three capabilities: compressing high-dimensional perception into usable representations, predicting future states over time, and utilizing predictions for planning and decision-making [6]. - By 2024, the understanding of world models expanded to encompass general world evolution modeling, with a trend from language generation to image generation, and ultimately to 3D and world generation [6]. - The boundaries of the world model concept have become more ambiguous, with ongoing debates about the nature of representations, the incorporation of physical laws, and the organization of input relationships [6]. Group 2: Industry Layout and Trends - Major companies are investing in world models, questioning whether they are enhancing their "data engines" or building new frameworks for "spatiotemporal cognition" [3]. - In February 2024, OpenAI referred to the video generation model Sora as "world simulators," emphasizing their ability to learn the three-dimensional structure and physical laws of the real world [6]. - Concurrently, LeCun introduced V-JEPA, which focuses on predicting masked video segments in abstract representation space, allowing for higher training efficiency by discarding unpredictable information [6]. - The current discourse has shifted from whether to develop world models to how to model them, with debates on whether to abstract from pixel levels or to directly operate in abstract spaces [7]. - There is a recognition that existing approaches may only capture partial physical laws, indicating a need for representations of isolated objects and a priori laws of change across space and time to achieve a coherent world model [7]. Group 3: Definition and Ambiguity of World Models - By 2025, world models are positioned alongside LLMs, with companies like Google DeepMind, Meta, and Nvidia shifting focus from pure LLMs to world models, aiming for "Physical AI + superintelligence" due to stagnation in LLM advancements [8]. - The distinction between world models and existing generative AI lies in the former's goal to construct internal representations of environments that include physical, temporal, and spatial dimensions for planning and decision-making [9]. - The term "world model" has become ambiguous, referring to latent states within systems, game-like simulators for training agents, or any content pipeline capable of generating navigable 3D scenes [9]. - An analysis from Entropy Town in November 2025 categorized world models into three technical routes: interface, simulator, and cognitive framework, highlighting the ongoing ambiguity in the field [9].
GAIR 2025 正式开幕:当AI变革行至产业深海,我们又将如何破暗寻光?
雷峰网· 2025-12-12 02:49
" 在模型与算力的潮汐中,智能星火正在汇成产业巨浪,且看AI如 何重构产业生态的万千图景。 " 作者丨徐晓飞 编辑丨包永刚 12月12日的深圳,和世界万千城市一同蛰伏于智能产业爆发的黎明前夜,而一场汇聚前沿洞见的思想盛 会,正在此破土而出。 站在大模型技术深入"产业变革"的关键节点, 第八届 GAIR 全球人工智能与机器人大会 ,正式在深圳博 林天瑞喜来登酒店举办。 大会共开设四个主题论坛与两个闭门会议,聚焦 大模型、AI算力、世界模型、 数据&一脑多形、AI 硬件 等领域的创新脉搏。 这是GAIR大会走过的第八载,也是中国AI产学研投专家群体,对当前科技变革的又一次思想共振与方向校 准。 古有探骊得珠,需持炬而入深海,方可见骊龙颔下之至宝。 对眼下的AI大模型产业变革来说,亦是如此。 要知道,如今的AI大模型浪潮,已从几年前的"技术破壁"迈入了"价值深耕"阶段,愈发如深海骊龙的颔下 之宝,浮于浅水者必不可得。 而始于2016年的GAIR大会便如这枚探海之炬,八载深耕,薪火相传,汇聚前瞻学者与行业先锋的顶尖思 想,既照见了全球 AI 从业者的筚路蓝缕,也照彻了智能纪元从萌芽到勃发的浩荡征程。 GAIR大会至今 ...
商汤AI论坛探索未来智能范式,视觉AI迈入二次增长曲线
Group 1 - The "2025 SenseTime Technology AI Forum" was successfully held, focusing on key topics such as breakthroughs in multimodal large models, embodied intelligence, and industrial intelligence upgrades [1] - SenseTime's CEO Xu Li emphasized that the past decade has seen rapid changes in AI cognition, marking a significant technological wave that is reshaping work across industries [1] - SenseTime aims to leverage Hong Kong's favorable innovation and technology environment to connect national AI strategies with global innovation networks, serving both local and international markets [1] Group 2 - SenseTime's co-founder and Chief Scientist Lin Dahua highlighted challenges in AI industrialization, including reliability, professional data, spatial understanding, and cost [1] - SenseTime is innovating through foundational technologies such as native multimodal fusion architecture and high-efficiency reasoning systems to enhance spatial cognition and real-time interaction capabilities [1] - The forum also discussed how AI can drive deep paradigm shifts in enterprises, with SenseTime's Asia-Pacific business serving nearly 500 clients, 70% of whom maintain long-term partnerships [2] Group 3 - Wang Xiaogang, co-founder of SenseTime and Chairman of Daxiao Robotics, announced the upcoming launch of Daxiao Robotics on December 18, introducing leading technologies and the first domestic open-source "KAIWU" world model 3.0 [2] - The forum emphasized the integration of "model-hardware-scene" ecosystems to promote breakthroughs in embodied intelligence across various applications, including industrial manufacturing and home companionship [2] - SenseTime's Visual AI 2.0, empowered by large language models, transforms real-time video analysis into actionable solutions, marking a new growth phase for visual AI [2]
倒反天罡,Meta抄阿里千问作业,没拿授权
3 6 Ke· 2025-12-11 11:51
Core Insights - Meta has introduced Alibaba's Tongyi Qianwen model to fine-tune its next-generation AI model "Avocado," which aims to compete with GPT-5 and is set for release in Q1 2026 [1][2] - The shift from open-source to closed-source for the Avocado model raises ethical concerns, as it utilizes an open-source model for training but will charge for access [4] Group 1: Meta's AI Strategy - Meta's flagship AI model "Avocado" is seen as a critical response to the underperformance of Llama 4, which has widened the gap between Meta and competitors like OpenAI and Google [2] - The Avocado model will transition from the open-source Llama series to a proprietary model, available only through API and managed services [4] - Meta's new AI leadership under Alexander Wang, who was brought in with a significant investment, is now responsible for the closed-source AI development [6][7] Group 2: Leadership Changes and Industry Dynamics - Yang Li-Kun, the founding figure of Meta AI, recently left the company, which raises questions about the continuity of Meta's AI vision [5] - Alexander Wang, despite his youth and relative inexperience compared to industry veterans, has been given significant authority in Meta's AI initiatives [7] - The competitive landscape is shifting, with Chinese AI models gaining traction globally, as evidenced by Southeast Asia's shift from Meta's Llama to Alibaba's Tongyi Qianwen [8][9] Group 3: Future Projections - Predictions suggest that in ten years, the global AI market may see a dual dominance between Chinese and American technologies, with China's market share potentially increasing from 30% to 40-45% [9] - The competitive dynamics may lead to a scenario where developing regions adopt Chinese AI technologies for cost-effectiveness, while wealthier nations may prefer American solutions for data privacy and ethical considerations [9]
AD智驾的2025年:监管刹车、技术狂飙,“地大华魔”四雄争霸
3 6 Ke· 2025-12-11 09:55
Core Insights - The automotive industry in 2025 has seen a significant shift towards safety and responsibility, moving away from exaggerated claims about autonomous driving technology [1][3] - The Chinese Ministry of Industry and Information Technology has banned the term "autonomous driving," leading to a more realistic portrayal of the technology by car manufacturers [3][5] Industry Developments - The narrative around autonomous driving has changed, with companies now focusing on "assisted driving" and "intelligent driving assistance" instead of "autonomous driving" [3][5] - The industry is characterized by two main trends: advancement in technology and democratization of intelligent driving [5][11] Key Players and Innovations - Xiaopeng Motors has introduced a second-generation VLA model that eliminates the "middleman" in the translation process, allowing for direct machine understanding of physical environments [6][7] - BYD launched the "Tian Shen Zhi Yan" high-level intelligent driving system, targeting the 100,000 yuan market with various versions, including features like highway NOA and automatic parking [11][13] - Geely has also entered the market with its own intelligent driving system, offering multiple versions with varying capabilities [11][13] Competitive Landscape - Tesla's role has evolved, with Chinese companies no longer viewing it as the sole leader in intelligent driving technology [13][14] - Horizon Robotics has gained traction with its end-to-end architecture and aims to make urban NOA widely available, achieving significant market share in the autonomous driving sector [19][21] - DJI's subsidiary, Zhuoyue Technology, has focused on practical applications and has made strides in the European market, showcasing its capabilities in urban NOA [22][24] Strategic Collaborations - Huawei has formed numerous partnerships across the automotive industry, providing comprehensive intelligent driving solutions to various manufacturers [25][28] - Momenta has expanded its collaboration network significantly, working with multiple brands to implement its driving assistance solutions [29][31] Challenges and Future Outlook - Despite advancements, the industry faces challenges related to user trust and the potential for misuse of autonomous driving systems [33][34] - The ongoing evolution of intelligent driving technology is expected to continue, with a focus on making it accessible to a broader market while addressing safety and ethical concerns [35][36]