世界模型
Search documents
专家指具身智能大规模落地仍处于早期阶段
Zhong Guo Xin Wen Wang· 2025-12-13 12:33
Core Insights - The current state of embodied intelligence has achieved breakthroughs in both cognitive and physical intelligence, but large-scale implementation is still in its early stages [1][2] - The future direction of embodied intelligence is characterized by ongoing competition and rapid evolution [1] Group 1: Key Issues in the Industry - The first core issue is the debate over model pathways, specifically whether large model paradigms are applicable to robotics. While large models have seen success in language, image, and video domains, it remains unproven if the same paradigm can be directly transferred to robot control [1] - The second core issue is the contention over data training paradigms. Data continues to be a critical bottleneck limiting the leap in robotic capabilities, with various approaches such as mixed data, multimodal data, and world model generation data being explored [1] - The third core issue is the debate over the form factor of robots, questioning whether humanoid robots represent a "true demand." Companies like Tesla and Figure AI are pursuing a fully humanoid approach, while several Chinese companies have introduced "wheel-arm composite robots" this year, emphasizing "engineering feasibility" for scalable commercial applications in the short term [1] Group 2: Future Development Paths - There is a consensus in the industry that enhancing robots' generalization capabilities using large models is essential, but effective application of large models in robotic systems still involves multiple technical pathways [2] - Looking ahead, the introduction of world models based on visual-language-action models (VLA) is expected to significantly enhance the capabilities of large models in robotics by leveraging their understanding, prediction, and reasoning abilities regarding the physical world [2]
“世界模型”竞赛升级:Runway推出GWM-1,实时交互可持续数分钟之久
Hua Er Jie Jian Wen· 2025-12-13 10:36
Core Insights - Runway has launched its first General World Model (GWM-1), entering the competitive "world simulation" arena dominated by giants like Google and Nvidia [1] - GWM-1 is designed to understand physical laws, geometric structures, and environmental dynamics, focusing on "coherence" and "interactivity" [1] - The model consists of three specialized autoregressive models: GWM-Worlds, GWM-Robotics, and GWM-Avatars, all built on Runway's latest Gen-4.5 base model [3] GWM-Worlds - GWM-Worlds allows users to interactively explore digital environments, predicting the next frame based on user inputs [4] - It generates environments at 24fps and 720p resolution, enabling real-time changes in camera angles and environmental conditions [4] - The model aims to provide a training ground for AI agents to navigate and act in the physical world [4] GWM-Robotics - GWM-Robotics addresses the challenge of data scarcity in robotics by generating high-quality synthetic data for various environmental scenarios [6] - This approach significantly reduces training costs and helps predict compliance risks before deploying robots in real-world settings [6] - Runway is actively engaging with robotics companies and offering GWM-Robotics through SDK to expand its B2B industrial client base [6] GWM-Avatars - GWM-Avatars integrates video generation with voice interaction, enabling digital avatars to engage in long-duration conversations without quality loss [8] - If successful, this technology could disrupt customer service and online education sectors [8] Base Model Evolution and Computational Power - Runway has upgraded its Gen-4.5 model to enhance its video generation capabilities, supporting one-minute video generation with consistent character portrayal and native dialogue [10] - The company has partnered with CoreWeave to utilize Nvidia's cloud infrastructure for model training and inference, addressing the computational demands of world simulation [10] Strategic Expansion - Runway's strategy is rapidly evolving from a creative tool for film to a simulator for robotics, but it faces stiff competition from established players like Google and Nvidia [11] - The ability to prove its worth as an "AI architect" in the physical world will be crucial for Runway's valuation and future growth [11]
GAIR 2025 世界模型分论坛:从通用感知到视频、物理世界模型的百家争鸣
雷峰网· 2025-12-13 09:13
" 具身智能爆发第三年,世界模型凝聚了哪些共识? " 作者丨 张进 吴彤 梁丙鉴 刘欣 齐铖湧 编辑丨 林觉民 马晓宁 13 日,第八届 GAIR 全球人工智能与机器人大会世界模型分论坛圆满成功。 这场的演讲嘉宾是在世界模型领域,研究不同方向的五位青年学者,他们带来了五场围绕世界模型的精彩 演讲,话题聚焦通用感知、三维技术、物理模型、世界模型、数字人重建。通过他们的演讲、我们得以窥 见当下围绕着世界模型的研究是多么广泛与丰富。 目前,世界模型的研究尚处于起步阶段,共识尚未形成,有关该领域的研究形成了无数支流,而这股潮流 中,今天到场的几位嘉宾,用他们的智慧和力量给世界模型领域研究带来了不同的启发。 浙江大学研究员彭思达:面向具身智能的通用空间感知技术 在"世界模型"分论坛上,首位演讲者是浙江大学研究员彭思达。他是浙江大学软件学院"百人计划"研究 员、博士生导师,研究方向为三维计算机视觉和计算机图形学。此次他带来的主题演讲是《面向具身智能 的通用空间感知技术》,介绍了其团队近期在赋予机器人通用感知能力方面的多项工作。 团队主要聚焦于赋予机器人三项基础能力:一是相机定位(Camera Pose Estimatio ...
港中深韩晓光:3DGen,人类安全感之战丨GAIR 2025
雷峰网· 2025-12-13 09:13
Core Viewpoint - The article discusses the importance of understanding the underlying principles of world models, emphasizing that relying solely on data-driven approaches ("炼丹") is insufficient for creating effective AI systems. It advocates for the integration of human-understandable structures and logic into AI models to enhance their interpretability and reliability [2][63]. Group 1: Development of 3D Generation - The evolution of 3D generation has transitioned from early attempts at creating 3D models from single images to the current era of large models capable of generating high-quality 3D content from textual descriptions [7][16]. - The emergence of "open world" 3D generation began around 2023 with the Dreamfusion project, which allowed for the generation of 3D models without category restrictions, marking a significant shift in the field [11][12]. - Current trends in 3D generation focus on achieving finer details, structured outputs for easier editing, and better alignment between generated models and input images [19][20]. Group 2: Challenges and Opportunities in 3D Generation - The article highlights a dilemma faced by the 3D generation field, particularly in light of advancements in video generation technologies that can produce content without the complex 3D modeling processes [24][28]. - Despite the rise of video generation, 3D content creation retains its value due to its ability to provide physical realism, spatial consistency, and detailed control over content [29][34]. - The potential crisis for 3D generation lies in the increasing capabilities of video generation models, which are beginning to exhibit controllable features, raising questions about the necessity of 3D in future content creation [34][38]. Group 3: The Role of 3D in World Models - The article categorizes world models into three types: macro models for societal understanding, personal experience models for exploration, and embodied models for machine intelligence, with 3D being essential for interactive virtual environments [43][44][45]. - For embodied intelligence, understanding human interaction with the physical world necessitates 3D modeling to accurately capture and simulate these interactions [48][50]. - The transition from digital to physical manufacturing processes, such as 3D printing, underscores the foundational role of 3D data in creating tangible products [52]. Group 4: Technical Approaches in AI - The article contrasts explicit and implicit approaches in AI development, with explicit methods relying on clear geometric and physical modeling, while implicit methods depend on data-driven neural networks [56][57]. - The need for explainability in AI systems is emphasized, suggesting that a balance between performance and interpretability is crucial for user trust and safety [58][63]. - The discussion concludes that 3D and 4D modeling are vital for providing a comprehensible framework for understanding complex AI systems, thereby enhancing user confidence [59][63].
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:46
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its first version [1] - A bet was placed by Xiaopeng's chairman with the autonomous driving team, aiming to match Tesla's FSD V14.2 performance by August 30, 2026, or face a challenge [1] Group 1: VLA Model and Industry Perspectives - The VLA model is seen as an advanced end-to-end solution, integrating visual perception (V), action execution (A), and a language model (L) to enhance decision-making and environmental understanding [5][11] - The industry has shifted from relying on LiDAR and high-precision maps to adopting AI-driven models like VLA, with a notable divergence in development paths emerging by 2025 [4][11] - Li Auto's VP emphasized the importance of real-world data over model architecture, asserting that VLA is the best solution due to their extensive data collection from millions of vehicles [6][8] Group 2: Diverging Technical Approaches - Huawei's approach focuses on the World Action (WA) model, which bypasses the language processing step, aiming for direct control through visual inputs [8][10] - The World Model concept allows AI systems to simulate the physical world, enhancing predictive capabilities and decision-making in autonomous driving [9][11] - Companies like NIO and SenseTime are also exploring the World Model approach, indicating a broader industry trend [10] Group 3: Future Integration and Evolution - There is a growing trend towards integrating VLA and World Models, with both technologies not being mutually exclusive but rather complementary [11][12] - Xiaopeng's second-generation VLA model aims to combine VLA and World Model functionalities, enhancing data training and decision-making processes [14][15] - The automotive industry anticipates further iterations in autonomous driving technology architecture over the next few years, potentially stabilizing by 2028 [15]
何小鹏立“赌约”:明年8月底前达到特斯拉FSD效果!理想高管回应宇树王兴兴质疑,多家车企押注的VLA,靠谱吗?
Mei Ri Jing Ji Xin Wen· 2025-12-13 06:31
Core Viewpoint - Xiaopeng Motors is set to release its VLA 2.0 (Vision-Language-Action) model in the next quarter, with significant pressure on its development as it is the first version [1] Group 1: VLA Model Development - Xiaopeng's chairman, He Xiaopeng, has made a special bet with the autonomous driving team, promising to establish a Chinese-style cafeteria in Silicon Valley if the VLA system matches Tesla's FSD V14.2 performance by August 30, 2026 [3] - The VLA model is seen as an advanced end-to-end solution, integrating visual perception, action execution, and language processing to enhance decision-making capabilities [7][12] - The VLA model aims to overcome traditional model limitations by incorporating a reasoning chain through language models, enhancing its adaptability to complex driving environments [7][12] Group 2: Industry Perspectives - There is a divergence in the industry regarding the development paths of VLA and world models, with companies like Li Auto and Xiaopeng favoring the VLA approach [6][12] - Li Auto's VP, Lang Xianpeng, emphasizes the importance of real-world data in developing effective autonomous driving systems, arguing that the VLA model is superior due to its data-driven approach [8][9] - Huawei and other companies are pursuing a world model approach, which focuses on direct control through visual inputs without the intermediary language processing [9][10][11] Group 3: Future Integration and Trends - Despite differing opinions, VLA and world models are not mutually exclusive and may increasingly integrate as both technologies evolve [12][17] - The future of autonomous driving technology is expected to see further iterations and stabilization by 2028, with a potential convergence of VLA and world model methodologies [17]
2026 将近,世界模型到底更「世界」了吗?
机器之心· 2025-12-13 02:30
Core Viewpoint - The recent launch of GWM Worlds and GWM Robotics by Runway pushes video generation towards an interactive "world simulation" paradigm, reigniting discussions on the definition and scope of "world models" as interfaces for creation and interaction, simulators for training and evaluation, or cognitive frameworks for reasoning and decision-making [1]. Group 1: Evolution of World Models - Over the past two years, world models have evolved to be considered on par with LLMs in the AGI landscape, transitioning from a narrow definition focused on reinforcement learning to a broader understanding that includes generative modeling [4]. - Initially, world models were seen as internal environment models for agents, predicting future states based on current conditions and actions, allowing for internal simulation and decision-making [5]. - The engineering perspective defined world models as a combination of three capabilities: compressing high-dimensional perception into usable representations, predicting future states over time, and utilizing predictions for planning and decision-making [6]. - By 2024, the understanding of world models expanded to encompass general world evolution modeling, with a trend from language generation to image generation, and ultimately to 3D and world generation [6]. - The boundaries of the world model concept have become more ambiguous, with ongoing debates about the nature of representations, the incorporation of physical laws, and the organization of input relationships [6]. Group 2: Industry Layout and Trends - Major companies are investing in world models, questioning whether they are enhancing their "data engines" or building new frameworks for "spatiotemporal cognition" [3]. - In February 2024, OpenAI referred to the video generation model Sora as "world simulators," emphasizing their ability to learn the three-dimensional structure and physical laws of the real world [6]. - Concurrently, LeCun introduced V-JEPA, which focuses on predicting masked video segments in abstract representation space, allowing for higher training efficiency by discarding unpredictable information [6]. - The current discourse has shifted from whether to develop world models to how to model them, with debates on whether to abstract from pixel levels or to directly operate in abstract spaces [7]. - There is a recognition that existing approaches may only capture partial physical laws, indicating a need for representations of isolated objects and a priori laws of change across space and time to achieve a coherent world model [7]. Group 3: Definition and Ambiguity of World Models - By 2025, world models are positioned alongside LLMs, with companies like Google DeepMind, Meta, and Nvidia shifting focus from pure LLMs to world models, aiming for "Physical AI + superintelligence" due to stagnation in LLM advancements [8]. - The distinction between world models and existing generative AI lies in the former's goal to construct internal representations of environments that include physical, temporal, and spatial dimensions for planning and decision-making [9]. - The term "world model" has become ambiguous, referring to latent states within systems, game-like simulators for training agents, or any content pipeline capable of generating navigable 3D scenes [9]. - An analysis from Entropy Town in November 2025 categorized world models into three technical routes: interface, simulator, and cognitive framework, highlighting the ongoing ambiguity in the field [9].
GAIR 2025 正式开幕:当AI变革行至产业深海,我们又将如何破暗寻光?
雷峰网· 2025-12-12 02:49
" 在模型与算力的潮汐中,智能星火正在汇成产业巨浪,且看AI如 何重构产业生态的万千图景。 " 作者丨徐晓飞 编辑丨包永刚 12月12日的深圳,和世界万千城市一同蛰伏于智能产业爆发的黎明前夜,而一场汇聚前沿洞见的思想盛 会,正在此破土而出。 站在大模型技术深入"产业变革"的关键节点, 第八届 GAIR 全球人工智能与机器人大会 ,正式在深圳博 林天瑞喜来登酒店举办。 大会共开设四个主题论坛与两个闭门会议,聚焦 大模型、AI算力、世界模型、 数据&一脑多形、AI 硬件 等领域的创新脉搏。 这是GAIR大会走过的第八载,也是中国AI产学研投专家群体,对当前科技变革的又一次思想共振与方向校 准。 古有探骊得珠,需持炬而入深海,方可见骊龙颔下之至宝。 对眼下的AI大模型产业变革来说,亦是如此。 要知道,如今的AI大模型浪潮,已从几年前的"技术破壁"迈入了"价值深耕"阶段,愈发如深海骊龙的颔下 之宝,浮于浅水者必不可得。 而始于2016年的GAIR大会便如这枚探海之炬,八载深耕,薪火相传,汇聚前瞻学者与行业先锋的顶尖思 想,既照见了全球 AI 从业者的筚路蓝缕,也照彻了智能纪元从萌芽到勃发的浩荡征程。 GAIR大会至今 ...
商汤AI论坛探索未来智能范式,视觉AI迈入二次增长曲线
Zheng Quan Shi Bao Wang· 2025-12-11 14:03
Group 1 - The "2025 SenseTime Technology AI Forum" was successfully held, focusing on key topics such as breakthroughs in multimodal large models, embodied intelligence, and industrial intelligence upgrades [1] - SenseTime's CEO Xu Li emphasized that the past decade has seen rapid changes in AI cognition, marking a significant technological wave that is reshaping work across industries [1] - SenseTime aims to leverage Hong Kong's favorable innovation and technology environment to connect national AI strategies with global innovation networks, serving both local and international markets [1] Group 2 - SenseTime's co-founder and Chief Scientist Lin Dahua highlighted challenges in AI industrialization, including reliability, professional data, spatial understanding, and cost [1] - SenseTime is innovating through foundational technologies such as native multimodal fusion architecture and high-efficiency reasoning systems to enhance spatial cognition and real-time interaction capabilities [1] - The forum also discussed how AI can drive deep paradigm shifts in enterprises, with SenseTime's Asia-Pacific business serving nearly 500 clients, 70% of whom maintain long-term partnerships [2] Group 3 - Wang Xiaogang, co-founder of SenseTime and Chairman of Daxiao Robotics, announced the upcoming launch of Daxiao Robotics on December 18, introducing leading technologies and the first domestic open-source "KAIWU" world model 3.0 [2] - The forum emphasized the integration of "model-hardware-scene" ecosystems to promote breakthroughs in embodied intelligence across various applications, including industrial manufacturing and home companionship [2] - SenseTime's Visual AI 2.0, empowered by large language models, transforms real-time video analysis into actionable solutions, marking a new growth phase for visual AI [2]
倒反天罡,Meta抄阿里千问作业,没拿授权
3 6 Ke· 2025-12-11 11:51
Core Insights - Meta has introduced Alibaba's Tongyi Qianwen model to fine-tune its next-generation AI model "Avocado," which aims to compete with GPT-5 and is set for release in Q1 2026 [1][2] - The shift from open-source to closed-source for the Avocado model raises ethical concerns, as it utilizes an open-source model for training but will charge for access [4] Group 1: Meta's AI Strategy - Meta's flagship AI model "Avocado" is seen as a critical response to the underperformance of Llama 4, which has widened the gap between Meta and competitors like OpenAI and Google [2] - The Avocado model will transition from the open-source Llama series to a proprietary model, available only through API and managed services [4] - Meta's new AI leadership under Alexander Wang, who was brought in with a significant investment, is now responsible for the closed-source AI development [6][7] Group 2: Leadership Changes and Industry Dynamics - Yang Li-Kun, the founding figure of Meta AI, recently left the company, which raises questions about the continuity of Meta's AI vision [5] - Alexander Wang, despite his youth and relative inexperience compared to industry veterans, has been given significant authority in Meta's AI initiatives [7] - The competitive landscape is shifting, with Chinese AI models gaining traction globally, as evidenced by Southeast Asia's shift from Meta's Llama to Alibaba's Tongyi Qianwen [8][9] Group 3: Future Projections - Predictions suggest that in ten years, the global AI market may see a dual dominance between Chinese and American technologies, with China's market share potentially increasing from 30% to 40-45% [9] - The competitive dynamics may lead to a scenario where developing regions adopt Chinese AI technologies for cost-effectiveness, while wealthier nations may prefer American solutions for data privacy and ethical considerations [9]