Workflow
世界模型
icon
Search documents
内行被外行指导、时刻担心被裁,Meta 人现在迷茫又内卷
AI前线· 2025-11-16 05:33
整理 | 褚杏娟 据《金融时报》报道,全球社交媒体巨头 Meta 的首席人工智能科学家、图灵奖得主 Yann LeCun 计划离开公司,并着手创办一家 AI 初创企业。 知情人士透露,这位被誉为"深度学习先驱"的法裔美籍科学家已在内部沟通中表示,将在未来几个月内离职。LeCun 已经开始与潜在投资者进行早 期接触。 无论是 Meta 还是 LeCun 本人,目前都未对此消息发表评论。这位图灵奖得主尚未透露离职时间表,其新公司的具体方向也暂无公开信息。他在 纽约大学的教授职位将保持不变。 LeCun 的离职发生在扎克伯格全力重塑 Meta 人工智能战略的关键时期。扎克伯格已逐步将重心从 LeCun 自 2013 年起领导的 Meta 基础 AI 研究 实验室(FAIR)那种长期研究型工作,转向更快速地推出模型和 AI 产品。他认为 Meta 在竞争中已经落后。 不意外的离开 LeCun 的离开并不令人意外。过去几个月,他对 Meta 内部的一些变化愈发感到不满。据报道,他尤其反感公司新出台的内部研究发表规定,即研 究成果在对外发布前必须经过更严格的内部审查。多名团队成员认为,这一政策限制了学术自由。 Meta ...
李飞飞和LeCun的世界模型之争
具身智能之心· 2025-11-15 16:03
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Fei Fei, LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [2][22][39]. Group 1: Li Fei Fei's Marble - Li Fei Fei's company, World Labs, has launched its first commercial world model, Marble, which is considered to have significant commercial potential due to its ability to generate persistent, downloadable 3D environments [5][21]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][20]. Group 2: LeCun's JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, focusing on abstract representations that enable robots to predict changes in the environment [22][25]. - JEPA is designed to train robots by capturing essential world states without generating visually appealing images, making it more suitable for robotic training [27][29]. - This model contrasts sharply with Marble, as it prioritizes understanding the structure of the world over visual fidelity [39]. Group 3: Google's Genie 3 - Google DeepMind's Genie 3, launched in August, generates interactive video environments based on prompts, showcasing improvements in long-term consistency and event triggering [31][34]. - Despite its advancements, Genie 3 remains fundamentally a video logic model, lacking the deep understanding of physical laws that LeCun's JEPA provides [35][36]. - The visual quality and resolution of Genie 3 are also limited compared to Marble, which offers high-precision, exportable 3D assets [38]. Group 4: Comparative Analysis - The three world models—Marble, Genie 3, and JEPA—represent different paradigms: Marble focuses on visual representation, Genie 3 on dynamic video generation, and JEPA on understanding the underlying structure of the world [39]. - This creates a "world model pyramid," where models become increasingly abstract and aligned with AI's cognitive processes as one moves up the hierarchy [47][48].
李飞飞和LeCun的世界模型之争
量子位· 2025-11-15 05:00
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Feifei, Yann LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [1][3][42]. Group 1: Li Feifei and Marble - Li Feifei's company, World Labs, has launched its first commercial world model, Marble, which is seen as having significant commercial potential due to its ability to generate persistent, downloadable 3D environments [2][5]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][18][20]. Group 2: Yann LeCun and JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, aiming to enable robots to predict changes in the environment without needing to generate visually appealing images [24][26]. - JEPA focuses on capturing abstract representations of the world that are essential for AI decision-making, making it more suitable for training robots [28][30]. Group 3: Google and Genie 3 - Google DeepMind's Genie 3, launched in August, allows users to generate interactive video environments with a single prompt, addressing long-term consistency issues in generated worlds [32][35]. - Despite its dynamic capabilities, Genie 3 is still fundamentally a video logic model and lacks the deeper understanding of physical laws that JEPA provides, making it less effective for robotic training [38][40]. Group 4: World Model Pyramid - The article categorizes the three world models into a pyramid structure: Marble as the interface, Genie 3 as the simulator, and JEPA as the cognitive framework, illustrating their varying levels of abstraction and suitability for AI training [53][54]. - As one moves up the pyramid, the models become more abstract and aligned with AI's cognitive processes, while those at the bottom are more visually appealing but harder for robots to comprehend [54].
李飞飞「世界模型」正式开放,人人可用, Pro版首月仅7元
36氪· 2025-11-14 13:36
Core Insights - The article discusses the launch of Marble, a world model by World Labs, which allows users to create immersive 3D environments using a single image or text prompt [2][3][4] - The concept of "spatial intelligence" is highlighted as a key focus for the next decade of AI development, as articulated by Li Feifei [6][7][70] Group 1: Product Features - Marble enables the generation of persistent, downloadable 3D environments, distinguishing it from other real-time models [21] - Users can upload 2D images or 3D models (with a fee) to create worlds, achieving high-quality visuals akin to AAA games [11][13] - The platform includes AI-native editing tools and a mixed 3D editor, allowing users to construct spatial frameworks and fill in visual details [23][50] Group 2: Creative Control - Marble supports multi-image prompts, allowing for more creative control and higher precision in world creation [39][43] - Users can input multiple images or short videos to generate 3D worlds that incorporate real-world elements [44] - The editing process is iterative, enabling users to refine and modify generated worlds extensively [46][47] Group 3: Export Options - Marble offers various export options, including high-quality mesh and video formats, facilitating integration into downstream projects [54][62] - The system can generate both low-precision collision meshes for physical simulations and high-quality visual meshes [59][61] Group 4: Pricing Structure - Marble has a tiered pricing model with three levels: a free version allowing limited world generation, a standard version at $20 per month, and a pro version at $95 per month for up to 75 worlds [82][84] - The pro version offers significant credits for actions and commercial rights, enhancing its appeal for professional users [87]
空间智能系列之三:物理AI:数字孪生、具身智能实现基石
Investment Rating - The report maintains a positive outlook on the Physical AI industry, indicating it as a key driver for the next wave of AI development [3][4]. Core Insights - Physical AI is a systematic engineering approach that integrates spatial intelligence and world models, enabling AI to interact with the physical world [3][11]. - The implementation of Physical AI relies on three technological pillars: world models, physical simulation engines, and embodied intelligent controllers [17][21]. - NVIDIA has established a comprehensive ecosystem in the Physical AI space, leveraging its "chip-algorithm-platform" strategy to create a competitive advantage [3][4]. - Digital twins represent the most mature application of Physical AI, allowing industries to optimize production lines and reduce costs through high-fidelity virtual models [3][48]. - The most promising applications of Physical AI are in intelligent driving and embodied intelligence, with various models like end-to-end, VLA, and world models being explored [3][60]. Summary by Sections 1. Physical AI: The Next Wave of AI - Physical AI signifies a transition from virtual to real-world applications, focusing on understanding and interacting with physical laws [11][12]. - The core structure of Physical AI can be simplified into spatial intelligence, world models, and Physical AI as an integrative system [12][16]. 2. Applications of Physical AI: Understanding the World and Predicting the Future - Physical AI is rapidly moving towards large-scale commercial applications, enhancing efficiency and creating new business models across various industries [47]. - Digital twins serve as a critical tool for industrial digital transformation, enabling real-time simulation and control of physical assets [48][52]. - Intelligent driving and embodied intelligence are identified as key areas where Physical AI can significantly impact [47][60]. 3. Physical AI Industry Chain Analysis - The industry chain of Physical AI shows clear value distribution, with significant changes across various segments including chips, data supply, algorithms, and applications [4][3]. - Key players in the industry include NVIDIA, Qualcomm, and various companies involved in data acquisition and algorithm development [3][4]. 4. Core Targets and Related Companies - Core targets in the Physical AI industry include companies like Zhiwei Intelligent, Tianzhun Technology, and Desay SV [3][4]. - Companies involved in data supply and algorithm development are also highlighted, indicating a diverse investment landscape [3][4].
李飞飞长文火爆硅谷
投资界· 2025-11-14 08:01
Core Insights - The article emphasizes that spatial intelligence is the next frontier for AI, which can revolutionize creativity, robotics, scientific discovery, and more [6][10][14] - It outlines the three core capabilities that a world model must possess: generative, multimodal, and interactive [4][18][19] Group 1: Importance of Spatial Intelligence - Spatial intelligence is foundational to human cognition and influences how individuals interact with the physical world [11][14] - Historical examples illustrate how spatial intelligence has driven significant advancements in civilization, such as Eratosthenes' calculation of the Earth's circumference and Watson and Crick's discovery of DNA structure [12][13] Group 2: Current Limitations of AI - Current AI models, particularly large language models (LLMs), lack the spatial reasoning capabilities that humans possess, limiting their effectiveness in understanding and interacting with the physical world [15][16] - Despite advancements, AI struggles with tasks like estimating distances and navigating environments, indicating a fundamental gap in spatial understanding [15][16] Group 3: Future Directions for AI Development - The development of world models is essential for creating AI that can understand and interact with the world in a human-like manner [18][24] - World models should be capable of generating consistent virtual worlds, processing multimodal inputs, and predicting future states based on actions [18][19][20] Group 4: Applications of Spatial Intelligence - The potential applications of spatial intelligence span various fields, including creativity, robotics, science, medicine, and education [34][35] - In creative industries, tools like World Labs' Marble platform enable creators to build immersive experiences without traditional design constraints [28][29] - In robotics, spatial intelligence can enhance machine learning and human-robot collaboration, making robots more effective in various environments [30][31] Group 5: Vision for the Future - The article envisions a future where AI enhances human capabilities rather than replacing them, emphasizing the importance of aligning AI development with human needs [26][36] - The ultimate goal is to create machines that can understand and interact with the physical world, thereby improving human welfare and addressing significant challenges [38]
“读万卷书”不如“行万里路”!芯原股份掌舵人戴伟民详解AI芯片下一站:端侧推理与场景落地
Xin Lang Zheng Quan· 2025-11-14 04:08
专题:专题|2025上海证券交易所国际投资者大会 炒股就看金麒麟分析师研报,权威,专业,及时,全面,助您挖掘潜力主题机会! 11月12日-13日,上交所国际投资者大会举行。芯原股份董事长兼总裁戴伟民表示,当前,AI定制化芯 片(AI ASIC)需求正显著增长。面对国内外客户日益增长的AI ASIC需求,这位半导体行业的资深舵 手为我们勾勒了芯原的战略布局与未来机遇。 GPU和AI ASIC相辅相成,各有聚焦 端侧推理崛起,"数字树叶"蕴藏巨大商机 目前,行业对算力需求清晰地划分为"云"与"端"。戴伟民形象地指出,云上大规模训练如同粗壮的"树 干",而真正产生海量价值空间的,是生长在树干上的"枝繁叶茂"。 "端上主要做推理和微调两种AI计算工作。"他解释道,在手机、汽车、智能眼镜、物联网设备等终端上 进行模型的推理和微调(如在医疗、金融、教育等垂直领域进行模型优化),是未来AI落地和商业化 的关键。 赋能终端:从智能眼镜到AI玩具,芯原的"端侧"实践 戴伟民以多个实例展示了端侧AI的广阔前景。他提到,在智能手机上,通过AI 相关定制芯片,可以实 现远超当前的拍照效果、画质优化和功耗控制。他特别看好AI在智能眼镜 ...
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
王振辉接替胡伟出任京东物流CEO;滴滴自动驾驶出海首站落地阿布扎比 | 早资道
Sou Hu Cai Jing· 2025-11-14 01:12
Group 1 - JD Logistics appoints Wang Zhenhui as CEO, replacing Hu Wei, effective November 13, 2025 [2] - Hu Wei resigns as CEO of JD Logistics to take on other roles within JD Group [2] Group 2 - Didi Chuxing establishes its first international presence for autonomous driving in Abu Dhabi, partnering with the Abu Dhabi Investment Office [3] - The collaboration focuses on innovation in autonomous driving technology, AI talent development, and ecosystem building [3] - Plans to expand operations throughout the Middle East [3] Group 3 - Alibaba Cloud's Bailian announces a price reduction for the Tongyi Qianwen 3-Max model starting November 13, 2025 [4] - Batch invocation will be charged at half price, with implicit caching at 20% of the standard input token price [4] - Explicit caching will charge 125% for creating cache tokens, with subsequent hits costing only 10% [4] Group 4 - Tencent's President Liu Chiping addresses the agreement with Apple regarding a 15% fee on purchases of WeChat mini-games during Q3 earnings call [5] - Liu emphasizes the strong relationship and ongoing discussions between Tencent and Apple to enhance the mini-game ecosystem [5] Group 5 - Stanford professor Fei-Fei Li's startup World Labs launches the first commercial world model, Marble [6] - Marble supports large-scale multimodal capabilities, allowing the creation of 3D worlds from various inputs [6] - Users can interactively edit, expand, and combine worlds using Marble [6]
港科大等团队提出WMPO:基于世界模型的VLA策略优化框架
具身智能之心· 2025-11-14 01:02
Core Insights - The article introduces WMPO (World Model-based Policy Optimization), a framework developed by Hong Kong University of Science and Technology and ByteDance Seed team, which enhances sample efficiency, task performance, generalization ability, and lifelong learning through pixel-level video generation for VLA (Vision-Language-Action) models [5][25]. Research Background and Pain Points - Existing solutions struggle to balance scalability and effectiveness, with human intervention requiring continuous supervision and high costs for adapting simulators to diverse scenarios [4]. - Traditional latent space world models misalign with web-scale pre-trained visual features, failing to fully leverage pre-trained knowledge [4] [6]. Core Framework Design - WMPO's logic is based on generating trajectories in an "imagination" space using high-fidelity pixel-level world models, replacing real environment interactions and supporting stronger on-policy reinforcement learning [5][11]. - The iterative process follows "imagination trajectory generation → trajectory sampling evaluation → policy update" [5]. Key Modules - **Generative World Model**: Simulates dynamic changes between the robot and the environment, generating visual trajectories aligned with VLA pre-trained features [8]. - **Lightweight Reward Model**: Automatically assesses the success or failure of imagined trajectories, providing sparse reward signals to avoid complex reward shaping [9]. - **On-Policy Policy Optimization (GRPO)**: Adapts Group Relative Policy Optimization for sparse reward scenarios, balancing stability and scalability [10]. Core Innovations - **Pixel Space Priority**: Directly generates trajectories in pixel space, perfectly matching VLA pre-trained visual features and maximizing the value of pre-trained knowledge [11]. - **Trajectory Generation Logic**: Predicts action blocks based on initial frames and language instructions, generating subsequent frames iteratively [12]. - **Dynamic Sampling Strategy**: Generates multiple imagined trajectories from the initial state, filtering out all-success or all-failure trajectories to ensure effective training samples [12]. Experimental Validation and Key Results - In simulation environments, WMPO outperformed baseline methods (GRPO, DPO) across four fine manipulation tasks, achieving an average success rate of 47.1% with a rollout budget of 128, and 57.6% with a budget of 1280, demonstrating superior sample efficiency [13][14]. - In real environments, WMPO achieved a success rate of 70% in a "block insertion" task, significantly higher than baseline strategies [15]. Emergent Behaviors - WMPO exhibits self-correcting capabilities, autonomously adjusting actions in response to failure states, unlike baseline strategies that continue erroneous actions until timeout [17]. Generalization Ability - WMPO demonstrated an average success rate of 29.6% in out-of-distribution scenarios, outperforming all baseline methods, indicating its learning of general operational skills rather than false visual cues [19][20]. Lifelong Learning - WMPO showed stable performance improvement through iterative collection of trajectories, while DPO struggled with instability and required more expert demonstrations [23]. Conclusion and Significance - WMPO establishes a new paradigm for VLA optimization by integrating world models with on-policy reinforcement learning, addressing high costs and low sample efficiency in real environment interactions. It enhances performance, generalization, and lifelong learning capabilities, paving the way for scalable applications in general robotic operations [25].