世界模型
Search documents
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
王振辉接替胡伟出任京东物流CEO;滴滴自动驾驶出海首站落地阿布扎比 | 早资道
Sou Hu Cai Jing· 2025-11-14 01:12
Group 1 - JD Logistics appoints Wang Zhenhui as CEO, replacing Hu Wei, effective November 13, 2025 [2] - Hu Wei resigns as CEO of JD Logistics to take on other roles within JD Group [2] Group 2 - Didi Chuxing establishes its first international presence for autonomous driving in Abu Dhabi, partnering with the Abu Dhabi Investment Office [3] - The collaboration focuses on innovation in autonomous driving technology, AI talent development, and ecosystem building [3] - Plans to expand operations throughout the Middle East [3] Group 3 - Alibaba Cloud's Bailian announces a price reduction for the Tongyi Qianwen 3-Max model starting November 13, 2025 [4] - Batch invocation will be charged at half price, with implicit caching at 20% of the standard input token price [4] - Explicit caching will charge 125% for creating cache tokens, with subsequent hits costing only 10% [4] Group 4 - Tencent's President Liu Chiping addresses the agreement with Apple regarding a 15% fee on purchases of WeChat mini-games during Q3 earnings call [5] - Liu emphasizes the strong relationship and ongoing discussions between Tencent and Apple to enhance the mini-game ecosystem [5] Group 5 - Stanford professor Fei-Fei Li's startup World Labs launches the first commercial world model, Marble [6] - Marble supports large-scale multimodal capabilities, allowing the creation of 3D worlds from various inputs [6] - Users can interactively edit, expand, and combine worlds using Marble [6]
港科大等团队提出WMPO:基于世界模型的VLA策略优化框架
具身智能之心· 2025-11-14 01:02
Core Insights - The article introduces WMPO (World Model-based Policy Optimization), a framework developed by Hong Kong University of Science and Technology and ByteDance Seed team, which enhances sample efficiency, task performance, generalization ability, and lifelong learning through pixel-level video generation for VLA (Vision-Language-Action) models [5][25]. Research Background and Pain Points - Existing solutions struggle to balance scalability and effectiveness, with human intervention requiring continuous supervision and high costs for adapting simulators to diverse scenarios [4]. - Traditional latent space world models misalign with web-scale pre-trained visual features, failing to fully leverage pre-trained knowledge [4] [6]. Core Framework Design - WMPO's logic is based on generating trajectories in an "imagination" space using high-fidelity pixel-level world models, replacing real environment interactions and supporting stronger on-policy reinforcement learning [5][11]. - The iterative process follows "imagination trajectory generation → trajectory sampling evaluation → policy update" [5]. Key Modules - **Generative World Model**: Simulates dynamic changes between the robot and the environment, generating visual trajectories aligned with VLA pre-trained features [8]. - **Lightweight Reward Model**: Automatically assesses the success or failure of imagined trajectories, providing sparse reward signals to avoid complex reward shaping [9]. - **On-Policy Policy Optimization (GRPO)**: Adapts Group Relative Policy Optimization for sparse reward scenarios, balancing stability and scalability [10]. Core Innovations - **Pixel Space Priority**: Directly generates trajectories in pixel space, perfectly matching VLA pre-trained visual features and maximizing the value of pre-trained knowledge [11]. - **Trajectory Generation Logic**: Predicts action blocks based on initial frames and language instructions, generating subsequent frames iteratively [12]. - **Dynamic Sampling Strategy**: Generates multiple imagined trajectories from the initial state, filtering out all-success or all-failure trajectories to ensure effective training samples [12]. Experimental Validation and Key Results - In simulation environments, WMPO outperformed baseline methods (GRPO, DPO) across four fine manipulation tasks, achieving an average success rate of 47.1% with a rollout budget of 128, and 57.6% with a budget of 1280, demonstrating superior sample efficiency [13][14]. - In real environments, WMPO achieved a success rate of 70% in a "block insertion" task, significantly higher than baseline strategies [15]. Emergent Behaviors - WMPO exhibits self-correcting capabilities, autonomously adjusting actions in response to failure states, unlike baseline strategies that continue erroneous actions until timeout [17]. Generalization Ability - WMPO demonstrated an average success rate of 29.6% in out-of-distribution scenarios, outperforming all baseline methods, indicating its learning of general operational skills rather than false visual cues [19][20]. Lifelong Learning - WMPO showed stable performance improvement through iterative collection of trajectories, while DPO struggled with instability and required more expert demonstrations [23]. Conclusion and Significance - WMPO establishes a new paradigm for VLA optimization by integrating world models with on-policy reinforcement learning, addressing high costs and low sample efficiency in real environment interactions. It enhances performance, generalization, and lifelong learning capabilities, paving the way for scalable applications in general robotic operations [25].
一句话,就能创造出随便乱逛的3D世界!
自动驾驶之心· 2025-11-14 00:04
Core Insights - The article discusses the launch of Marble, a world model developed by WorldLabs, which allows users to create immersive 3D environments using a single image or text prompt [2][3][7]. Group 1: Product Features - Marble enables the generation of persistent, downloadable 3D environments, distinguishing it from other real-time models [28]. - Users can upload 2D images or 3D models (with a fee) to generate worlds, achieving high realism akin to AAA video games [14][16]. - The platform includes AI-native editing tools and a mixed 3D editor, allowing users to construct spatial frameworks and fill in visual details [31]. Group 2: User Experience - The initial testing phase showed impressive results, with the ability to create interactive 3D scenes from a single image [32]. - Users can input multiple images or short videos to create more accurate 3D worlds, enhancing the creative process [48]. - The editing process is iterative, allowing users to modify generated worlds extensively, from minor adjustments to major structural changes [49][50]. Group 3: Pricing and Accessibility - Marble offers three pricing tiers, with the highest tier costing $95 per month for generating up to 75 worlds, while the free version allows for 4 worlds [83][84]. - The Pro version is available for the first month at just $1, with standard pricing at $20 per month [85]. Group 4: Future Implications - The article emphasizes that Marble represents a significant step towards achieving spatial intelligence in AI, which is expected to unlock new applications in simulation and robotics [70][71]. - The integration of interactive capabilities in future world models is highlighted as a key opportunity for enhancing user engagement and application [69].
不用术语看懂世界模型:从日常预测到自动驾驶
自动驾驶之心· 2025-11-14 00:04
Group 1 - The core concept of the article is the definition and function of the "world model," which predicts future scenarios based on past sensory data, similar to how humans anticipate events in daily life [2][3][30] - The world model operates by taking various forms of input, such as images, sounds, and sensor data, and outputs predictions about future states, emphasizing the importance of recognizing patterns and making forecasts [4][30] - The distinction between world models and neural networks is highlighted, where neural networks serve as tools for recognition and imitation, while world models are the core that enables prediction and understanding [5][10][30] Group 2 - The article discusses the limitations of creating a "universal" world model due to the vast differences in rules and requirements across various scenarios, leading to the necessity for specialized models [11][12][30] - Various specialized world models are introduced, including video generation, music generation, game, and industrial production models, each focusing on specific domains to achieve precise predictions [12][14][18][30] - The automatic driving world model is described as the most stringent type, as its predictions directly impact safety, requiring rapid response times and high accuracy [18][22][30] Group 3 - The VLA model is presented as an enhanced version of the automatic driving world model, incorporating language logic to improve the prediction of actions based on user commands and traffic rules [23][26][30] - The article concludes that the future of world models lies in becoming more specialized rather than universal, focusing on improving prediction accuracy and speed in specific scenarios [29][30]
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].
李飞飞的世界模型来了,一句话生成3D世界,AI 真的开始理解现实了
3 6 Ke· 2025-11-13 11:42
Core Insights - The launch of Marble by World Labs marks the first public product in the realm of world models, showcasing significant advancements in spatial intelligence and AI capabilities [1][2][3] Group 1: Marble's Core Capabilities - Marble features three main capabilities: multimodal generation, AI-native world editing, and a practical production workflow [1] - It can reconstruct a complete 3D world from various inputs, including text, images, and videos, allowing for a seamless creative process [4][7] - Users can edit generated worlds similarly to real scenes, enabling continuous refinement and expansion of 3D environments [13][14] Group 2: Applications and Integration - Marble allows for the export of generated worlds into various formats compatible with industry-standard tools like Unreal, Unity, and Blender, facilitating integration into game and film production workflows [15][17] - The platform supports high-quality rendering and video generation, enhancing the usability of created worlds in real-world applications [18][19] Group 3: Theoretical Foundations and Future Implications - The development of Marble is rooted in the concept of spatial intelligence, which is essential for AI to interact with the physical world [20][21] - A mature world model must possess generative, multimodal, and interactive capabilities, which are foundational for future advancements in robotics and scientific research [22][23][24] - Marble's release signifies a step towards achieving comprehensive spatial intelligence, paving the way for future applications in automation and simulation [27]
AI界巨震!图灵奖得主Yann LeCun即将离职Meta,投身「世界模型」创业
机器人圈· 2025-11-13 10:40
Core Viewpoint - The departure of Yann LeCun from Meta signifies a major shift in the AI landscape, highlighting internal strategic disagreements and a pivot in Meta's AI development approach [2][3][4]. Group 1: Departure and Strategic Shift - Yann LeCun, a prominent figure in AI and Meta's Chief AI Scientist, is leaving the company after 12 years, marking a formal split with CEO Mark Zuckerberg over AI strategy [2][3]. - The decision to leave was foreshadowed by increasing disagreements with Meta's management regarding the AI development roadmap and company strategy [3][4]. - Meta's internal restructuring has shifted focus from long-term foundational research led by LeCun's FAIR lab to a more agile product development approach, driven by immediate market needs [4][7]. Group 2: Internal Changes and Leadership Dynamics - Meta has made significant changes, including a $100 million compensation package to attract young talent from competitors, and the formation of a new "superintelligence" team led by 28-year-old Alexandr Wang [4]. - LeCun's reporting structure changed, requiring him to report to Wang instead of the Chief Product Officer, which marginalized his FAIR lab and its research initiatives [4][7]. Group 3: Technological Disagreements - LeCun has publicly criticized the current trend of large language models (LLMs), arguing they are inadequate for achieving true reasoning and planning capabilities, which diverges from Zuckerberg's focus on immediate monetization [7][8]. - The emphasis on "world models," which LeCun advocates, contrasts sharply with the short-term goals set by Meta's leadership, leading to his decision to leave [7][8]. Group 4: Future Aspirations - Post-Meta, LeCun aims to fully commit to developing "world models," which he believes will redefine AI by enabling machines to learn from observing the physical world, akin to human cognitive development [8]. - He predicts that within 3-5 years, "world models" will become the mainstream AI architecture, challenging the current dominance of LLMs [8]. Group 5: Legacy and Impact - LeCun's career has been pivotal in the evolution of AI, having co-developed convolutional neural networks (CNNs) and led the FAIR lab to prominence [9]. - His departure is seen as a significant loss for Meta, indicating a potential shift in the AI research landscape and the company's future direction [9].
数字经济双周报(2025年第20期):科技巨头联手布局,全球AI算力联盟加速成型-20251113
Yin He Zheng Quan· 2025-11-13 09:07
Core Insights - The collaboration between OpenAI and Amazon marks a new phase in the global AI computing landscape, transitioning into a "multi-cloud collaboration" model [1][5][6] - OpenAI's partnership with AWS completes its supply chain in North America, integrating major cloud providers like Microsoft, Oracle, Google, and Amazon into its computing ecosystem [1][6] - The AI industry is returning to a "compute is king" paradigm, with a focus on computational power and capital investment as key competitive advantages [11] Section Summaries 1. Focus of the Report: Expansion of the Computing Alliance with OpenAI and Amazon - OpenAI's collaboration with AWS is seen as the final piece of its computing ecosystem, indicating a comprehensive multi-cloud strategy [5][6] - The partnership is expected to enhance OpenAI's capabilities, with significant investments in GPU resources and infrastructure planned for the coming years [5][6] 2. China Dynamics: Accelerated "Artificial Intelligence +" Initiatives - Chinese government policies are increasingly focused on integrating AI into manufacturing, transportation, and healthcare sectors, with a systematic approach to AI deployment [12][13] - Local policies and industry funds are fostering collaboration across regions, creating new industrial hubs [14][15] - Financial tools and capital markets are aligning to support AI initiatives, indicating a robust investment environment [15] 3. U.S. Dynamics: Parallel Expansion of AI Computing and Regulatory Restructuring - NVIDIA continues to strengthen its dominance in the AI ecosystem, with its market capitalization surpassing $5 trillion, raising concerns about systemic risks [18] - The dual focus on chip and energy sectors by companies like AMD and Google is creating a resonance between computing power and energy supply [19] 4. European Dynamics: Technology Sovereignty and AI Governance - The EU is reshaping its technological sovereignty through initiatives that combine funding and infrastructure development, alongside AI governance [3][4] 5. Technological Frontiers: Rise of World Models and Acceleration of Physical Intelligence - The development of world models and embodied intelligence is pushing AI towards a new era of "physical intelligence" [3][6] 6. Think Tank Insights: IDC's Three Forces Reshaping Future IT Landscape - IDC identifies three major forces driving the transformation of the IT landscape, with AI becoming a core engine of enterprise leadership [3][6]
图灵奖得主杨立昆离职创业,Meta股票蒸发1400亿
Tai Mei Ti A P P· 2025-11-13 08:38
Core Viewpoint - The departure of Yann LeCun, a Turing Award winner and chief scientist at Meta, has caused significant turmoil in the AI industry, leading to a 1.5% drop in Meta's stock price and a market value loss of 140 billion yuan [1][2]. Group 1: Background and Context - Yann LeCun is a foundational figure in deep learning, credited with developing the Convolutional Neural Network (CNN) architecture, which has been pivotal for modern AI advancements [1]. - LeCun's departure is not merely a personal career change but reflects a broader ideological conflict regarding the future direction of AI development, particularly between his vision of "world models" and Meta's focus on Large Language Models (LLMs) [2][3]. Group 2: Internal Dynamics at Meta - Meta has faced challenges in the AI space, with competitors like DeepSeek making breakthroughs in the Mixture of Experts (MoE) architecture, while Meta's own Llama4 model series has received lackluster market feedback [4]. - The company's financial commitment to AI has increased, with capital expenditures for AI reaching 70 billion yuan, and organizational restructuring has led to the establishment of a "Super Intelligence Lab" under new leadership, sidelining LeCun [6][7]. - LeCun's role has shifted from a strategic leader to a symbolic figure within the company, as he now reports to a younger executive and faces restrictions on publishing his team's research [6][7]. Group 3: Ideological Conflict - The ideological rift between LeCun and Meta's leadership became apparent with the emergence of ChatGPT, as Meta was slow to engage with LLM technology, leading to internal dissatisfaction and frustration [8][9]. - LeCun's insistence that LLMs represent a "dead end" in AI development has been a point of contention, as he believes they lack the necessary understanding of the physical world and cannot achieve true AGI [14][16]. - He advocates for a "world model" approach, which emphasizes learning through interaction with the environment rather than solely through text, proposing a modular AI architecture that contrasts with the monolithic nature of LLMs [17].