Workflow
JEPA
icon
Search documents
李飞飞和LeCun的世界模型之争
具身智能之心· 2025-11-15 16:03
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Fei Fei, LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [2][22][39]. Group 1: Li Fei Fei's Marble - Li Fei Fei's company, World Labs, has launched its first commercial world model, Marble, which is considered to have significant commercial potential due to its ability to generate persistent, downloadable 3D environments [5][21]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][20]. Group 2: LeCun's JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, focusing on abstract representations that enable robots to predict changes in the environment [22][25]. - JEPA is designed to train robots by capturing essential world states without generating visually appealing images, making it more suitable for robotic training [27][29]. - This model contrasts sharply with Marble, as it prioritizes understanding the structure of the world over visual fidelity [39]. Group 3: Google's Genie 3 - Google DeepMind's Genie 3, launched in August, generates interactive video environments based on prompts, showcasing improvements in long-term consistency and event triggering [31][34]. - Despite its advancements, Genie 3 remains fundamentally a video logic model, lacking the deep understanding of physical laws that LeCun's JEPA provides [35][36]. - The visual quality and resolution of Genie 3 are also limited compared to Marble, which offers high-precision, exportable 3D assets [38]. Group 4: Comparative Analysis - The three world models—Marble, Genie 3, and JEPA—represent different paradigms: Marble focuses on visual representation, Genie 3 on dynamic video generation, and JEPA on understanding the underlying structure of the world [39]. - This creates a "world model pyramid," where models become increasingly abstract and aligned with AI's cognitive processes as one moves up the hierarchy [47][48].
李飞飞和LeCun的世界模型之争
量子位· 2025-11-15 05:00
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Feifei, Yann LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [1][3][42]. Group 1: Li Feifei and Marble - Li Feifei's company, World Labs, has launched its first commercial world model, Marble, which is seen as having significant commercial potential due to its ability to generate persistent, downloadable 3D environments [2][5]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][18][20]. Group 2: Yann LeCun and JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, aiming to enable robots to predict changes in the environment without needing to generate visually appealing images [24][26]. - JEPA focuses on capturing abstract representations of the world that are essential for AI decision-making, making it more suitable for training robots [28][30]. Group 3: Google and Genie 3 - Google DeepMind's Genie 3, launched in August, allows users to generate interactive video environments with a single prompt, addressing long-term consistency issues in generated worlds [32][35]. - Despite its dynamic capabilities, Genie 3 is still fundamentally a video logic model and lacks the deeper understanding of physical laws that JEPA provides, making it less effective for robotic training [38][40]. Group 4: World Model Pyramid - The article categorizes the three world models into a pyramid structure: Marble as the interface, Genie 3 as the simulator, and JEPA as the cognitive framework, illustrating their varying levels of abstraction and suitability for AI training [53][54]. - As one moves up the pyramid, the models become more abstract and aligned with AI's cognitive processes, while those at the bottom are more visually appealing but harder for robots to comprehend [54].
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
LeCun在Meta的最后论文?还是共同一作,LeJEPA:JEPAs理论拼图补完
机器之心· 2025-11-14 01:33
机器之心报道 编辑:+0 这可能是 LeCun 在 Meta 发表的最后几篇论文之一。这次,LeCun 为 JEPA 架构补上了关键的理论拼图。 学习世界及其动态的可操控表征是人工智能的核心。联合嵌入预测架构(JEPAs)是实现这一目标的有前景的蓝图。其核心思想是:通过最大化语义相关视 图(例如图像的不同变换或裁剪)的嵌入之间的一致性,来学习一个有组织且可操作的高维嵌入空间。 然而,当前的 JEPA 训练方法缺乏坚实的理论指导,导致研发过程临时且脆弱。它们面临一个共同的难题: 表征崩溃 (即所有输入都映射到相似的嵌 入)。 为了缓解这种「捷径解」,当今的先进方法严重依赖各种复杂的「启发式方法」,例如:停止梯度、教师-学生网络(及精心调整的 EMA 调度)、非对称 的视图生成、显式的归一化和白化层。 这些机制不仅使训练过程复杂、脆弱,而且对超参数、架构和数据分布非常敏感,并且缺乏坚实的理论保证。 LeCun 团队提出了一个关于 JEPAs 的全面理论,并将其具体化为 LeJEPA ,这是一个精简、可扩展且有理论基础的训练目标。 论文标题:LeJEPA: Provable and Scalable Self-Su ...
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].