JEPA
Search documents
杨立昆公开“手撕”Meta 内部环境:“LLM 吸光了房间里的空气”,物理世界才是 AGI 的终局
AI科技大本营· 2026-03-30 09:12
Core Viewpoint - The article discusses the limitations of current AI models, particularly in the context of generative video technology, and proposes that the missing piece in AI development is a world model that can learn abstract representations and predict outcomes, with JEPA (Joint Embedding Predictive Architecture) being a potential solution [4][7][12]. Summary by Sections AI's Missing Component - Current AI lacks a significant component, which is a world model capable of learning abstract representations and supporting planning [8][9]. - The evolution of AI has seen two major revolutions: deep learning and large language models (LLMs), with the latter focusing on next-token prediction [9][10]. Limitations of Generative Models - The limitations of LLMs stem from their reliance on next-token prediction, which is not suitable for the unpredictable nature of the real world [7][14]. - Predicting every detail in real-world data, such as video, is fundamentally flawed; instead, the focus should be on learning abstract representations that can support predictions [12][13]. JEPA as a Solution - JEPA aims to find representations that retain input information while being predictive, contrasting with traditional methods that attempt to reconstruct all details [12][13]. - The approach emphasizes that effective modeling requires ignoring many details to retain sufficient structure for predictions [12][13]. Experience and Evidence - Historical experiments indicate that joint embedding methods consistently outperform reconstruction methods in learning representations [16][17]. - The article highlights that the best way to learn representations for natural signals is not through reconstruction but through methods that do not attempt to reconstruct every detail [17]. Transition to AMI Labs - The shift in focus at Meta towards short-term goals and LLMs led to the decision to leave and pursue JEPA at AMI Labs, where the application of these ideas can be explored in areas like industrial process control and robotics [21][22]. Future Directions - The potential for a hierarchical JEPA model is discussed, which would allow for predictions across different time and spatial scales, drawing parallels with concepts in physics [23]. - The article suggests that understanding complex systems, such as economic models, may benefit from a data-driven approach similar to JEPA, focusing on higher-level abstractions [26][27].
“世界模型”到底是什么?
虎嗅APP· 2026-03-08 03:04
Core Viewpoint - The article discusses the concept of "world models" in AI, emphasizing their potential to enable machines to understand, predict, and interact with the world, moving towards achieving Artificial General Intelligence (AGI) [4][6]. What is a World Model? - The definition of a world model is still evolving, but it is rooted in the idea that humans use mental models to predict outcomes based on their understanding of the world [7][8]. - World models are essential for AI to achieve true intelligence, allowing machines to simulate and predict the consequences of their actions [10][12]. - The concept has been explored since the 1940s, with significant developments in AI and reinforcement learning leading to the formalization of world models in recent years [9][17]. - A world model consists of three core components: observation of the world, prediction of future states, and learning to act within an internal representation of the world [18][24]. Why Study World Models? - World models differ from large language models (LLMs) in their objectives, training data, and outputs, focusing on dynamic understanding and interaction with the environment [28][30]. - The limitations of LLMs have prompted a renewed interest in world models, as they are seen as a necessary step towards achieving AGI [32][40]. - The emergence of multi-modal technologies has made it feasible to train effective world models, which require vast amounts of visual and action data [44][46]. Current Approaches to World Models - The industry is exploring various approaches to world models, which can be categorized into three layers: foundational theories, representation forms, and training objectives [49][50]. - The focus on world generation is crucial, as it lays the groundwork for understanding how the world evolves over time and how AI can interact with it [54][56]. - Two main technical routes for world generation are video generation and 3D spatial generation, each with its own advantages and challenges [56][70]. Impact on Key Industries - The robotics industry stands to benefit significantly from world models, as they can enable robots to understand and predict their environment, enhancing their adaptability and functionality [106][109]. - In autonomous driving, world models can improve the ability of systems to predict future scenarios, addressing current limitations in perception and decision-making [110][113]. - Wearable devices can evolve from simple data recorders to intelligent companions that understand and interact with the user's environment, fundamentally changing human-device relationships [114][116].
倒计时3周离职,LeCun最后警告:硅谷已陷入集体幻觉
3 6 Ke· 2025-12-16 07:11
Core Viewpoint - LeCun criticizes the obsession with large language models (LLMs) in Silicon Valley, asserting that this approach is a dead end and will not lead to artificial general intelligence (AGI) [1][3][26] Group 1: Critique of Current AI Approaches - LeCun argues that the current trend of stacking LLMs and relying on extensive synthetic data is misguided and ineffective for achieving true intelligence [1][3][26] - He emphasizes that the real challenge in AI is not achieving human-like intelligence but rather understanding basic intelligence, as demonstrated by simple creatures like cats and children [3][12] - The focus on LLMs is seen as a dangerous "herd mentality" in the industry, with major companies like OpenAI, Google, and Meta all pursuing similar strategies [26][30] Group 2: Introduction of World Models - LeCun is advocating for a different approach called "world models," which involves making predictions in an abstract representation space rather than relying solely on pixel-level outputs [3][14] - He believes that world models can effectively handle high-dimensional, continuous, and noisy data, which LLMs struggle with [14][12] - The concept of world models is tied to the idea of planning, where the system predicts the outcomes of actions to optimize task completion [14][12] Group 3: Future Directions and Company Formation - LeCun plans to establish a new company, Advanced Machine Intelligence (AMI), focusing on world models and maintaining an open research tradition [4][5][30] - AMI aims to not only conduct research but also develop practical products related to world models and planning [9][30] - The company will be global, with headquarters in Paris and offices in other locations, including New York [30] Group 4: Perspectives on AGI and AI Development Timeline - LeCun dismisses the concept of AGI as meaningless, arguing that human intelligence is highly specialized and cannot be replicated in a single model [31][36] - He predicts that significant advancements in AI could occur within 5-10 years, potentially achieving intelligence levels comparable to dogs, but acknowledges that unforeseen obstacles may extend this timeline [31][33] Group 5: Advice for Future AI Professionals - LeCun advises against pursuing computer science as a primary focus, suggesting instead to study subjects with long-lasting relevance, such as mathematics, engineering, and physics [45][46] - He emphasizes the importance of learning how to learn and adapting to rapid technological changes in the AI field [45][46]
李飞飞和LeCun的世界模型之争
具身智能之心· 2025-11-15 16:03
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Fei Fei, LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [2][22][39]. Group 1: Li Fei Fei's Marble - Li Fei Fei's company, World Labs, has launched its first commercial world model, Marble, which is considered to have significant commercial potential due to its ability to generate persistent, downloadable 3D environments [5][21]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][20]. Group 2: LeCun's JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, focusing on abstract representations that enable robots to predict changes in the environment [22][25]. - JEPA is designed to train robots by capturing essential world states without generating visually appealing images, making it more suitable for robotic training [27][29]. - This model contrasts sharply with Marble, as it prioritizes understanding the structure of the world over visual fidelity [39]. Group 3: Google's Genie 3 - Google DeepMind's Genie 3, launched in August, generates interactive video environments based on prompts, showcasing improvements in long-term consistency and event triggering [31][34]. - Despite its advancements, Genie 3 remains fundamentally a video logic model, lacking the deep understanding of physical laws that LeCun's JEPA provides [35][36]. - The visual quality and resolution of Genie 3 are also limited compared to Marble, which offers high-precision, exportable 3D assets [38]. Group 4: Comparative Analysis - The three world models—Marble, Genie 3, and JEPA—represent different paradigms: Marble focuses on visual representation, Genie 3 on dynamic video generation, and JEPA on understanding the underlying structure of the world [39]. - This creates a "world model pyramid," where models become increasingly abstract and aligned with AI's cognitive processes as one moves up the hierarchy [47][48].
李飞飞和LeCun的世界模型之争
量子位· 2025-11-15 05:00
Core Viewpoint - The article discusses the competition among three major players in the AI industry—Li Feifei, Yann LeCun, and Google—regarding the development of world models, highlighting their distinct technological approaches and implications for artificial general intelligence (AGI) [1][3][42]. Group 1: Li Feifei and Marble - Li Feifei's company, World Labs, has launched its first commercial world model, Marble, which is seen as having significant commercial potential due to its ability to generate persistent, downloadable 3D environments [2][5]. - Marble features a native AI world editor called Chisel, allowing users to create and modify worlds with simple prompts, which is particularly beneficial for VR and game developers [7][9]. - However, some experts argue that Marble resembles a 3D rendering model rather than a true world model, as it focuses on visual representation without incorporating the underlying physical laws necessary for robotic training [10][18][20]. Group 2: Yann LeCun and JEPA - LeCun's approach to world models, exemplified by JEPA, emphasizes control theory and cognitive science rather than 3D graphics, aiming to enable robots to predict changes in the environment without needing to generate visually appealing images [24][26]. - JEPA focuses on capturing abstract representations of the world that are essential for AI decision-making, making it more suitable for training robots [28][30]. Group 3: Google and Genie 3 - Google DeepMind's Genie 3, launched in August, allows users to generate interactive video environments with a single prompt, addressing long-term consistency issues in generated worlds [32][35]. - Despite its dynamic capabilities, Genie 3 is still fundamentally a video logic model and lacks the deeper understanding of physical laws that JEPA provides, making it less effective for robotic training [38][40]. Group 4: World Model Pyramid - The article categorizes the three world models into a pyramid structure: Marble as the interface, Genie 3 as the simulator, and JEPA as the cognitive framework, illustrating their varying levels of abstraction and suitability for AI training [53][54]. - As one moves up the pyramid, the models become more abstract and aligned with AI's cognitive processes, while those at the bottom are more visually appealing but harder for robots to comprehend [54].
LeCun在Meta的最后一篇论文
3 6 Ke· 2025-11-14 03:04
Core Insights - The article discusses Yann LeCun's recent paper on a self-supervised learning method called LeJEPA, which is seen as his farewell work at Meta as he departs the company [1][33]. - LeJEPA introduces a new framework that enhances predictive performance by ensuring the embedding space follows a specific statistical distribution [2]. Group 1: LeJEPA Framework - LeJEPA is based on isotropic Gaussian embedding and addresses the representation collapse issue in traditional JEPA frameworks, significantly improving model generalization [1][5]. - The framework utilizes Sketched Isotropic Gaussian Regularization (SIGReg) to achieve distribution matching, transforming the problem into a statistical hypothesis test [6][11]. Group 2: Experimental Validation - Extensive experiments were conducted on large architectures such as ViT, ConvNeXt, and ResNet, with models approaching 1 billion parameters [8]. - Results indicate that LeJEPA outperforms existing methods while maintaining training simplicity and robustness, particularly on domain-specific datasets like Galaxy10 and Food101 [10]. Group 3: Statistical Insights - The research highlights that isotropic Gaussian distribution minimizes bias and variance during training, enhancing stability and accuracy in downstream tasks [3][5]. - Non-isotropic distributions lead to higher bias and variance, confirming the superiority of isotropic Gaussian distribution through various experiments [3]. Group 4: Future Directions - Despite LeCun's departure from Meta, it is suggested that he is raising funds to establish a startup focused on advancing his work in world models, indicating ongoing contributions to the AI field [33][34].
LeCun在Meta的最后论文?还是共同一作,LeJEPA:JEPAs理论拼图补完
机器之心· 2025-11-14 01:33
Core Viewpoint - The article discusses the development of LeJEPA, a new self-supervised learning framework that addresses the limitations of existing Joint Embedding Predictive Architectures (JEPAs) by providing a solid theoretical foundation and eliminating reliance on heuristic methods [4][5][8]. Group 1: Theoretical Foundation - The research team established that the optimal embedding distribution for JEPAs is an isotropic Gaussian distribution, which minimizes downstream prediction risk across various tasks [5]. - A novel distribution matching objective called Stochastic Isotropic Gaussian Regularization (SIGReg) was introduced to efficiently enforce the embedding to conform to the ideal isotropic Gaussian distribution [6][8]. - LeJEPA combines the predictive objectives of JEPA with SIGReg, resulting in a statistically optimal solution that mitigates representation collapse [8][9]. Group 2: Practical Implementation - LeJEPA demonstrates simplicity, robustness, and high performance due to its principled theoretical design, which eliminates the need for complex heuristics like gradient stopping and teacher-student networks [9][11]. - The implementation of LeJEPA requires only about 50 lines of code in PyTorch, making it user-friendly and easy to deploy [11][19]. Group 3: Experimental Validation - LeJEPA was tested across over 10 datasets and 60 architectures, achieving or surpassing state-of-the-art results, such as a 79% accuracy on ImageNet-1K with ViT-H/14 [10]. - The framework showed superior performance in domain-specific datasets, outperforming DINOv2-based transfer learning, indicating its capability for in-domain pre-training [10][33]. Group 4: Stability and Scalability - LeJEPA maintains stability across different hyperparameters and architectures, with recommended settings yielding competitive performance even with small batch sizes [24][26]. - The framework's design is architecture-agnostic, allowing it to learn high-quality representations across various model types [26][27]. Group 5: Semantic Structure Emergence - LeJEPA's self-supervised learning successfully emerged semantic structures without explicit supervision, as evidenced by attention patterns that correspond to object boundaries and salient regions [41][43]. - The attention maps demonstrated temporal consistency, enabling unsupervised video segmentation, indicating that the learned features capture both spatial semantics and temporal structure [43].
LeCun在Meta的最后一篇论文
量子位· 2025-11-13 11:52
Core Insights - The article discusses the introduction of LeJEPA, a self-supervised learning method developed by Yann LeCun, marking his farewell from Meta [2][3][4]. - LeJEPA aims to address the representation collapse issue in traditional JEPA frameworks by utilizing isotropic Gaussian embeddings and introducing SIGReg regularization to enhance model generalization [5][6]. Group 1: LeJEPA Overview - LeJEPA is based on isotropic Gaussian embeddings, which effectively mitigate the representation collapse problem and significantly improve model generalization capabilities [5]. - The traditional JEPA framework often encounters representation collapse, where models map all inputs to a single point, hindering the capture of semantic differences [6]. Group 2: Impact of Embedding Distribution - The study analyzed the impact of embedding distribution on bias and variance through ordinary least squares regression, revealing that isotropic Gaussian distribution minimizes both during training [8][9]. - Isotropic Gaussian distribution ensures lower bias and variance compared to non-isotropic distributions, enhancing stability and accuracy in downstream tasks [9][11][13]. Group 3: SIGReg Regularization - SIGReg (Sketched Isotropic Gaussian Regularization) is introduced as a method to achieve distribution matching, transforming the problem into a hypothesis testing framework [15][17]. - It employs a combination of univariate directional tests and Epps-Pulley tests to assess the match between the embedding distribution and the target isotropic Gaussian distribution [16][17]. Group 4: High-Dimensional Challenges - SIGReg addresses computational challenges in high-dimensional spaces by combining SIGReg and predictive loss, ensuring efficient and stable training through mini-batch training [19][21]. - The total loss in LeJEPA is a weighted sum of SIGReg loss and predictive loss, with a hyperparameter λ to balance their contributions [22]. Group 5: Experimental Validation - Extensive experiments on large architectures, including ViT, ConvNeXt, ResNet, MaxViT, and Swin Transformer, demonstrated that LeJEPA outperforms existing methods while maintaining training simplicity and robustness [20][23]. - In domain-specific datasets like Galaxy10 and Food101, LeJEPA surpassed DINOv2-based transfer learning methods when pre-trained directly on target data [24]. Group 6: JEPA Framework Evolution - JEPA (Joint-Embedding Predictive Architecture) has evolved over three years since its introduction by LeCun, focusing on enhancing model expressiveness and reasoning capabilities through joint prediction methods [31][28]. - Unlike generative models, JEPA captures the dependencies between x and y without explicitly generating predictions for y [32]. Group 7: Future Directions - Although LeJEPA signifies the end of LeCun's research at Meta, it does not mark the conclusion of JEPA's development, as LeCun is reportedly raising funds to establish a startup focused on world models [72][71]. - LeCun's departure from Meta, while not entirely graceful, reflects a significant period of achievement in AI research, contributing to the field's advancement [74][79].