Core Viewpoint - The article discusses the limitations of current AI models, particularly in the context of generative video technology, and proposes that the missing piece in AI development is a world model that can learn abstract representations and predict outcomes, with JEPA (Joint Embedding Predictive Architecture) being a potential solution [4][7][12]. Summary by Sections AI's Missing Component - Current AI lacks a significant component, which is a world model capable of learning abstract representations and supporting planning [8][9]. - The evolution of AI has seen two major revolutions: deep learning and large language models (LLMs), with the latter focusing on next-token prediction [9][10]. Limitations of Generative Models - The limitations of LLMs stem from their reliance on next-token prediction, which is not suitable for the unpredictable nature of the real world [7][14]. - Predicting every detail in real-world data, such as video, is fundamentally flawed; instead, the focus should be on learning abstract representations that can support predictions [12][13]. JEPA as a Solution - JEPA aims to find representations that retain input information while being predictive, contrasting with traditional methods that attempt to reconstruct all details [12][13]. - The approach emphasizes that effective modeling requires ignoring many details to retain sufficient structure for predictions [12][13]. Experience and Evidence - Historical experiments indicate that joint embedding methods consistently outperform reconstruction methods in learning representations [16][17]. - The article highlights that the best way to learn representations for natural signals is not through reconstruction but through methods that do not attempt to reconstruct every detail [17]. Transition to AMI Labs - The shift in focus at Meta towards short-term goals and LLMs led to the decision to leave and pursue JEPA at AMI Labs, where the application of these ideas can be explored in areas like industrial process control and robotics [21][22]. Future Directions - The potential for a hierarchical JEPA model is discussed, which would allow for predictions across different time and spatial scales, drawing parallels with concepts in physics [23]. - The article suggests that understanding complex systems, such as economic models, may benefit from a data-driven approach similar to JEPA, focusing on higher-level abstractions [26][27].
杨立昆公开“手撕”Meta 内部环境:“LLM 吸光了房间里的空气”,物理世界才是 AGI 的终局