联合嵌入预测架构（JEPA） - filings, earnings calls, financial reports, news

联合嵌入预测架构（JEPA）

Search documents

机器之心· 2026-01-02 05:00

Core Viewpoint - The article discusses a new approach in visual pre-training called Next-Embedding Predictive Autoregression (NEPA), which shifts the paradigm from learning representations to learning models, demonstrating strong performance in visual tasks similar to language models [2][18]. Group 1: NEPA Overview - NEPA is a minimalist approach that predicts the next feature block of an image, akin to how language models predict the next word [20]. - The method utilizes causal masking and stop gradient techniques to ensure stable predictions without requiring complex architectures [17][25]. - NEPA has shown competitive performance on benchmarks like ImageNet-1K, achieving Top-1 accuracy of 83.8% for ViT-B and 85.3% for ViT-L, surpassing several state-of-the-art methods [29]. Group 2: Methodology and Architecture - The architecture employs a standard visual Transformer (ViT) backbone with causal attention masking, directly predicting future image block embeddings based on past embeddings [22]. - Unlike pixel-level reconstruction methods, NEPA does not require a separate decoder, simplifying the model design [22]. - The training process involves segmenting images into patches, encoding them into vectors, and predicting the next patch while preventing the model from "cheating" by using stop-gradient techniques [25]. Group 3: Performance and Applications - NEPA demonstrates strong transfer capabilities, achieving 48.3% and 54.0% mIoU on the ADE20K semantic segmentation task, indicating its ability to learn rich semantic features necessary for dense prediction tasks [29]. - The model can be adapted for various downstream tasks by simply changing the classification head, showcasing its versatility [30]. - Visual analysis reveals that NEPA learns long-range, object-centered attention patterns, effectively ignoring background noise and focusing on semantically relevant areas [37].

图灵奖得主 Yann LeCun：大模型是“死胡同”，下一步押在哪一条路？

3 6 Ke· 2025-11-28 01:43

Core Insights - Yann LeCun, a Turing Award winner, announced his departure from Meta to establish a new company focused on Advanced Machine Intelligence (AMI), marking a significant shift in his career and the AI landscape [1][2] - LeCun criticizes large language models (LLMs), labeling them as a "dead end" for achieving human-like intelligence, emphasizing their lack of real-world understanding and limitations in reasoning and action [3][4] Group 1: Critique of Large Language Models - LeCun argues that while LLMs perform well in language tasks, they do not possess true understanding of the world, lacking common sense and causal reasoning [5][6] - He highlights that the performance of LLMs is reaching a saturation point, where increasing model size does not equate to enhanced intelligence [6][7] - The training data and computational costs are approaching their limits, leading to diminishing returns in understanding [7][8] - LLMs are described as being unable to plan or take action effectively, with LeCun providing examples of how human-like intelligence involves more than just language skills [12][13] Group 2: The Concept of World Models - LeCun proposes that the next generation of AI should focus on building "world models" that allow AI to understand and interact with the physical world [14][15] - He introduces the Joint Embedding Predictive Architecture (JEPA) as a new learning paradigm that contrasts with LLMs by enabling AI to learn from multi-modal inputs and develop an internal representation of the world [16][17] - JEPA emphasizes the importance of action and planning, moving beyond mere language processing to a more holistic understanding of the environment [18][19] Group 3: Diverging Paths in AI Development - Both LeCun and former OpenAI chief scientist Ilya Sutskever are questioning the current trajectory of AI, but they propose different solutions: LeCun focuses on world models, while Sutskever emphasizes safety and control in AI systems [25][26] - The industry is witnessing a shift towards new architectures and approaches, as evidenced by significant investments and developments in embodied intelligence and robotics [34][35] - The future of AI is seen as a marathon rather than a sprint, with both LeCun and Sutskever acknowledging that their proposed directions will take years to mature [38][40] Group 4: Implications for Entrepreneurs and Developers - LeCun's transition signals that larger models do not necessarily equate to better intelligence, highlighting the need for architectural innovation [41] - There are opportunities in vertical applications, particularly in fields requiring physical interaction, such as robotics and autonomous driving [42] - The importance of open-source development is emphasized, as LeCun's new company will continue to support this approach, allowing smaller teams to contribute to new paradigms [43]

Artificial Intelligence

Artificial Intelligence

图灵奖得主LeCun：人类智能不是通用智能，下一代AI可能基于非生成式

量子位· 2025-04-14 09:09

Core Viewpoint - Human intelligence is not general intelligence; it is specialized and evolved to solve survival-related problems, which makes the term AGI (Artificial General Intelligence) misleading [2][18]. Group 1: Next Generation AI - The next breakthrough in AI may come from non-generative models, contrary to the current focus on generative AI [3][14]. - Current AI technologies, such as large language models (LLMs), exhibit limitations in generalization and reasoning capabilities, which are essential for achieving human-like intelligence [20][21]. - To reach human-level intelligence, new technologies must be invented, as the current state of AI is far from this goal [8][10]. Group 2: AI Capabilities - Future AI must possess several key abilities, including world modeling, reasoning, planning, and long-term memory, which are not solely reliant on language [17][22]. - The ability to understand the physical world and adapt to it is crucial for AI to function similarly to biological entities [21][23]. Group 3: Open Source Strategy - Meta's decision to open-source the LLaMA series models is driven by ethical considerations and aims to foster innovation and participation from academia and startups [25][27]. - Open-source strategies are seen as essential for accelerating breakthroughs in AI, as no single company can monopolize all innovations [28][33]. Group 4: Future Directions - Smart glasses are identified as an important direction for the practical application of AI technology [29]. - The future of AI assistants should focus on multi-sensory interaction, specialized virtual assistant teams, and the ability to adapt to user environments [34].

Meta Platforms(US:META)

通用人工智能（AGI）

非生成式AI

联合嵌入预测架构（JEPA）

Artificial Intelligence

Artificial Intelligence

LLaMA系列模型

智能眼镜

杨立昆“砸场”英伟达：不太认同黄仁勋，目前大模型的推理方式根本是错的，token 不是表示物理世界的正确方式｜GTC 2025

AI科技大本营· 2025-03-21 06:35

责编 | 王启隆出品丨AI 科技大本营（ID：rgznai100）黄教主的演讲感觉才没过几天，今年的 GTC 英伟达大会也即将迎来尾声了。而今年比尔·达利则是对话"AI 教父" 杨立昆（Yann LeCun），很有前后呼应的感觉。但 GTC 并不只有黄仁勋和杨立昆，还有许多精彩的演讲与对话，比方说： ………… 接下来的一段时间， CSDN AI 科技大本营将会在「 GTC 2025 大师谈」栏目持续更新这些精华内容的全文整理，尽情期待。比尔·达利自己就在采访杨立昆之后进行了一场演讲，系统性地讲解了英伟达 2024 一整年的四大项目进展，内容干货很多； OpenAI o1 作者诺姆·布朗（Noam Brown）和英伟达的 AI 科学家来了一场对话，他认为现在 AI 圈最需要来一场革命的，就是这些五花八门的基准测试（Benchmark），而且改这个东西还不需要花太多算力资源； 2018 年诺贝尔化学奖得主弗朗西斯·阿诺德（Frances Arnold）围绕 AI for Sciense 还有蛋白质工程进行了一场相当硬核的圆桌对话； UC 伯克利教授彼得·阿比尔（P ...