Joint Embedding Predictive Architecture (JEPA)
Search documents
杨立昆再联手谢赛宁,英伟达参投,新公司押注「LLM 之后」
3 6 Ke· 2026-03-10 05:17
Core Insights - AMI, founded by Turing Award winner Yann LeCun, has completed a $1.03 billion funding round with a pre-money valuation of $3.5 billion, focusing on world models for AI development [1][4] - The company aims to establish Europe as a third global AI hub, alongside the US and China, with headquarters in Paris and offices in New York, Montreal, and Singapore [3][24] Funding and Investment - The funding round was led by prominent investors including KKR Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, with strategic investors like Nvidia, Toyota Ventures, Temasek, SoftBank, and Mark Cuban participating [4][5] - The diverse investor base reflects a global interest in AI, with a focus on establishing a European presence that is independent of US and Chinese influences [24] Leadership and Team - AMI's Chief Scientific Officer is Sergey Sutskever, a leading expert in AI foundational research, known for his work on diffusion transformers [2][13] - The core founding team includes four members from Meta's FAIR team, indicating a strong background in AI research and development [3][18] Technological Focus - AMI is developing a new generation of AI systems that can understand the world, maintain long-term memory, and perform genuine reasoning and planning, moving beyond the limitations of large language models (LLMs) [8][12] - The company's approach is based on the Joint Embedding Predictive Architecture (JEPA), which emphasizes learning abstract representations of the world rather than merely processing language [10][12] Strategic Vision - LeCun envisions AMI as a platform that will not rely on existing US or Chinese models, aiming to create an open-source AI ecosystem that addresses sovereignty concerns in AI technology [24] - The company is positioned to leverage its unique funding and expertise to innovate in the AI space, focusing on long-term research and product reliability in world models [2][24]
LeCun的JEPA已进化为视觉-语言模型,1.6B参数比肩72B Qwen-VL
机器之心· 2025-12-20 07:00
Core Insights - The article discusses the advancements in the Joint Embedding Predictive Architecture (JEPA) with the introduction of the visual-language model VL-JEPA, developed by a collaborative team from Meta, Hong Kong University of Science and Technology, Sorbonne University, and New York University [2][3]. Group 1: Model Overview - VL-JEPA is the first non-generative model based on the joint embedding predictive architecture that can perform general domain visual-language tasks in real-time [3]. - Unlike traditional visual-language models (VLMs) that generate tokens in an autoregressive manner, VL-JEPA predicts continuous embeddings of target text, focusing on task-relevant semantics while ignoring superficial language variations [4][13]. Group 2: Model Efficiency - The model transforms expensive token generation learning into more efficient latent space semantic prediction, which simplifies the target distribution and enhances the learning process [11][16]. - VL-JEPA can produce continuous target semantic embedding streams with very low latency due to its non-autoregressive nature, making it particularly beneficial for real-time applications like action tracking and scene recognition [17]. Group 3: Performance Comparison - In a comparative study, VL-JEPA demonstrated consistent higher performance in zero-shot description generation and classification while using approximately half the trainable parameters compared to traditional token-generating VLMs, indicating improved learning efficiency [20]. - The selective decoding strategy implemented in VL-JEPA reduced the number of decoding operations by about 2.85 times while maintaining overall output quality as measured by average CIDEr scores [22]. Group 4: Training Phases and Results - The VL-JEPA model undergoes two training phases, with the first phase producing VL-JEPA_BASE, which outperformed models like CLIP and SigLIP2 in average classification accuracy and retrieval recall across eight datasets [23][24]. - The second phase, which involves domain-specific training data, significantly enhances the classification performance of the model, resulting in VL-JEPA_SFT, which approaches the performance of specialized models [25][28]. Group 5: Application and Demonstration - The article includes demonstrations of VL-JEPA's capabilities, such as real-time robot state tracking, showcasing its practical applications in various fields [29].