刚刚，LeCun亲自出镜，Meta推出新世界模型！

Core Insights - Meta is actively pursuing advancements in artificial intelligence, particularly through the establishment of a "Super Intelligence Team" and the introduction of the V-JEPA 2 model, which focuses on video-based training for world modeling and predictive capabilities [2][3][4]. Group 1: Meta's AI Developments - Meta is forming a "Super Intelligence Team" led by Mark Zuckerberg, offering nine-figure salaries to attract talent for the development of general artificial intelligence [3]. - The newly launched V-JEPA 2 model is designed to enhance environmental understanding and predictive abilities, enabling zero-shot planning and robot control in unfamiliar environments [4][5]. - Yann LeCun, Meta's Chief AI Scientist, emphasizes that world models allow AI to understand and predict physical interactions without extensive trial and error, which can significantly impact various applications, including assistive technologies and personalized education [6]. Group 2: V-JEPA 2 Model Specifications - V-JEPA 2 consists of 1.2 billion parameters and is built on the Joint Embedding Predictive Architecture (JEPA), which has shown strong performance in handling images and 3D point clouds [8]. - The model improves upon its predecessor, V-JEPA, by enhancing action prediction and world modeling capabilities, allowing robots to interact with unfamiliar objects and environments [9]. - V-JEPA 2 demonstrates superior performance in various tasks, achieving 100% in planning and robot control tasks and significantly improving action anticipation and understanding benchmarks compared to previous models [12]. Group 3: Training and Performance - The training of V-JEPA 2 involves two phases: a pre-training phase using over 1 million hours of video and 1 million images, followed by action-conditioned training with minimal robot data [21][25]. - The model's ability to predict world states and plan actions is showcased through its performance in tasks such as grasping and placing objects, achieving success rates of 65% to 80% in new environments [26]. - Meta has introduced new benchmarks to evaluate models' understanding of physical interactions, revealing that while V-JEPA 2 ranks first in physical reasoning, there remains a significant gap compared to human performance [28][34]. Group 4: Future Directions - Meta plans to explore hierarchical JEPA models capable of learning and planning across multiple time and space scales, as well as multi-modal models that integrate various sensory inputs for enhanced predictive capabilities [36].