LeCun亲自官宣！Meta世界模型V-JEPA 2登场！仅用62小时机器人数据，就能实现零样本控制！

Core Viewpoint - Meta has launched V-JEPA 2, an advanced AI system designed to enhance machines' understanding, prediction, and interaction with the physical world, marking a significant step towards building more general AI agents [3][27]. Group 1: V-JEPA 2 Overview - V-JEPA 2 is based on video training and aims to provide deeper physical world understanding and predictive capabilities [3]. - The model has achieved the top ranking in the Hugging Face physical reasoning leaderboard, surpassing GPT-4o [6]. - The training process consists of two phases: unsupervised pre-training using over 1 million hours of video and 1 million images, followed by action-conditioned training [9][10]. Group 2: Model Performance - V-JEPA 2 has demonstrated excellent understanding and prediction capabilities, achieving state-of-the-art results in various action recognition and prediction tasks [12][14]. - The model can perform zero-shot task planning, successfully completing tasks in entirely new environments with a success rate of 65% to 80% for object manipulation [17]. Group 3: World Model Concept - The concept of a world model is introduced, which allows AI to predict the consequences of actions based on an internal simulation of the physical world [21]. - Meta emphasizes the importance of understanding, predicting, and planning as key capabilities for AI's world model [25]. Group 4: New Benchmark Tests - Meta has released three new benchmarks: IntPhys 2, MVPBench, and CausalVQA, to evaluate AI models' understanding of physical laws, causal relationships, and counterfactual reasoning [23]. - These benchmarks highlight the gap between human performance (85%-95% accuracy) and current AI models, including V-JEPA 2 [24]. Group 5: Future Directions - Future efforts will focus on developing hierarchical world models and enhancing multimodal modeling capabilities to improve AI's understanding and predictive abilities [30].