Meta发布世界模型，被群嘲的开源旧王要反击了

Core Viewpoint - Meta is doubling down on its commitment to AI development, particularly through the launch of its new model V-JEPA 2, which aims to enhance AI's understanding of the physical world and its ability to perform tasks autonomously [1][2][4]. Group 1: Investment and Team Formation - Founder Mark Zuckerberg is personally leading the formation of a "super-intelligent" team, investing heavily in AI and recruiting top scientists from Google and OpenAI with nine-figure sums [2][3]. - Meta's strategy includes open-sourcing its latest model, V-JEPA 2, to further its AI capabilities [3]. Group 2: V-JEPA 2 Model Features - V-JEPA 2 is designed to enable AI to understand the world and possess physical reasoning capabilities, allowing it to perform tasks in unfamiliar environments without extensive training [4][12]. - The model has 1.2 billion parameters and focuses on prediction rather than mere recognition, enabling it to anticipate future events based on observed data [12][13]. Group 3: Training and Capabilities - The training process for V-JEPA 2 consists of two phases: a pre-training phase using over 1 million hours of video and 1 million images, followed by a phase incorporating 62 hours of robot data for action execution [16][20]. - V-JEPA 2 has demonstrated strong capabilities in zero-shot robot planning, successfully executing tasks like grasping and transporting objects in new environments [21][22]. Group 4: Benchmarking and Testing - Meta has introduced three new benchmark tests: IntPhys 2, Minimal Video Pairs, and CausalVQA, to evaluate the model's understanding of physical concepts and causal relationships [25][30]. - The IntPhys 2 test assesses the model's ability to identify violations of physical laws in video sequences, while Minimal Video Pairs challenges the model to discern subtle differences in similar videos [26][33]. Group 5: Future Directions - Meta plans to develop a multi-time-scale hierarchical JEPA model to support complex tasks requiring step-by-step execution, as well as a multi-modal JEPA model that integrates various sensory inputs [40][41]. - The ultimate goal is to advance AI's understanding of causal relationships in the physical world, moving closer to achieving general action intelligence [42].