世界模型(world model)
Search documents
AI大家说 | 哈佛&MIT:AI能预测,但它还解释不了“why”
红杉汇· 2025-10-22 00:06
Core Insights - The article discusses a significant experiment conducted by Harvard and MIT to explore whether large language models (LLMs) can learn a "world model" or if they merely predict the next word based on probabilities [3][4][5] - The experiment utilized orbital mechanics as a testing ground, aiming to determine if AI could derive Newton's laws from its predictions of planetary motion [4][5] - The findings revealed that while AI models could accurately predict planetary trajectories, they did not encode the underlying physical laws, indicating a disconnect between prediction and explanation [6][10] Group 1: Experiment Design and Findings - The research team trained a small Transformer model on 10 million simulated solar system coordinates, totaling 20 billion tokens, to assess its ability to utilize Newton's laws for predicting planetary movements [8] - The results showed that the AI model could generate precise trajectory predictions but relied on specific situational heuristics rather than understanding the fundamental laws of physics [10][11] - The study also highlighted that the AI's predictions could not be generalized to untrained scenarios, demonstrating a lack of a stable world model [10][11] Group 2: Implications for AI Development - The research raises questions about the fundamental limitations of AI models, particularly regarding their ability to construct a coherent world model necessary for scientific discovery [11][12] - The article suggests that while LLMs are not entirely useless, they are currently insufficient for achieving scientific breakthroughs [13] - Future AI development may require a combination of larger models and new methodologies to enhance their understanding and predictive capabilities [13][14] Group 3: Philosophical Considerations - The article reflects on a classic scientific debate: whether the essence of science lies in precise predictions or in understanding the underlying reasons for phenomena [14] - It emphasizes the importance of developing AI that can not only predict but also comprehend the logic of the world, which will determine its ultimate impact on scientific history [14]
世界模型:机器能否理解现实?
3 6 Ke· 2025-10-20 13:01
Core Concept - The article discusses the concept of "world models" in artificial intelligence (AI), which are internal representations of the environment that AI systems use to evaluate predictions and decisions before executing tasks [1][4]. Group 1: Definition and Importance of World Models - World models are considered essential for building intelligent, scientific, and safe AI systems, as emphasized by leading figures in deep learning [1]. - The idea of a world model has historical roots, dating back to Kenneth Craik's 1943 proposal of a "small-scale model" in the brain that allows organisms to simulate various scenarios [2]. Group 2: Historical Context and Evolution - Early AI systems like SHRDLU demonstrated the use of world models but struggled with scalability and complexity in real-world environments [3]. - The rise of machine learning and deep learning has revitalized the concept of world models, allowing AI to build internal approximations of environments through trial and error [3]. Group 3: Current Challenges and Perspectives - Despite the potential of world models, there is still a lack of consensus among researchers regarding their definition, content, and verification methods [2]. - Current generative AI models, such as large language models (LLMs), exhibit heuristic rules but lack a coherent and unified world model, leading to inconsistencies in their outputs [4][6]. Group 4: Future Directions and Research Focus - Researchers are exploring how to develop robust and verifiable world models, which could enhance AI's reliability and interpretability [6][7]. - There are differing opinions on how to create these models, with some suggesting that sufficient multimodal training data could naturally lead to their emergence, while others advocate for entirely new architectures [7].