大模型是「躲在洞穴里」观察世界？强化学习大佬「吹哨」提醒LLM致命缺点

Core Viewpoint - The article discusses the disparity in success between language models (LLMs) and video models, questioning why LLMs can learn effectively from predicting the next token while video models struggle with next-frame predictions [1][5][21]. Group 1 - AI technology is rapidly evolving, leading to deeper reflections on the limits of AI capabilities and the similarities and differences between human brains and computers [2][3]. - Sergey Levine argues that current LLMs are merely indirect "scans" of human thought processes, suggesting that they do not replicate true human cognition but rather mimic it through reverse engineering [5][26]. - The success of LLMs raises questions about the current direction of Artificial General Intelligence (AGI) exploration, indicating a potential need for adjustment in research focus [8][10]. Group 2 - The article highlights that while LLMs have achieved significant success in simulating human intelligence, they still exhibit limitations that warrant fundamental questioning [17][19]. - The core algorithm of LLMs is relatively simple, primarily involving next-word prediction, which leads to speculation about whether this simplicity reflects a universal algorithm used by the human brain [18][24]. - Despite the potential of video models to provide richer information, they have not matched the cognitive capabilities of LLMs, which can handle complex reasoning tasks that video models cannot [21][30]. Group 3 - The article posits that LLMs may not learn about the world through direct observation but rather through analyzing human thought processes reflected in text, leading to a form of indirect learning [26][28]. - This indirect learning method allows LLMs to simulate certain cognitive functions without fully understanding the underlying learning algorithms that humans use [30][32]. - The implications for AI development suggest that while LLMs can imitate human cognitive skills, they may struggle with autonomous learning from real-world experiences, highlighting a gap in achieving true adaptability [36][38].