李飞飞一年前究竟说了啥？怎么又火了

Core Viewpoint - The limitations of large language models (LLMs) in understanding the physical world are highlighted, emphasizing that language is a generated signal dependent on human input, while the physical world is an objective reality governed by its own laws [1][5][19]. Group 1: Language Models and Their Limitations - Language models operate on a one-dimensional representation of discrete tokens, making them adept at handling written text but inadequate for representing the three-dimensional nature of the physical world [12][14]. - The challenge of spatial intelligence lies in extracting, representing, and generating information from the real world, which is fundamentally different from language processing [17][19]. - Experiments show that LLMs struggle with physical tasks, performing poorly compared to human children and specialized robots [22][28]. Group 2: Experimental Findings - In a test using the Animal-AI environment, LLMs could only complete simple tasks, failing at more complex ones even with additional teaching examples [26][27]. - A tool named ABench-Physics was developed to assess LLMs' physical reasoning abilities, revealing that even the best models achieved only a 43% accuracy rate on basic physics problems [30][34]. - Visual tasks further demonstrated the limitations of LLMs, with human accuracy at 95.7% compared to a maximum of 51% for the models [37][41]. Group 3: Philosophical and Future Considerations - The discussion includes perspectives on whether language can sometimes describe reality better than perception and the potential for AI to develop its own language for understanding the physical world [46][47]. - The ongoing development of models based on physical and multimodal understanding indicates a shift towards addressing these limitations [44].