Workflow
概念黑客(Concept Hacking)
icon
Search documents
MLLM集体翻车,缺乏婴儿级常识,业界首个核心认知基准发布,LeCun转赞
3 6 Ke· 2025-08-05 01:45
Core Insights - Current large models lag behind humans by 10-30% in 12 core cognitive areas, indicating a significant gap in foundational knowledge [1][5] - A new evaluation framework, CoreCognition, has been developed to assess these models, emphasizing the need for a solid grasp of basic knowledge before advancing to higher-level intelligence [1][8] Model Performance - In a comprehensive test of 1,503 questions, mainstream models showed substantial deficiencies in common sense, with the best-performing model, InternVL3-78B, scoring only 74.1% in Object Permanence compared to 88.1% for humans, a 14% gap [5][6] - The performance of various models in 12 "kindergarten" tests revealed that they collectively underperformed, with significant discrepancies in areas like Intuitive Physics, where the best model scored 75.45% against 91.52% for humans, a difference of over 16% [5][6] Findings from CoreCognition - Finding 1 highlights a lack of core knowledge in models, suggesting that high-level reasoning is not built on a solid foundation [13][15] - Finding 2 indicates a disconnection between different cognitive abilities, with low-level skills showing little correlation with higher-level reasoning tasks [17] - Finding 3 suggests that core knowledge is beneficial across various tasks, with a strong positive correlation between foundational skills and performance on higher-level tasks [20] - Finding 4 reveals that increasing model parameters does not necessarily enhance core knowledge, as larger models often fail to improve in foundational tasks [22] - Finding 5 shows that larger models tend to rely on shortcuts rather than developing true understanding, indicating a regression in core knowledge as model size increases [23] Research Implications - The research emphasizes the importance of foundational cognitive abilities in AI development, suggesting a shift in focus from merely scaling models to enhancing core knowledge [26] - The study also highlights potential risks in applications like autonomous driving, where a lack of basic understanding could lead to critical errors [26]