Workflow
核心认知能力
icon
Search documents
230个大模型在婴幼儿认知题上集体翻车!揭秘多模态大模型的核心知识缺陷
量子位· 2025-10-10 01:03
Core Insights - The article highlights that while most AI models perform well on complex tasks, they struggle significantly with basic cognitive abilities that humans develop from a young age [1][4]. Core Cognition Benchmark - Researchers created the CoreCognition benchmark, which includes 1503 classic developmental psychology tests covering 12 core cognitive abilities that emerge in early childhood [2][9]. - The benchmark aims to systematically test models on their understanding of fundamental cognitive concepts such as object permanence and intuitive physics [5][9]. Model Performance - A comparison of 230 mainstream models revealed a "core knowledge blind spot," with many models showing significant deficits in basic cognitive abilities, often lagging behind human performance by double-digit percentages [3][4][16]. - The study found that lower-level cognitive abilities (e.g., boundary perception, continuity) are significantly weaker in models compared to higher-level abilities (e.g., intentional understanding, mechanical reasoning) [16][18]. Key Findings - The research identified five key findings regarding the cognitive capabilities of models: 1. Models exhibit systematic shortcomings in foundational "core knowledge" compared to human cognitive development [16]. 2. There is a weak correlation between lower-level abilities and higher-level reasoning, indicating a lack of scaffolding in cognitive development [18]. 3. Core abilities are positively correlated with performance on public benchmarks, suggesting that stronger core knowledge leads to better task performance [20]. 4. Increasing model size does not significantly improve lower-level cognitive abilities, with some abilities even deteriorating as model size increases [22]. 5. Concept Hacking experiments showed that larger models do not necessarily perform better, indicating that mere scaling does not eliminate reliance on shortcuts [24]. Cognitive Instruction and Model Understanding - Cognitive instructions can provide short-term gains in performance, but they do not address the underlying gaps in foundational knowledge [27][29]. - The study suggests that true intelligence relies on understanding the most basic rules of the world, rather than just increasing model parameters [31][32]. Recommendations - The article advocates for a shift in focus from merely scaling models to ensuring that foundational cognitive knowledge is solidified first, emphasizing that core knowledge is multiplicative rather than additive [33][34].
多模态大模型,真的「懂」世界吗?——揭秘 MLLM 的核心知识缺陷
机器之心· 2025-07-28 02:47
Core Insights - The article highlights that Multi-Modal Language Models (MLLMs) exhibit impressive capabilities in high-level visual understanding and reasoning tasks, yet they frequently fail in seemingly simple tasks that even infants can accomplish [1][2] - It questions whether MLLMs lack "core knowledge," which is essential for early human learning, indicating a potential cognitive blind spot in these models [2][5] Research Findings - A study from UC San Diego titled "Core Knowledge Deficits in Multi-Modal Language Models" systematically analyzes the lack of core cognitive abilities in mainstream MLLMs [3][5] - The research reveals that current MLLMs widely lack core cognitive abilities, which cannot be naturally acquired through model scaling [5][12] CoreCognition Framework - The authors developed an innovative multi-modal assessment system called CoreCognition, along with a unique "Concept Hacking" method to test whether models genuinely understand the core knowledge behind tasks or are merely guessing [6][18] - CoreCognition is a large-scale assessment framework focusing on core knowledge, inspired by Piaget's theories of cognitive development, and aims to bridge the gap between cognitive science and AI testing [9][11] Assessment Design - The CoreCognition dataset includes 1,503 image-question pairs and generates 2,530 evaluation data points across 230 mainstream multi-modal models and 11 prompt designs, effectively covering various model scales and instruction comprehension [11] - The assessment is designed to be discriminative, minimizing confounding factors and avoiding text shortcuts, ensuring that models must engage in multi-modal reasoning to arrive at correct answers [11][12] Key Findings on Model Performance - MLLMs show significant deficiencies in basic cognitive tasks, particularly in areas like boundary perception and spatial awareness, performing poorly compared to their understanding of more complex tasks [12][14] - The study indicates that increasing model size does not significantly enhance basic cognitive abilities, and in some cases, larger models perform worse on foundational tasks [16][20] Concept Hacking Methodology - The Concept Hacking method involves creating control and manipulated groups to test models' understanding of core concepts by reversing key features while keeping other conditions constant [18][29] - Results show that many models perform well on standard tasks but fail dramatically when key features are altered, indicating a reliance on superficial learning rather than true understanding [20][30] Implications and Future Directions - The findings suggest that MLLMs lack the foundational cognitive scaffolding that humans use to build higher-level reasoning, posing a fundamental challenge to the current model development path focused on scaling [22][30] - Future directions may include explicitly injecting physical and spatial common sense into pre-training phases, exploring cognitive-guided training mechanisms, and developing more controlled assessments of cognitive abilities [30]