核心认知能力 - filings, earnings calls, financial reports, news

核心认知能力

Search documents

量子位· 2025-10-10 01:03

CoreCognition团队投稿量子位 | 公众号 QbitAI 一篇被Yann LeCun转发的ICML 2025研究给了多模态大模型当头一棒—— 大部分AI在复杂任务上表现很好，但在人类从小就会的基础认知能力上却很拉垮。研究者建了测评题库 CoreCognition ，覆盖在人类婴幼儿阶段即出现的12种核心认知能力（如客体永恒、视角采择、直觉物理、知觉恒常等），用来对模型进行系统性测试。在CoreCognition基准的1503道"经典发展心理学测验"上，230个主流模型系统暴露出对世界常识的"核心知识盲区"。在归一化准确率对比中，多模态大模型在基础核心认知能力上普遍落后，差距往往达到两位数，即便规模更大也难以弥补。这是否意味着MLLM（多模态大模型）的先天认知结构中，缺少那些支撑早期人类学习的基础知识机制？也就是说，它们是否缺乏"core knowledge"（核心认知能力）？构建CoreCognition Benchmark 来自加州大学圣地亚哥分校、约翰霍普金斯大学、埃默里大学、北卡罗来纳大学教堂山分校、斯坦福大学、卡内基梅隆大学等机构的研究人员，花费一年时间构造并开 ...

多模态大模型，真的「懂」世界吗？——揭秘 MLLM 的核心知识缺陷

机器之心· 2025-07-28 02:47

Core Insights - The article highlights that Multi-Modal Language Models (MLLMs) exhibit impressive capabilities in high-level visual understanding and reasoning tasks, yet they frequently fail in seemingly simple tasks that even infants can accomplish [1][2] - It questions whether MLLMs lack "core knowledge," which is essential for early human learning, indicating a potential cognitive blind spot in these models [2][5] Research Findings - A study from UC San Diego titled "Core Knowledge Deficits in Multi-Modal Language Models" systematically analyzes the lack of core cognitive abilities in mainstream MLLMs [3][5] - The research reveals that current MLLMs widely lack core cognitive abilities, which cannot be naturally acquired through model scaling [5][12] CoreCognition Framework - The authors developed an innovative multi-modal assessment system called CoreCognition, along with a unique "Concept Hacking" method to test whether models genuinely understand the core knowledge behind tasks or are merely guessing [6][18] - CoreCognition is a large-scale assessment framework focusing on core knowledge, inspired by Piaget's theories of cognitive development, and aims to bridge the gap between cognitive science and AI testing [9][11] Assessment Design - The CoreCognition dataset includes 1,503 image-question pairs and generates 2,530 evaluation data points across 230 mainstream multi-modal models and 11 prompt designs, effectively covering various model scales and instruction comprehension [11] - The assessment is designed to be discriminative, minimizing confounding factors and avoiding text shortcuts, ensuring that models must engage in multi-modal reasoning to arrive at correct answers [11][12] Key Findings on Model Performance - MLLMs show significant deficiencies in basic cognitive tasks, particularly in areas like boundary perception and spatial awareness, performing poorly compared to their understanding of more complex tasks [12][14] - The study indicates that increasing model size does not significantly enhance basic cognitive abilities, and in some cases, larger models perform worse on foundational tasks [16][20] Concept Hacking Methodology - The Concept Hacking method involves creating control and manipulated groups to test models' understanding of core concepts by reversing key features while keeping other conditions constant [18][29] - Results show that many models perform well on standard tasks but fail dramatically when key features are altered, indicating a reliance on superficial learning rather than true understanding [20][30] Implications and Future Directions - The findings suggest that MLLMs lack the foundational cognitive scaffolding that humans use to build higher-level reasoning, posing a fundamental challenge to the current model development path focused on scaling [22][30] - Future directions may include explicitly injecting physical and spatial common sense into pre-training phases, exploring cognitive-guided training mechanisms, and developing more controlled assessments of cognitive abilities [30]