抽象推理
Search documents
Loop-ViT:让AI学会「反复思考」,3.8M参数小模型追平人类平均水平
机器之心· 2026-02-12 10:08
当我们解一道复杂的数学题或观察一幅抽象图案时,大脑往往需要反复思考、逐步推演。然而,当前主流的深度学习模型却走的是「一次通过」的路线—— 输入数据,经过固定层数的网络,直接输出答案。 这种前馈式架构在图像分类等感知任务上表现出色,但面对需要 多步推理 的抽象问题时,却显得力不从心。最典型的例子就是「ARC-AGI 基准测试」 ——一个被认为是衡量 AI 抽象推理能力的「试金石」。 近日,来自香港科技大学、中科院自动化所、UC Santa Cruz 的研究团队提出了「 Loop-ViT 」,首次将循环 Transformer 引入视觉推理领域。这个仅 有 18M 参数 的模型,在 ARC-AGI-1 基准上达到了「65.8%」的准确率,超越了参数量高达 73M 的 VARC 集成模型。更令人惊讶的是,其 3.8M 的小 型版本也能达到 60.1% 的准确率,几乎追平人类平均水平(60.2%)。 什么是 ARC-AGI? 为什么它如此困难? ARC-AGI(Abstraction and Reasoning Corpus)是由 Keras 之父 François Chollet 提出的抽象推理基准。与 Image ...
AI秒破18世纪“天书”账本,谷歌新模型盲测刷屏全网
3 6 Ke· 2025-11-12 10:44
Core Insights - Google has potentially solved two longstanding challenges in AI with a mysterious model that successfully recognized and corrected a 200-year-old merchant's handwritten ledger, showcasing advanced reasoning capabilities that astonished historians [1][3][15]. Group 1: Model Performance - The mysterious model achieved near-perfect performance in handwritten text recognition (HTR) and corrected a formatting error in the original ledger, indicating its ability to understand the logic and context behind the text [3][15]. - The model's performance in HTR reached human expert-level accuracy, with a strict Character Error Rate (CER) of 1.7% and a Word Error Rate (WER) of 6.5% on a challenging test set [13][15]. - Compared to previous models, the new Gemini model demonstrated significant improvements, with Gemini-2.5-Pro showing a 50-70% enhancement over earlier versions [11][15]. Group 2: Historical Context and Challenges - Recognizing historical handwriting requires not only visual recognition but also an understanding of the historical context, making it a complex task for AI models [5][8]. - The model's ability to accurately transcribe difficult historical documents, including those with ambiguous numbers and inconsistent styles, marks a significant advancement in AI capabilities [19][23]. - The model successfully interpreted complex historical currency and measurement systems, showcasing its potential for abstract reasoning and contextual understanding [23][24]. Group 3: Expert Validation - Historian Mark Humphries utilized the model to test its capabilities, emphasizing that the final accuracy in historical text recognition is crucial for practical use [8][9]. - The model's performance in transcribing a ledger with non-standard formats and mixed languages was particularly impressive, as it corrected errors and inferred missing context [20][23]. - Humphries noted that the model's ability to perform multi-step reasoning and contextual inference suggests a shift towards genuine understanding in AI, beyond mere pattern recognition [24].