变异数独 - filings, earnings calls, financial reports, news

变异数独

Search documents

大模型玩不好数独？！Transformer作者初创公司公布排行榜：o3 Mini High“变异数独”正确率仅2.9%

量子位· 2025-05-28 04:22

Core Insights - The article discusses the performance of AI models in solving Sudoku puzzles, revealing that the overall accuracy is only 15%, with the best model achieving just 2.9% accuracy on 9x9 puzzles [1][25]. Group 1: AI Model Performance - Sakana AI introduced a new benchmark called Sudoku-Bench, which tests AI models on various Sudoku puzzles ranging from 4x4 to 9x9 [1][6]. - The leaderboard shows that the top-performing model, O3 Mini High, solved 14% of puzzles, while other models like Gemini 2.5 Pro and Qwen 3 235B A22B achieved 11% and 8% respectively [2][22]. - Even the most advanced models struggled, with many failing to place even a single correct number in the puzzles [21][25]. Group 2: Challenges Faced by AI Models - A significant issue identified is the "memory dependence" of large models, where they rely on memorized solutions rather than logical reasoning [7][8]. - Models often fail to adapt to new rules or unseen patterns, leading to ineffective problem-solving strategies [9][10]. - Traditional Sudoku puzzles may be too simplistic for these models, as they tend to memorize patterns instead of developing creative problem-solving skills [10]. Group 3: Innovative Testing Approach - The Sudoku-Bench includes "variant Sudoku" puzzles that require multi-step reasoning and cannot be solved through memory alone, making them ideal for testing AI reasoning capabilities [11][12]. - The benchmark features both traditional and modern Sudoku problems, with varying difficulty levels [15][16]. Group 4: Company Background - Sakana AI was founded in July 2023 by former Google researchers Llion Jones and David Ha, focusing on generative AI models [24]. - The company has previously released AI models capable of generating academic papers and reviewing AI-generated content [26][29].

大模型推理能力

变异数独

Artificial Intelligence

Artificial Intelligence

Sudoku-Bench

连续思维机器 (CTM)