8B模型任务击败GPT-5？阶跃星辰开源Deep Think新框架，小模型解锁百万Token测试时计算

Core Insights - The article discusses the launch of the PaCoRe framework by Jieyue Xingchen, which enables large models to perform parallel coordinated reasoning, overcoming limitations of linear thinking chains and context window sizes [2][3][7]. - The PaCoRe-8B model achieved a score of 94.5 on the HMMT 2025 mathematics benchmark, surpassing GPT-5's score of 93.2, by effectively utilizing up to 2 million tokens during problem-solving [3][23]. PaCoRe Framework - PaCoRe decouples reasoning from context capacity by shifting the focus from "serial depth" to "parallel collaborative breadth," allowing for extensive reasoning capabilities [7]. - The framework's performance is demonstrated through significant test-time scaling, where increasing parallel trajectories and coordinated rounds leads to improved results [9]. Inference Mechanism - The inference process involves iterative message passing, where each round of reasoning starts with a set of compacted messages from the previous round, allowing for extensive parallel exploration [12][13]. - This iterative coordination enables the model to refine its understanding and correct errors over multiple iterations, producing effective test-time computation that exceeds physical context window limits [14]. Training Methodology - The training approach focuses on transitioning the model from isolated reasoning to active collaboration, utilizing outcome-based reinforcement learning to develop reasoning synthesis capabilities [15][16]. - The training data is curated to exclude simple problems solvable by heuristic rules, encouraging the model to develop true collaborative reasoning skills [16]. Performance Evaluation - The PaCoRe-8B model demonstrated superior performance in both mathematics and coding benchmarks, achieving 78.2% on the LiveCodeBench, maintaining competitiveness with larger models [23]. - The emergence of "synthesis" capabilities was tracked through the frequency of cross-checking language features, indicating a significant shift in reasoning dynamics due to reinforcement learning [25]. Future Directions - The team plans to apply PaCoRe to more powerful foundational models, expanding task domains and enhancing both breadth and depth of reasoning [30]. - Future goals include maximizing token intelligence density and exploring emergent multi-agent intelligence through collaborative learning environments [31].