NeurIPS 2025 | DynaAct：DeepSeek R1之外，探索大模型推理的另一条道路

Core Insights - The article discusses the emergence of a new paradigm in large model reasoning, shifting from train-time scaling to test-time scaling (TTS), emphasizing the need for efficient inference rather than merely longer reasoning chains [3][10]. - The research team from Ant Group and the University of Hong Kong introduces DynaAct, a novel approach that focuses on dynamic action space optimization to enhance reasoning efficiency [4][7]. Group 1: DynaAct Overview - DynaAct is based on the principle of Action Space Optimization, which dynamically constructs a set of selectable actions at each reasoning step, allowing for more structured and efficient inference [7][11]. - The core idea of DynaAct is to transform the action space learning problem into a set selection problem, utilizing submodular optimization to achieve linear complexity algorithms [14]. Group 2: Methodology and Implementation - DynaAct employs a submodular function that includes utility and diversity components, measuring the similarity of the action space to the current state and the redundancy of actions within the action space [14]. - The implementation of DynaAct is supported by a high-performance Monte Carlo Tree Search (MCTS) framework, which significantly enhances the efficiency of node expansion, rollout, and reward calculation [19]. Group 3: Performance and Results - DynaAct outperforms traditional methods such as CoT, RAP, and rStar across six reasoning benchmarks, demonstrating the effectiveness of dynamic action spaces [21]. - Evaluation results indicate that DynaAct achieves a score of 70.22 on the MMLU benchmark, surpassing other models, and shows a stable test-time scaling trend with increased MCTS rollout iterations [22][25]. Group 4: Future Directions - The research team plans to explore the extension of dynamic action spaces to multi-agent planning scenarios and to combine submodular optimization with reinforcement learning for adaptive reasoning strategies [26].