大型推理模型
Search documents
R-HORIZON:长程推理时代来临,复旦NLP&美团LongCat重磅发布LRMs能力边界探测新范式
机器之心· 2025-10-22 08:46
Core Insights - The article discusses the introduction of R-HORIZON, a benchmark for evaluating and enhancing the long-chain reasoning capabilities of large reasoning models (LRMs) [8][39] - It highlights the limitations of current training and evaluation paradigms, which primarily focus on isolated single-step problems, failing to address the complexities of real-world reasoning scenarios [4][5] Group 1: Background and Motivation - The transition from "single-step reasoning" to "long-chain decision-making" is emphasized as a critical evolution in AI reasoning capabilities [3] - Existing benchmarks like MATH500 and AIME focus on independent problems, which do not reflect the interconnected nature of real-world reasoning tasks [4] Group 2: R-HORIZON Benchmark - R-HORIZON is the first systematic method and benchmark for assessing and enhancing LRMs' long-chain reasoning abilities [8] - It employs a query composition method to transform isolated tasks into complex multi-step reasoning scenarios, allowing for a more accurate evaluation of model capabilities [11] Group 3: Key Findings - A significant performance drop was observed in top models when faced with long-chain reasoning tasks, indicating a "reasoning cliff" where even advanced models struggle [16] - The benchmark includes six representative datasets covering various reasoning tasks, such as mathematical reasoning and code generation [15] Group 4: Mechanisms and Bottlenecks - Three key bottlenecks were identified in current LRMs: limited effective reasoning length, localized reflection mechanisms, and imbalanced thinking budget allocation [20][23] - The analysis revealed that all models experienced significant performance declines as the number of interdependent problems increased, with larger models showing more resilience [21] Group 5: Training and Performance Improvement - R-HORIZON training demonstrated a dual performance enhancement, improving both long-chain task performance and single problem accuracy [30][33] - The training process led to more efficient reasoning lengths and better token budget allocation across multi-step problems, addressing previous imbalances [34][35] Group 6: Future Directions - The launch of R-HORIZON marks a paradigm shift in LRM research, focusing on the extent of reasoning capabilities rather than just problem-solving abilities [39] - The framework is open-sourced, inviting collaboration from global researchers to advance the development of next-generation reasoning models [40]
AI成为数学家得力助手还要多久
Ke Ji Ri Bao· 2025-06-17 01:18
Core Viewpoint - The article discusses the current state and future potential of AI in assisting mathematical research, highlighting both advancements and limitations in AI's capabilities to solve complex mathematical problems. Group 1: AI Advancements in Mathematics - The U.S. Defense Advanced Research Projects Agency (DARPA) launched the "Exponential Mathematics" program to develop AI systems that can significantly enhance mathematical research efficiency [1] - New generation large language models (LLMs) like OpenAI's o3 and Anthropic's Claude 4 Thinking have shown improvements, performing at levels close to excellent high school students in competitions [2] - Google's AlphaProof system combines LLMs with chess AI, achieving results comparable to silver medalists in the International Mathematical Olympiad [2] - The AlphaEvolve model from Google has found solutions to long-standing mathematical and computational problems that outperform existing human methods [2] Group 2: Limitations of AI in Mathematics - Despite impressive performances, experts believe that current AI models lack the capability to assist in genuine mathematical research, as competition problems are more like intellectual games with certain patterns [2] - A test by Epoch AI revealed that LLMs struggled with high-difficulty problems designed to avoid previously seen training data, indicating significant limitations in their problem-solving abilities [3] - AI faces challenges with "super long reasoning chains," where complex problems may require millions of steps to solve, making it difficult for AI to find the correct solutions [5] Group 3: Innovative Approaches and Future Directions - Researchers are developing methods to package multiple steps into "super steps" to tackle complex problems, which has led to breakthroughs in classic unsolved problems [5][6] - The exploration of new mathematical ideas is crucial, and AI tools like AlphaEvolve can generate and refine solutions, allowing for human intervention to provide inspiration [7] - AI is seen as a potential tool for discovering new mathematical objects, but it currently lacks true creativity, with significant innovations still attributed to human mathematicians [8]