TTS)
Search documents
微软发布首个测试时扩展大规模研究,还给出了终极指南
机器之心· 2025-12-10 10:30
Core Insights - The article discusses the concept of Test-time Scaling (TTS) for Large Language Models (LLMs), emphasizing the importance of allowing models to "think" longer during inference to improve results [1][2] - Microsoft has conducted a comprehensive study revealing that there are distinct personality traits among models, categorized into "short-sighted" and "long-sighted" groups, which affects their performance based on the TTS strategy employed [2][26] TTS Strategies Overview - TTS strategies for LLMs can be categorized into parallel, sequential, mixed/meta methods, and internal computation mechanisms, with no single strategy being universally optimal [4][11] - The study analyzed eight open-source LLMs with parameter counts ranging from 7 billion to 235 billion, generating over 30 billion tokens across four inference datasets [5] Parallel Scaling Strategy - The parallel scaling strategy aggregates answers from multiple independent sampling paths to enhance performance, with methods like Self-consistency and Best-of-n sampling being widely used [8] - Recent advancements include more principled voting strategies such as weighted majority voting and multi-agent verification [8] Sequential Scaling Strategy - The sequential scaling strategy involves iterative corrections, restarts, or backtracking to deepen reasoning, with techniques like Chain of Thought (CoT) prompting and structured search methods [9] - Models like AlphaGeometry combine symbolic proof search with LLMs for step-level control [9] Mixed Scaling Strategy - The mixed strategy combines elements of both parallel and sequential methods, utilizing meta-reasoners to dynamically select TTS strategies based on perceived task difficulty [10] - Internal scaling strategies modify the model's internal computation during inference without explicitly adjusting the number of samples or reasoning steps [10] Research Findings - The study found that beam search exhibited an inverse-scaling pattern, where increasing the beam size led to a decline in performance for certain model families [16][20] - The correlation between reasoning path length and quality was revealed, indicating that shorter paths are often more effective for short-sighted models, while longer paths may benefit long-sighted models under certain conditions [21][26] Decision Matrix for TTS Strategy - Microsoft developed a practical decision matrix for selecting TTS strategies based on model type, problem difficulty, and computational budget, providing actionable insights for algorithm engineers [38][41] - For short-sighted models, the recommendation is to use majority voting (MV@N) with a large N for high budgets, while for low budgets, FFS with k=1 is suggested [41][42] - Long-sighted models require a more nuanced approach, favoring longer paths for difficult problems and shorter paths for easier ones, with MV@N being a robust choice [46][48]