Investment Rating - The report maintains a "Positive" investment rating for the industry [1][34]. Core Insights - The report highlights the innovative method of "Sampling-based Search" for optimizing model performance during the inference phase, which significantly enhances reasoning accuracy in complex tasks [7][10]. - It emphasizes the importance of self-verification mechanisms in selecting optimal solutions from a wide range of candidate answers generated through random sampling [13][18]. - The findings suggest that the effectiveness of reasoning is positively correlated with the scale of sampling, indicating that increasing the number of samples improves the probability of generating correct solutions [15][27]. Summary by Sections Section 1: Sampling-based Search and Post-training Scaling - The research introduces a novel approach to inference-time computation that optimizes model performance through a sampling-based search strategy [10]. - The core mechanism involves generating a large number of candidate solutions and filtering them through a self-verification process to retain the best options, thereby improving reasoning precision [7][10]. - Experimental results show that even basic random sampling combined with self-verification can outperform specialized models in challenging benchmark tests [10][15]. Section 1.1: Self-verification Mechanism - The self-verification mechanism operates on the principle of "broad sampling and precise filtering," allowing the model to identify the most accurate solutions from a diverse set of candidates [13][20]. - The process includes a probability coverage mechanism that enhances the likelihood of capturing correct answers through extensive exploration of the solution space [13][14]. Section 1.2: Enhancing Sampling Quality - Two strategies are identified to improve the quality of sampling: direct comparison of candidate solutions to locate errors and task-specific rewriting of outputs for better verification [20][18]. - These strategies aim to enhance the model's ability to identify discrepancies and improve the overall accuracy of the generated solutions [20][18]. Section 2: Scaling Law and Multi-linear Narratives - The report discusses the current advancements in model scaling achieved through breakthroughs in post-training and inference phases, highlighting the potential of sampling-based search as a scalable solution [24]. - It notes that the theoretical limits of brute-force search are only constrained by computational resources, making it a unique and independent scaling method [24]. Section 3: Experimental Validation - The report presents data showing a significant positive correlation between the effectiveness of reasoning and the scale of sampling, reinforcing the dual optimization effect of generating and filtering solutions [15][27]. - The findings indicate that as the number of samples increases, both the probability of generating correct solutions and the rigor of those solutions improve [15][27].
测试时推理:随机采样的“暴力”美学
财通证券·2025-03-26 07:27