测试时扩展（Test-Time Scaling） - filings, earnings calls, financial reports, news

测试时扩展（Test-Time Scaling）

Search documents

扩展外部测试时Scaling Law，中关村学院新发现：轻量级验证器可解锁LLM推理最优选择

机器之心· 2025-11-06 05:28

Core Insights - The article discusses the concept of Test-Time Scaling (TTS) as a method to enhance the reasoning capabilities of large language models (LLMs) by allocating more computational resources during the model's response phase [4][6] - It introduces the TrajSelector method, a lightweight yet powerful Best-of-N strategy that leverages the hidden states of large models to evaluate reasoning paths without the need for expensive process annotations or large reward models [7][10] Summary by Sections Research Background - TTS is categorized into internal and external methods, with the latter focusing on parallel reasoning to generate multiple paths for a final answer [4][6] Existing Methods and Their Limitations - Traditional Best-of-N methods include Majority Voting and Process Reward Model (PRM), both of which have significant drawbacks such as instability and inefficiency [5][10] TrajSelector Methodology - TrajSelector operates through a three-step pipeline: parallel sampling, step scoring, and aggregation to select the optimal reasoning path [12][14] - It utilizes a lightweight scoring model (0.6B parameters) to assess reasoning steps based on the hidden states of a larger strategy model, achieving better scoring performance with reduced parameter size [13][14] Training Approach - TrajSelector employs a weak supervision training scheme that eliminates the need for extensive manual annotations, allowing the model to learn effectively from large datasets [16][17] Experimental Results - The article presents performance metrics for various N values in Best-of-N tasks, demonstrating that TrajSelector outperforms traditional methods across multiple benchmarks [19][20] Conclusion - TrajSelector offers a significant advancement in optimizing reasoning for large models, emphasizing the importance of effectively utilizing existing model capabilities rather than merely increasing model size [22][23]

测试时扩展（Test-Time Scaling）

Best-of-N范式

人工智能

TrajSelector方法

测试时扩展（Test-Time Scaling）

Best-of-N范式

人工智能

TrajSelector方法

视频生成1.3B碾压14B、图像生成直逼GPT-4o！港科&快手开源测试时扩展新范式

机器之心· 2025-06-10 03:58

论文第一作者为何浩然，香港科技大学二年级博士，他的研究方向包括强化学习、生成流模型（GFlowNets）以及具身智能，通讯作者为香港科技大学电子与计算机工程系、计算机科学与工程系助理教授潘玲。测试时扩展（Test-Time Scaling）极大提升了大语言模型的性能，涌现出了如 OpenAI o 系列模型和 DeepSeek R1 等众多爆款。那么，什么是视觉领域的 test-time scaling？又该如何定义？为了回答这一问题，最近香港科技大学联合快手可灵团队推出 Evolutionary Search （EvoSearch）方法，通过提高推理时的计算量来大幅提升模型的生成质量，支持图像和视频生成，支持目前最先进的 diffusion-based 和 flow-based 模型。EvoSearch 无需训练，无需梯度更新，即可在一系列任务上取得显著最优效果，并且表现出良好的 scaling up 能力、鲁棒性和泛化性。随着测试时计算量提升，EvoSearch 表明 SD2.1 和 Flux.1-dev 也有潜力媲美甚至超过 GPT4o。对于视频生成，Wan 1.3B 也能超过 Wa ...

测试时扩展（Test-Time Scaling）

演化搜索（Evolutionary Search）

人工智能

EvoSearch

测试时扩展（Test-Time Scaling）

演化搜索（Evolutionary Search）

人工智能

EvoSearch