RandOpt算法 - filings, earnings calls, financial reports, news

RandOpt算法

Search documents

量子位· 2026-03-16 06:11

Core Viewpoint - A new paper from MIT suggests that by simply adding Gaussian noise to pre-trained models, performance can match or even exceed that of traditional tuning algorithms like GRPO/PPO, thus simplifying the tuning process significantly [1][3][7]. Group 1: Findings on Pre-trained Models - The paper reveals that expert models already exist within the weight space of pre-trained models, described as a "Neural Thicket" phenomenon, where small perturbations can uncover task-specific experts [6][9][26]. - The authors propose a method called RandOpt, which involves adding Gaussian noise to large language models and integrating the results, achieving comparable or superior performance in various tasks without complex tuning [7][35]. - Larger models exhibit better performance due to denser regions of effective perturbations surrounding their weights, making it easier to find task-specific improvements [8][16][17]. Group 2: Mechanism of RandOpt - RandOpt operates in two simple steps: randomly perturbing the model parameters to find "expert" versions and then using a voting mechanism to determine the best output from these models [28][32]. - The method allows for testing different noise strengths to ensure a variety of expert types are identified, and it can run multiple models simultaneously on different GPUs for efficiency [33][34]. - Initial results indicate that RandOpt achieves accuracy levels similar to or higher than mainstream tuning methods across various tasks, including language and visual-language models [35][38]. Group 3: Implications and Limitations - The research emphasizes the need for high-quality pre-training, as the effectiveness of RandOpt relies on the model's initial training data [58]. - While RandOpt can enhance performance in specific tasks, it cannot enable the model to learn new skills beyond its pre-trained capabilities [58]. - The approach is best suited for tasks with clear answers, such as structured generation tasks, and may require further refinement for more complex tasks [59].

神经丛林

RandOpt算法

Artificial Intelligence

Qwen2.5模型

神经丛林

RandOpt算法

Artificial Intelligence

Qwen2.5模型

后训练中的RL已死？MIT新算法挑战传统后训练思维，谢赛宁转发

机器之心· 2026-03-15 06:00

机器之心编辑部这一发现对大模型参数空间的理解具有颠覆性意义。早在 2001 年，Schmidhuber 等人提出「随机猜测」不能算作一种有效的学习算法，认为「优秀的解决方案在权重空间中的分布必须极其稀疏」。然而，Gan 和 Isola 的研究揭示了一个反直觉的现象：在完成预训练后，LLM 模型的权重空间实际上形成了一个密集的「神经丛林」（Neural Thickets），这一状态促使简单的随机采样就能发现有效的解决方案。论文指出，预训练模型不仅仅是后训练的「起点」，其权重空间内已潜藏着大量任务专家。随着模型规模的增大，这些专家在权重空间中的分布密度急剧增加，足以让随机扰动和集成方法有效捕捉优越的解决方案。基于这一理论，RandOpt 算法的操作方式非常简单：只需向预训练模型添加单步的高斯噪声（无需任何迭代、学习率或梯度计算），并对多个扰动后的模型副本进行集成。实验结果表明，仅凭这一极简的操作，模型就能够在数学推理、代码生成等复杂任务中达到，甚至超越 PPO 或 GRPO 等传统后训练方法的性能。在当前的 LLM 开发中，后训练阶段通常被视为赋予模型特定能力的关键环节。传统的观点认为，模型 ...