Workflow
神经丛林
icon
Search documents
MIT新研究:大模型加噪声就能替代GRPO/PPO调参
量子位· 2026-03-16 06:11
Core Viewpoint - A new paper from MIT suggests that by simply adding Gaussian noise to pre-trained models, performance can match or even exceed that of traditional tuning algorithms like GRPO/PPO, thus simplifying the tuning process significantly [1][3][7]. Group 1: Findings on Pre-trained Models - The paper reveals that expert models already exist within the weight space of pre-trained models, described as a "Neural Thicket" phenomenon, where small perturbations can uncover task-specific experts [6][9][26]. - The authors propose a method called RandOpt, which involves adding Gaussian noise to large language models and integrating the results, achieving comparable or superior performance in various tasks without complex tuning [7][35]. - Larger models exhibit better performance due to denser regions of effective perturbations surrounding their weights, making it easier to find task-specific improvements [8][16][17]. Group 2: Mechanism of RandOpt - RandOpt operates in two simple steps: randomly perturbing the model parameters to find "expert" versions and then using a voting mechanism to determine the best output from these models [28][32]. - The method allows for testing different noise strengths to ensure a variety of expert types are identified, and it can run multiple models simultaneously on different GPUs for efficiency [33][34]. - Initial results indicate that RandOpt achieves accuracy levels similar to or higher than mainstream tuning methods across various tasks, including language and visual-language models [35][38]. Group 3: Implications and Limitations - The research emphasizes the need for high-quality pre-training, as the effectiveness of RandOpt relies on the model's initial training data [58]. - While RandOpt can enhance performance in specific tasks, it cannot enable the model to learn new skills beyond its pre-trained capabilities [58]. - The approach is best suited for tasks with clear answers, such as structured generation tasks, and may require further refinement for more complex tasks [59].