无需SFT也不用RL，样本级推理优化神器SLOT来了，准确率轻松+10%

Core Viewpoint - The article discusses the innovative SLOT (Sample-specific Language Model Optimization at Test-time) method developed by the West Lake University MAPLE lab, which allows language models to "temporarily learn" from specific prompts during inference, leading to significant performance improvements in complex tasks [1][2][10]. Group 1: Methodology - SLOT treats each input prompt as a "mini training data," enabling the model to better understand the specific question before generating an answer [2][10]. - The method is simple, requiring only the optimization of a lightweight parameter vector (delta) at the last layer of the model, with minimal computational overhead (only a 7.9% increase in inference time) [5][12]. - The optimization process involves minimizing cross-entropy loss using the prompt itself as training data, which allows for efficient adaptation without modifying the original model [12][19]. Group 2: Performance Improvements - The Qwen2.5-7B model achieved an accuracy increase from 57.54% to 66.19% on the GSM8K math reasoning task, a rise of 8.65 percentage points [7]. - The DeepSeek-R1-Distill-Llama-70B model reached a new record of 68.69% on the GPQA Diamond task, showcasing the effectiveness of SLOT across various models [7][21]. - In challenging tasks like AIME 2024, multiple models demonstrated improvements exceeding 10% [7][22]. Group 3: Broader Implications - SLOT has shown stable enhancements across different model sizes and types, from 1.5B to 70B parameters, indicating its broad applicability [18][20]. - The method encourages deeper reasoning by adjusting the probability distribution of output vocabulary, promoting thoughtful responses rather than superficial pattern matching [17][19]. - Unlike traditional fine-tuning methods, SLOT does not require extensive training data, complex sampling strategies, or significant computational resources, making it a more accessible option for improving model performance [18][19].