推理时扰动高熵词，增强LLM性能

Core Insights - The article discusses the emerging research on test-time scaling for large language models (LLMs), highlighting the phenomenon of localized uncertainty during inference, where a small number of high-entropy words significantly impact output correctness [2][20]. Methodology - The research team from Hong Kong University of Science and Technology (Guangzhou) proposed the Minimal Test-Time Intervention (MTI), which includes two main methods: Selective CFG intervention and Lightweight negative-prompt guidance. MTI enhances the reasoning capabilities of LLMs during inference without requiring additional training [3][20]. Selective CFG Intervention - This method aims to reduce the uncertainty of high-entropy words, which often lead to instability in multi-step reasoning. The team found that errors in LLM responses were associated with higher entropy, primarily due to high-entropy words. By applying Classifier-free Guidance (CFG) to these words, the method stabilizes the reasoning process while maintaining efficiency and improving performance [7][8]. Lightweight Negative-Prompt Guidance - This approach reuses the key-value (KV) cache and injects negative prompts to save memory allocation while maintaining a better unconditional space. The team observed that traditional CFG methods required new KV caches, which reduced the efficiency of modern LLM inference accelerators. By treating the unconditional branch as a negative prompt channel, they were able to improve performance while conserving resources [9][10]. Experimental Results - The research team conducted systematic tests across various tasks, including general tasks (Winogrande, MMLU-Pro), coding tasks (Humaneval, Humaneval_plus, LiveCodeBench), and math and science tasks (GPQA-Diamond, MATH500). Results indicated that applying MTI to only 3.5% of high-entropy words on the Qwen3-14B-Reasoning model led to an average performance improvement of 1.58 across all tasks [12][20]. Analysis of Findings - The study revealed that some low-entropy words are resistant to CFG changes, as LLMs are highly confident in their outputs for these words. This indicates that not all words require CFG intervention, and the method primarily affects high-entropy words where the model lacks confidence [17][19]. Conclusion - Overall, the work demonstrates that a small number of high-entropy words can significantly influence the correctness of LLM outputs. The proposed MTI method, which includes Selective CFG intervention and Lightweight negative-prompt guidance, is easy to implement and can be integrated with modern acceleration frameworks and various decoding strategies. This approach not only enhances model performance across numerous tasks but also opens new avenues for exploring the potential of LLMs during the reasoning phase [20].