ACL 2025｜为什么你设计的 Prompt 会成功？新理论揭示大模型 Prompt 设计的奥秘与效能

Core Insights - The article discusses the importance of prompt design in enhancing the performance of large language models (LLMs) during complex reasoning tasks, emphasizing that effective prompts can significantly improve model accuracy and efficiency [2][7][36] - A theoretical framework is proposed to quantify the complexity of the prompt search space, transitioning prompt engineering from an empirical practice to a more scientific approach [5][35] Group 1: Prompt Design and Its Impact - The effectiveness of prompt engineering has historically been viewed as somewhat mystical, with certain combinations yielding significant performance boosts while others fall short [7] - Prompts serve as critical "selectors" in the chain of thought (CoT) reasoning process, guiding the model in extracting relevant information from its internal hidden states [12][36] - The study reveals that the choice of prompt templates directly influences the reasoning performance of LLMs, with optimal prompt designs leading to performance improvements exceeding 50% [29][36] Group 2: Theoretical Framework and Experimental Evidence - The research introduces a systematic approach to finding optimal prompts by breaking down the CoT reasoning process into two interconnected search spaces: the prompt space and the answer space [22][35] - Experimental results demonstrate that the introduction of CoT mechanisms allows LLMs to perform recursive calculations, which are essential for tackling multi-step reasoning tasks [26][30] - The study highlights that well-designed prompts can effectively dictate the output of each reasoning step, ensuring that only the most relevant information is utilized for subsequent calculations [28][36] Group 3: Limitations and Future Directions - The article notes that relying solely on generic prompts can severely limit the model's performance on complex tasks, indicating the need for tailored prompt designs [36] - Variants of CoT, such as Tree-of-Thought (ToT) and Graph-of-Thought (GoT), can enhance performance but are still constrained by the underlying prompt templates used [32][33] - The findings underscore the necessity for a deeper understanding of task requirements to design prompts that effectively guide LLMs in extracting and utilizing core information [23][35]