MIPROv2

Search documents
当提示词优化器学会进化,竟能胜过强化学习
机器之心· 2025-07-31 08:58
Core Viewpoint - The article discusses the introduction of GEPA (Genetic-Pareto), a new optimization technique that outperforms the GRPO reinforcement learning algorithm by 20% while significantly reducing the number of rollouts to 1/35 of the original [2][39]. Group 1: GEPA Overview - GEPA employs a technique called reflective prompt evolution, which enhances the performance of composite AI systems [2][6]. - The core principles of GEPA include genetic prompt evolution, utilizing natural language feedback, and Pareto-based candidate selection [7][8]. Group 2: GEPA Algorithm - GEPA initializes a candidate pool with parameters from the composite AI system and iteratively proposes new candidates until the evaluation budget is exhausted [12][15]. - The optimization process involves mutation or crossover of existing candidates, allowing GEPA to accumulate learning signals and improve candidate performance over iterations [16][17]. Group 3: Reflective Feedback Mechanism - Natural language trajectories generated during the execution of the composite AI system provide insights into the reasoning steps, enabling diagnostic value for decision-making [19][20]. - GEPA utilizes these trajectories for implicit credit assignment, allowing targeted updates to modules based on their performance [21][22]. Group 4: Candidate Selection Strategy - GEPA employs a Pareto-based candidate selection strategy to avoid local optima and ensure a balance between exploration and exploitation [27][30]. - This strategy involves identifying candidates that have achieved the best scores across training tasks, filtering out strictly dominated candidates [31][32]. Group 5: Performance Evaluation - Experimental results show that GEPA consistently outperforms MIPROv2 and GRPO across various benchmarks, achieving improvements of up to 14.29% [42][39]. - GEPA demonstrates high sample efficiency, outperforming GRPO while requiring significantly fewer rollouts [39][41]. Group 6: Observations and Insights - The next candidate selection strategy significantly impacts optimization trajectories and final performance, with Pareto-based sampling showing clear advantages [43]. - Optimized prompts from GEPA are shorter and more efficient than few-shot demonstration prompts, enhancing computational efficiency [45]. - A unique system-aware crossover strategy, GEPA+Merge, yields additional performance gains by identifying complementary strategies from different optimization lineages [47].