Transformer作者初创公司最新成果：开源新框架突破进化计算瓶颈，样本效率暴涨数十倍

Core Insights - The article discusses the launch of an open-source framework called ShinkaEvolve, developed by Sakana AI, which significantly enhances sample efficiency in various computational tasks, achieving results that previously required thousands of evaluations with only 150 samples [1][3][22]. Group 1: Framework Overview - ShinkaEvolve allows large language models (LLMs) to optimize their own code while maintaining efficiency, likened to equipping evolutionary computation with an "acceleration engine" [3][6]. - The framework demonstrates performance comparable to Google's AlphaEvolve but with higher sample efficiency and open-source accessibility [6][22]. Group 2: Key Innovations - The framework incorporates three major architectural innovations that enhance its performance across tasks such as mathematical optimization, agent design, and competitive programming [5][11]. - The first innovation is a parent sampling technique that balances exploration and exploitation through a layered strategy and multi-method integration [11][13]. - The second innovation involves a novelty rejection sampling method that reduces ineffective computations by filtering out low-novelty variants using a two-tiered mechanism [14][16]. - The third innovation is a multi-armed bandit LLM selection strategy based on the UCB1 algorithm, which dynamically schedules LLMs based on their performance during different task phases [17][18]. Group 3: Performance Validation - In mathematical optimization, ShinkaEvolve achieved a significant breakthrough by requiring only 150 evaluations to optimize the placement of 26 circles within a unit square, compared to thousands needed by AlphaEvolve [20][22]. - For agent design, experiments showed that ShinkaEvolve outperformed baseline models in solving mathematical reasoning problems, achieving maximum performance with just seven LLM queries [23][25]. - In competitive programming benchmarks, ShinkaEvolve improved average scores by 2.3% across ten AtCoder problems, demonstrating its effectiveness without extensive code restructuring [28]. - The framework also excelled in evaluating load balancing loss functions in mixed expert models, showing higher accuracy and lower perplexity across multiple downstream tasks [30][32].