OpenEvolve
Search documents
AI五小时发现MoE新算法,比人类算法快5倍,成本狂降26%
3 6 Ke· 2025-10-24 13:03
Core Insights - The article discusses the advancements in AI-driven algorithm creation, highlighting a system called ADRS (AI-Driven Research for Systems) developed by a research team at UC Berkeley, which can create new algorithms faster than human-designed ones by up to five times [1][2]. Group 1: Algorithm Efficiency - The ADRS framework, based on OpenEvolve, has demonstrated significant improvements in algorithm performance across various fields, achieving up to 5 times higher operational efficiency or a 26% reduction in costs [2]. - A new heuristic method was discovered through OpenEvolve, which replaces traditional linear search methods, resulting in a runtime reduction to 3.7 milliseconds, a fivefold improvement over previous implementations [12]. Group 2: Expert Parallelism Load Balancer (EPLB) - The EPLB algorithm aims to optimize load balancing among expert networks in large language models (LLMs) by dynamically adjusting the distribution of experts across GPUs, minimizing load imbalance and maximizing system throughput [6]. - The EPLB algorithm operates in three phases: distributing experts to balance load, creating replicas for hotspot experts, and assigning these replicas to GPUs [6]. - The research team evaluated existing methods, including a greedy "bin packing" strategy, which was slower and less efficient compared to the new EPLB approach [7]. Group 3: Research Team and Contributions - The research team includes notable members such as Audrey Cheng, Shu Liu, and Melissa Pan, who are focused on enhancing system performance through innovative scheduling algorithms and large-scale machine learning [14][16][17]. - The article also references a related development in AI, where a meta-learning algorithm was created to discover new reinforcement learning algorithms, indicating a broader trend of AI innovation in algorithm design [20][22].
AI五小时发现MoE新算法,比人类算法快5倍,成本狂降26%
量子位· 2025-10-24 07:50
Core Insights - The article discusses the advancements in AI-driven algorithm creation, highlighting a new system called ADRS (AI-Driven Research for Systems) that can generate algorithms faster than human capabilities by up to 5 times [2][4]. Group 1: AI Algorithm Development - The ADRS framework, based on OpenEvolve, has demonstrated significant improvements in algorithm performance across various fields, achieving up to 5 times efficiency gains or 26% cost reductions compared to human-designed algorithms [4]. - The research team utilized a mixed expert architecture in large language models (LLMs), which dynamically allocates input tokens to specific expert networks, enhancing inference efficiency [6]. Group 2: Load Balancing Challenges - A key challenge in this architecture is load balancing among experts, as some may become "hotspots," leading to computational bottlenecks [7]. - The proposed solution is an Expert Parallelism Load Balancer (EPLB) that dynamically adjusts the distribution of experts across GPUs to minimize load imbalance and maximize system throughput [9][12]. Group 3: EPLB Algorithm Optimization - The EPLB algorithm operates in three phases: determining the required number of expert replicas, mapping these replicas to specific GPUs, and optimizing load distribution [10]. - The research team compared their EPLB algorithm against two baseline methods, finding that existing solutions were slower and less efficient in achieving load balance [13][14]. Group 4: OpenEvolve Implementation - The team employed OpenEvolve to search for an optimized EPLB algorithm, focusing on maximizing load balance while minimizing rebalancing time [17][18]. - The evolutionary process involved 300 iterations and resulted in a new heuristic method that significantly reduced rebalancing time to 3.7 milliseconds, achieving a 5-fold performance improvement over internal benchmarks [25]. Group 5: Broader Implications - The article also references a related development in AI, where a meta-learning algorithm was created to discover new reinforcement learning algorithms, further emphasizing AI's capability to innovate independently [35][38].
AI动态汇总:上交AI智能体表现亮眼,AlphaEvolve生成代码反超人类
China Post Securities· 2025-07-08 14:03
Quantitative Models and Construction Methods Model Name: ML-Master - **Model Construction Idea**: The ML-Master model is designed to simulate human expert cognitive strategies, addressing the three major bottlenecks in existing AI4AI systems: low exploration efficiency, limited reasoning ability, and module fragmentation[12] - **Model Construction Process**: - **Balanced Multi-Trajectory Exploration Module**: Utilizes a parallelized Monte Carlo tree search to model the AI development process as a dynamic decision tree, with each node representing a potential solution state. This module dynamically allocates computing resources based on the potential value of 75 Kaggle task branches, avoiding local optima and improving medium difficulty task medal rates to 20.2%, 2.2 times the baseline method[13] - **Controllable Reasoning Module**: Overcomes the static decision limitations of large language models by filtering key code fragments, performance metrics, and cross-node insights from historical explorations through an adaptive memory mechanism. This ensures the reasoning process is based on verifiable execution feedback rather than probabilistic guesses, improving high difficulty task performance by 30%, significantly surpassing Microsoft's system's 18.7%[13] - **Adaptive Memory Mechanism**: Integrates the exploration and reasoning modules, creating a closed-loop evolution system. The results of code execution collected during the exploration phase are embedded into the reasoning model's "think" phase after intelligent filtering, and the optimized solutions from the reasoning output guide subsequent exploration paths. This dual empowerment allows ML-Master to reach the Grandmaster level among the top 259 global Kaggle participants after 900 machine hours of training, with solution quality improving by 120% over multiple iterations[15] - **Model Evaluation**: The ML-Master model demonstrates significant advantages in exploration efficiency, reasoning ability, and module integration, making it a leading system in the AI4AI field[12][13][15] Model Backtesting Results - **ML-Master**: - **Average Medal Rate**: 29.3%[12] - **Effective Submission Rate**: 93.3%[19] - **Task Performance**: 44.9% of tasks outperform more than half of human participants, with 17.3% of tasks winning gold medals[19] Quantitative Factors and Construction Methods Factor Name: OpenEvolve - **Factor Construction Idea**: OpenEvolve is designed to autonomously evolve code, achieving significant performance improvements in GPU kernel optimization tasks[22] - **Factor Construction Process**: - **Algorithm Layer**: Through 25 generations of evolutionary iterations, OpenEvolve autonomously discovered three key optimization strategies. For example, the SIMD optimization for Apple Silicon demonstrated the system's precise grasp of hardware characteristics, perfectly matching the hardware's SIMD width when processing 128-dimensional attention heads[23] - **Technical Implementation**: Utilizes a multi-model collaborative evolutionary architecture. The main model, Gemini-2.5-Flash, is responsible for rapid exploration, while the auxiliary model, Gemini-2.5-Pro, performs deep optimization. The system divides the Metal kernel function source code into evolvable blocks, retaining the integration code with the MLX framework unchanged, and evolves five subpopulations in parallel using the island model, with each generation having a population size of 25 individuals[24] - **Performance Evaluation**: The evaluation phase adopts a high-robustness design, including Metal command buffer protection, memory access violation handling, and exponential backoff retry mechanisms, ensuring the system can boldly attempt aggressive optimizations without worrying about crashes[25] - **Factor Evaluation**: OpenEvolve redefines the boundary of human-machine collaboration, demonstrating the potential for AI to autonomously explore optimization paths that require deep professional knowledge[22][23][24] Factor Backtesting Results - **OpenEvolve**: - **Average Performance Improvement**: 12.5% in decoding speed, 14.4% in pre-filling speed, and 10.4% in overall throughput[25] - **Peak Performance Improvement**: 106% in decoding speed for repetitive pattern generation tasks[25] - **Accuracy and Error Rate**: Maintains 100% numerical accuracy and zero GPU errors[25]